My Digital Papers System
January 11, 2021If they don’t got books, don’t fuck em
Long time ago we valued books. Every house would have at least a bible, a few books on a shelf. Manors had nooks. Castles had full on libraries, with maybe a monk chained tween the stacks. We weren’t so drenched in a torrent of meaningless data that I suspect we had time to think, and with thinking comes organization, and maybe even some appreciation of the hard earned work it takes to accumulate knowlege, of the value of long-records-made-useful. When the mark of the wilderness was to swallow all events into the murky dark, empty burned out farmhouses at the edge of the woods with no one to repeat their stories or tell what happened, to value the torch of civilization was to value writing.
I made it a goal leaving college to keep my books, scattered around my shelves and also various friends’ and acquaintances’ shelves as good books travel. My other goal was to keep some equivalent of my pile of books in the digital world. Used-to-be I’d keep a folded-up sheet of paper in my back pocket with a rotating scribble of things, to do, to look up, to read, to watch, to recommend. Then there were Field Notes. You stumble into the fetishization of things and stuff when you get into Field Notes. Fountain pens. Bookmarks. Book bindings. Not a bad fetish as far as things go, certainly better in my estimation than the one girl who liked strings of drilled teeth. I can understand the love people invest in a piece of paper and a quick scribble as they are oh-so-convenient.
My digital notes were just a folder. I’d never come back to them was the problem. They were all so hard to read and most of them disintegrated, as hard drive replacements and upgrades conspired with entropy to gnaw on em. I tolerated that for some period of time until I noticed that I would learn more in some weeks than I could ever use, and worried that in a few years past when I would eventually need something I’d found on the big ol wide world of internet reading, seminars and conferences and conversations in alleys behind bars with queer strangers, I wouldn’t have it anymore. I also have to admit that the up-front cost of the computers I wanted to invest in, Network Attached Storage (some 800$ abouts) put me off. The cobblers kids have no shoes, and working on the computers made me neglect my own systems for years.
“the universe is made of stories, not of atoms”
Impetus
I drove up along the lake to see a friend, a fellow of similar madness but more of a fine weave disposition than my own matted haberdashery if that makes any sense. I got some mad respect for his ability to gitrdoin and done. He was living on his boat at the time, anchored out of an industrial looking harbor on the edge of a rougher spot of town. Gates and warehouses and millions of joydollars of mostly working boats lined a way back to an adorable vine covered lake cottage, vacant because I assume the zoning only permits storybook characters to live whimsically these days.
I’d only had GPS coordinates to a pin in the middle of the harbor for directions, which I assumed was a mistake until after a few text messages and awkward milling bout, he rolls up in a zodiac boat. Until the goodwife started watching Sailing Uma I hadn’t realized that you could live on a boat, mostly free of the world. This has kept my friend as a bright spot of light in my memory, fighting the storybook characters for the landgrab.
After trading some very nice LiFePo batteries out of a rusted tool van along the side of the way, we got chatting about all the bits and pieces and materials that go into any decent project. And the odd communities we follow that trade in such knowledge. My notes here say I learned about G-10, a lightweight very tough fiberglass composite then, something he uses to fix rotting wood parts of his boat. I haven’t had a need for it yet, but its there waiting, in my notes.
He said:
“You know wouldn’t it be great, if all of these people who delve real deep into something and have all these notebooks and ways of looking at things, could store them on some kind of ‘life drive’ that could be shared around, like checking out a dedicated library for a human, so you could go through and learn in context of what the person was doing.”
I firmly believe all genuine knowledge originates from direct experience, something hillbillies and Mao can agree upon. Our tooling progresses as a list of things that didn’t work, and the only way you build that list is by trying and failing. I told my friend I had something of a digital library, and that I would send him the notes on it whenever I got around to writing it up. This got me thinking about a library of notes, manuals, etc, filtered through one’s experience is the meta-tool behind all tools. And while I don’t have the whole “life drive” publishing idea figured out, maybe posting about what I’ve gotten figured out will be helpful to someone as we move our libraries from paper onto the infernal thinking-machines.
The System
Summary: I use the Joplin note taking application to record notes, pictures and attachments (like PDF manuals) which are immediately synchronized to all full sized computers via SyncThing, and then exposed by a SyncThing participating central home server over the WebDAV protocol to lightweight mobile clients and also backed up incrementally to a backup off-site server.
If that seems like a lot, it is, but hopefully you come away from this article understanding a few patterns that will outlast any of this specific software.
Namely:
- Use text files and simple formats.
- Interface matters, using your notes needs to be an effortless workflow.
- Digital storage is only reliable by being constantly copied.
- Do not trust tech companies. You don’t need to depend on centralized servers, though they can make things simpler.
- Synchronization is not backup.
Where we are in the lifespan of computing is extremely young (70 years maybe?), so I should mention that whatever you do, the system will take more maintenance than a library of books. It will need to be migrated, both between data formats and mediums every so often. The goal of my system is to make that migration take much less energy than the utility it provides me. I’d like to get that maintenance work down to a few hours every decade, but sadly its been more like a 8 hours every 2 years or so but at least its gotten longer as time goes on.
Data format (avoiding artificial landmines)
You ever notice theres not a bunch of publishing nerds changing around the format of books every quarter in a need to prove their own existence? A book is made of paper, contains text in some language, printed with ink. Its meant to be read by humans. Maybe clever dogs. Not so for software. Software attracts investors like the Oak Island treasure hole, with billions of dollars of VC’s money thrown into an immature, fashion-chasing engineering culture hunting for that next virgin industry to sacrifice on the altar of “digital transformation.” Change is made for the sake of padding resumes and getting raises. Very little software advances “the state of the art”, even less improves upon its non-digital predecessor, and even less makes it past the filter of “can I extract rent from it.”
I would not use a proprietary commercial format for my papers for the same reason I don’t buy my clothes based on the latest GQ.
Not to harp on adversarial capitalism, but it gets worse when you consider the anti-competitive behavior from companies that don’t even bother hunting “innovation”, but instead intentionally break things to gain and maintain control of their revenue stream: you. Don’t let someone gatekeep your data. Don’t stick it in Microsoft Word, because Microsoft has an incentive to break your shit so you keep paying for their new stuff. Evernote has the same incentive.
Keeping the problem train rolling, theres the issue that the bigger (read: monopolistic) these companies are, the more of a target they are for someone to hack and mine that data. Or if you want to join me down in the well of rebellion, that a democratically free society must be “secure in their persons, houses, papers, and effects, against unreasonable searches and seizures”, meaning I’m not going to use a SaaS provided data format. I would not send all my personal papers and effects to companies which Edward Snowden showed are having all of their data slurped into a datacenter in Utah for perusal by whichever political group is in power right now. Whats fine today is illegal tomorrow (and parallelly constructed).
This leaves us with the many open source formats, which are created as good-hearted or practical patronage from tangential businesses, as marketing for developers looking to build their “personal brand”, and from naive ideologues who have been unable to come up with a better business model than begging. While I am most sympathetic to this group, depending on their software in many cases and paying the patreons of those I would like to succeed, it remains on shaky foundations.
So the system I chose and the one I’m going to suggest here to everyone, is that when at all possible:
USE TEXT FILES.
The venerable text file is old, easily 50 years at this point. Text files can be archived. Text files can be read by damn near everything. A large text file can be partially read off of a broken, nearly destroyed medium and somewhat recreated. Text files will survive. You can create formatting and notations on the fly with scripts, or even just by tapping your keyboard a bunch. And you don’t have to ask a programmer, app developer or operating system for permission or support. Text files are already supported. Text files can often be printed out easily on paper with a cheap, reliable laser printer and last even longer. Theres even a cult behind them, checkout textfiles.com, from Jason Scott, the software fellow at archive.org, a group who fights to preserve the records of this point in history from being lost forever.
Interface (its important)
So with text files in mind, there are lots of options when it comes to editing them. I use Joplin as my text file editor and organizer, I use vim in other cases, which Joplin supports as an “external editor.” Ignoring the madness of the javascript community, its not bad-to-use software. The main thing thats important for me is that it runs a version on every desktop and mobile device I have. If you wonder about formatting text files, Joplin supports an ever growing set of conventions around Markdown that annotate your text in a semi-standard way, heavily adopted by programmers everywhere. Joplin stores everything as text files with YAML footers for its custom structured data, which makes a super easy-to-read, two part file. If Joplin disappeared tomorrow, the data in those files would be searchable and recoverable. Which “Notebook” a note resides in and which “Tags” are applied are just a few extra fields on the text file. And this is a pattern I’m expecting to continue, as people continue to think about minimized data notations, like generating charts from text with mermaid.js.
But what about pictures? Stuff I get from other people? When considering other data formats, apply the Lindy Effect which says that a techs lifespan is roughly proportional to its age. JPG was invented in 1992, PDF in 1993. You can probably expect both to be around till 2050, but as those formats become increasingly complex, parts of them could break. Video is awful, good luck. In the case of my manuals I can download them, attach them to a note with a text title and they’ll be there, ready to search later even if the PDF format itself is huffing glue.
Something to note on that front, all attachments in Joplin are created as textfiles with the hash (fingerprint) of the attachment and its filetype, and then stored raw along with the textfiles. If I needed to move all the manuals I keep there to a better set of file names for instance, it would take some scripting but would be easy to do.
Synchronization (avoiding natural landmines)
Books have survived because they are simple. We have recovered scrolls from thousands of years ago. A wood crate of paper from a hundred years ago, kept dry and away from light, will probably be readable today. Meanwhile an “archive” grade CD (non organic dyes?) have errors after less than 10 years. Those digital photos you have from pre-facebook times (2006?), yeah its probably good to check those. Hope you still have a CD drive nowadays. Anyone remember ZIP drives? Floppies? Oo how about all that magnetic tape. I hear commercially produced cassette tapes made in the 70s will finally be failing about now, though being analog their degradation will have been incremental. With digital its worse, you don’t get a scratchy noise filled audio track like analog. With a digital file you often either read all of it, or none of it.
I have a fair few computers. And I trust none of them. Enter: Syncthing. Syncthing is a tool that when you write locally to a hard drive, it notices and immediately ships it to whichever other machines you have running Syncthing.
The main thing is that on full desktop/laptop machines, this process is super fast. As in sub 5 seconds. Often less than 2. There is no forgetting a backup job that runs every other day. Under the hood it uses a bittorrent like system, though for text files you’ll never notice. For large files, if two of your devices have a file and a third wakes up, it can download simultaneously from both. I can say it scales to at least the 50,000 files I have under shares right now, with roughly 200MB of memory used, though of course a good kernel will use whatever extra memory it can caching your files so don’t be surprised when some tools report it as using 1GB+ (its not).
So all computers have full copies of your files. Hard drive space, per dollar has increased more in the last 20 years than anything save maybe display resolution. (Meanwhile healthcare and tuition have risen the most so take what you can get ya know.) Misewell use it to add a layer of resiliency to your digital world, so you’re prepared when you visit friends in other parts of the Midwest continually drained of any and all funding for rural broadband (across the last 3 administrations) and the cellphone signal goes out.
At some point I plan on making a “solar cycle aware” media caching computer in our travel trailer, where a server at home loads the last X newest downloads into a share that the camper computer will attempt to sync during midday when wifi is available. Why? Because.
So you are aware of what you need to decide up front, each machine you install SyncThing on will need:
- A name
- each share you’d like to add to it, added via a link that shares the cryptographic parameters
- a place locally to store each share. consider the relative size of each of your drives
- a decision on whether you want a one way sync or bi-directional
- a decision on how you’d like to delete with deletes, from not at all, to a simple trashcan, to multiple backup versions.
In this way you can carve up your files into “shares”, spread them around your machines without any centralized server, and so long as one machine is awake your changes will travel from machine to machine automatically. I can work on a document at my desk, open Joplin on my phone and continue editing it on my way out the door. In a completely different workflow, this website is synced to the webserver hosting it via SyncThing.
Home Server
If you wanted to get started with your own notes system, you could stop reading right here. Most people don’t need a server. Find your computer using friend who’s willing to maintain one, and use theirs. Encrypt your stuff and they can host it for you. Most computers sit idle most of the time, and when we consider the power draw used between their lowest and highest power state (a small gap on everything not intentionally designed for it), its usually more efficient to keep one server filled with work than 3 servers idling. This is actually the source of the use of WebDAV on phones with Joplin, as WebDAV on demand is much less energy hungry than a constantly running SyncThing.
While I would really love to live in a world of no servers, the unfortunate truth is if you want to minimize power usage (because our energy footprint is heating the planet and killing us and our descendants), it actually makes sense to have a separate “always-on” computer that is optimized to use the least amount of power possible. This means any machines like desktops with their power hungry GPUs are timed to shut down when a human is done with them, leaving only the little core up doing things.
Honestly it would be better if we had fewer computers entirely, as producing new shit and then transporting it all over the planet is the primary way energy is spent. With that said, more and more functions are run by computers in the home like lights, HVAC and locks, and if we’re going to continue on into the futureand not just stagnate at a primitive level, our energy systems are going to need to get more complex to use more sources efficiently. That will require some type of computer, so the best I think we can do is keep them running for as long as possible between replacements.
So my server started out as a file server, but has further goals.
-
Be able to browse my library all in one place. This means no unlabeled stack of CDs or SD cards I cant access without manual work.
-
Be a set of storage that I own, with multiple drives for backup as they will fail, and I’d like to keep it up and running while a new drive is on the way. I think paying in perpetuity for a cloud server is a kind of hostage situation where you never know if the guy holding the gun is planning on checking out to meet the glorious investor alien-angels in the sky to get his promised 72 IPO bux. https://ourincrediblejourney.tumblr.com
-
Be kind to myself and my partner, make it easy to add things to it and move things. Using it should not be a chore.
-
Progressively be able to do more tasks in the house.
My current solution is a Synology NAS, because while I am capable of managing all manner of complex software, I don’t want to at home. And I’d bet neither do you. Synology has a web interface that looks like a windows desktop and you can mostly just click your way to what you want. It runs docker so I’ve been able to add more and more things to it, like how it currently runs my lights via some terrible automation.
To make some pieces work has exposed some ugly things, like this deploy script for Let’s Encrypt certs, and the networking is screwy if you want to do complex isolation of containers, but overall it does pretty well.
Some notes:
- DS918+ was the cheapest 4 core drive NAS I could get. Synology Home NASes
- All Synology devices support a “X-RAID” that lets you add disks as you get them, growing them as needed.
- Dual NVMe SSDs (~100$) in the belly mean all 4 bays are open for slow, cheap 5400RPM drives because the SSDs are configured as a RAID1 read/write cache for great performance.
- This also runs my cameras via separately purchased, perpetual licenses for Surveillance Station whos client software works well in HTML5/linux/android etc.
- The “+” notation and formerly “play” notation means a bit of extra horsepower for stuff like video transcoding, acts as a Plex but I use plain ol NFS mounted via another machine running Kodi
The DS415+ I had previously died due to an intel chip bug of some kind. I found out after buying a replacement that it could be fixed via soldering a resistor to the right place on the board. Thus I’ve ended up with a second, which is nice as it keeps a snapshot of the main server on a progressive basis, specifically to prevent an accidental deletion event from synchronizing to everything and deleting all my files.
Kitchen Computer
Part of the notes system is making our data more useful. To that end I wired up an old work laptop with Ubuntu and a 75$ Acer touchscreen I found on craigslist from a bankrupt dental software office, and stuck that on a 100$ clamping swingarm. Now we can pull up recipes and keep them displayed while we work. Its also wonderful for fullscreen YouPorn playlists when that need for Chicken and Waffles hits (use real maple syrup y’all.)
Eventually that laptop which draws considerable power will be replaced by something like a Pine64.
Johann Rudolf