Archiving Articles Revision as of Thursday, 18 March 2021 at 02:56 UTC

Looked at three options

Requirement: I don’t need any JavaScript in my archive. Don’t really care about the images either. Just the text.

SingleFile

Not bad at all. Everything but the JavaScript, all scrunched into a… single file.

# Use JsDOM instead of Chrome/Puppeteer to avoid JavaScript
npm i -g jsdom
npm i -g "gildas-lormeau/SingleFile#master"

ArchiveBox

Saves everything kinda like archive.is. Images, CSS, JS, fonts, everything.

# On macOS
brew install archivebox/archivebox/archivebox

# Get the Readability driver
npm install --prefix . "git+https://github.com/ArchiveBox/ArchiveBox.git"

archivebox init
archivebox add https://www.washingtonpost.com/politics/2021/01/15/pillow-salesman-apparently-has-some-ideas-about-declaring-martial-law/?utm_source=reddit.com
archivebox server

readability-cli

Saves just the DOM. No styling. Example.

Result

ArchiveBox is awesome but I ended up using SingleFile for a good balance. Plus, readability-cli had some encoding issues.