SQLite with Litestream has little to no downside

SQLite with Litestream has little to no downside

It might sound cynical, but it isn't often a new piece of software comes a long, and I'm immediately won over.

# SQLite (opens new window)

SQLite's utility when it comes to proof of concept or testing environments has been something I've been leaning on for a number of years. For example, in any demo, personal project, it is my default choice simply because it is easy to use, and embedded right into your application.

Using it with the :memory: connection string means you can initialize the application with exactly the state you want to demo with, and if something gets messed up, you just restart the application.

However, if you want to take that same SQLite demo and build it into something bigger with any kind of automated disaster recovery, I usually immediately reach for PostgreSQL. PostgreSQL is still my default for large applications, but there are a lot of little toy systems where I really wish I didn't need to manage a whole other server or process just to make it more resilient.

# Litestream (opens new window)

Litestream is a process that injects itself into SQLite's journaling and snapshot process to create its own WAL (write ahead log) and snapshots that can be replicated elsewhere on a variety of different storage mediums. It does this by saving transactions in separate WAL files every second (opens new window).

For example, say you have an application using SQLite saving to a file on the local machine to persist data between restarts. This works fine, but you now have to come up with your own backup/restore strategy when inevitably the host or storage fails.

This is not too difficult if your application isn't "mission critical" like a personal app or dev tool. However, it is just another thing you have to remember how to fix when it goes down.

"Where did I back up the file?", "What server was it on?" are the types of question you have to hunt around to solve, cause even if well documented, it is still some kind of manual intervention.

The effort required to automate recovery is usually reserved for money making systems where downtime has a larger impact. And even then, automating recovery for something like SQLite would require the replacement host to know where the last good version of the database was, pull it down and hope that the data lost in between wasn't too great.

Litestream makes this genuinely simple but making it seem like the ability to replicate data from your SQLite database to S3, Azure Blob Storage, Backblaze B2, DigitalOcean or even SFTP, is baked write into the database itself.

# Using Litestream

Litestream is an external process that needs access to your SQLite file to watch for transactions and configuration for authentication to the remote storage solution. This can be run from the same host VM using a simple command, for example, replicating a local SQLite database to AWS S3.

litestream replicate fruits.db s3://my-sqlite-backup-bucket/fruits.db

litestream will look for LITESTREAM_ACCESS_KEY_ID and LITESTREAM_SECRET_ACCESS_KEY environment variables to try to authenticate with S3. Alternatively, you can use the Litestream configuration file to specify the details of your replica there.

access-key-id: AKIAxxxxxxxxxxxxxxxx
secret-access-key: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/xxxxxxxxx

dbs:
  - path: /home/me/fruits.db
    replicas:
      - url: s3://my-sqlite-backup-bucket/fruits.db

And since Open Source compatible S3 alternatives like MinIO exist, you can use your own storage service in the same way.

Litestream's own getting started guide (opens new window) uses this as the step by step to show how it works using MinIO running in Docker and the SQLite command line tool.

Inserting values you can see new .wal files are generated quickly in MinIO, and you can also configure the interval between snapshots.

I did a basic stress test of an empty snapshot and 100k rows in ~300 .wal files on S3. After which I deleted the SQLite file, and ran the litestream restore command.

litestream restore -o fruits2.db s3://mybkt.localhost:9000/fruits.db

From DigitalOcean droplet to S3 this took about 25 seconds to restore. After the restore a new snapshot was created with all the 100k rows. Performing the same restore test and it was basically instant.

This is because when using the "generation" WAL files. It finds the latest snapshot, restores that as a starting point, and proceeds to repeat the transactions in the series of .wal files to reach the same consistent state when litestream stopped replicating.

# Disaster Recovery For Free

The reason I guess I am so impressed by this technology, is that is takes a problem that can be quite complex and has provided an extremely easy to use solution that is still quite robust.

For example, if you have an application already using SQLite file storage and want to automate backup and restore, you can run this command when a replacement host is coming back online.

litestream restore -if-db-not-exists -o /data/MyApp.sqlite s3://my-sqlite-backup-bucket/mydb.sqlite
litestream replicate /data/MyApp.sqlite s3://my-sqlite-backup-bucket/mydb.sqlite

This checks if the mydb.sqlite file already exists, if it does, it just starts a new replication watch on your database. However, if it doesn't it restores your database to the last known good state and then starts the same replication watch guarding against future outages.

And you can run both your application and Litestream in containerized environments. Using a shared volume, Litestream can run as a sidecar container backing up or restoring your database when your containers come up and down. If this happens on the same host, Litestream won't need to restore and will continue on making snapshots and writing the WAL to S3.

If however, your container is on a cluster and gets restored on a different host, the Litestream sidecar will restore your database and continuing watching for changes.

# Cost efficient

For small databases (upto say a few million rows), even using services like S3/Azure Blob storage is extremely cost-efficient. Daily snapshots might be a few 100MB, and with a simple lifecycle policy on the bucket, you could keep weeks of backups that track transactions roughly every second for 25c a month.

As your database starts to approach 10-100GB, which for SQLite is very large IMO, you could use a service like Hetzner Storage, Backblaze or your own machines if you really want to penny pinch.

# Nothing is a silver bullet

Now, this is not to say SQLite is the best fit for all applications of all sizes, it really isn't. The type system in SQLite is enough to make anyone cringe (opens new window), but for constrained domains, it really can simplify a lot.

Attribution for the above image goes to ChShersh (opens new window) on Twitter.

Litestream expands where SQLite can be used with confidence into production settings. Prior to knowing about Litestream, I would not have used SQLite with anything beyond personal tools or demos. But now, having Litestream fill that restore/backup gap in such an easy-to-use, cheap and flexible way, it is definitely going to be a combination I will be reaching to use for small production systems.

# Making tech more boring

As I said, I don't often get that impressed by new software or technology these days (let's face it, most of tech is a capitalistic hellhole), but the stuff that does impress me is things that make tech easier, more accessible and well.. more boring.

Litestream seems like one of those patterns that I think will get a lot of use because it provides a huge amount of utility by accepting same sensible tradeoffs (opens new window) and make an existing, commonly used utility like SQLite easier to manage in a wider variety of use cases.