AskAnna Backup Helper

author.name Robbert, 12 July 2022

At AskAnna we like to keep things simple. For our backups we recently switched to a new setup. The new approach not only supports making backups as easy as running a command. The new setup also makes it easier for us to set up review environments by restoring a review backup.

The AskAnna Backup Helper is able to:

The AskAnna Backend that contains all the data is a Django application. We want to back up the user-generated data saved in a PostgreSQL database and the files uploaded to the AskAnna Backend. Per environment we have a different configuration, and the backup helper should be able to deal with this. Our approach of makings backups can be used not only for Django applications but for any applications that need backups of files or databases.

In our setup we use Docker Compose to start the several services we need. We plan to add examples and support for Kubernetes as well. We wanted a separate service that could handle all the backup-related tasks, so we don’t have to include that in one of the other services. The AskAnna Backup Helper is a separate container running in our stack.

How we use it

One of the goals was to keep the configuration simple. For example, for setting up the PostgreSQL service we use an environment file that contains the host, port, database name, user and password. The Backup Helper can reuse this environment file. You only have to make sure that you refer to the correct environment file containing the required environment variables.

To get an idea of how we use it, hereby the Backup Helper service we specified in our Docker Compose file:

backup_helper:
  image: askanna/backup-helper
  command: crond -f
  volumes:
    - backup_volume:/backups
    - storage_volume:/data
    - ~/gcs-key.json:/keys/gcs-key.json:ro
  env_file:
    - ./postgres.env
  environment:
      GCS_BUCKET: <Google Cloud Storage Bucket name>

image

You can find the Backup Helper on Docker Hub.

command

By default when you run the container, you can use it to run backup related commands. In our infrastructure we start the Backup Helper service with the cron daemon running. For more information see schedule backups.

volumes

We attach three volumes to the Backup Helper. The backup_volume contains the “local” backup files. storage_volume contains the files uploaded to and generated by the AskAnna platform.

We also link a Google Cloud Storage JSON key. This JSON key is used by gsutil to authenticate for uploading to and downloading files from the Google Cloud Storage. For more information see upload backups to & download from cloud storage.

By default the backup helper will make a backup of files in the directory /data and place the archive file with the backup in the directory /backups. If this directory structure does not match your setup, then you could change this and specify the location to be used using environment variables. For the configuration options, see the AskAnna Backup Helper description.

env_file

In our configuration we have a postgres.env file that is used by PostgreSQL clients to connect to the database. The same environment file can be used for the Backup Helper. The environment file specified the following environment variables:

  • POSTGRES_HOST
  • POSTGRES_PORT
  • POSTGRES_DB or POSTGRES_DATABASES (we have multiple databases we want to backup)
  • POSTGRES_USER
  • POSTGRES_PASSWORD

environment

In our setup we use most of the default configuration. For uploading the data to Google Cloud Storage, we need to provide the image with the bucket name. Therefore we set the GCS_BUCKET environment variable.

In case you want to change other configuration, you could add them to this list. The full set of configurations can be found in the AskAnna Backup Helper README.

Making and restoring backups

For the AskAnna system we want to make two type of backups:

  1. Files
  2. PostgreSQL databases

Files

All files we want to backup are on the storage_volume, which is attached to the image on the directory /data. This source is set via the environment variable BACKUP_SOURCE. If your files are in another directory, you can change this environment variable to the directory you want to backup. To create a backup, you can run:

docker-compose run --rm backup_helper backup_files

Or if you want to set another directory you want to backup:

docker-compose run --rm --env BACKUP_SOURCE=/dir_to_backup backup_helper backup_files

The backups are saved on the Docker volume backup_volume. If you want to see which backups are saved on the volume, you can list them via:

docker-compose run --rm backup_helper backup_ls

If you want to restore a file backup, look for the backup file you want to use and copy the name. Now you can use the restore command if you want to restore the files:

docker-compose run --rm backup_helper restore_files <files backup>.tar.gz

The files are by default stored to the /data location. If you want to change the location the files are stored to, you could change the BACKUP_SOURCE or set the BACKUP_TARGET variable:

docker-compose run --rm --env BACKUP_TARGET=/restore_backup_to_dir backup_helper restore_files <files backup>.tar.gz

Note: when we restore files, the script will overwrite existing files, but it will not remove files that are not in the backup. If you want to make sure you only end up with the files in the backup, you can first remove the backup target directory. If you run the restore script, it will check if the target directory exists. If not, the script first make the directory before restoring the files.

PostgreSQL databases

On the AskAnna Backup Helper we installed the postgresql-client which makes it possible to make backups from PostgreSQL database. We use pg_dump to extract a PostgreSQL database into an archive file.

See the above description of the env_file to read more about the configuration. In the AskAnna Backend we have multiple databases, but if you have only one database to backup you can set the POSTGRES_DB and ignore the POSTGRES_DATABASES variable.

You can make the backup by running:

docker-compose run --rm backup_helper backup_postgres

And similar to restore files, you can restore the Postgres backup via:

docker-compose run --rm backup_helper restore_postgres <postgres backup>.tar.gz

Note: when you restore the database, we always drop the current databases if they exist. If you don’t want to lose the current data, you could first make a backup of the database before restoring the PostgreSQL backup.

Upload to and download from cloud storage

We could keep the backups on the local system in the Docker volume backup_volume. In case of a complete system crash, this would also result in losing the backup files. It’s always a good idea to save your backups in an external location.

We decided to use Google Cloud Storage. Our current infrastructure is hosted on Google Cloud and in this setup it’s cheaper to upload the backups to a bucket on the Google Cloud. Therefore we installed gsutil in the image.

In our Google Cloud environment, we created a bucket. And we created a GCS service account key file. To get the key file, you need to have a Google service account or create a new service account. From this service account, you can get the associated private JSON key or create a new service account JSON key.

In the Docker Compose file, you can now set the environment variable the GCS Bucket that should be used for uploading the backup files to. And you can refer to your local location for the JSON key by modifying the first part (~/gcs-key.json) of the volume definition for ~/gcs-key.json:/keys/gcs-key.json:ro.

You can upload the backup archive files to the GCS Bucket by running:

docker-compose run --rm backup_helper gcs_upload

Managing backups for different deployments

At AskAnna we maintain several deployments of our platform. We could create a bucket for each deployment, but we decided to do it slightly different. For each environment we add a directory with the environment name in the GCS Bucket.

If you want to upload files to a specific directory in the GCS Bucket, we only have to add the directory name to the GCS_BUCKET environment variable: <GCS Bucket>/<directory>.

Listing backups available in the GCS Bucket

To list the backups that you have upload to the GCS Bucket, you can run:

docker-compose run --rm backup_helper gcs_ls

Download backups from a GCS Bucket

To download a backup from the GCS Bucket to the Docker Stack, you can run:

docker-compose run --rm backup_helper gcs_download <backup file name>

With the options to upload files to and download files from a GCS Bucket, it’s not necessary to keep all backup files on the local system. This can save a lot of disk space.

Clean up backup files

In our setup, backup files are saved in two locations. Locally in the Docker Stack and in the GCS Bucket. You could keep backups indefinitely, but this is probably not necessary in most cases.

Clean up local backup files

In the Docker Stack we can run a command to clean up files:

docker-compose run --rm backup_helper backup_clean

Before you run the clean up command, we assume you first uploaded all backup files to the GCS Bucket. It could be that you want to keep the backups locally for a couple of days. Therefore you can use the environment variable BACKUP_KEEP_DAYS.

The default value for the environment variable BACKUP_KEEP_DAYS is NONE. You could set it to a numeric value as well. If you set the numeric value to n, the job will keep backup files that are modified less than (n + 1) * 24 hours ago. For example:

  • 0: keep backups that are modified less than 24 hours ago
  • 1: keep backups that are modified less than 48 hours ago

For finding files to remove, we use the command find with the option mtime set to the value of BACKUP_KEEP_DAYS.

Clean up GCS Bucket files

In GCS Bucket we use Object Lifecycle Management to remove backup files after a certain period. In our case, we added a rule that deletes the object after 30 days since the object was updated.

Another tip is to look at the storage classes. This can save you money on your monthly bill (see pricing). But please keep in mind that if you download backup files a lot, it might be better to use the standard storage. In our case, we remove backup files after 30 days. Using the Nearline storage class saves us around 50%.

Schedule backups

By design, the daily backup is not scheduled when you run the container because the cron daemon is not started. But the container already contains a script backup-upload-clean.sh that you can use to run a backup procedure that:

  • Upload all backup files to the cloud storage
  • Remove all local backup files, unless you changed the BACKUP_KEEP_DAYS variable
  • Backup of Postgres databases and files
  • Upload backup files to the cloud storage

This script is stored in the directory with daily cron scripts. If you run the container with the command crond -f the cron daemon starts and a daily backup is scheduled.

For example, if you want to schedule backups hourly, you can move the backup script from the directory /etc/periodic/daily to etc/periodic/hourly and start the cron daemon. To do this, you could replace the command in the Docker Compose file with:

mv /etc/periodic/daily/backup-upload-clean.sh /etc/periodic/hourly && crond -f

Inspiration

For the AskAnna Backup Helper, we found inspiration in cookiecutter/cookiecutter-django and diogopms/postgres-gcs-backup. With our version, we tried to make it easier to configure several options and we introduced scheduled backups.

Contribute

The project is published on GitLab: https://gitlab.com/askanna/backup-helper/. You could add an issue with any question, improvements or suggestions. Or feel free to create a merge request with anything you want to change.

If you want to ask us something, you can also email me at [email protected].