At AskAnna we like to keep things simple. For our backups we recently switched to a new setup. The new approach not only supports making backups as easy as running a command. The new setup also makes it easier for us to set up review environments by restoring a review backup.
The AskAnna Backup Helper is able to:
- make backups of files and databases
- restore file and database backups
- upload backups to & download from cloud storage
- schedule backups
The AskAnna Backend that contains all the data is a Django application. We want to back up the user-generated data saved in a PostgreSQL database and the files uploaded to the AskAnna Backend. Per environment we have a different configuration, and the backup helper should be able to deal with this. Our approach of makings backups can be used not only for Django applications but for any applications that need backups of files or databases.
In our setup we use Docker Compose to start the several services we need. We plan to add examples and support for Kubernetes as well. We wanted a separate service that could handle all the backup-related tasks, so we don’t have to include that in one of the other services. The AskAnna Backup Helper is a separate container running in our stack.
How we use it
One of the goals was to keep the configuration simple. For example, for setting up the PostgreSQL service we use an environment file that contains the host, port, database name, user and password. The Backup Helper can reuse this environment file. You only have to make sure that you refer to the correct environment file containing the required environment variables.
To get an idea of how we use it, hereby the Backup Helper service we specified in our Docker Compose file:
backup_helper: image: askanna/backup-helper command: crond -f volumes: - backup_volume:/backups - storage_volume:/data - ~/gcs-key.json:/keys/gcs-key.json:ro env_file: - ./postgres.env environment: GCS_BUCKET: <Google Cloud Storage Bucket name>
You can find the Backup Helper on Docker Hub.
By default when you run the container, you can use it to run backup related commands. In our infrastructure we start the Backup Helper service with the cron daemon running. For more information see schedule backups.
We attach three volumes to the Backup Helper. The
backup_volume contains the “local” backup files.
storage_volume contains the files uploaded to and generated by the AskAnna platform.
We also link a Google Cloud Storage JSON key. This JSON key is used by gsutil to authenticate for uploading to and downloading files from the Google Cloud Storage. For more information see upload backups to & download from cloud storage.
By default the backup helper will make a backup of files in the directory
/data and place the archive file with the
backup in the directory
/backups. If this directory structure does not match your setup, then you could change this
and specify the location to be used using environment variables. For the configuration options, see the
AskAnna Backup Helper description.
In our configuration we have a
postgres.env file that is used by PostgreSQL clients to connect to the database. The
same environment file can be used for the Backup Helper. The environment file specified the following environment
- POSTGRES_DB or POSTGRES_DATABASES (we have multiple databases we want to backup)
In our setup we use most of the default configuration. For uploading the data to Google Cloud Storage, we need to provide the image with the bucket name. Therefore we set the
GCS_BUCKET environment variable.
In case you want to change other configuration, you could add them to this list. The full set of configurations can be found in the AskAnna Backup Helper README.
Making and restoring backups
For the AskAnna system we want to make two type of backups:
- PostgreSQL databases
All files we want to backup are on the
storage_volume, which is attached to the image on the directory
/data. This source is set via the environment variable
BACKUP_SOURCE. If your files are in another directory, you can change this environment variable to the directory you want to backup. To create a backup, you can run:
docker-compose run --rm backup_helper backup_files
Or if you want to set another directory you want to backup:
docker-compose run --rm --env BACKUP_SOURCE=/dir_to_backup backup_helper backup_files
The backups are saved on the Docker volume
backup_volume. If you want to see which backups are saved on the volume, you can list them via:
docker-compose run --rm backup_helper backup_ls
If you want to restore a file backup, look for the backup file you want to use and copy the name. Now you can use the restore command if you want to restore the files:
docker-compose run --rm backup_helper restore_files <files backup>.tar.gz
The files are by default stored to the
/data location. If you want to change the location the files are stored to, you could change the
BACKUP_SOURCE or set the
docker-compose run --rm --env BACKUP_TARGET=/restore_backup_to_dir backup_helper restore_files <files backup>.tar.gz
Note: when we restore files, the script will overwrite existing files, but it will not remove files that are not in the backup. If you want to make sure you only end up with the files in the backup, you can first remove the backup target directory. If you run the restore script, it will check if the target directory exists. If not, the script first make the directory before restoring the files.
On the AskAnna Backup Helper we installed the
postgresql-client which makes it possible to make backups from
PostgreSQL database. We use pg_dump to extract a PostgreSQL
database into an archive file.
See the above description of the env_file to read more about the configuration. In the AskAnna Backend we
have multiple databases, but if you have only one database to backup you can set the
POSTGRES_DB and ignore the
You can make the backup by running:
docker-compose run --rm backup_helper backup_postgres
And similar to restore files, you can restore the Postgres backup via:
docker-compose run --rm backup_helper restore_postgres <postgres backup>.tar.gz
Note: when you restore the database, we always drop the current databases if they exist. If you don’t want to lose the current data, you could first make a backup of the database before restoring the PostgreSQL backup.
Upload to and download from cloud storage
We could keep the backups on the local system in the Docker volume
backup_volume. In case of a complete system
crash, this would also result in losing the backup files. It’s always a good idea to save your backups in an external
We decided to use Google Cloud Storage. Our current infrastructure is hosted on Google Cloud and in this setup it’s cheaper to upload the backups to a bucket on the Google Cloud. Therefore we installed gsutil in the image.
In our Google Cloud environment, we created a bucket. And we created a GCS service account key file. To get the key file, you need to have a Google service account or create a new service account. From this service account, you can get the associated private JSON key or create a new service account JSON key.
In the Docker Compose file, you can now set the environment variable the
GCS Bucket that should be used for
uploading the backup files to. And you can refer to your local location for the JSON key by modifying the first part
~/gcs-key.json) of the volume definition for
You can upload the backup archive files to the GCS Bucket by running:
docker-compose run --rm backup_helper gcs_upload
Managing backups for different deployments
At AskAnna we maintain several deployments of our platform. We could create a bucket for each deployment, but we decided to do it slightly different. For each environment we add a directory with the environment name in the GCS Bucket.
If you want to upload files to a specific directory in the GCS Bucket, we only have to add the directory name to the
GCS_BUCKET environment variable:
Listing backups available in the GCS Bucket
To list the backups that you have upload to the GCS Bucket, you can run:
docker-compose run --rm backup_helper gcs_ls
Download backups from a GCS Bucket
To download a backup from the GCS Bucket to the Docker Stack, you can run:
docker-compose run --rm backup_helper gcs_download <backup file name>
With the options to upload files to and download files from a GCS Bucket, it’s not necessary to keep all backup files on the local system. This can save a lot of disk space.
Clean up backup files
In our setup, backup files are saved in two locations. Locally in the Docker Stack and in the GCS Bucket. You could keep backups indefinitely, but this is probably not necessary in most cases.
Clean up local backup files
In the Docker Stack we can run a command to clean up files:
docker-compose run --rm backup_helper backup_clean
Before you run the clean up command, we assume you first uploaded all backup files to the GCS Bucket. It could be that
you want to keep the backups locally for a couple of days. Therefore you can use the environment variable
The default value for the environment variable
NONE. You could set it to a numeric value as
well. If you set the numeric value to
n, the job will keep backup files that are modified less than
(n + 1) * 24
hours ago. For example:
0: keep backups that are modified less than 24 hours ago
1: keep backups that are modified less than 48 hours ago
For finding files to remove, we use the command
find with the option
mtime set to the
Clean up GCS Bucket files
In GCS Bucket we use Object Lifecycle Management to remove backup files after a certain period. In our case, we added a rule that deletes the object after 30 days since the object was updated.
Another tip is to look at the storage classes. This can save you money on your monthly bill (see pricing). But please keep in mind that if you download backup files a lot, it might be better to use the standard storage. In our case, we remove backup files after 30 days. Using the Nearline storage class saves us around 50%.
By design, the daily backup is not scheduled when you run the container because the cron daemon is not started. But the container already contains a script backup-upload-clean.sh that you can use to run a backup procedure that:
- Upload all backup files to the cloud storage
- Remove all local backup files, unless you changed the
- Backup of Postgres databases and files
- Upload backup files to the cloud storage
This script is stored in the directory with daily cron scripts. If you run the container with the command
the cron daemon starts and a daily backup is scheduled.
For example, if you want to schedule backups hourly, you can move the backup script from the directory
etc/periodic/hourly and start the cron daemon. To do this, you could replace the command in
the Docker Compose file with:
mv /etc/periodic/daily/backup-upload-clean.sh /etc/periodic/hourly && crond -f
For the AskAnna Backup Helper, we found inspiration in cookiecutter/cookiecutter-django and diogopms/postgres-gcs-backup. With our version, we tried to make it easier to configure several options and we introduced scheduled backups.
The project is published on GitLab: https://gitlab.com/askanna/backup-helper/. You could add an issue with any question, improvements or suggestions. Or feel free to create a merge request with anything you want to change.
If you want to ask us something, you can also email me at [email protected].