Quick Links

Organizations using a self-managed GitLab instance usually rely on it to hold their source code, project management, and operational tooling. It's vital to have functioning backups so your data's protected in case of a hardware failure, unsuccessful server update, or malicious compromise.

GitLab has a built-in back up component that can create a complete archive of your installation's data. The archive can be restored a fresh server running the same GitLab version.

Here's how to setup back ups to your local filesystem or an Amazon S3 bucket. These steps are intended for use with GitLab omnibus editions. You'll need to modify the GitLab CLI commands by prefixing them with

        bundle exec rake
    

if your instance was built from source.

Making an On-Demand Backup

The simplest way to create a backup is with the on-demand creation command. Run the following command in your shell:

sudo gitlab-backup create

This works on GitLab 12.2 and newer. Older versions should use an alternative version instead:

sudo gitlab-rake gitlab:backup:create

The backup will be saved as a tar archive in the directory defined by your GitLab configuration file. Omnibus installations default to using /var/opt/gitlab/backups. Each backup archive is named with its creation timestamp and GitLab version.

What's Included in a Backup?

GitLab's built-in back up utility exports data created by users on your GitLab instance. This includes everything in the GitLab database and your on-disk Git repositories.

Restoring the backup will reinstate your projects, groups, users, issues, uploaded file attachments, and CI/CD job logs. The backup also covers GitLab Pages websites and Docker images uploaded to the integrated container registry.

Packages added to GitLab's package registries are not supported. You should configure your installation to save packages to an external object storage provider if you need them to be recoverable without a manual rebuild.

Creating a Backup Schedule

There's no integrated mechanism to define an automated backup schedule. You should setup your own cron task to run the backup command shown above.

Run sudo crontab -e to open root's crontab, then add the following contents to the file:

0 21 * * * /opt/gitlab/bin/gitlab-backup create CRON=1

Save and close the file to apply your crontab change. This example will create a new backup at 9pm each day. Setting the CRON environment variable instructs GitLab to hide the backup progress display so you don't receive redundant cron emails with the job output.

Using this task as-is will keep every backup indefinitely until you manually clean them up. This can quickly consume a lot of storage space if you're running an active GitLab instance containing large projects.

An optional configuration key lets you delete old archives as part of the backup creation script. Open your GitLab configuration file at /etc/gitlab/gitlab.rb. Search for backup_keep_time, uncomment the line, and set the number of seconds you want to keep each backup for.

gitlab_rails['backup_keep_time'] = 432000

Here backups are retained for five days. GitLab will delete all eligible archives in the backup directory each time the backup creation command is executed.

You need to reconfigure GitLab whenever the configuration file changes. Run sudo gitlab-ctl reconfigure to apply your new setting.

Excluding Data Types

Sometimes you might want to run a backup with a subset of the supported data types. Defining the SKIP environment variable lets you exclude specific operations from running, slimming down your final archive.

The environment variable takes a comma-separated list of data types. You can find the currently supported options in the GitLab wiki.

Here's how to backup everything except container registry images:

sudo gitlab-backup create SKIP=registry

Excluding the registry content is often an easy way to significantly reduce your backup size and accelerate its creation speed. A team with several active projects building multiple Docker images a day can quickly accumulate gigabytes of registry data. Excluding them from backup is not necessarily too big a risk, as you can always rebuild the images using the Dockerfile in your repository.

Backing Up to S3

GitLab can automatically save your backups to S3-compatible object storage providers. Uncomment the backup_upload_connection lines and add your connection details:

gitlab_rails['backup_upload_connection'] = {
    

"provider" => "AWS",

"region" => "eu-west-1",

"aws_access_key_id" => "access_key",

"aws_secret_access_key" => "secret_key",

# "endpoint" => "https://..."

}

Add your own access key, secret key, and AWS region ID to complete the connection. You should set the endpoint field too if you're connecting to a provider other than AWS. Supply the URL of your object storage server so GitLab can upload to it.

You must also set a backup_upload_remote_directory key. Find this line in the config file, uncomment it, and set an S3 bucket name to upload your backups into:

gitlab_rails['backup_upload_remote_directory'] = 'gitlab-backups';

Run sudo gitlab-ctl reconfigure to apply your changes.

The backup creation command will now upload its archives to your configured S3 bucket. This gives you much greater redundancy by storing your backups off-site, protecting you against physical hardware failure.

Beware that the backup_keep_time setting isn't supported when you're using S3 storage. It only applies to locally stored backup archives. You can achieve something similar by using S3's built-in expiration policies to automatically delete uploads after a set time period has elapsed.

The Copy Backup Strategy

GitLab's default backup strategy is to stream data continuously to the tar archive. This generally works well but can present problems on very active GitLab instances. Data might change in the source directory before it's finished reaching the archive, causing tar to skip it with a file changed as we read it error.

To combat this, GitLab introduced an optional copy strategy. This copies all eligible backup data to a temporary directory, then streams the copied content into the final tar archive. This ensures tar isn't reading from a live GitLab instance but has the side-effect of temporarily increasing GitLab's storage consumption. Backup performance can also take a noticeable hit, especially on slower storage devices.

The copy strategy is activated by setting the STRATEGY environment variable when running the backup command. You should make sure you've got enough disk space available. GitLab will run the backup in data type stages so you only need double the size of your largest data type. As an example, if you have 5GB of Git repositories and 10GB of container registries, you'd need to have 10GB of extra available space, not 15GB.

sudo gitlab-backup create STRATEGY=copy

Don't Forget: Back Up Your Config File!

GitLab's back up script only manages user-created data. There are two other critical files essential to the operation of your GitLab server. These must be backed up too to ensure successful recovery of your instance.

  • /etc/gitlab/gitlab.rb - This is your GitLab configuration file. All but the most basic of installations will usually acquire many modifications over time. Backing up this file lets you drop it into a new GitLab installation without having to start from scratch.
  • /etc/gitlab/gitlab-secrets.json - This file must be backed up. It includes your database encryption key, secrets used for two-factor authentication, and other non-recoverable sensitive data. Misplacing this file could render any recovery effort impossible, even if you've got a functioning backup archive available.

You could use another cron task to backup these two files. They should be copied off your server so you can still access them if you face a hardware failure.

Conclusion

Backups are vital to any GitLab administrator. The software is usually critical to every team in an organization so any unexpected downtime could cause severe operational challenges.

GitLab comes with everything you need to make regular backups. The best approach is to create a cron task that runs the built-in script on a regular schedule. Save your backup archives to external object storage to protect against hardware loss, failure, or damage. Remember to manually backup your GitLab config and secrets files too as otherwise the recovery process will be significantly more complicated.