• ARTICLES
SEARCH

How-To Geek

How to Use rsync to Backup Your Data on Linux

banner

rsync is a protocol built for Unix-like systems that provides unbelievable versatility for backing up and synchronizing data.  It can be used locally to back up files to different directories or can be configured to sync across the Internet to other hosts.

It can be used on Windows systems but is only available through various ports (such as Cygwin), so in this how-to we will be talking about setting it up on Linux.  First, we need to install/update the rsync client.  On Red Hat distributions, the command is “yum install rsync” and on Debian it is “sudo apt-get install rsync.”

rsync1
The command on Red Hat/CentOS, after logging in as root (note that some recent distributions of Red Hat support the sudo method).

rsync4
The command on Debian/Ubuntu.

Using rsync for local backups

In the first part of this tutorial, we will back up the files from Directory1 to Directory2. Both of these directories are on the same hard drive, but this would work exactly the same if the directories existed on two different drives. There are several different ways we can approach this, depending on what kind of backups you want to configure. For most purposes, the following line of code will suffice:

$ rsync -av --delete /Directory1/ /Directory2/

The code above will synchronize the contents of Directory1 to Directory2, and leave no differences between the two. If rsync finds that Directory2 has a file that Directory1 does not, it will delete it. If rsync finds a file that has been changed, created, or deleted in Directory1, it will reflect those same changes to Directory2.

There are a lot of different switches that you can use for rsync to personalize it to your specific needs. Here is what the aforementioned code tells rsync to do with the backups:

1. -a = recursive (recurse into directories), links (copy symlinks as symlinks), perms (preserve permissions), times (preserve modification times), group (preserve group), owner (preserve owner), preserve device files, and preserve special files.
2. -v = verbose. The reason I think verbose is important is so you can see exactly what rsync is backing up. Think about this: What if your hard drive is going bad, and starts deleting files without your knowledge, then you run your rsync script and it pushes those changes to your backups, thereby deleting all instances of a file that you did not want to get rid of?
3. –delete = This tells rsync to delete any files that are in Directory2 that aren’t in Directory1. If you choose to use this option, I recommend also using the verbose options, for reasons mentioned above.

Using the script above, here’s the output generated by using rsync to backup Directory1 to Directory2. Note that without the verbose switch, you wouldn’t receive such detailed information.

rsync2

The screenshot above tells us that File1.txt and File2.jpg were detected as either being new or otherwise changed from the copies existent in Directory2, and so they were backed up. Noob tip: Notice the trailing slashes at the end of the directories in my rsync command – those are necessary, be sure to remember them.

We will go over a few more handy switches towards the end of this tutorial, but just remember that to see a full listing you can type “man rsync” and view a complete list of switches to use.

That about covers it as far as local backups are concerned. As you can tell, rsync is very easy to use. It gets slightly more complex when using it to sync data with an external host over the Internet, but we will show you a simple, fast, and secure way to do that.

Using rsync for external backups

rsync can be configured in several different ways for external backups, but we will go over the most practical (also the easiest and most secure) method of tunneling rsync through SSH. Most servers and even many clients already have SSH, and it can be used for your rsync backups. We will show you the process to get one Linux machine to backup to another on a local network. The process would be the exact same if one host were out on the internet somewhere, just note that port 22 (or whatever port you have SSH configured on), would need to be forwarded on any network equipment on the server’s side of things.

On the server (the computer that will be receiving the backups), make sure SSH and rsync are installed.

# yum -y install ssh rsync

# sudo apt-get install ssh rsync

Other than installing SSH and rsync on the server, all that really needs to be done is to setup the repositories on the server where you would like the files backed up, and make sure that SSH is locked down. Make sure the user you plan on using has a complex password, and it may also be a good idea to switch the port that SSH listens on (default is 22).

We will run the same command that we did for using rsync on a local computer, but include the necessary additions for tunneling rsync through SSH to a server on my local network. For user “geek” connecting to “192.168.235.137” and using the same switches as above (-av –delete) we will run the following:

$ rsync -av –delete -e ssh /Directory1/ geek@192.168.235.137:/Directory2/

If you have SSH listening on some port other than 22, you would need to specify the port number, such as in this example where I use port 12345:

$ rsync -av –delete -e 'ssh -p 12345' /Directory1/ geek@192.168.235.137:/Directory2/

rsync3

As you can see from the screenshot above, the output given when backing up across the network is pretty much the same as when backing up locally, the only thing that changes is the command you use. Notice also that it prompted for a password. This is to authenticate with SSH. You can set up RSA keys to skip this process, which will also simplify automating rsync.

Automating rsync backups

Cron can be used on Linux to automate the execution of commands, such as rsync. Using Cron, we can have our Linux system run nightly backups, or however often you would like them to run.

To edit the cron table file for the user you are logged in as, run:

$ crontab -e

You will need to be familiar with vi in order to edit this file. Type “I” for insert, and then begin editing the cron table file.

Cron uses the following syntax: minute of the hour, hour of the day, day of the month, month of the year, day of the week, command.

It can be a little confusing at first, so let me give you an example. The following command will run the rsync command every night at 10 PM:

0 22 * * * rsync -av --delete /Directory1/ /Directory2/

The first “0” specifies the minute of the hour, and “22” specifies 10 PM. Since we want this command to run daily, we will leave the rest of the fields with asterisks and then paste the rsync command.

After you are done configuring Cron, press escape, and then type “:wq” (without the quotes) and press enter. This will save your changes in vi.

Cron can get a lot more in-depth than this, but to go on about it would be beyond the scope of this tutorial. Most people will just want a simple weekly or daily backup, and what we have shown you can easily accomplish that. For more info about Cron, please see the man pages.

Other useful features

Another useful thing you can do is put your backups into a zip file. You will need to specify where you would like the zip file to be placed, and then rsync that directory to your backup directory. For example:

$ zip /ZippedFiles/archive.zip /Directory1/ && rsync -av --delete /ZippedFiles/ /Directory2/

rsync5

The command above takes the files from Directory1, puts them in /ZippedFiles/archive.zip and then rsyncs that directory to Directory2. Initially, you may think this method would prove inefficient for large backups, considering the zip file will change every time the slightest alteration is made to a file. However, rsync only transfers the changed data, so if your zip file is 10 GB, and then you add a text file to Directory1, rsync will know that is all you added (even though it’s in a zip) and transfer only the few kilobytes of changed data.

There are a couple of different ways you can encrypt your rsync backups. The easiest method is to install encryption on the hard drive itself (the one that your files are being backed up to). Another way is to encrypt your files before sending them to a remote server (or other hard drive, whatever you happen to be backing up to). We’ll cover these methods in later articles.

Whatever options and features you choose, rsync proves to be one of the most efficient and versatile backup tools to date, and even a simple rsync script can save you from losing your data.

Korbin Brown is an IT enthusiast with a passion for writing. He enjoys troubleshooting complex Windows, Linux, and networking issues and sharing his experiences with fellow geeks.

  • Published 02/5/13

Comments (15)

  1. billymort

    Superb article. Keep up with the good work!

  2. Cal

    Excellent! I have been looking forward to a good linux how-to. also nice that you include directions for both Red Hat and Debian based systems. Rsync is amazing. I use its derivative rsnapshot for local and remote backups.
    This is a great beginner tutorial, I look forward to the next intermediate tutorial.

  3. Grant

    In addition to being useful for backups, it is also very useful for poor quality connections. Not only does it compress before copying, but if the copy is partially done and fails, run it again, and it will pick up where it left off, finishing the job nicely.

  4. tecn0tarded

    more…

  5. Dave

    Just the guide i was looking for, thanks and keep up the good work!

  6. marty331

    Is there a way to sync the two directories, so say Directory 1 has file A and Directory 2 has file B, can you set it to sync where you would end up with Directory 1 having A & B and Directory 2 having A & B?

  7. Al Reid

    Can you do a tutorial on Cygwin and some of the benefits and features on Windows ?.

    I have used Deltacopy that does not rely on Cygwin for Windows rsync
    http://www.aboutmyip.com/AboutMyXApp/DeltaCopy.jsp

  8. John Higgins

    Brilliant guide, I am not too techno minded so I use backupbranch for my rsync backups because they setup everything for me.

  9. Ionut

    @marty331 you can use “unison” ( http://www.cis.upenn.edu/~bcpierce/unison/ ) in this case

  10. michel

    This is why linux is still for geeks. Command line? Where’s the gui?

  11. Kiko

    If you don’t want to learn command line (It’s actually fun for some) then use grsync its a simple, nice GUI front end. “In Linux there’s always an answer or someone will create one” that’s what makes it great.

  12. presence1960

    Nice article. I have been backing up my data (including NTFS partitions with rsync for 5 years now. But when I first started using Linux I used grsync, the GUI of rsync, until I became more comfortable with the command line. The terminal in Linux and command propmpt in Windows is well worth the effort of learning because you can fix any machine and get it running with the terminal and command prompt. Most people are afraid of it and say it is confusing, but it is like anything else new. it is just different or unfamiliar.

  13. SGeek

    Rsnapshot is a neat ‘layer’ on top of rsync. First sync is normal, then each successive sync creates links to the prior ones, saving space for unchanged files. Acts like Apple’s “Time Machine” or related. Be careful when working across file systems, as not all targets support hard links (FAT32 – a standard of USB and external drives, for one).

    It’s even available on Cygwin, so you can do Windows machines to Windows, or to an external Linux machine.

  14. SGeek

    I once rigged up a small script for my dad’s Windows machine to run as a scheduled task – every four hours it would attempt to back up his profile directories (appdata, my documents, desktop, music, pictures) to my linux VM across the internet. The script would first kill any prior instances of rsync, then start a new one. I eventually moved to a command-line instance of WinSCP, as it had better controls for me.

  15. TheFu

    rsync is fantastic when you want a mirror, but terrible if you want efficient backups.

    Backups need to be
    * automatic
    * versioned
    * on different media, far away
    * network efficient
    * storage efficient
    * encrypted in transit AND on disk

    rdiff-backup my me tool of choice for Linux backups. rdiff-backup uses librsync just like rsync does. In fact the rdiff-backup command options are very similar to the rsync options. A simple rdiff-backup command would be:

    $ rdiff-backup user@remote::/home/user /backups/$remote/

    Simple. Best of all, the current backup looks like a mirror, but the prior versions are highly compresses and only the differences. Keeping 30-60 versions is only 10-15% more storage for my systems. Highly efficient. More efficient than rsync with hardlinks (like rbackup uses).

    These days on GUI Linux systems, the backup commands are usually DejaDup or Duplicity or Duplicati. These are each great tools, but I really have a strong preference for tools that don’t make me hunt down some special program when I need to restore a single file or an either OS. rdiff-backup works for me.

Get Free Articles in Your Inbox!

Join 134,000 newsletter readers

Email:

Go check your email!