Adam Oudad

Adam Oudad

(Machine) Learning log.

4 minutes read

Backing up data is a wise choice, but it can become a hassle for large amount of data.

Here comes rsync. By reading this article, you will be able to synchronize and backup your data with rsync within one minute!

What is rsync

rsync stands for remote syncrhonization. It is a tool written in C language which lets you synchronize your data seemlessly through a large number of protocols.

How to synchronize your data with Rsync

This is quite simple, all you need to know are three options, and a few details to keep in mind.

a

is the option for archive mode. It is a shorthand for using many handy options at the same time. In a nutshell, it

  • recursively synchronizes subdirectories as well,
  • copies symbolic links as is, with the same path they point to,
  • keeps the same user, group and other permissions of each file,
  • keeps the same modification times,
  • keeps the same user ID of the user owner,
  • keeps the same group ID of the group owner.
v
is the option for verbose, which will display information of what the command is doing, and which files it has processed. This is ideal to keep track of the current status of the command on a directory with many files.
P
is the option for partial, and it will display information on the partial progress of the current file being processed. When copying large files, this option displays the status in real-time and keeps you up-to-date with its progress.

Here is an example, where we copy dir1 in the current folder to dir2.

  rsync -avP ./dir1 /path/to/dir2/

The result of the above command will be the /path/to/dir2/dir1, which will have the same content, copied from dir1 in the current working directory of your command line.

Heads up on folder paths!

The first detail I would like to note here, is that rsync care about whether you write a folder as dir2 or with a trailing slash, as dir2/. If we change the previous command, by removing the slash,

  rsync -avP ./dir1/ /path/to/dir2

The command will copy all of the content of dir1 into /path/to/dir2. It is important to note that, in this case, a /path/to/dir2/dir1 folder will NOT be created.

Be careful if you have some files already in /path/to/dir2 before using the previous command, as rsync will, based on files with the same file name, update the version to the one with the newest modification date and time!

Careful when modifying both source and destination folders

The second detail to take away arises when one wants to synchronize two different versions of the same file.

Say dir1/hello.txt and dir2/dir1/hello.txt are two files with different contents.

If you use the command I showed above, rsync -avP ./dir1 /path/to/dir2, dir1/hello.txt will overwrite the destination file /path/to/dir2/dir1/hello.txt.

There are two ways to keep the destination file, if it differs from the source file.

  1. If you want to keep the destination file only when its modification date is more recent than the source file, use the --update option.
  2. If you want to keep the destination file anyway, use --ignore-existing option.

Now that you've arrived to this point, I'll give you an exercise.

Say the source file ./dir1/hello.txt was last modified 4 days ago, while the destination file /path/to/dir2/dir1/hello.txt was last modified yesterday. What is the result of each of the three following commands?

  rsync -avP ./dir1 /path/to/dir2	# 1.
  rsync -avP --update ./dir1 /path/to/dir2 # 2.
  rsync -avP --ignore-existing ./dir1 /path/to/dir2 # 3.

I'll give you a minute…

Alright, the solution.

  1. Source file overwrites destination. Indeed, the basic command just updates the destination with the source.
  2. Destination remains unchanged. Due to the --update option, the command will check the modification times. The destination time has the most recent time, and it will be preserved.
  3. Destination remains unchanged. The command will simply ignore the source file, since it is already existing in the destination, regardless its modification time.

Conclusion

As a rule of thumb, I suggest you stick to the practice of omitting the trailing slash at the end of the path to the source folder dir1.

  rsync -avP ./dir1 /path/to/dir2

What if you want to rename dir1 to another name, dir1new? You should be able to do it on your own. ;)

Ok, here is the solution.

  rsync -avP ./dir1 /path/to/dir2/dir1new
comments powered by Disqus

Recent posts

See more

Categories

About

This website is a weblog were I write about computer science, machine learning, language learning.