The Turnstone's Bill

Tricks With Rsync Filter Rules

The rsync program is one of the most powerful, but complex unix tools. It’s man page describes well over 100 commandline options as well as a fully fledged file filter syntax. Basic usage of the program isn’t too complex, but by taking the time to understand some of rsync’s less known options, and especially it’s file filtering syntax it’s possible to achieve almost complete control over what files are synced and how. In this post I’m going to outline a couple of examples of neat things that can be achieved with rsync filter rules. I won’t be going over the basics of filter rule syntax though because that’s actually really well described in the rsync man page itself.

Include only files matching a specific pattern

Given that this seems like a common use case you’d think it would be easy, but it turns out that some rsync filtering magic is needed to make this happen. Remembering that the order of filter arguments is important we could include only .html and .png files in a recursive transfer with the following additional filters (in order from first to last).

--filter=-! */  

The first two include options are pretty self explanatory, and are needed to explicitly include the files we want. By default however all the other files (ones we don’t want) will be included anyway so the last option is designed to explicitly exclude those. The last option does more than just exclude though, let’s unpick it. The - at the start of the rule specifies that it is an exclude rule. The ! means only apply the exclude rule for files that don’t match the pattern, and the pattern */ matches all directories. In other words this last rule will exclude any file that is not a directory. We need to avoid excluding the directories because if those aren’t included rsync won’t even scan them to look for file we actually want synced (eg the .html and .png files).

And finally to keep things clean and avoid copying unnecessary directories we prune empty directories from the transfer by adding this option.


Better excludes with the “perishable” modifier

Recent versions of rsync allow filter rules to be flagged with a perishable modifier. The rsync man page describes this modifier as follows;

A p indicates that a rule is perishable, meaning that it is ignored in directories that are being deleted

This is particularly useful in cases where you would like to ignore the presence of a file unless that file happens to be the last remaining file in a directory (and would otherwise block deletion of a directory). A great example of this are .DS_Store files which are only used to tell the OSX Finder about how to display directories. Usually when rsyncing between folders on two different macs you would want to ignore .DS_Store files, perhaps with an exclude filter since these files are only useful on the computer they were generated on. This becomes a problem though when paired with the --delete option because .DS_Store files will block deletion of otherwise empty directories and generate unpleasant rsync warnings from rsync like Unable to delete non-empty directory.

With the perishable modifier this issue can be solved in an elegant way by adding a filter rule for .DS_Store files like this;

--filter='-p .DS_Store'

which tells rsync to exclude .DS_Store files except when the .DS_Store file is the last file in a directory.

Beware of hide rules and “delete”

This one is just a cautionary note because failure to understand the difference between rsync’s hide and exclude rules could lead to unwanted deletion of files. OK, so what is the difference between hide and exclude? From rsync’s point of view an excluded part of the file tree is something it should not examine or touch in any way, whereas a hidden part of a file tree behaves as if those files were actually not present. These might sound like the same thing, but when deletion comes into play they are quite different. For example the rsync command;

rsync -r --delete-after --filter='H /some/dir' left right/

will delete all files under right/some/dir if any exist regardless of whether equivalent files exist in left/some/dir. Another way to understand hide filters is to note that a combination of hide and protect (see below) is equivalent to exclude. So for example the following are equivalent

#Exclude rule

#Combined Hide and protect
--filter='H /some/dir' --filter='P /some/dir'