File synchronisation that just works
If you’re a System Admin or Engineer/Architect, chances are you’ve spent many hours trawling the internet looking for a reliable and stable bidirectional file synchronisation tool that just works.
Before we dig deeper there are many pros and cons of syncing files in a bidirectional manner using software such as rsync, lsync, unison etc… as opposed to setting up a SAN (which I might add is not cheap!) or DFS (Distributed File System) such as DRBD, Gluster, or on Windows the classic MSDFS. This article is not intended for these use-cases but robust setups that require minimal maintenance and high availability/scalability (for example my medium sized wordpress blog, online community forum, or configuration of my server farm).
Csync2?
So, how does csync2 – Short for “Cluster Synchronisation tool, 2nd Generation” fit in? Well the answer is really as simple or as complex as you choose to make it. In simple terms, it’s highly customizable and it actually functions as it’s supposed to. Csync2 is a tool for Asynchronous File Synchronisation in Clusters and it’s been around for a long time. It’s based off Librsync and is stable; it isn’t going anywhere any time soon.
Asynchronous File Synchronisation is good for files which are seldom modified such as configuration files or application images, this is what csync2 does best. Note: csync2 may not always be an adequate choice for all types of data. For instance a Database with continuous write accesses should be synced synchronously in order to ensure the data integrity. That does not automatically mean that synchronous synchronisation is better, it simply is different and there are many cases where asynchronous synchronisation is favoured.
Lets take a quick look at what LinBit have to say about Csync2:
Csync2 is a cluster synchronisation tool. It can be used to keep files on multiple hosts in a cluster in sync. Csync2 can handle complex setups with much more than just 2 hosts, handle file deletions and can detect conflicts.
It is expedient for HA-clusters, HPC-clusters, COWs and server farms. If you are looking for a tool to sync your laptop with your workstation you better have a look at Unison (http://www.cis.upenn.edu/~bcpierce/unison/) too.
Diving right in
Csync2 is made up of two main components: The sync server and the sync daemon. Basically the sync daemon runs in the background at all times waiting for the server to connect to it (just like rsync daemon).
So what’s required to get me going? Well almost everything that’s required is noted in the thorough install document over at: http://oss.linbit.com/csync2/paper.pdf (Thanks Clifford Wolf!). There is one setback however, unfortunately it may not be all that easy to understand for Amazon Linux/CentOS/RHEL & other RPM based users who do not have technical background.
To make things easier we’ve put together a simple build script (see bottom of page for actual script) which should do all the hard work leaving you to do one thing. Configure & Use it…
To get started run the following on your Amazon Linux or RHEL/CentOS Box (you might want to go make a cup of coffee while csync2 compiles, it’s rather timely):
sudo -s wget http://autobuild.itoc.com.au/csync2/csync2-build-and-install.sh chmod +x csync2-build-and-install.sh ./csync2-build-and-install.sh
Configuring csync2

Once the script has done its work you should be left with a window that looks something like the following:
Create yourself a preshared key file by running
csync2 -k /etc/csync-production-group.key
and copy this file to all your other machines you are planning on syncronising with.
By default csync2 reads its configuration from the “/etc/csync2.cfg” file. The default file we have bundled for you is a little bit different to the standard out-of-the-box install so here’s a copy of what should be installed and some things to note…
# Csync2 Configuration File
# Preconfigured using the csync2 config and install script from ITOC Autobuild service
# -----------------------------------
#
# Please read the doco at:
# http://oss.linbit.com/csync2/paper.pdf
nossl * *;
group production {
host wsinl1-01;
host wsinl2-01;
host (wsinl1-02);
host (wsinl2-02);
key /etc/csync-production-group.key;
include /etc/hosts;
include /etc/httpd/conf/;
# include /data/httpd/;
# exclude *~ .*; ## dont allow sync of files starting with a dot (.)
exclude *.log;
action {
pattern /etc/httpd/conf/httpd.conf;
exec "/etc/init.d/httpd reload";
logfile "/data/logs/csync2/csync2_action.log";
do-local;
}
backup-directory /data/sync-conflicts/;
backup-generations 2;
auto younger;
}
- nossl;
- This is fairly self explanatory. We’re not running Csync2 over SSL as we are running it in our VPC. Running it over SSL creates a bigger memory-footprint. (If you want to use SSL features you may need to re-compile with ssl; openssl-devel has been installed to compile with ssl if required).
- group
- In our example we have called our group “production” in a real world, you can specify multiple groups with multiple different keys. For example this could be used to sync all config amongst servers but only certain files amongst certain groups.
- host
- Some servers are listed in brackets ( ) and some are not. Servers specified in brackets are “slaves” meaning config is only updated on these servers, not read from or compared against. Files on servers specified without brackets are compared using whatever method you set i.e. auto younger;
- key
- The preshared key which is required on each node you wish to sync with. Multiple keys can be specified in your configuration but only one key per group (see the group guide above).
- include/exclude
- This is fairly self explanatory. We have included some examples which should help paint a better picture
- This is fairly self explanatory. We have included some examples which should help paint a better picture
- action
- Action sections are used to specify shell commands to be executed when synchronisation matches a pattern you have specified.
- In the above example, apache will reload its configuration when the httpd.conf file has been updated (cool huh!).
- If no “logfile” is specified csync2 will output to /dev/null so it’s important to specify this when testing/debugging.
- backup-directory/backup-generations
- This is also self explanatory. Make sure the directory exists on your filesystem before running csync2 if you have this configured or you will run into all sorts of problems.
- auto
- This is specified for conflict detection. In our case we are overwriting older copies of the file(s) with newer copies. Other directives that can be specified here are: none, first, older, bigger, smaller, left, right. Note: properly consult the csync2 documentation before setting this option. If you are unsure about exactly what your auto-resolution is doing you may find yourself cleaning up a rather large mess.
Firing it up!
Once you are happy with your configuration and you’ve set it up on each server appropriately you should be able to do a few test runs.
Our install script has setup csync2 as a xinetd daemon, so to check that it’s all working simply run the following:
netstat -vatn | grep 30865
You should get back some thing along the lines of “tcp 30865 LISTEN” — This is good.
Basically you should be able to manually run “csync2 -x -v” on any server and this will trigger a complete sync process.
There are several ways you can trigger a file sync, we’ve installed an example to: /etc/crontab
(Make sure you comment out if not syncing via cron!)
You may find that you prefer to use inotifywait to trigger a sync or using an inotify tool such as incrond (http://inotify.aiken.cz/) — This is our preferred method and it works flawlessly for us using our Amazon Linux 2012.03 AMI Release (at the time of writing).
In the real world?
Since our initial endeavours with csync2 we have deployed several installations ranging from medium blogs and forums to several large well-known brands, synchronising everything from images and user-generated-content to mass server configurations and fixes.
We’ve compiled csync2 for Windows (don’t ask why…) and successfully kept well over 100,000 files in bi-directional sync on 4 windows boxes, taking only about 30-120 seconds to cross-check all files and sync updates on all nodes.
While this may not be the ultimate solution for everyone, it sure as hell beats complicated cron jobs and rsyncs or setting up a full-blown DFS to keep some server configurations in check. For what it’s worth (10 minutes of compile time and 10 minutes of configuration) it’s well worth checking out.

