Distributing cron jobs efficiently
When you have many servers executing the same cron job, it's usually a good idea not to run them all at the same time. If all the jobs access a common server (for example, when running backups), it may put too much load on that server, and even if they don't, all the servers will be busy at the same time, which may affect their capacity to provide other services.
As usual, Puppet can help; this time, using the inline_template
function to calculate a unique time for each job.
How to do it...
Here's how to have Puppet schedule the same job at a different time for each machine:
- Modify your
site.pp
file as follows:node 'cookbook' { cron { 'run-backup': ensure => present, command => '/usr/local/bin/backup', hour => inline_template('<%= @hostname.sum % 24 %>'), minute => '00', } }
- Run Puppet:
[root@cookbook ~]# puppet agent -t Info: Caching catalog for cookbook.example.com Info: Applying configuration version '1413730771' Notice: /Stage[main]/Main/Node[cookbook]/Cron[run-backup]/ensure: created Notice: Finished catalog run in 0.11 seconds
- Run
crontab
to see how the job has been configured:[root@cookbook ~]# crontab -l # HEADER: This file was autogenerated at Sun Oct 19 10:59:32 -0400 2014 by puppet. # HEADER: While it can still be managed manually, it is definitely not recommended. # HEADER: Note particularly that the comments starting with 'Puppet Name' should # HEADER: not be deleted, as doing so could cause duplicate cron jobs. # Puppet Name: run-backup 0 15 * * * /usr/local/bin/backup
How it works...
We want to distribute the hour of the cron job runs across all our nodes. We choose something that is unique across all the machines and convert it to a number. This way, the value will be distributed across the nodes and will not change per node.
We can do the conversion using Ruby's sum
method, which computes a numerical value from a string that is unique to the machine (in this case, the machine's hostname). The sum
function will generate a large integer (in the case of the string cookbook
, the sum is 855), and we want values for hour
between 0 and 23, so we use Ruby's %
(modulo) operator to restrict the result to this range. We should get a reasonably good (though not statistically uniform) distribution of values, depending on your hostnames. Another option here is to use the fqdn_rand()
function, which works in much the same way as our example.
If all your machines have the same name (it does happen), don't expect this trick to work! In this case, you can use some other string that is unique to the machine, such as ipaddress
or fqdn
.
There's more...
If you have several cron jobs per machine and you want to run them a certain number of hours apart, add this number to the hostname.sum
resource before taking the modulus. Let's say we want to run the dump_database
job at some arbitrary time and the run_backup
job an hour later, this can be done using the following code snippet:
cron { 'dump-database': ensure => present, command => '/usr/local/bin/dump_database', hour => inline_template('<%= @hostname.sum % 24 %>'), minute => '00', } cron { 'run-backup': ensure => present, command => '/usr/local/bin/backup', hour => inline_template('<%= ( @hostname.sum + 1) % 24 %>'), minute => '00', }
The two jobs will end up with different hour
values for each machine Puppet runs on, but run_backup
will always be one hour after dump_database
.
Most cron implementations have directories for hourly, daily, weekly, and monthly tasks. The directories /etc/cron.hourly
, /etc/cron.daily
, /etc/cron.weekly
, and /etc/cron.monthly
exist on both our Debian and Enterprise Linux machines. These directories hold executables, which will be run on the referenced schedule (hourly, daily, weekly, or monthly). I find it better to describe all the jobs in these folders and push the jobs as file
resources. An admin on the box searching for your script will be able to find it with grep
in these directories. To use the same trick here, we would push a cron task into /etc/cron.hourly
and then verify that the hour is the correct hour for the task to run. To create the cron jobs using the cron directories, follow these steps:
- First, create a
cron
class inmodules/cron/init.pp
:class cron { file { '/etc/cron.hourly/run-backup': content => template('cron/run-backup'), mode => 0755, } }
- Include the
cron
class in your cookbook node insite.pp
:node cookbook { include cron }
- Create a template to hold the cron task:
#!/bin/bash runhour=<%= @hostname.sum%24 %> hour=$(date +%H) if [ "$runhour" -ne "$hour" ]; then exit 0 fi echo run-backup
- Then, run Puppet:
[root@cookbook ~]# puppet agent -t Info: Caching catalog for cookbook.example.com Info: Applying configuration version '1413732254' Notice: /Stage[main]/Cron/File[/etc/cron.hourly/run-backup]/ensure: defined content as '{md5}5e50a7b586ce774df23301ee72904dda' Notice: Finished catalog run in 0.11 seconds
- Verify that the script has the same value we calculated before,
15
:#!/bin/bash runhour=15 hour=$(date +%H) if [ "$runhour" -ne "$hour" ]; then exit 0 fi echo run-backup
Now, this job will run every hour but only when the hour, returned by $(date +%H)
, is equal to 15
will the rest of the script run. Creating your cron jobs as file resources in a large organization makes it easier for your fellow administrators to find them. When you have a very large number of machines, it can be advantageous to add another random wait at the beginning of your job. You would need to modify the line before echo run-backup
and add the following:
MAXWAIT=600 sleep $((RANDOM%MAXWAIT))
This will sleep a maximum of 600
seconds but will sleep a different amount each time it runs (assuming your random number generator is working). This sort of random wait is useful when you have thousands of machines, all running the same task and you need to stagger the runs as much as possible.
See also
- The Running Puppet from cron recipe in Chapter 2, Puppet Infrastructure