DevOps:Puppet,Docker,and Kubernetes
上QQ阅读APP看书,第一时间看更新

Distributing cron jobs efficiently

When you have many servers executing the same cron job, it's usually a good idea not to run them all at the same time. If all the jobs access a common server (for example, when running backups), it may put too much load on that server, and even if they don't, all the servers will be busy at the same time, which may affect their capacity to provide other services.

As usual, Puppet can help; this time, using the inline_template function to calculate a unique time for each job.

How to do it...

Here's how to have Puppet schedule the same job at a different time for each machine:

  1. Modify your site.pp file as follows:
    node 'cookbook' {
      cron { 'run-backup':
        ensure  => present,
        command => '/usr/local/bin/backup',
        hour    => inline_template('<%= @hostname.sum % 24 %>'),
        minute  => '00',
      }
    }
  2. Run Puppet:
    [root@cookbook ~]# puppet agent -t
    Info: Caching catalog for cookbook.example.com
    Info: Applying configuration version '1413730771'
    Notice: /Stage[main]/Main/Node[cookbook]/Cron[run-backup]/ensure: created
    Notice: Finished catalog run in 0.11 seconds
    
  3. Run crontab to see how the job has been configured:
    [root@cookbook ~]# crontab -l
    # HEADER: This file was autogenerated at Sun Oct 19 10:59:32 -0400 2014 by puppet.
    # HEADER: While it can still be managed manually, it is definitely not recommended.
    # HEADER: Note particularly that the comments starting with 'Puppet Name' should
    # HEADER: not be deleted, as doing so could cause duplicate cron jobs.
    # Puppet Name: run-backup
    0 15 * * * /usr/local/bin/backup
    

How it works...

We want to distribute the hour of the cron job runs across all our nodes. We choose something that is unique across all the machines and convert it to a number. This way, the value will be distributed across the nodes and will not change per node.

We can do the conversion using Ruby's sum method, which computes a numerical value from a string that is unique to the machine (in this case, the machine's hostname). The sum function will generate a large integer (in the case of the string cookbook, the sum is 855), and we want values for hour between 0 and 23, so we use Ruby's % (modulo) operator to restrict the result to this range. We should get a reasonably good (though not statistically uniform) distribution of values, depending on your hostnames. Another option here is to use the fqdn_rand() function, which works in much the same way as our example.

If all your machines have the same name (it does happen), don't expect this trick to work! In this case, you can use some other string that is unique to the machine, such as ipaddress or fqdn.

There's more...

If you have several cron jobs per machine and you want to run them a certain number of hours apart, add this number to the hostname.sum resource before taking the modulus. Let's say we want to run the dump_database job at some arbitrary time and the run_backup job an hour later, this can be done using the following code snippet:

cron { 'dump-database':
  ensure  => present,
  command => '/usr/local/bin/dump_database',
  hour    => inline_template('<%= @hostname.sum % 24 %>'),
  minute  => '00',
}

cron { 'run-backup':
  ensure  => present,
  command => '/usr/local/bin/backup',
  hour    => inline_template('<%= ( @hostname.sum + 1) % 24 %>'),
  minute  => '00',
}

The two jobs will end up with different hour values for each machine Puppet runs on, but run_backup will always be one hour after dump_database.

Most cron implementations have directories for hourly, daily, weekly, and monthly tasks. The directories /etc/cron.hourly, /etc/cron.daily, /etc/cron.weekly, and /etc/cron.monthly exist on both our Debian and Enterprise Linux machines. These directories hold executables, which will be run on the referenced schedule (hourly, daily, weekly, or monthly). I find it better to describe all the jobs in these folders and push the jobs as file resources. An admin on the box searching for your script will be able to find it with grep in these directories. To use the same trick here, we would push a cron task into /etc/cron.hourly and then verify that the hour is the correct hour for the task to run. To create the cron jobs using the cron directories, follow these steps:

  1. First, create a cron class in modules/cron/init.pp:
    class cron {
      file { '/etc/cron.hourly/run-backup':
        content => template('cron/run-backup'),
        mode    => 0755,
      }
    }
  2. Include the cron class in your cookbook node in site.pp:
    node cookbook {
      include cron
    }
  3. Create a template to hold the cron task:
    #!/bin/bash
    
    runhour=<%= @hostname.sum%24 %>
    hour=$(date +%H)
    if [ "$runhour" -ne "$hour" ]; then
      exit 0
    fi
    
    echo run-backup
  4. Then, run Puppet:
    [root@cookbook ~]# puppet agent -t
    Info: Caching catalog for cookbook.example.com
    Info: Applying configuration version '1413732254'
    Notice: /Stage[main]/Cron/File[/etc/cron.hourly/run-backup]/ensure: defined content as '{md5}5e50a7b586ce774df23301ee72904dda'
    Notice: Finished catalog run in 0.11 seconds
    
  5. Verify that the script has the same value we calculated before, 15:
    #!/bin/bash
    
    runhour=15
    hour=$(date +%H)
    if [ "$runhour" -ne "$hour" ]; then
      exit 0
    fi
    
    echo run-backup

Now, this job will run every hour but only when the hour, returned by $(date +%H), is equal to 15 will the rest of the script run. Creating your cron jobs as file resources in a large organization makes it easier for your fellow administrators to find them. When you have a very large number of machines, it can be advantageous to add another random wait at the beginning of your job. You would need to modify the line before echo run-backup and add the following:

MAXWAIT=600
sleep $((RANDOM%MAXWAIT))

This will sleep a maximum of 600 seconds but will sleep a different amount each time it runs (assuming your random number generator is working). This sort of random wait is useful when you have thousands of machines, all running the same task and you need to stagger the runs as much as possible.

See also

  • The Running Puppet from cron recipe in Chapter 2, Puppet Infrastructure