Using Amazon an EC2 instance as an off-site CronJob

I run the CCHits.net website, and part of the day-to-day tasks that running the site entailed was the daily show creation which involved creating some text-to-speech audio for insertion into the podcasts. As I run the website on shared hosting, to which I didn’t have full access to the OS, I couldn’t just install Festival on the platform, and for whatever reason (I never did figure out what went wrong), I couldn’t build Festival to run on the shared host either.

Until “The Big Rewrite” (the capitals are totally worth it), I’d been doing the text-to-speech on my home server, but frankly, I’m on DSL, which meant I needed to set up Dynamic DNS, I had to be sure the server was always up (and it wasn’t!), etc, etc, etc. While I was looking into why I couldn’t get Festival to build, someone said “Well, why not just use EC2 to do it”.

After nearly a year of faffing about trying to make the …….. stupid thing work (as is testified by the draft in this very blog called “How I built my Audio Stack for CCHits”), I finally decided to spin up an EC2 instance for just this one task.

Now, I’m not the greenest guy on the block – hell, I drive 45 minutes into work each day, but I figured, why keep an EC2 instance running all the time, when I only need it for less than 20 minutes each day, so I did some reading, and found a post on making EC2 do the hard work for you, using the scalable computing APIs, but frankly, all I actually needed was the code to make it spin up, run the task and shut down again, especially as with using his methods, I’d have needed to either create an AMI image, or download the festival voice files each time… at around 100Mb. Not good. I ended up using the stuff I did know (bash scripting, cron tasks) and ditching the stuff I didn’t (AMI files, scalable computing API). I may revisit this later to do it the way he said instead. We’ll see :)

So, here’s the crack.

Create your EC2 image. It doesn’t need to do anything fancy yet – just boot up and keep running. You’ll do some tweaks later. Make a note of the instance number – it’ll probably start i- and then 8 or so hexedecimal digits, like this: i-12345678.

On your shared web host, download the EC2 API tools. According to this page, the API tools are available from here. The first link of those two is unlikely to change for a while, the second, maybe more so. You’ll need to make sure you have Java installed on that web host.

Once you’ve got the tools, you’ll need to create an X.509 certificate and key pair. See this page to find instructions. It was pretty straightforward.

So, you now have in, let’s say for sake of argument, your home directory:

  • /home/user/ec2-api-tools-x.x.x.x # The directory containing all the EC2 API tools
  • /home/user/ec2-keys/pk-{32[A-Za-z1-0]}.pem
  • /home/user/ec2-keys/cert-{32[A-Za-z1-0]}.pem

Also, you have java in /usr/bin/java.

Create the following script also in /home/user/ec2-api-tools-x.x.x.x – I called it ec2-wrapper.

#!/bin/bash
export EC2_HOME=/home/user/ec2-api-tools-x.x.x.x
export JAVA_HOME=/usr
export EC2_KEY=/home/user/ec2-keys/pk-{32[A-Za-z1-0]}.pem
export EC2_CERT=/home/user/ec2-keys/cert-{32[A-Za-z1-0]}.pem
${EC2_HOME}/bin/$* -K ${EC2_KEY} -C ${EC2_CERT

Obviously, you should change your paths to match what you have. What this script does is to add the X.509 certs to every EC2 request, plus adds the appropriate java and EC2_HOME paths to the script before running it.

I set up a CRON job (using crontab -e) to schedule the regular startup of the instance. Here’s the entry from my crontab:

#M   H  DoM Mth DoW  Command (Regular Crontab columns)
30   1   *   *   *   /home/user/ec2-api-tools-x.x.x.x/ec2-wrapper ec2-start-instances i-12345678
30   2   *   *   *   /home/user/ec2-api-tools-x.x.x.x/ec2-wrapper ec2-stop-instances i-12345678

So, this runs the start task at 30 minutes past 1am, local server time, and the stop task at 30 minutes past 2am. The second one there is just to be on the safe side, as we’ll try to shut down the box once it’s finished processing anyway. This way, the maximum time you’ll be billed for is 1 hour of time each day.

I then logged into my EC2 machine, and created, then tweaked the script from the earlier blog post (the scalable computing one).

#!/bin/bash -x
exec > >(tee /var/log/user-data.log|logger -t user-data -s 2>/dev/console) 2>&1
#
# This code is licensed under an Apache License - for the original 
# code and license, please see the footer of this script.
#
# !!!IMPORTANT!!!
# Edit this file and change this next line to your own email address:
#

EMAIL=user@example.com

# Get some information about the running instance
instance_id=$(wget -qO- instance-data/latest/meta-data/instance-id)
public_hostname=$(wget -qO- instance-data/latest/meta-data/public-hostname)

# Wait 5 minutes in case we want to get in to stop this from doing stuff
Sleep 300

if [ -f /home/ubuntu/donotrun ]
then
  exit 0
fi

# Send status email
/usr/sbin/sendmail -oi -t -f $EMAIL <<EOM
From: $EMAIL
To: $EMAIL
Subject: Running EC2 Scripts

== Making sure everything is up to date ==

`cd /home/ubuntu/website-rewrite && su -c "git pull" ubuntu 2>&1`

== Running the scheduled task ==

`php /home/ubuntu/website-rewrite/scheduled_task.php`

== Notes ==

This email was generated on the EC2 instance: $instance_id

If the instance is still running, you can monitor the output of this
job using a command like:

  ssh ubuntu@$public_hostname tail -1000f /var/log/user-data.log

EOM

# Give the email some time to be queued and delivered
sleep 300 # 5 minutes

if [ -f /home/ubuntu/shutdownwhendone ]
then
  shutdown -h now
fi

exit 0

########################################################################
# For more information about the original version of this code see:
#   http://alestic.com/2011/11/ec2-schedule-instance
# The original code and its license are available on github:
#   https://github.com/alestic/demo-ec2-schedule-instance
########################################################################

So, with that, I get a script which runs on schedule, on an EC2 platform, I get a confirmation e-mail it’s running. It shuts itself down, and hopefully, keeps on trucking :)

JonTheNiceGuy

He/Him. Husband and father. Linux advocating geek. Co-Host on the AdminAdmin Podcast, occasional conference speaker.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.