Using Expect to SFTP a file

Because of technical limitations on a pair of platforms I’m using at work, I am unable to set-up key-based SFTP or SCP to transfer files between the pair of them, so I knocked together this short script using the TCL based Expect language.


#!/usr/bin/expect
set arg1 [lindex $argv 0]
set arg2 [lindex $argv 1]
set arg3 [lindex $argv 2]
set timeout 1000
spawn sftp "$arg2"
expect {
yes {
send "yes\r"
exp_continue
}
ass {
send "$arg3\r"
exp_continue
}
sftp {
send "put $arg1\r"
expect {
100% {
send "quit\r"
exp_continue
}
}
}
}

view raw

upload.exp

hosted with ❤ by GitHub

There’s no error checking here, which isn’t great, but as a quick-and-dirty script to SFTP files to a box which needs the password each run… it works! :)

GPG Encrypting files using a keyserver

Another “at work” post!

I’ve been generating files which need to be distributed via a file server, but need to be encrypted using GPG (the open source PGP application). Rather than managing keys for a large number of users, instead, I have a text file with the user names in, and a batch file. Please see the below gist for details :)

Starting EC2 instances using PHP

I run a small podcast website called CCHits.net. It runs on Dreamhost because they offer unlimited storage and bandwidth, but while it’s a great service for storage, it’s not really useful for running a batch process because long running processes are killed regularly (in my case, building the cchits podcasts on a daily basis).

As a result, I built an EC2 instance which I trigger every day using a cronjob. Previously, I used the “AWS CLI tools”, but as this uses a Java Virtual Machine, it was taking an awful lot of resources just to spin up an instance, and Dreamhost kept killing the task off. As a result, I found the AWS PHP SDK, and coded up this little snippet to spin up the EC2 instance.

Development Environment Replication with Vagrant and Puppet

This week, I was fortunate enough to meet up with the Cheadle Geeks group. I got talking to a couple of people about Vagrant and Puppet, and explaining how it works, and I thought the best thing to do would be to also write that down here, so that I can point anyone who missed any of what I was saying to it.

Essentially, Vagrant is program to read a config file which defines how to initialize a pre-built virtual machine. It has several virtual machine engines which it can invoke (see [1] for more details on that), but the default virtual machine to use is VirtualBox.

To actually find a virtual box to load, there’s a big list over at vagrantbox.es which have most standard cloud servers available to you. Personally I use the Ubuntu Precise 32bit image from VagrantUp.com for my open source projects (which means more developers can get involved). Once you’ve picked an image, use the following command to get it installed on your development machine (you only need to do this step once per box!):

vagrant box add {YourBoxName} {BoxURL}

After you’ve done that, you need to set up the Vagrant configuration file.

cd /path/to/your/dev/environment
mkdir Vagrant
cd Vagrant
vagrant init {YourBoxName}

This will create a file called Vagrantfile in /path/to/your/dev/environment/Vagrant. It looks overwhelming at first, but if you trim out some of the notes (and tweak one or two of the lines), you’ll end up with a file which looks a bit like this:

Vagrant.configure("2") do |config|
  config.vm.box = "{YourBoxName}"
  config.vm.hostname = "{fqdn.of.your.host}"
  config.vm.box_url = "{BoxURL}"
  config.vm.network :forwarded_port, guest: 80, host: 8080
  # config.vm.network :public_network
  config.vm.synced_folder "../web", "/var/www"
  config.vm.provision :puppet do |puppet|
    puppet.manifests_path = "manifests"
    puppet.manifest_file  = "site.pp"
  end
end

This assumes you’ve replaced anything with {}’s in it with a real value, and that you want to forward TCP/8080 on your machine to TCP/80 on that box (there are other work arounds, using more Vagrant plugins, different network types, or other services such as pagekite, but this will do for now).

Once you’ve got this file, you could start up your machine and get a bare box, but that’s not much use to you, as you’d have to tell people how to configure your development environment every time they started up a new box. Instead, we’ll be using a Provisioning service, and we’re going to use Puppet for that.

Puppet was originally designed as a way of defining configuration across all an estate’s servers, and a lot of tutorials I’ve found online explain how to use it for that, but when we’re setting up Puppet for a development environment, we just need a simple file. This is the site.pp manifest, and in here we define the extra files and packages we need, plus any commands we need to run. So, let’s start with a basic manifest file:

node default {

}

Wow, isn’t that easy? :) We need some more detail than that though. First, let’s make sure the timezone is set. I live in the UK, so my timezone is “Europe/London”. Let’s put that in. We also need to make sure that any commands we run have the right path in them. So here’s our revised, debian based, manifest file.

node default {
    Exec {
        path => '/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/sbin:/usr/sbin'
    }

    package { "tzdata":
        ensure => "installed"
    }

    file { "/etc/timezone":
        content => "Europe/London\n",
        require => Package["tzdata"]
    }

    exec { "Set Timezone":
        unless => "diff /etc/localtime /usr/share/zoneinfo/`cat /etc/timezone`",
        command => "dpkg-reconfigure -f noninteractive tzdata",
        require => File["/etc/timezone"]
    }
}

OK, so we’ve got some pretty clear examples of code to run here. The first Exec statement must always be in there, otherwise it gets a bit confused, but after that, we’re making sure the package tzdata is installed, we then make sure that, once the tzdata package is installed, we create or update the /etc/timezone file with the value we want, and then we use the dpkg-reconfigure command to set the timezone, but only if the timezone isn’t already set to that.

Just to be clear, this file describes what the system should look like at the end of it running, not a step-by-step guide to getting it running, so you might find that some of these packages install out of sequence, or something else might run before or after when you were expecting it to run. As a result, you should make good use of the “require” and “unless” statements if you want a proper sequence of events to occur.

Now, so far, all this does is set the timezone for us, it doesn’t set up anything like Apache or MySQL… perhaps you want to install something like WordPress here? Well, let’s see how we get other packages installed.

In the following lines of code, we’ll assume you’re just adding this text above the last curled bracket (the “}” at the end).

First, we need to ensure our packages are up to date:

exec { "Update packages":
    command => "sudo apt-get update && sudo apt-get dist-upgrade -y",
}

Here’s Apache getting installed:

package { "apache2":
    ensure => "installed",
    require => Exec['Update packages']
}

And, maybe you’ll want to set up something that needs mod_rewrite and a custom site? Add this to your Vagrantfile

config.vm.synced_folder "../Apache_Site", "/etc/apache2/shared_config"

Create a directory called /path/to/your/dev/environment/Apache_Site which should contain your apache site configuration file called “default”. Then add this to your site.pp

exec { "Enable rewrite":
    command => 'a2enmod rewrite',
    onlyif => 'test ! -e /etc/apache2/mods-enabled/rewrite.load',
    require => Package['apache2']
}

file { "/etc/apache2/sites-enabled/default":
  ensure => link,
  target => "/etc/apache2/shared_config/default",
}

So, at the end of all this, we have the following file structure:

/path/to/your/dev/environment
+ -- /Apache_Site
|    + -- default
+ -- /web
|    + -- index.html
+ -- /Vagrant
     + -- /manifests
     |    + -- site.pp
     + -- Vagrantfile

And now, you can add all of this to your Git repository [2], and off you go! To bring up your Vagrant machine, type (from the Vagrant directory):

vagrant up

And then to connect into it:

vagrant ssh

And finally to halt it:

vagrant halt

Or if you just want to kill it off…

vagrant destroy

If you’re tweaking the provisioning code, you can run this instead of destroying it and bringing it back up again:

vagrant provision

You can do some funky stuff with running several machines, and using the same puppet file for all of those, but frankly, that’s a topic for another day.

[1] Vagrant is extended using plugins. There is a list of plugins on this Github Wiki Page. The plugins here can include additional virtual machine back ends (called Providers in Vagrant terminology), and methods of configuring the OS after bootup (called Provisioners), but also anything around defining where to find resources, to define network addresses, even to handle caches and proxies.

[2] If you’re not using Git, you should be! However, you might want to add some stuff to your .gitignore – in particular, Vagrant adds a directory called /path/to/your/dev/environment/Vagrant/.vagrant where it puts the VMs it creates.

A quick note on autoloaders for PHP

Over the past few days, as you may have noticed, I’ve been experimenting with PHPUnit, and writing up notes on what I’ve learned. Here’s a biggie, but it’s such a small actual change, I didn’t want to miss it.

So, when you have your autoloader written, you’ll have a function like this (probably):

<?php
function __autoload($classname)
{
    if (file_exists(dirname(__FILE__) . '/classes/' . $classname . '.php')) {
        require_once dirname(__FILE__) . '/classes/' . $classname . '.php';
    }
}

Load this from your test, or in a bootstrap file (more to come on that particular subject, I think!), like this:

<?php
require_once dirname(__FILE__) . '/../autoloader.php';
class SomeClassTest extends ........

And you’ll probably notice the autoloader doesn’t do anything… but why is this? Because PHPUnit has it’s own autoloader, and you need to chain our autoloader to the end. So, in your autoloader file, add this line to the end:

<?php
function __autoload($classname)
{
    if (file_exists(dirname(__FILE__) . '/classes/' . $classname . '.php')) {
        require_once dirname(__FILE__) . '/classes/' . $classname . '.php';
    }
}

spl_autoload_register('__autoload');

And it all should just work, which is nice :)

A quick word on salting your hashes.

If you don’t know what hashing is in relation to coding, the long version is here: Cryptographic Hash Function but the short version is that it performs a mathermatical formula to components of the file, string or data, and returns a much shorter number with a slim chance of “collisions”.

I don’t know whether it’s immediately clear to anyone else, but I used to think this was a good idea.

<?php
$password = sha1($_POST['password']);

Then I went to a PHPNW session, and asked someone to take a look at my code, and got a thorough drubbing for not adding a cryptographic salt (wikipedia).

For those who don’t know, a salt is a set of characters you add before or after the password (or both!) to make it so that a simple “rainbow table analysis” doesn’t work (essentially a brute-force attack against the authentication data by hashing lots and lots of strings looking for another hash which matches the stored hash). In order to make it possible to actually authenticate with that string again in the future, the string should be easily repeatable, and a way to do that is to use other data that’s already in the user record.

For example, this is a simple salt:

<?php
$password = sha1('salt' . $_POST['password']);

I read in the April 2012 edition of 2600 magazine something that I should have been doing with my hashes all along. How’s this for more secure code?

<?php
$site_salt = 'pepper';
$SQL = "SELECT intUserID FROM users WHERE strUsername = ?";
$DB = new PDO($dsn);
$query = $DB->prepare($SQL);
$query->execute(array(strtolower($_POST['username'])));
$userid = $query->fetch();
if ($userid == false) {
  return false;
}
$prefix = '';
$suffix = '';
if ($userid % 2 == 0) {
  $prefix = $site_salt;
} else {
  $suffix = $site_salt;
}
if ($userid % 3 == 0) {
  $prefix .= strtolower($_POST['username']);
} else {
  $suffix .= strtolower($_POST['username']);
}
if ($userid % 4 == 0) {
  $prefix = strrev($prefix);
}
if ($userid % 5 == 0) {
  $suffix = strrev($suffix);
}
$hashedPassword = sha1($prefix . $_POST['password'] . $suffix);

So, this gives you an easily repeatable string, that’s relatively hard to calculate without easy access to the source code :)

Getting started with Unit Testing for PHP

Unit testing seems like a bit of a dark art when you’re first introduced to it. “Create this new file. Tell it what is supposed to be the result when you run a test, and it’ll tell you if you’re right nor not.”

Let’s start with a pseudocode example:

test->assertTrue(1+1 = 2); // Test returns true, huzzah!
test->assertFalse(1+1 = 3); // Test returns false. Those integers must not have been large enough

I want to use PHPUnit, and for me the easiest way to get this and the rest of the tools I’ll be referring to in this collection of posts is to install “The PHP Quality Assurance Toolchain“. On my Ubuntu install, this was done as follows:

sudo pear upgrade PEAR
sudo pear config-set auto_discover 1
sudo pear install --all-deps pear.phpqatools.org/phpqatools

Now we’ve got the tools in place, let’s set up the directory structure.

/
+ -- Classes
|    + -- Config.php
+ -- Tests
     + -- ConfigTest.php

In here, you see we’ve created two files, one contains the class we want to use, and the other contains the tests we will be running.

So, let’s slap on the veneer of coating that these two files need to be valid to test.

/Classes/Config.php

<?php
class Config
{
}

/Tests/Config.php

<?php

include dirname(__FILE__) . '/../Classes/Config.php';

class ConfigTest extends PHPUnit_Framework_TestCase
{
}

So, just to summarise, here we have two, essentially empty classes.

Let’s put some code into the test file.

<?php

include dirname(__FILE__) . '/../Classes/Config.php';

class ConfigTest extends PHPUnit_Framework_TestCase
{
  public function testCreateObject()
  {
    $config = new Config();
    $this->assertTrue(is_object($config));
  }
}

We can now run this test from the command line as follows:

phpunit Tests/ConfigTest.php

phpunit Tests/01_ConfigTest.php
PHPUnit 3.6.10 by Sebastian Bergmann.

.

Time: 1 second, Memory: 3.00Mb

OK (1 test, 1 assertion)

That was nice and straightforward!

Let’s add some more code!

In ConfigTest, let’s tell it to load some configuration, using a config file.

<?php

include dirname(__FILE__) . '/../Classes/Config.php';

class ConfigTest extends PHPUnit_Framework_TestCase
{
  public function testCreateObject()
  {
    $config = new Config();
    $this->assertTrue(is_object($config));
  }

  public function testLoadConfig()
  {
    $config = new Config();
    $config->load();
  }
}

And now when we run it?

PHP Fatal error:  Call to undefined method Config::load() in /var/www/PhpBetterPractices/Tests/ConfigTest.php on line 16

Ah, perhaps we need to write some code into /Classes/Config.php

<?php
class Config
{
  public function load()
  {
    include dirname(__FILE__) . '/../Config/default_config.php';
  }
}

But, running this, again, we get an error message!

PHPUnit 3.6.10 by Sebastian Bergmann.

.E

Time: 0 seconds, Memory: 3.00Mb

There was 1 error:

1) ConfigTest::testLoadConfig
include(/var/www/PhpBetterPractices/Config/default_config.php): failed to open stream: No such file or directory

/var/www/PhpBetterPractices/Classes/Config.php:7
/var/www/PhpBetterPractices/Classes/Config.php:7
/var/www/PhpBetterPractices/Tests/ConfigTest.php:16

FAILURES!
Tests: 2, Assertions: 1, Errors: 1.

So, we actually need to check that the file exists first, perhaps we should throw an error if it doesn’t? We could also pass the name of the config file to pass to the script, which would let us test more and different configuration options, should we need them.

class Config
{
    public function load($file = null)
    {
        if ($file == null) {
            $file = 'default.config.php';
        }

        $filename = dirname(__FILE__) . '/../Config/' . $file;

        if (file_exists($filename)) {
            include $filename;
        } else {
            throw new InvalidArgumentException("File not found");
        }
    }
}

So, here’s the new UnitTest code:

class ConfigTest extends PHPUnit_Framework_TestCase
{
    public function testCreateObject()
    {
        $config = new Config();
        $this->assertTrue(is_object($config));
    }

    public function testLoadConfig()
    {
        $config = new Config();
        $config->load();
    }

    /**
     * @expectedException InvalidArgumentException
     */
    public function testFailLoadingConfig()
    {
        $config = new Config();
        @$config->load('A file which does not exist');
    }
}

This assumes the file /Config/default.config.php exists, albeit as an empty file.

So, let’s run those tests and see what happens?

PHPUnit 3.6.10 by Sebastian Bergmann.

...

Time: 0 seconds, Memory: 3.25Mb

OK (3 tests, 2 assertions)

Huzzah! That’s looking good. Notice that to handle a test of something which should throw an exception, you can either wrapper the function in a try/catch loop and, in the try side of the loop, have $this->assertTrue(false) to prevent false positives and in the catch side, do your $this->assertBlah() on the exception. Alternatively, (and much more simplely), use a documentation notation of @expectedException NameOfException and then prefix the function you are testing with the @ symbol. This is how I did it with the test “testFailLoadingConfig()”.

This obviously doesn’t handle setting and getting configuration values, so let’s add those.

Here’s the additions to the Config.php file:

    public function set($key = null, $value = null)
    {
        if ($key == null) {
            throw new BadFunctionCallException("Key not set");
        }
        if ($value == null) {
            unset ($this->arrValues[$key]);
            return true;
        } else {
            $this->arrValues[$key] = $value;
            return true;
        }
    }

    public function get($key = null)
    {
        if ($key == null) {
            throw new BadFunctionCallException("Key not set");
        }
        if (isset($this->arrValues[$key])) {
            return $this->arrValues[$key];
        } else {
            return null;
        }
    }

And the default.config.php file:

<?php
$this->set('demo', true);

And lastly, the changes to the ConfigTest.php file:

    public function testLoadConfig()
    {
        $config = new Config();
        $this->assertTrue(is_object($config));
        $config->load('default.config.php');
        $this->assertTrue($config->get('demo'));
    }

    /**
     * @expectedException BadFunctionCallException
     */
    public function testFailSettingValue()
    {
        $config = new Config();
        @$config->set();
    }

    /**
     * @expectedException BadFunctionCallException
     */
    public function testFailGettingValue()
    {
        $config = new Config();
        @$config->get();
    }

We’ve not actually finished testing this yet. Not sure how I can tell?

phpunit --coverage-text Tests/ConfigTest.php
PHPUnit 3.6.10 by Sebastian Bergmann.

....

Time: 0 seconds, Memory: 3.75Mb

OK (4 tests, 5 assertions)

Generating textual code coverage report, this may take a moment.

Code Coverage Report
  2012-05-08 18:54:16

 Summary:
  Classes: 0.00% (0/1)
  Methods: 0.00% (0/3)
  Lines:   76.19% (16/21)

@Config::Config
  Methods: 100.00% ( 3/ 3)   Lines:  76.19% ( 16/ 21)

Notice that there are 5 lines outstanding – probably around the unsetting values and using default values. If you use an IDE (like NetBeans) you can actually get the editor to show you, using coloured lines, exactly which lines you’ve not yet tested! Nice.

So, the last thing to talk about is Containers and Dependency Injection. We’ve already started with the Dependency Injection here – that $config->load(‘filename’); function handles loading config files, or you could just bypass that with $config->set(‘key’, ‘value); but once you get past a file or two, you might just end up with a lot of redundant re-loading of config files, or worse, lots of database connections open.

So, this is where Containers come in (something I horrifically failed to understand before).

Here’s a container:

class ConfigContainer
{
  protected static $config = null;

  public static function Load()
  {
    if (self::$config == null) {
      self::$config = new Config();
      self::$config->load();
    }
    return self::$Config;
  }
}

It’s purpose (in this case) is to load the config class, including any dependencies that you may need for that class, and then return that class to you. You could conceivably create a Database container, or a Request container or a User container with very little extra work, and with a few short calls, have a single function for each of your regular and routine sources of processing data, but without preventing you from being able to easily and repeatably test that data – by not going through the container.

Of course, there’s nothing to stop you just having these created in a registry class, or store them in a global from the get-go, but, I am calling these “Better Practices” after all, and these are considered to be not-so-good-practices.

Just as a note, code from this section can be seen at GitHub, if you want to use them at all.

Update 2012-05-11: Added detail to the try/catch exception catching as per frimkron’s comment. Thanks!

php-PDO-MySQL versus php-mysql

PHP and MySQL was the duo that took dynamic web development from /cgi/formmail.cgi to… well, where we are today for me. My first proper experience of working with a web application, as opposed to a static web site with a few perl scripts bolted on was PHPNuke, and I’ve not looked back since.

Almost the first lines of PHP you learn to code are

<?php
mysql_connect("localhost", "root", "");
mysql_use_db("mysite");

Followed closely by mysql_query(), then mysql_fetch_array(), mysl_num_rows(), and if the rest of us are lucky, mysql_real_escape_string().

So you see, for me, it’s really weird to find that even though this is what we’re teaching everyone to use, this weird incantation where we have to wrap every query up in a cotton wool ball of SQL injection protection… there’s an easier way where you don’t have to wrapper everything, which, even better, is more efficient at your SQL server and, if you’ve made the transition to using Classes in your code, will give you an object per-result rather than an array.

Let’s look at some examples of code, and see how we would transition things.

Here’s some rather unoriginal code to get the contents of a user table.

<?php
$db = mysql_connect("localhost", "root", "");
mysql_use_db("mysite");
$sql = "select userid, realname, lastloggedin from users "
     . "where username = '" . mysql_real_escape_string($_POST['username'])
     . "' and password = '" . mysql_real_escape_string($_POST['password']) . "'";
$query = mysql_query($sql);
$data = false;
if (mysql_num_rows($query) == 1) {
  $data = mysql_fetch_array($query);
  echo "Hello {$data['realname']}, your userid is {$data['userid']} and "
     . "you last logged in at " . date("H:i:s", strtotime($data['lastloggedin']))
     . " on " . date("d-M-Y" , strtotime($data['lastloggedin']));
}

There, now there’s not much wrong with that (ignoring the security hole of having your passwords in cleartext) is there? All pretty readable, right? Well… what happens if your database is down, how do you handle that in some kind of sensible way? You need to wrap some code around the mysql_connect to show it actually connected. What about if someone dropped the database by mistake, or you connected to the wrong database server and that table wasn’t there, now you’re wrappering the use_db line too. Did you make a typo in the SQL? Are you sure you didn’t miss a mysql_real_escape_string in there somewhere…

Here’s the same thing in PDO, protected against all (again, except the cleartext password) of those things.

<?php
try {
  $db = new PDO("mysql:host=localhost;dbname=mysite", "root", "");
  $db->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);
  $sql = "select userid, realname, lastloggedin from users where username = ? and password = ?";
  $query = $db->prepare($sql);
  $query->execute(array($_POST['username'], $_POST['password']));
  $data = $query->fetch();
  if ($data != false) {
    echo "Hello {$data['realname']}, your userid is {$data['userid']} and "
       . "you last logged in at " . date("H:i:s", strtotime($data['lastloggedin']))
       . " on " . date("d-M-Y" , strtotime($data['lastloggedin']));
  }
} catch (PDOException $e) {
  error_log("User unable to login: " . $e->getMessage());
}

So, let’s look at what transforms this into something much more interesting: $query->fetchObject();

<?php
try {
  $db = new PDO("mysql:host=localhost;dbname=mysite", "root", "");
  $db->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);
  $sql = "select userid, realname, lastloggedin, username, password " 
       . "from users where username = ? and password = ?";
  $query = $db->prepare($sql);
  $query->execute(array($_POST['username'], $_POST['password']));
  $data = $query->fetchObject('user');
  if ($data != false) {
    echo $data;
  }
} catch (PDOException $e) {
  error_log("User unable to login: " . $e->getMessage());
}

class user
{
  // Columns from the database
  protected $userid = null;
  protected $realname = null;
  protected $lastloggedin = null;
  protected $username = null;
  protected $password = null;
  // Processed Data
  protected $transformed_lastloggedin = null

  public function __construct()
  {
    if ($this->lastloggedin != null) {
      $this->transformed_lastloggedin = date("H:i:s", strtotime($this->lastloggedin)) 
                             . " on " . date("d-M-Y" , strtotime($this->lastloggedin);
    }
  }

  public function toString()
  {
    echo "Hello {$this->realname}, your userid is {$this->userid} and " 
       . "you last logged in at {$this->transformed_lastloggedin}";
  }
}

I’m starting to stretch the analogy here a little, but the important things here are:

  1. The __construct() function of the class is run after the data is inserted into it, meaning you can post-process any data you receive from the database without needing any new functions. Arguably, you could make the construct function accept the same values you’d receive from the database, and process those in the same way.
  2. The toString() function means you can mangle the content you want to output, without having to apparently run another function on the class you’ve just created.
  3. By using classes you get to do wonderful things like Unit Testing (this is a topic to follow), have automatic code documentation (err… to follow), reduce code duplication by extending classes and check on that, using “Copy and Paste Detection”.

Using Amazon an EC2 instance as an off-site CronJob

I run the CCHits.net website, and part of the day-to-day tasks that running the site entailed was the daily show creation which involved creating some text-to-speech audio for insertion into the podcasts. As I run the website on shared hosting, to which I didn’t have full access to the OS, I couldn’t just install Festival on the platform, and for whatever reason (I never did figure out what went wrong), I couldn’t build Festival to run on the shared host either.

Until “The Big Rewrite” (the capitals are totally worth it), I’d been doing the text-to-speech on my home server, but frankly, I’m on DSL, which meant I needed to set up Dynamic DNS, I had to be sure the server was always up (and it wasn’t!), etc, etc, etc. While I was looking into why I couldn’t get Festival to build, someone said “Well, why not just use EC2 to do it”.

After nearly a year of faffing about trying to make the …….. stupid thing work (as is testified by the draft in this very blog called “How I built my Audio Stack for CCHits”), I finally decided to spin up an EC2 instance for just this one task.

Now, I’m not the greenest guy on the block – hell, I drive 45 minutes into work each day, but I figured, why keep an EC2 instance running all the time, when I only need it for less than 20 minutes each day, so I did some reading, and found a post on making EC2 do the hard work for you, using the scalable computing APIs, but frankly, all I actually needed was the code to make it spin up, run the task and shut down again, especially as with using his methods, I’d have needed to either create an AMI image, or download the festival voice files each time… at around 100Mb. Not good. I ended up using the stuff I did know (bash scripting, cron tasks) and ditching the stuff I didn’t (AMI files, scalable computing API). I may revisit this later to do it the way he said instead. We’ll see :)

So, here’s the crack.

Create your EC2 image. It doesn’t need to do anything fancy yet – just boot up and keep running. You’ll do some tweaks later. Make a note of the instance number – it’ll probably start i- and then 8 or so hexedecimal digits, like this: i-12345678.

On your shared web host, download the EC2 API tools. According to this page, the API tools are available from here. The first link of those two is unlikely to change for a while, the second, maybe more so. You’ll need to make sure you have Java installed on that web host.

Once you’ve got the tools, you’ll need to create an X.509 certificate and key pair. See this page to find instructions. It was pretty straightforward.

So, you now have in, let’s say for sake of argument, your home directory:

  • /home/user/ec2-api-tools-x.x.x.x # The directory containing all the EC2 API tools
  • /home/user/ec2-keys/pk-{32[A-Za-z1-0]}.pem
  • /home/user/ec2-keys/cert-{32[A-Za-z1-0]}.pem

Also, you have java in /usr/bin/java.

Create the following script also in /home/user/ec2-api-tools-x.x.x.x – I called it ec2-wrapper.

#!/bin/bash
export EC2_HOME=/home/user/ec2-api-tools-x.x.x.x
export JAVA_HOME=/usr
export EC2_KEY=/home/user/ec2-keys/pk-{32[A-Za-z1-0]}.pem
export EC2_CERT=/home/user/ec2-keys/cert-{32[A-Za-z1-0]}.pem
${EC2_HOME}/bin/$* -K ${EC2_KEY} -C ${EC2_CERT

Obviously, you should change your paths to match what you have. What this script does is to add the X.509 certs to every EC2 request, plus adds the appropriate java and EC2_HOME paths to the script before running it.

I set up a CRON job (using crontab -e) to schedule the regular startup of the instance. Here’s the entry from my crontab:

#M   H  DoM Mth DoW  Command (Regular Crontab columns)
30   1   *   *   *   /home/user/ec2-api-tools-x.x.x.x/ec2-wrapper ec2-start-instances i-12345678
30   2   *   *   *   /home/user/ec2-api-tools-x.x.x.x/ec2-wrapper ec2-stop-instances i-12345678

So, this runs the start task at 30 minutes past 1am, local server time, and the stop task at 30 minutes past 2am. The second one there is just to be on the safe side, as we’ll try to shut down the box once it’s finished processing anyway. This way, the maximum time you’ll be billed for is 1 hour of time each day.

I then logged into my EC2 machine, and created, then tweaked the script from the earlier blog post (the scalable computing one).

#!/bin/bash -x
exec > >(tee /var/log/user-data.log|logger -t user-data -s 2>/dev/console) 2>&1
#
# This code is licensed under an Apache License - for the original 
# code and license, please see the footer of this script.
#
# !!!IMPORTANT!!!
# Edit this file and change this next line to your own email address:
#

EMAIL=user@example.com

# Get some information about the running instance
instance_id=$(wget -qO- instance-data/latest/meta-data/instance-id)
public_hostname=$(wget -qO- instance-data/latest/meta-data/public-hostname)

# Wait 5 minutes in case we want to get in to stop this from doing stuff
Sleep 300

if [ -f /home/ubuntu/donotrun ]
then
  exit 0
fi

# Send status email
/usr/sbin/sendmail -oi -t -f $EMAIL <<EOM
From: $EMAIL
To: $EMAIL
Subject: Running EC2 Scripts

== Making sure everything is up to date ==

`cd /home/ubuntu/website-rewrite && su -c "git pull" ubuntu 2>&1`

== Running the scheduled task ==

`php /home/ubuntu/website-rewrite/scheduled_task.php`

== Notes ==

This email was generated on the EC2 instance: $instance_id

If the instance is still running, you can monitor the output of this
job using a command like:

  ssh ubuntu@$public_hostname tail -1000f /var/log/user-data.log

EOM

# Give the email some time to be queued and delivered
sleep 300 # 5 minutes

if [ -f /home/ubuntu/shutdownwhendone ]
then
  shutdown -h now
fi

exit 0

########################################################################
# For more information about the original version of this code see:
#   http://alestic.com/2011/11/ec2-schedule-instance
# The original code and its license are available on github:
#   https://github.com/alestic/demo-ec2-schedule-instance
########################################################################

So, with that, I get a script which runs on schedule, on an EC2 platform, I get a confirmation e-mail it’s running. It shuts itself down, and hopefully, keeps on trucking :)