Starting EC2 instances using PHP

I run a small podcast website called CCHits.net. It runs on Dreamhost because they offer unlimited storage and bandwidth, but while it’s a great service for storage, it’s not really useful for running a batch process because long running processes are killed regularly (in my case, building the cchits podcasts on a daily basis).

As a result, I built an EC2 instance which I trigger every day using a cronjob. Previously, I used the “AWS CLI tools”, but as this uses a Java Virtual Machine, it was taking an awful lot of resources just to spin up an instance, and Dreamhost kept killing the task off. As a result, I found the AWS PHP SDK, and coded up this little snippet to spin up the EC2 instance.

Development Environment Replication with Vagrant and Puppet

This week, I was fortunate enough to meet up with the Cheadle Geeks group. I got talking to a couple of people about Vagrant and Puppet, and explaining how it works, and I thought the best thing to do would be to also write that down here, so that I can point anyone who missed any of what I was saying to it.

Essentially, Vagrant is program to read a config file which defines how to initialize a pre-built virtual machine. It has several virtual machine engines which it can invoke (see [1] for more details on that), but the default virtual machine to use is VirtualBox.

To actually find a virtual box to load, there’s a big list over at vagrantbox.es which have most standard cloud servers available to you. Personally I use the Ubuntu Precise 32bit image from VagrantUp.com for my open source projects (which means more developers can get involved). Once you’ve picked an image, use the following command to get it installed on your development machine (you only need to do this step once per box!):

vagrant box add {YourBoxName} {BoxURL}

After you’ve done that, you need to set up the Vagrant configuration file.

cd /path/to/your/dev/environment
mkdir Vagrant
cd Vagrant
vagrant init {YourBoxName}

This will create a file called Vagrantfile in /path/to/your/dev/environment/Vagrant. It looks overwhelming at first, but if you trim out some of the notes (and tweak one or two of the lines), you’ll end up with a file which looks a bit like this:

Vagrant.configure("2") do |config|
  config.vm.box = "{YourBoxName}"
  config.vm.hostname = "{fqdn.of.your.host}"
  config.vm.box_url = "{BoxURL}"
  config.vm.network :forwarded_port, guest: 80, host: 8080
  # config.vm.network :public_network
  config.vm.synced_folder "../web", "/var/www"
  config.vm.provision :puppet do |puppet|
    puppet.manifests_path = "manifests"
    puppet.manifest_file  = "site.pp"
  end
end

This assumes you’ve replaced anything with {}’s in it with a real value, and that you want to forward TCP/8080 on your machine to TCP/80 on that box (there are other work arounds, using more Vagrant plugins, different network types, or other services such as pagekite, but this will do for now).

Once you’ve got this file, you could start up your machine and get a bare box, but that’s not much use to you, as you’d have to tell people how to configure your development environment every time they started up a new box. Instead, we’ll be using a Provisioning service, and we’re going to use Puppet for that.

Puppet was originally designed as a way of defining configuration across all an estate’s servers, and a lot of tutorials I’ve found online explain how to use it for that, but when we’re setting up Puppet for a development environment, we just need a simple file. This is the site.pp manifest, and in here we define the extra files and packages we need, plus any commands we need to run. So, let’s start with a basic manifest file:

node default {

}

Wow, isn’t that easy? :) We need some more detail than that though. First, let’s make sure the timezone is set. I live in the UK, so my timezone is “Europe/London”. Let’s put that in. We also need to make sure that any commands we run have the right path in them. So here’s our revised, debian based, manifest file.

node default {
    Exec {
        path => '/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/sbin:/usr/sbin'
    }

    package { "tzdata":
        ensure => "installed"
    }

    file { "/etc/timezone":
        content => "Europe/London\n",
        require => Package["tzdata"]
    }

    exec { "Set Timezone":
        unless => "diff /etc/localtime /usr/share/zoneinfo/`cat /etc/timezone`",
        command => "dpkg-reconfigure -f noninteractive tzdata",
        require => File["/etc/timezone"]
    }
}

OK, so we’ve got some pretty clear examples of code to run here. The first Exec statement must always be in there, otherwise it gets a bit confused, but after that, we’re making sure the package tzdata is installed, we then make sure that, once the tzdata package is installed, we create or update the /etc/timezone file with the value we want, and then we use the dpkg-reconfigure command to set the timezone, but only if the timezone isn’t already set to that.

Just to be clear, this file describes what the system should look like at the end of it running, not a step-by-step guide to getting it running, so you might find that some of these packages install out of sequence, or something else might run before or after when you were expecting it to run. As a result, you should make good use of the “require” and “unless” statements if you want a proper sequence of events to occur.

Now, so far, all this does is set the timezone for us, it doesn’t set up anything like Apache or MySQL… perhaps you want to install something like WordPress here? Well, let’s see how we get other packages installed.

In the following lines of code, we’ll assume you’re just adding this text above the last curled bracket (the “}” at the end).

First, we need to ensure our packages are up to date:

exec { "Update packages":
    command => "sudo apt-get update && sudo apt-get dist-upgrade -y",
}

Here’s Apache getting installed:

package { "apache2":
    ensure => "installed",
    require => Exec['Update packages']
}

And, maybe you’ll want to set up something that needs mod_rewrite and a custom site? Add this to your Vagrantfile

config.vm.synced_folder "../Apache_Site", "/etc/apache2/shared_config"

Create a directory called /path/to/your/dev/environment/Apache_Site which should contain your apache site configuration file called “default”. Then add this to your site.pp

exec { "Enable rewrite":
    command => 'a2enmod rewrite',
    onlyif => 'test ! -e /etc/apache2/mods-enabled/rewrite.load',
    require => Package['apache2']
}

file { "/etc/apache2/sites-enabled/default":
  ensure => link,
  target => "/etc/apache2/shared_config/default",
}

So, at the end of all this, we have the following file structure:

/path/to/your/dev/environment
+ -- /Apache_Site
|    + -- default
+ -- /web
|    + -- index.html
+ -- /Vagrant
     + -- /manifests
     |    + -- site.pp
     + -- Vagrantfile

And now, you can add all of this to your Git repository [2], and off you go! To bring up your Vagrant machine, type (from the Vagrant directory):

vagrant up

And then to connect into it:

vagrant ssh

And finally to halt it:

vagrant halt

Or if you just want to kill it off…

vagrant destroy

If you’re tweaking the provisioning code, you can run this instead of destroying it and bringing it back up again:

vagrant provision

You can do some funky stuff with running several machines, and using the same puppet file for all of those, but frankly, that’s a topic for another day.

[1] Vagrant is extended using plugins. There is a list of plugins on this Github Wiki Page. The plugins here can include additional virtual machine back ends (called Providers in Vagrant terminology), and methods of configuring the OS after bootup (called Provisioners), but also anything around defining where to find resources, to define network addresses, even to handle caches and proxies.

[2] If you’re not using Git, you should be! However, you might want to add some stuff to your .gitignore – in particular, Vagrant adds a directory called /path/to/your/dev/environment/Vagrant/.vagrant where it puts the VMs it creates.

The Apathy of the Lone Coder

I think I might be having a bit of a mid-life crisis. It’ll be my 35th birthday this year, and I’ve started to realise that I don’t really want to do much more of the Open Source’y stuff that I’ve been a part of for the past 10 or so years.

Don’t get me wrong. This isn’t me saying I want to hang up my linux user hat, put away the android phone, wipe the PHP manuals from my kindle or return an HTTP 410 code for everything I’ve ever published… but it’s getting close.

The rot has been setting in for some time.

November 2011 was the “first birthday” of CCHits.net – I’d planned to have my site-wide re-write of the whole code base ready for the birthday, but frankly, I’d massively underestimated the amount of work involved, so it wasn’t ready for November. As it was, a critical failure on my web host prompted me to “make” the rewrite work in April – nearly half a year after it was supposed to be in by. I’m not at all happy with the site layout, the way the tracks are build, the lack of adoption of the service by any other podcasters than the three who currently submit to the site (no criticisms there for anyone else, just a frustration really) and, well, the fact it never really achieved the vision I had for it.

In April, I helped to organise UCubed – a one day unconference about Linux and Open Source [1], held at MadLab, Manchester. We put less effort into organising it than we had the last few times, I pretty much wimped out on the day, taking my son to his swimming lesson (which meant leaving two hours after the event started, and returning an hour before it finished), and after the event, I felt like all I’d done was go to get the refreshments.

In July and August, I pulled a lot of 2 and 3AM finishes to get CampFireManager ready for OggCamp. I had some solid support from a guy called Jack who committed a load of great code to the project, plus loads of encouragement from the organisation team for OggCamp, the big day came, and, well, let’s just say there were issues. Quite a lot of issues really. I missed all of both mornings of talks because I was fire fighting those issues, and on the second day, I was held up as an example of “why not to code something instead of just doing it”. I had a top notch PHP engineer [2] sitting next to me while I was looking through issues, and even though I’ve gone through the theory of how the site works with her before, she couldn’t get her head around it. OK, I was skimming through the code pretty fast and I know most of it like the back of my hand so I knew roughly where code had gone and was going to next but still… code is code, right? Not if it’s crap code with unusual structure, insufficient testing, incomprehensible logic and, well, it’s just crap…

Before OggCamp, I inadvertently became the project lead for something I still don’t fully understand (although I’m a lot closer on it, to be fair): MOTP-AS. An implementation of the Mobile One Time PIN algorithm, written in PHP, tied up to a FreeRadius Server with a pretty web UI to give something a bit like RSA SecurID Authentication Manager server. Essentially, I made some suggestions on how to improve the code, and was told “Well, actually, we were pretty much going to kill off the project after the next release – do you want to take it over?” and I, in hindsight, stupidly said “Oh, OK”. I said that from October, I’d have “loads” of time, and was going to re-write the code base using Object Oriented principals, was going to roll in Unit Testing, PHPDocumentor and, theoretically, move to using a sensible framework to render the whole thing.

The hindsight thing I mentioned there? On the 28th August, my father passed away. I’ve not really talked about it much on Social Media. It’s a pretty hard thing to do, as it may mean airing an awful lot of dirty laundry as a result, but I guess the outcome of that was that I’ve been spending a lot more time away from my home, staying instead at my fathers home where I have been clearing it to sell it, and when I’ve not been away from home, I’ve wanted to spend more time with Jules and Daniel.

The first couple of trips down to my Dad’s house were on the train. I tried to break open a text editor and start turning out reusable PHP which I could form into something in MOTP-AS, but let’s be serious about this, it was like trying to read a book in the same circumstances – you just keep reading the same page over and over again, but nothing “right” comes out the other side. I’ve not had the enthusiasm to even start to look at that project since then.

Everyone I was working with – CCHits, CampFireManager, MOTP-AS – all knew I was offline, and would be “for some time”, but the funk that set in on that train hasn’t shifted yet, and I still can’t work out if it’s something to do with my Dad, or just the fact that I’m not really feeling the code right now.

At a recent PHPNW session, Lorna said (although I am paraphrasing) that most of my bad practices come from a lack of exposure to other PHP developers, and that working as part of a team towards something would help. My day job has nothing to do with coding (and there’s no scope to bring it into my role, and the few times I’ve tried to bring it in, it’s caused me more issues with my work than if I hadn’t) and 5% to do with open source software (the 5% is due to the OS that many of the devices we support are RedHat, BSD or Solaris based). I don’t want to, and can’t afford to make a career change now (aside from anything else, I still love my job, especially what I’m doing at the moment) to get that experience, and I’m getting closer and closer to burning out on the projects I’m involved in – just because there’s no one else who understands it like I do… which is sad.

When I do start to code in the evenings, what I tend to do is think of something I’d like to write (yep, starting a new project will fix *everything* Jon!), open my IDE, try and work out what I want to learn to use this time, and start reading the documentation for it… and not actually start working on the project. And then 2 hours have passed, I’ve done nothing, and frankly I could do with going to bed.

So, how do I beat this apathy folks. Is there anyone out there who can help?

I think if I’ve not sorted something out by June, I’ll close down CCHits.net. It’s been a great blast, but I’m so nervous of something going wrong with the system and it collapsing like a pack of cards… which is a real shame as HPR [3] have just said they’ll be running the daily shows in their Icecast server when “real” feeds aren’t being streamed, that and I love discovering, or re-discovering the music which is played through the system.

Likewise, I think I’ll probably try and find someone to hand CFM over to during OggCamp this year, and if I can’t find someone to hand it over to, I’ll shut it down. Again, it’s been fun, but I don’t need 2 months of sleepless nights and 2 days of sheer panic for something which ultimately could be replaced by a sheet of paper and some post-it notes.

Of all of the projects I’ve mentioned, the MOTP-AS part is most likely to be something of use to me in my day job (which was, in fact, how I came across it… for our lab network), so I might make more of an effort with that, but again, I really can’t see me being happy with it at the end of it all.

[1] It used to be about more than that, but frankly, it’s what it turned into.
[2] Plug for that top notch PHP engineer who, fortunately for me, was happy (or if not actually happy, appeared to be happy enough) to be an observer, a person to bounce ideas off, a muse and cheerleader (sort-of) for those two days of hell – http://LornaJane.net
[3] HackerPublicRadio.net – a podcast network made up from individual posts by the community.

A quick note on autoloaders for PHP

Over the past few days, as you may have noticed, I’ve been experimenting with PHPUnit, and writing up notes on what I’ve learned. Here’s a biggie, but it’s such a small actual change, I didn’t want to miss it.

So, when you have your autoloader written, you’ll have a function like this (probably):

<?php
function __autoload($classname)
{
    if (file_exists(dirname(__FILE__) . '/classes/' . $classname . '.php')) {
        require_once dirname(__FILE__) . '/classes/' . $classname . '.php';
    }
}

Load this from your test, or in a bootstrap file (more to come on that particular subject, I think!), like this:

<?php
require_once dirname(__FILE__) . '/../autoloader.php';
class SomeClassTest extends ........

And you’ll probably notice the autoloader doesn’t do anything… but why is this? Because PHPUnit has it’s own autoloader, and you need to chain our autoloader to the end. So, in your autoloader file, add this line to the end:

<?php
function __autoload($classname)
{
    if (file_exists(dirname(__FILE__) . '/classes/' . $classname . '.php')) {
        require_once dirname(__FILE__) . '/classes/' . $classname . '.php';
    }
}

spl_autoload_register('__autoload');

And it all should just work, which is nice :)

A quick word on salting your hashes.

If you don’t know what hashing is in relation to coding, the long version is here: Cryptographic Hash Function but the short version is that it performs a mathermatical formula to components of the file, string or data, and returns a much shorter number with a slim chance of “collisions”.

I don’t know whether it’s immediately clear to anyone else, but I used to think this was a good idea.

<?php
$password = sha1($_POST['password']);

Then I went to a PHPNW session, and asked someone to take a look at my code, and got a thorough drubbing for not adding a cryptographic salt (wikipedia).

For those who don’t know, a salt is a set of characters you add before or after the password (or both!) to make it so that a simple “rainbow table analysis” doesn’t work (essentially a brute-force attack against the authentication data by hashing lots and lots of strings looking for another hash which matches the stored hash). In order to make it possible to actually authenticate with that string again in the future, the string should be easily repeatable, and a way to do that is to use other data that’s already in the user record.

For example, this is a simple salt:

<?php
$password = sha1('salt' . $_POST['password']);

I read in the April 2012 edition of 2600 magazine something that I should have been doing with my hashes all along. How’s this for more secure code?

<?php
$site_salt = 'pepper';
$SQL = "SELECT intUserID FROM users WHERE strUsername = ?";
$DB = new PDO($dsn);
$query = $DB->prepare($SQL);
$query->execute(array(strtolower($_POST['username'])));
$userid = $query->fetch();
if ($userid == false) {
  return false;
}
$prefix = '';
$suffix = '';
if ($userid % 2 == 0) {
  $prefix = $site_salt;
} else {
  $suffix = $site_salt;
}
if ($userid % 3 == 0) {
  $prefix .= strtolower($_POST['username']);
} else {
  $suffix .= strtolower($_POST['username']);
}
if ($userid % 4 == 0) {
  $prefix = strrev($prefix);
}
if ($userid % 5 == 0) {
  $suffix = strrev($suffix);
}
$hashedPassword = sha1($prefix . $_POST['password'] . $suffix);

So, this gives you an easily repeatable string, that’s relatively hard to calculate without easy access to the source code :)

Getting started with Unit Testing for PHP

Unit testing seems like a bit of a dark art when you’re first introduced to it. “Create this new file. Tell it what is supposed to be the result when you run a test, and it’ll tell you if you’re right nor not.”

Let’s start with a pseudocode example:

test->assertTrue(1+1 = 2); // Test returns true, huzzah!
test->assertFalse(1+1 = 3); // Test returns false. Those integers must not have been large enough

I want to use PHPUnit, and for me the easiest way to get this and the rest of the tools I’ll be referring to in this collection of posts is to install “The PHP Quality Assurance Toolchain“. On my Ubuntu install, this was done as follows:

sudo pear upgrade PEAR
sudo pear config-set auto_discover 1
sudo pear install --all-deps pear.phpqatools.org/phpqatools

Now we’ve got the tools in place, let’s set up the directory structure.

/
+ -- Classes
|    + -- Config.php
+ -- Tests
     + -- ConfigTest.php

In here, you see we’ve created two files, one contains the class we want to use, and the other contains the tests we will be running.

So, let’s slap on the veneer of coating that these two files need to be valid to test.

/Classes/Config.php

<?php
class Config
{
}

/Tests/Config.php

<?php

include dirname(__FILE__) . '/../Classes/Config.php';

class ConfigTest extends PHPUnit_Framework_TestCase
{
}

So, just to summarise, here we have two, essentially empty classes.

Let’s put some code into the test file.

<?php

include dirname(__FILE__) . '/../Classes/Config.php';

class ConfigTest extends PHPUnit_Framework_TestCase
{
  public function testCreateObject()
  {
    $config = new Config();
    $this->assertTrue(is_object($config));
  }
}

We can now run this test from the command line as follows:

phpunit Tests/ConfigTest.php

phpunit Tests/01_ConfigTest.php
PHPUnit 3.6.10 by Sebastian Bergmann.

.

Time: 1 second, Memory: 3.00Mb

OK (1 test, 1 assertion)

That was nice and straightforward!

Let’s add some more code!

In ConfigTest, let’s tell it to load some configuration, using a config file.

<?php

include dirname(__FILE__) . '/../Classes/Config.php';

class ConfigTest extends PHPUnit_Framework_TestCase
{
  public function testCreateObject()
  {
    $config = new Config();
    $this->assertTrue(is_object($config));
  }

  public function testLoadConfig()
  {
    $config = new Config();
    $config->load();
  }
}

And now when we run it?

PHP Fatal error:  Call to undefined method Config::load() in /var/www/PhpBetterPractices/Tests/ConfigTest.php on line 16

Ah, perhaps we need to write some code into /Classes/Config.php

<?php
class Config
{
  public function load()
  {
    include dirname(__FILE__) . '/../Config/default_config.php';
  }
}

But, running this, again, we get an error message!

PHPUnit 3.6.10 by Sebastian Bergmann.

.E

Time: 0 seconds, Memory: 3.00Mb

There was 1 error:

1) ConfigTest::testLoadConfig
include(/var/www/PhpBetterPractices/Config/default_config.php): failed to open stream: No such file or directory

/var/www/PhpBetterPractices/Classes/Config.php:7
/var/www/PhpBetterPractices/Classes/Config.php:7
/var/www/PhpBetterPractices/Tests/ConfigTest.php:16

FAILURES!
Tests: 2, Assertions: 1, Errors: 1.

So, we actually need to check that the file exists first, perhaps we should throw an error if it doesn’t? We could also pass the name of the config file to pass to the script, which would let us test more and different configuration options, should we need them.

class Config
{
    public function load($file = null)
    {
        if ($file == null) {
            $file = 'default.config.php';
        }

        $filename = dirname(__FILE__) . '/../Config/' . $file;

        if (file_exists($filename)) {
            include $filename;
        } else {
            throw new InvalidArgumentException("File not found");
        }
    }
}

So, here’s the new UnitTest code:

class ConfigTest extends PHPUnit_Framework_TestCase
{
    public function testCreateObject()
    {
        $config = new Config();
        $this->assertTrue(is_object($config));
    }

    public function testLoadConfig()
    {
        $config = new Config();
        $config->load();
    }

    /**
     * @expectedException InvalidArgumentException
     */
    public function testFailLoadingConfig()
    {
        $config = new Config();
        @$config->load('A file which does not exist');
    }
}

This assumes the file /Config/default.config.php exists, albeit as an empty file.

So, let’s run those tests and see what happens?

PHPUnit 3.6.10 by Sebastian Bergmann.

...

Time: 0 seconds, Memory: 3.25Mb

OK (3 tests, 2 assertions)

Huzzah! That’s looking good. Notice that to handle a test of something which should throw an exception, you can either wrapper the function in a try/catch loop and, in the try side of the loop, have $this->assertTrue(false) to prevent false positives and in the catch side, do your $this->assertBlah() on the exception. Alternatively, (and much more simplely), use a documentation notation of @expectedException NameOfException and then prefix the function you are testing with the @ symbol. This is how I did it with the test “testFailLoadingConfig()”.

This obviously doesn’t handle setting and getting configuration values, so let’s add those.

Here’s the additions to the Config.php file:

    public function set($key = null, $value = null)
    {
        if ($key == null) {
            throw new BadFunctionCallException("Key not set");
        }
        if ($value == null) {
            unset ($this->arrValues[$key]);
            return true;
        } else {
            $this->arrValues[$key] = $value;
            return true;
        }
    }

    public function get($key = null)
    {
        if ($key == null) {
            throw new BadFunctionCallException("Key not set");
        }
        if (isset($this->arrValues[$key])) {
            return $this->arrValues[$key];
        } else {
            return null;
        }
    }

And the default.config.php file:

<?php
$this->set('demo', true);

And lastly, the changes to the ConfigTest.php file:

    public function testLoadConfig()
    {
        $config = new Config();
        $this->assertTrue(is_object($config));
        $config->load('default.config.php');
        $this->assertTrue($config->get('demo'));
    }

    /**
     * @expectedException BadFunctionCallException
     */
    public function testFailSettingValue()
    {
        $config = new Config();
        @$config->set();
    }

    /**
     * @expectedException BadFunctionCallException
     */
    public function testFailGettingValue()
    {
        $config = new Config();
        @$config->get();
    }

We’ve not actually finished testing this yet. Not sure how I can tell?

phpunit --coverage-text Tests/ConfigTest.php
PHPUnit 3.6.10 by Sebastian Bergmann.

....

Time: 0 seconds, Memory: 3.75Mb

OK (4 tests, 5 assertions)

Generating textual code coverage report, this may take a moment.

Code Coverage Report
  2012-05-08 18:54:16

 Summary:
  Classes: 0.00% (0/1)
  Methods: 0.00% (0/3)
  Lines:   76.19% (16/21)

@Config::Config
  Methods: 100.00% ( 3/ 3)   Lines:  76.19% ( 16/ 21)

Notice that there are 5 lines outstanding – probably around the unsetting values and using default values. If you use an IDE (like NetBeans) you can actually get the editor to show you, using coloured lines, exactly which lines you’ve not yet tested! Nice.

So, the last thing to talk about is Containers and Dependency Injection. We’ve already started with the Dependency Injection here – that $config->load(‘filename’); function handles loading config files, or you could just bypass that with $config->set(‘key’, ‘value); but once you get past a file or two, you might just end up with a lot of redundant re-loading of config files, or worse, lots of database connections open.

So, this is where Containers come in (something I horrifically failed to understand before).

Here’s a container:

class ConfigContainer
{
  protected static $config = null;

  public static function Load()
  {
    if (self::$config == null) {
      self::$config = new Config();
      self::$config->load();
    }
    return self::$Config;
  }
}

It’s purpose (in this case) is to load the config class, including any dependencies that you may need for that class, and then return that class to you. You could conceivably create a Database container, or a Request container or a User container with very little extra work, and with a few short calls, have a single function for each of your regular and routine sources of processing data, but without preventing you from being able to easily and repeatably test that data – by not going through the container.

Of course, there’s nothing to stop you just having these created in a registry class, or store them in a global from the get-go, but, I am calling these “Better Practices” after all, and these are considered to be not-so-good-practices.

Just as a note, code from this section can be seen at GitHub, if you want to use them at all.

Update 2012-05-11: Added detail to the try/catch exception catching as per frimkron’s comment. Thanks!

php-PDO-MySQL versus php-mysql

PHP and MySQL was the duo that took dynamic web development from /cgi/formmail.cgi to… well, where we are today for me. My first proper experience of working with a web application, as opposed to a static web site with a few perl scripts bolted on was PHPNuke, and I’ve not looked back since.

Almost the first lines of PHP you learn to code are

<?php
mysql_connect("localhost", "root", "");
mysql_use_db("mysite");

Followed closely by mysql_query(), then mysql_fetch_array(), mysl_num_rows(), and if the rest of us are lucky, mysql_real_escape_string().

So you see, for me, it’s really weird to find that even though this is what we’re teaching everyone to use, this weird incantation where we have to wrap every query up in a cotton wool ball of SQL injection protection… there’s an easier way where you don’t have to wrapper everything, which, even better, is more efficient at your SQL server and, if you’ve made the transition to using Classes in your code, will give you an object per-result rather than an array.

Let’s look at some examples of code, and see how we would transition things.

Here’s some rather unoriginal code to get the contents of a user table.

<?php
$db = mysql_connect("localhost", "root", "");
mysql_use_db("mysite");
$sql = "select userid, realname, lastloggedin from users "
     . "where username = '" . mysql_real_escape_string($_POST['username'])
     . "' and password = '" . mysql_real_escape_string($_POST['password']) . "'";
$query = mysql_query($sql);
$data = false;
if (mysql_num_rows($query) == 1) {
  $data = mysql_fetch_array($query);
  echo "Hello {$data['realname']}, your userid is {$data['userid']} and "
     . "you last logged in at " . date("H:i:s", strtotime($data['lastloggedin']))
     . " on " . date("d-M-Y" , strtotime($data['lastloggedin']));
}

There, now there’s not much wrong with that (ignoring the security hole of having your passwords in cleartext) is there? All pretty readable, right? Well… what happens if your database is down, how do you handle that in some kind of sensible way? You need to wrap some code around the mysql_connect to show it actually connected. What about if someone dropped the database by mistake, or you connected to the wrong database server and that table wasn’t there, now you’re wrappering the use_db line too. Did you make a typo in the SQL? Are you sure you didn’t miss a mysql_real_escape_string in there somewhere…

Here’s the same thing in PDO, protected against all (again, except the cleartext password) of those things.

<?php
try {
  $db = new PDO("mysql:host=localhost;dbname=mysite", "root", "");
  $db->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);
  $sql = "select userid, realname, lastloggedin from users where username = ? and password = ?";
  $query = $db->prepare($sql);
  $query->execute(array($_POST['username'], $_POST['password']));
  $data = $query->fetch();
  if ($data != false) {
    echo "Hello {$data['realname']}, your userid is {$data['userid']} and "
       . "you last logged in at " . date("H:i:s", strtotime($data['lastloggedin']))
       . " on " . date("d-M-Y" , strtotime($data['lastloggedin']));
  }
} catch (PDOException $e) {
  error_log("User unable to login: " . $e->getMessage());
}

So, let’s look at what transforms this into something much more interesting: $query->fetchObject();

<?php
try {
  $db = new PDO("mysql:host=localhost;dbname=mysite", "root", "");
  $db->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);
  $sql = "select userid, realname, lastloggedin, username, password " 
       . "from users where username = ? and password = ?";
  $query = $db->prepare($sql);
  $query->execute(array($_POST['username'], $_POST['password']));
  $data = $query->fetchObject('user');
  if ($data != false) {
    echo $data;
  }
} catch (PDOException $e) {
  error_log("User unable to login: " . $e->getMessage());
}

class user
{
  // Columns from the database
  protected $userid = null;
  protected $realname = null;
  protected $lastloggedin = null;
  protected $username = null;
  protected $password = null;
  // Processed Data
  protected $transformed_lastloggedin = null

  public function __construct()
  {
    if ($this->lastloggedin != null) {
      $this->transformed_lastloggedin = date("H:i:s", strtotime($this->lastloggedin)) 
                             . " on " . date("d-M-Y" , strtotime($this->lastloggedin);
    }
  }

  public function toString()
  {
    echo "Hello {$this->realname}, your userid is {$this->userid} and " 
       . "you last logged in at {$this->transformed_lastloggedin}";
  }
}

I’m starting to stretch the analogy here a little, but the important things here are:

  1. The __construct() function of the class is run after the data is inserted into it, meaning you can post-process any data you receive from the database without needing any new functions. Arguably, you could make the construct function accept the same values you’d receive from the database, and process those in the same way.
  2. The toString() function means you can mangle the content you want to output, without having to apparently run another function on the class you’ve just created.
  3. By using classes you get to do wonderful things like Unit Testing (this is a topic to follow), have automatic code documentation (err… to follow), reduce code duplication by extending classes and check on that, using “Copy and Paste Detection”.

Dependency Hell – and it’s not my package manager to blame

I’m writing a web service for conferences. I’ve been writing it, on and off for 3 years, and I think it would be fair to say the coding reflects my learning over those three years. The script is written in PHP. It has a MySQL backend. It has undergone a lot of changes.

In the beginning, was a concept. The concept was to have a digital timetable. One where you could register your interest in a talk, and the busiest talks got the biggest rooms. It should use any tech the user had to use the system including SMS. It would not expect you to have a user name and password combo (heaven forbid!), but would use OpenID.

The concept was implemented and demo’d at an event 2 years ago, and a friend [1] asked where the API was. “API?” I replied, feeling somewhat foolish, “It doesn’t have an API“.

[1] Lorna Jane Bowman (neé Mitchell aka @lornajane)

I realised the foolishness of my coding when discussing this issue [2]. “It’s dead simple, just create another view which returns just JSON” said someone. “View? I don’t know what you mean“… “Model, View, Controller? MVC – it’s a pretty common pattern”. “Oh no” I replied, “my project has inline PHP. It just seemed simpler that way.” “Well, how about you add a toString() function to your classes, and use that to render the output?” “Classes. Another thing my project doesn’t have. Sorry. Can’t do that either.

[2] With Lorna again, and adding Katherine Reeve (aka @BinaryKitten)

Did you ever get that slightly sinking feeling that maybe you’re making a fool of yourself?

“Well, there are lots of benefits to Classes in PHP, not least of which that you can use PHP CodeSniffer to enforce your coding standards” I start to laugh a little at this point “and you can use PHP Documenter to create all your developer documentation” which shut me right back up again “you can use PDO to create your objects from your database results” wow… mind blown – I’ve jumped on this little trick of a pony… “and of course, you can’t really do unit testing without classes”. Uh oh. “What’s unit testing?” “It’s a set of automated tests you can run against your code every time you commit to your version control software” whew! I’m using that at least! “to make sure that the code you’re committing hasn’t broken anything”.

Fast forward to this week, and I asked on Facebook whether anyone could teach me how to do Unit Testing. See, I’ve managed to cobble together my own MVC – I have classes called “Object_” which are the Models, the controller is my routing script – otherwise known as index.php, and then the Views are templates in Smarty, or just a json_encode((array) $object) [3] for my API. I have my own set of tests – not unit tests, it’s a script called “Test.sh” which runs PHP against the file (to make sure that the pages don’t have brackets missing or similar), then runs PHP Code Sniffer against it, and finally, if all the files are OK, it then runs PHPDoc against the whole mass of it.

[3] As the Apple iPhone/iPad adverts say – some sequences shortened

So, one of the books suggested to me, again by the lovely Lorna Jane, was The Grumpy Programmer’s Guide To Building Testable PHP Applications which mentioned that it’s much easier to do unit testing if you set up dependency injection. Note, I’m still not yet doing Unit testing here.

Woah. What the hell is Dependency Injection? Well, fortunately, there were code examples in that book. Oh boy, were there code examples. So let’s look through this idea. Instead of writing

$stuff = new MyStuff();

class MyStuff()
{
    function construct()
    {
        $this->db = mysql_connect('localhost', 'username', 'password');
        mysql_use_database('production', $this->db);
    }
}

You could instead write this:

$db = mysql_connect('localhost', 'username', 'password');
mysql_use_database('production', $this->db);
$stuff = new MyStuff($db);

class MyStuff()
{
    function construct($db = null)
    {
        $this->db = $db;
    }
}

So, this now means that in your testing framework, you could pass it a non-production database, or a testing database, or, well, anything that relies on something outside your script.

I did a bit of digging around to find some other examples of dependency injection code that might be a bit easier to use, which is to say, something so I don’t need to amend all my constructor functions.

I found this slideshare from a talk at PHP Barcelona about dependency injection which says you can do dependency injection like this:

$thing = new MyClass($dependency);

OR

$thing = new MyClass();
 $thing->setDependency($dependency);

OR

$thing = new MyClass();
 $thing->dependency = $dependency;

but somewhat weirdly, it also says that you can create a container class which holds your dependencies, and refer to that later – and that this isn’t a singleton. Sadly, I didn’t understand all that code fully (and have gone on to file a bug for the PHP documentation for the functions I didn’t understand to help people who follow behind with it!), but (and I’ve copied this verbatim from the slideshare) essentially, it looks like this:

class Container { protected $values = array(); function __set($id,$value) {  $this->values[$id] = $value;  }  function __get($id) {  if (!isset($this->values[$id])) {  throw new InvalidArgumentException(sprintf('Value "%s" is not defined.', $id));  }  if (is_callable($this->values[$id])) {  return $this->values[$id]($this);  } else { return $this->values[$id];  }  }  function asShared ($callable) {  return function($c) use ($callable) {  static $object; if (is_null($object)) {  $object=$callable($c);  }  return $object; }; } } $container = new Container(); $container->session_name='SESSION_ID'; $container->storage_class='SessionStorage'; $container->user = $container->asShared( function($c) {  return new User($c->storage);  } ); $container->storage = $container->asShared(  function($c) {  return new $c->storage_class($c->session_name); } );

Now, I have to be honest, this confuses the hell out of me. How on earth do I use this in my code? I’ve been doing this in my code thus far:

class Object_User{
  protected $intUserID    = null; // Obtained by the getCurrentUser() function
  protected $strUsername  = null;
  protected $hashPassword = null; // More of these, depending on the SQL

  function getCurrentUser() { // Called as $user = Base_User::getCurrentUser();
 $objCache  = Base_Cache::getHandler(); // A singleton to "cache" any data we've pulled to save getting it repeatedly
    if (
          isset($objCache->arrCache['Object_User']['current'])
          && $objCache->arrCache['Object_User']['current'] != false
        ) {
      return $objCache->arrCache['Object_User']['current'];
    }
    $arrRequest  = Base_Request::getRequest(); // Returns array of parsed request data
    $objDatabase = Base_Database::getConnection(); // Returns a PDO object

    $sql = "SELECT * FROM users WHERE strUsername = ? and hashPassword = ?";
    $query = $db->prepare($sql);
    $query->execute(array($request['username'], $request['password'])); $result = $query->fetchObject('Object_User');
    if ($result != false) {
      $objCache->arrCache['Object_User']['id'][$result->getVal('intUserID')] = $result;
      $objCache->arrCache['Object_User']['current'] = $result;
    }
    return $result;
  }

  function getVal($key) { // Return protected variables
    if (!isset($this->$key)) {
      return false;
    }
    return $this->$key;
  }
}

I know that singleton methods are considered “Bad” because they’re (apparently) difficult to unit test, but I would have thought that it would have been pretty straightforward to create a singleton class which holds all the dependencies (see following)

class Base_Dependencies
{
  protected static $handler = null; protected $arrDependencies = array();   protected function GetHandler() {
    if (self::$handler == null) { self::$handler = new self(); }
    return self::$handler;
  }

  function set($key, $dependency) {
    $handler = self::GetHandler();
    $handler->arrDependencies[$key] = $dependency;
  }

  function get($key) {
    $handler = self::GetHandler();
    if (isset($handler->arrDependencies[$key])) {
      return $handler->arrDependencies[$key];
    } else {
      return false;
    }
  }

  function unset($key) { // Only used for Unit Testing, I would imagine
    $handler = self::GetHandler();
    if (isset($handler->arrDependencies[$key])) {
      unset($handler->arrDependencies[$key]);
    }
  }
}

Doing it this way means I can, from, for example, my database class, which is currently a singleton, say instead:

function GetConnection() {
  $db = Base_Dependencies::get("Database");
  if ($db != false) {
    return $db;
  }
  $config = Base_Config::getAllConfig();
  $db = new PDO($config['DSN'], $config['DB_User'], $config['DB_Pass']);
  Base_Dependencies::set("Database", $db);
  return $db;
}

Is this wrong? Is this just not best practice? Given the above, how can I fix my dependencies in such a way that the poor schmuck who wants to commit a patch can figure out what they’re sending? Or do I just need to fix how my head works with this stuff? If it’s the latter, can someone provide some samples of what I’m doing wrong?

Thanks for reading this mammoth post!

Using Amazon an EC2 instance as an off-site CronJob

I run the CCHits.net website, and part of the day-to-day tasks that running the site entailed was the daily show creation which involved creating some text-to-speech audio for insertion into the podcasts. As I run the website on shared hosting, to which I didn’t have full access to the OS, I couldn’t just install Festival on the platform, and for whatever reason (I never did figure out what went wrong), I couldn’t build Festival to run on the shared host either.

Until “The Big Rewrite” (the capitals are totally worth it), I’d been doing the text-to-speech on my home server, but frankly, I’m on DSL, which meant I needed to set up Dynamic DNS, I had to be sure the server was always up (and it wasn’t!), etc, etc, etc. While I was looking into why I couldn’t get Festival to build, someone said “Well, why not just use EC2 to do it”.

After nearly a year of faffing about trying to make the …….. stupid thing work (as is testified by the draft in this very blog called “How I built my Audio Stack for CCHits”), I finally decided to spin up an EC2 instance for just this one task.

Now, I’m not the greenest guy on the block – hell, I drive 45 minutes into work each day, but I figured, why keep an EC2 instance running all the time, when I only need it for less than 20 minutes each day, so I did some reading, and found a post on making EC2 do the hard work for you, using the scalable computing APIs, but frankly, all I actually needed was the code to make it spin up, run the task and shut down again, especially as with using his methods, I’d have needed to either create an AMI image, or download the festival voice files each time… at around 100Mb. Not good. I ended up using the stuff I did know (bash scripting, cron tasks) and ditching the stuff I didn’t (AMI files, scalable computing API). I may revisit this later to do it the way he said instead. We’ll see :)

So, here’s the crack.

Create your EC2 image. It doesn’t need to do anything fancy yet – just boot up and keep running. You’ll do some tweaks later. Make a note of the instance number – it’ll probably start i- and then 8 or so hexedecimal digits, like this: i-12345678.

On your shared web host, download the EC2 API tools. According to this page, the API tools are available from here. The first link of those two is unlikely to change for a while, the second, maybe more so. You’ll need to make sure you have Java installed on that web host.

Once you’ve got the tools, you’ll need to create an X.509 certificate and key pair. See this page to find instructions. It was pretty straightforward.

So, you now have in, let’s say for sake of argument, your home directory:

  • /home/user/ec2-api-tools-x.x.x.x # The directory containing all the EC2 API tools
  • /home/user/ec2-keys/pk-{32[A-Za-z1-0]}.pem
  • /home/user/ec2-keys/cert-{32[A-Za-z1-0]}.pem

Also, you have java in /usr/bin/java.

Create the following script also in /home/user/ec2-api-tools-x.x.x.x – I called it ec2-wrapper.

#!/bin/bash
export EC2_HOME=/home/user/ec2-api-tools-x.x.x.x
export JAVA_HOME=/usr
export EC2_KEY=/home/user/ec2-keys/pk-{32[A-Za-z1-0]}.pem
export EC2_CERT=/home/user/ec2-keys/cert-{32[A-Za-z1-0]}.pem
${EC2_HOME}/bin/$* -K ${EC2_KEY} -C ${EC2_CERT

Obviously, you should change your paths to match what you have. What this script does is to add the X.509 certs to every EC2 request, plus adds the appropriate java and EC2_HOME paths to the script before running it.

I set up a CRON job (using crontab -e) to schedule the regular startup of the instance. Here’s the entry from my crontab:

#M   H  DoM Mth DoW  Command (Regular Crontab columns)
30   1   *   *   *   /home/user/ec2-api-tools-x.x.x.x/ec2-wrapper ec2-start-instances i-12345678
30   2   *   *   *   /home/user/ec2-api-tools-x.x.x.x/ec2-wrapper ec2-stop-instances i-12345678

So, this runs the start task at 30 minutes past 1am, local server time, and the stop task at 30 minutes past 2am. The second one there is just to be on the safe side, as we’ll try to shut down the box once it’s finished processing anyway. This way, the maximum time you’ll be billed for is 1 hour of time each day.

I then logged into my EC2 machine, and created, then tweaked the script from the earlier blog post (the scalable computing one).

#!/bin/bash -x
exec > >(tee /var/log/user-data.log|logger -t user-data -s 2>/dev/console) 2>&1
#
# This code is licensed under an Apache License - for the original 
# code and license, please see the footer of this script.
#
# !!!IMPORTANT!!!
# Edit this file and change this next line to your own email address:
#

EMAIL=user@example.com

# Get some information about the running instance
instance_id=$(wget -qO- instance-data/latest/meta-data/instance-id)
public_hostname=$(wget -qO- instance-data/latest/meta-data/public-hostname)

# Wait 5 minutes in case we want to get in to stop this from doing stuff
Sleep 300

if [ -f /home/ubuntu/donotrun ]
then
  exit 0
fi

# Send status email
/usr/sbin/sendmail -oi -t -f $EMAIL <<EOM
From: $EMAIL
To: $EMAIL
Subject: Running EC2 Scripts

== Making sure everything is up to date ==

`cd /home/ubuntu/website-rewrite && su -c "git pull" ubuntu 2>&1`

== Running the scheduled task ==

`php /home/ubuntu/website-rewrite/scheduled_task.php`

== Notes ==

This email was generated on the EC2 instance: $instance_id

If the instance is still running, you can monitor the output of this
job using a command like:

  ssh ubuntu@$public_hostname tail -1000f /var/log/user-data.log

EOM

# Give the email some time to be queued and delivered
sleep 300 # 5 minutes

if [ -f /home/ubuntu/shutdownwhendone ]
then
  shutdown -h now
fi

exit 0

########################################################################
# For more information about the original version of this code see:
#   http://alestic.com/2011/11/ec2-schedule-instance
# The original code and its license are available on github:
#   https://github.com/alestic/demo-ec2-schedule-instance
########################################################################

So, with that, I get a script which runs on schedule, on an EC2 platform, I get a confirmation e-mail it’s running. It shuts itself down, and hopefully, keeps on trucking :)