Today I learned… Cloud-init doesn’t like you repeating the same things

Because of templates I was building in my post “Today I learned… Ansible Include Templates”, I thought you could repeat the same sections over again. Here’s a snippet of something like what I’d built (after combining lots of templates together):

Note this is a non-working code sample!


#cloud-config
packages:
- iperf
- git

write_files:
- content: {% include 'files/public_key.j2' %}
  path: /root/.ssh/authorized_keys
  owner: root:root
  permission: '0600'
- content: {% include 'files/private_key.j2' %}
  path: /root/.ssh/id_rsa
  owner: root:root
  permission: '0600'

packages:
- byobu

write_files:
- content: |
    #!/bin/bash
    git clone {{ test_scripts }} /root/iperf_scripts
    bash /root/iperf_scripts/run_test.sh
  path: /root/run_test
  owner: root:root
  permission: '0700'

runcmd:
- /root/run_test

I’d get *bits* of it to run – basically, the last file, the last package and the last runcmd… but not all of it.

Turns out, cloud-init doesn’t like having to rebuild all the fragments together. Instead, you need to put them all together, so the write_files items, and the packages items all live in the same area.

Which, when you think about what it’s doing, which is that the parent lines are defining a variable called… well, whatever that line is, and if you replace it, it’s only going to keep the last one, then it all makes sense really!

Today I learned… that you can look at the “cloud-init” files on your target server…

Today I have been debugging why my Cloud-init scripts weren’t triggering on my Openstack environment.

I realised that something was wrong when I tried to use the noVNC console[1] with a password I’d set… no luck. So, next I ran a command to review the console logs[2], and saw a message (now, sadly, long gone – so I can’t even include it here!) suggesting there was an issue parsing my YAML file. Uh oh!

I’m using Ansible’s os_server module, and using templates to complete the userdata field, which in turn gets populated as cloud-init scripts…. and so clearly I had two ways to debug this – prefix my ansible playbook with a few debug commands, but then that can get messy… OR SSH into the box, and look through the logs. I knew I could SSH in, so the cloud-init had partially fired, but it just wasn’t parsing what I’d submitted. I had a quick look around, and found a post which mentioned debugging cloud-init. This mentioned that there’s a path (/var/lib/cloud/instances/$UUID/) you can mess around in, to remove some files to “fool” cloud-init into thinking it’s not been run… but, I reasoned, why not just see what’s there.

And in there, was the motherlode – user-data.txt…. bingo.

In the jinja2 template I was using to populate the userdata, I’d referenced another file, again using a template. It looks like that template needs an extra line at the end, otherwise, it all runs together.

Whew!

This does concern me a little, as I had previously been using this stanza to “simply” change the default user password to something a little less complicated:


#cloud-config
ssh_pwauth: True
chpasswd:
  list: |
    ubuntu:{{ default_password }}
  expire: False

But now that I look at the documentation, I realise you can also specify that as a pre-hashed value (in which case, you would suffix that default_password item above with |password_hash('sha512')) which makes it all better again!

[1] If you run openstack --os-cloud cloud_a console url show servername gives you a URL to visit that has an HTML5 based VNC-ish client. Note the “cloud_a” and “servername” should be replaced by your clouds.yml reference and the server name or server ID you want to connect to.
[2] Like before, openstack --os-cloud cloud_a console log show servername gives you the output of the boot sequence (e.g. dmesg plus the normal startup commands, and finally, cloud-init). It can be useful. Equally, it’s logs… which means there’s a lot to wade through!

One to read: “Test Driven Development (TDD) for networks, using Ansible”

Thanks to my colleague Simon (@sipart on Twitter), I spotted this post (and it’s companion Github Repository) which explains how to do test-driven development in Ansible.

Essentially, you create two roles – test (the author referred to it as “validate”) and one to actually do the thing you want it to do (in the author’s case “add_vlan”).

In the testing role, you’d have the following layout:

/path/to/roles/testing/tasks/main.yml
/path/to/roles/testing/tasks/SOMEFEATUREtest.yml

In the main.yml file, you have a simple stanza:

---
- name: Include all the test files
  include: "{{ outer_item }}"
  with_fileglob:"/path/to/roles/validate/tasks/*test.yml"
  loop_control: loop_var=outer_item

I’m sure that “with_fileglob” line could be improved to not actually need a full path… anyway

Then in your YourFeature_test.yml file, you do things like this:

---
- name: "Pseudocode in here. Use real modules for your testing!!"
  get_vlan_config: filter_for=needle_vlan
  register:haystack_var

- assert: that=" {{ needle_item }} in haystack_var "

When you run the play of the role the first time, the response will be “failed” (because “needle_vlan” doesn’t exist). Next do the “real” play of the role (so, in the author’s case, add_vlan) which creates the vlan. Then re-run the test role, your response should now be “ok”.

I’d probably script this so that it goes:

      reset-environment set_testing=true (maybe create a random little network)
      test
      run-action
      test
      reset-environment set_testing=false

The benefit to doing it that way is that you “know” your tests aren’t running if the environment doesn’t have the “set_testing” thing in place, you get to run all your tests in a “clean room”, and then you clear it back down again afterwards, leaving it clear for the next pass of your automated testing suite.

Fun!