"LEGO Factory Playset" from Brickset on Flickr

Building Azure Environments in Ansible

Recently, I’ve been migrating my POV (proof of value) and POC (proof of concept) environment from K5 to Azure to be able to test vendor products inside Azure. I ran a few tests to build the environment using the native tools (the powershell scripts) and found that the Powershell way of delivering Azure environments seems overly complicated… particularly as I’m comfortable with how Ansible works.

To be fair, I also need to look at Terraform, but that isn’t what I’m looking at today :)

So, let’s start with the scaffolding. Any Ansible Playbook which deals with creating virtual machines needs to have some extra modules installed. Make sure you’ve got ansible 2.7 or later and the python azure library 2.0.0 or later (you can get both with pip for python).

Next, let’s look at the group_vars for this playbook.

This file has several pieces. We define the project settings (anything prefixed project_ is a project setting), including the prefix used for all resources we create (in this case “env01“), and a standard password used for all VMs we create (in this case “My$uper$ecret$Passw0rd“).

Next we define the standard images to load from the Marketplace. You can extend this with other images, these are just the “easiest” ones that I’m most familiar with (your mileage may vary). Next up is the networks to build inside the VNet, and lastly we define the actual machines we want to build. If you’ve got questions about any of the values we define here, just let me know in the comments below :)

Next, we’ll start looking at the playbook (this has been exploded out – the full playbook is also in the gist).

Here we start by pulling in the variables we might want to override, and we do this by reading system environment variables (ANSIBLE_PREFIX and BREAKGLASS) and using them if they’re set. If they’re not, use the project defaults, and if that hasn’t been set, use some pre-defined values… and then tell us what they are when we’re running the tasks (those are the debug: lines).

This block is where we create our “Static Assets” – individual items that we will be consuming later. This shows a clear win here over the Powershell methods endorsed by Microsoft – here you can create a Resource Group (RG) as part of the playbook! We also create a single Storage Account for this RG and a single VNET too.

These creation rules are not suitable for production use, as this defines an “Any-Any” Security group! You should tailor your security groups for your need, not for blanket access in!

This is where things start to get a bit more interesting – We’re using the “async/async_status” pattern here (and the rest of these sections) to start creating the resources in parallel. As far as I can tell, sometimes you’ll get a case where the async doesn’t quite get set up fast enough, then the async_status can’t track the resources properly, but re-running the playbook should be enough to sort that out, without slowing things down too much.

But what are we actually doing with this block of code? A UDR is a “User Defined Route” or routing table for Azure. Effectively, you treat each network interface as being plumbed directly to the router (none of this “same subnet broadcast” stuff works here!) so you can do routing at the router for all the networks.

By default there are some existing network routes (stuff to the internet flows to the internet, RFC1918 addresses are dropped with the exception of any RFC1918 addresses you have covered in your VNETs, and each of your subnets can reach each other “directly”). Adding a UDR overrides this routing table. The UDRs we’re creating here are applied at a subnet level, but currently don’t override any of the existing routes (they’re blank). We’ll start putting routes in after we’ve added the UDRs to the subnets. Talking of which….

Again, this block is not really suitable for production use, and assumes the VNET supernet of /8 will be broken down into several /24’s. In the “real world” you might deliver a handful of /26’s in a /24 VNET… or you might even have lots of disparate /24’s in the VNET which are then allocated exactly as individual /24 subnets… this is not what this model delivers but you might wish to investigate further!

Now that we’ve created our subnets, we can start adding the routing table to the UDR. This is a basic one – add a 0.0.0.0/0 route (internet access) from the “protected” network via the firewall. You can get a lot more specific than this – most people are likely to want to add the VNET range (in this case 10.0.0.0/8) via the firewall as well, except for this subnet (because otherwise, for example, 10.0.0.100 trying to reach 10.0.0.101 will go via the firewall too).

Without going too much into the intricacies of network architecture, if you are routing your traffic between subnets to the firewall, it’s probably better to get an appliance with more interfaces, so you can route traffic across the appliance, rather than going across a single interface as this will halve your traffic bandwidth (it’s currently capped 1Gb/s – so 500Mb/s).

Having mentioned “The Internet” – let’s give our firewall a public IP address, and create the rest of the interfaces as well.

This script creates a public IP address by default for each interface unless you explicitly tell it not to (see lines 40, 53 and 62 in the group_vars file I rendered above). You could easily turn this around by changing the lines which contain this:

item.1.public is not defined or (item.1.public is defined and item.1.public == 'true')

into lines which contain this:

item.1.public is defined and item.1.public == 'true'

OK, having done all that, we’re now ready to build our virtual machines. I’ve introduced a “Priority system” here – VMs with priority 0 go first, then 1, and 2 go last. The code snippet below is just for priority 0, but you can easily see how you’d extrapolate that out (and in fact, the full code sample does just that).

There are a few blocks here to draw attention to :) I’ve re-jigged them a bit here so it’s clearer to understand, but when you see them in the main playbook they’re a bit more compact. Let’s start with looking at the Network Interfaces section!

network_interfaces: |
  [
    {%- for nw in item.value.ports -%}
      '{{ prefix }}{{ item.value.name }}port{{ nw.subnet.name }}'
      {%- if not loop.last -%}, {%- endif -%} 
    {%- endfor -%}
  ]

In this part, we loop over the ports defined for the virtual machine. This is because one device may have 1 interface, or four interfaces. YAML is parsed to make a JSON variable, so here we can create a JSON variable, that when the YAML is parsed it will just drop in. We’ve previously created all the interfaces to have names like this PREFIXhostnamePORTsubnetname (or aFW01portWAN in more conventional terms), so here we construct a JSON array, like this: ['aFW01portWAN'] but that could just as easily have been ['aFW01portWAN', 'aFW01portProtect', 'aFW01portMGMT', 'aFW01portSync']. This will then attach those interfaces to the virtual machine.

Next up, custom_data. This section is sometimes known externally as userdata or config_disk. My code has always referred to it as a “Provision Script” – hence the variable name in the code below!

custom_data: |
  {%- if item.value.provision_script is defined and item.value.provision_script != '' -%}
    {%- include(item.value.provision_script) -%}
  {%- elif item.value.image.provision_script is defined and item.value.image.provision_script != '' -%}
    {%- include(item.value.image.provision_script) -%}
  {%- else -%}
    {{ omit }}
  {%- endif -%}

Let’s pick this one apart too. If we’ve defined a provisioning script file for the VM, include it, if we’ve defined a provisioning script file for the image (or marketplace entry), then include that instead… otherwise, pretend that there’s no “custom_data” field before you submit this to Azure.

One last quirk to Azure, is that some images require a “plan” to go with it, and others don’t.

plan: |
  {%- if item.value.image.plan is not defined -%}{{ omit }}{%- else -%}
    {'name': '{{ item.value.image.sku }}',
     'publisher': '{{ item.value.image.publisher }}',
     'product': '{{ item.value.image.offer }}'
    }
  {%- endif -%}

So, here we say “if we’ve not got a plan, omit the value being passed to Azure, otherwise use these fields we previously specified. Weird huh?

The very last thing we do in the script is to re-render the standard password we’ve used for all these builds, so that we can check them out!

Want to review this all in one place?

Here’s the link to the full playbook, as well as the group variables (which should be in ./group_vars/all.yml) and two sample userdata files (which should be in ./userdata) for an Ubuntu machine (using cloud-init) and one for a FortiGate Firewall.

All the other files in that gist (prefixes from 10-16 and 00) are for this blog post only, and aren’t likely to work!

If you do end up using this, please drop me a note below, or star the gist! That’d be awesome!!

Image credit: “Lego Factory Playset” from Flickr by “Brickset” released under a CC-BY license. Used with Thanks!

Troubleshooting FortiGate API issues with the CLI?

One of my colleagues has asked me for some help with an Ansible script he’s writing to push some policy to a cloud hosted FortiGate appliance. Unfortunately, he kept getting some very weird error messages, like this one:

fatal: [localhost]: FAILED! => {"changed": false, "meta": {"build": 200, "error": -651, "http_method": "PUT", "http_status": 500, "mkey": "vip8080", "name": "vip", "path": "firewall", "revision": "36.0.0.10745196634707694665.1544442857", "serial": "CENSORED", "status": "error", "vdom": "root", "version": "v6.0.3"}, "msg": "Error in repo"}

This is using Fortinet’s own Ansible Modules, which, in turn use the fortiosapi python module.

This same colleague came across a post on the Fortinet Developer Network site (access to the site requires vendor approval), which said “this might be an internal bug, but to debug it, use the following”

fgt # diagnose debug enable

fgt # diagnose debug cli 8
Debug messages will be on for 30 minutes.

And then run your API commands. Your error message will be surfaced there… so here’s mine! (Mapped port doesn’t match extport in a vip).

0: config firewall vip
0: edit "vip8080"
0: unset src-filter
0: unset service
0: set extintf "port1"
0: set portforward enable
0: unset srcintf-filter
0: set mappedip "192.0.2.1-192.0.2.1"
0: unset extport
0: set extport 8080-8081
0: unset mappedport
0: set mappedport 8080
-651: end

Creating Self Signed certificates in Ansible

In my day job, I sometimes need to use a self-signed certificate when building a box. As I love using Ansible, I wanted to make the self-signed certificate piece something that was part of my Ansible workflow.

Here follows a bit of basic code that you could use to work through how the process of creating a self-signed certificate would work. I would strongly recommend using something more production-ready (e.g. LetsEncrypt) when you’re looking to move from “development” to “production” :)

---
- hosts: localhost
  vars:
  - dnsname: your.dns.name
  - tmppath: "./tmp/"
  - crtpath: "{{ tmppath }}{{ dnsname }}.crt"
  - pempath: "{{ tmppath }}{{ dnsname }}.pem"
  - csrpath: "{{ tmppath }}{{ dnsname }}.csr"
  - pfxpath: "{{ tmppath }}{{ dnsname }}.pfx"
  - private_key_password: "password"
  tasks:
  - file:
      path: "{{ tmppath }}"
      state: absent
  - file:
      path: "{{ tmppath }}"
      state: directory
  - name: "Generate the private key file to sign the CSR"
    openssl_privatekey:
      path: "{{ pempath }}"
      passphrase: "{{ private_key_password }}"
      cipher: aes256
  - name: "Generate the CSR file signed with the private key"
    openssl_csr:
      path: "{{ csrpath }}"
      privatekey_path: "{{ pempath }}"
      privatekey_passphrase: "{{ private_key_password }}"
      common_name: "{{ dnsname }}"
  - name: "Sign the CSR file as a CA to turn it into a certificate"
    openssl_certificate:
      path: "{{ crtpath }}"
      privatekey_path: "{{ pempath }}"
      privatekey_passphrase: "{{ private_key_password }}"
      csr_path: "{{ csrpath }}"
      provider: selfsigned
  - name: "Convert the signed certificate into a PKCS12 file with the attached private key"
    openssl_pkcs12:
      action: export
      path: "{{ pfxpath }}"
      name: "{{ dnsname }}"
      privatekey_path: "{{ pempath }}"
      privatekey_passphrase: "{{ private_key_password }}"
      passphrase: password
      certificate_path: "{{ crtpath }}"
      state: present

Podcast Summary – “TechSNAP Episode 384: Interplanetary Peers”

Last night I was a guest host on TechSNAP, a systems, network and administration podcast.

The episode I recorded was: TechSNAP Episode 384: Interplanetary Peers

In this episode, I helped cover the news items, mostly talking about the breach over at NewEgg by the MagePay group and a (now fixed) vulnerability in Alpine Linux, and then did a bit of a dive into IPFS.

It’s a good listen, but the audio right at the end was quite noisy as a storm settled in just as I was recording my outro.

“You can’t run multiple commands in sudo” – and how to work around this

At work, we share tips and tricks, and one of my colleagues recently called me out on the following stanza I posted:

I like this [ansible] one for Debian based systems:
  - name: "Apt update, Full-upgrade, autoremove, autoclean"
    become: yes
    apt:
      upgrade: full
      update_cache: yes
      autoremove: yes
      autoclean: yes

And if you’re trying to figure out how to do that in Shell:
apt-get update && apt-get full-update -y && apt-get autoremove -y && apt-get autoclean -y

His response was “Surely you’re not logging into bash as root”. I said “I normally sudo -i as soon as I’ve logged in. I can’t recall offhand how one does a sudo for a string of command && command statements”

Well, as a result of this, I looked into it. Here’s one comment from the first Stack Overflow page I found:

You can’t run multiple commands from sudo – you always need to trick it into executing a shell which may accept multiple commands to run as parameters

So here are a few options on how to do that:

  1. sudo -s whoami \; whoami (link to answer)
  2. sudo sh -c "whoami ; whoami" (link to answer)
  3. But, my favourite is from this answer:

    An alternative using eval so avoiding use of a subshell: sudo -s eval 'whoami; whoami'

Why do I prefer the last one? Well, I already use eval for other purposes – mostly for starting my ssh-agent over SSH, like this: eval `ssh-agent` ; ssh-add

One to read/watch: IPsec and IKE Tutorial

Ever been told that IPsec is hard? Maybe you’ve seen it yourself? Well, Paul Wouters and Sowmini Varadhan recently co-delivered a talk at the NetDev conference, and it’s really good.

Sowmini’s and Paul’s slides are available here: https://www.files.netdevconf.org/d/a18e61e734714da59571/

A complete recording of the tutorial is here. Sowmini’s part of the tutorial (which starts first in the video) is quite technically complex, looking at specifically the way that Linux handles the packets through the kernel. I’ve focused more on Paul’s part of the tutorial (starting at 26m23s)… but my interest was piqued from 40m40s when he starts to actually show how “easy” configuration is. There are two quick run throughs of typical host-to-host IPsec and subnet-to-subnet IPsec tunnels.

A key message for me, which previously hadn’t been at all clear in IPsec using {free,libre,open}swan is that they refer to Left and Right as being one party and the other… but the node itself works out if it’s “left” or “right” so the *SAME CONFIG* can be used on both machines. GENIUS.

Also, when you’re looking at the config files, anything prefixed with an @ symbol is something that doesn’t need resolving to something else.

It’s well worth a check-out, and it’s inspired me to take another look at IPsec for my personal VPNs :)

I should note that towards the end, Paul tried to run a selection of demonstrations in Opportunistic Encryption (which basically is a way to enable encryption between two nodes, even if you don’t have a pre-established VPN with them). Because of issues with the conference wifi, plus the fact that what he’s demoing isn’t exactly production-grade yet, it doesn’t really work right, and much of the rest of the video (from around 1h10m) is him trying to show that working while attendees are running through the lab, and having conversations about those labs with the attendees.