If you follow any of the content I postaround the internet, you might have seen me asking questions about trying to get data out of azure_rm_*_facts into something that’s usable. I can’t go into why I needed that data yet (it’s a little project I’m working on), but the upshot is that trying to manipulate data using “set_fact” with jinja is *doable* but *messy*. In the end, I decided to hand it all off to a new ansible module I’m writing. So, here are the things I learned about this.
There’s lots more documentation about writing a module (a plugin that let’s you do stuff) than there is about writing filters (things that change text inline) or lookups (things that let you search other data stores). In the end, while I could have spent the time to figure out how better to write a filter or a lookup, it actually makes more sense in my context to hand a module all my data, and say “Parse this” and register the result than it would have done to have the playbook constantly check whether things were in other things. I still might have to do that, but… you know, for now, I’ve got the bits I want! :)
I did start looking at writing a filter, and discovered that the “debugging advice” on the ansible site is all geared up to enable more modules than enabling filters… but I did discover that modules execute on their target (e.g. WebHost01) while filters and lookups execute on the local machine. Why does this matter? Well…..
While I was looking for documentation about debugging Ansible code, I stumbled over this page on debugging modules that makes it all look easy. Except, it’s only for debugging *MODULES* (very frustrating. Well, what does it actually mean? The modules get zipped up and sent to the host that will be executing the code, which means that with an extra flag to your playbook (ANSIBLE_KEEP_REMOTE_FILES – even if it’s going to be run on “localhost”), you get the combined output of the script placed into a path on your machine, which means you can debug that specific play. That doesn’t work for filters…
Oh man, this is the motherlode. So, q makes life *VERY* easy. Assuming this is valid code: All you’d need to do would be to add two lines, as you’ll see here: This then dumps the output from each of the q(something) lines into /tmp/q for you to read at your leisure! (To be fair, I’d probably remove it after you’ve finished, so you don’t fill a disk :) )
And that’s when I discovered that it’s actually easier to use q() for all my python debugging purposes than it is to follow the advice above about debugging modules. Yehr, it’s basically a load of print statements, so you don’t get to see stack traces, or read all the variables, and you don’t get to step through code to see why decisions were taken… but for the rubbish code I produce, it’s easily enough for me!
Recently, I’ve been migrating my POV (proof of value) and POC (proof of concept) environment from K5 to Azure to be able to test vendor products inside Azure. I ran a few tests to build the environment using the native tools (the powershell scripts) and found that the Powershell way of delivering Azure environments seems overly complicated… particularly as I’m comfortable with how Ansible works.
To be fair, I also need to look at Terraform, but that isn’t what I’m looking at today :)
So, let’s start with the scaffolding. Any Ansible Playbook which deals with creating virtual machines needs to have some extra modules installed. Make sure you’ve got ansible 2.7 or later and the python azure library 2.0.0 or later (you can get both with pip for python).
Next, let’s look at the group_vars for this playbook.
This file has several pieces. We define the project settings (anything prefixed project_ is a project setting), including the prefix used for all resources we create (in this case “env01“), and a standard password used for all VMs we create (in this case “My$uper$ecret$Passw0rd“).
Next we define the standard images to load from the Marketplace. You can extend this with other images, these are just the “easiest” ones that I’m most familiar with (your mileage may vary). Next up is the networks to build inside the VNet, and lastly we define the actual machines we want to build. If you’ve got questions about any of the values we define here, just let me know in the comments below :)
Here we start by pulling in the variables we might want to override, and we do this by reading system environment variables (ANSIBLE_PREFIX and BREAKGLASS) and using them if they’re set. If they’re not, use the project defaults, and if that hasn’t been set, use some pre-defined values… and then tell us what they are when we’re running the tasks (those are the debug: lines).
This block is where we create our “Static Assets” – individual items that we will be consuming later. This shows a clear win here over the Powershell methods endorsed by Microsoft – here you can create a Resource Group (RG) as part of the playbook! We also create a single Storage Account for this RG and a single VNET too.
These creation rules are not suitable for production use, as this defines an “Any-Any” Security group! You should tailor your security groups for your need, not for blanket access in!
This is where things start to get a bit more interesting – We’re using the “async/async_status” pattern here (and the rest of these sections) to start creating the resources in parallel. As far as I can tell, sometimes you’ll get a case where the async doesn’t quite get set up fast enough, then the async_status can’t track the resources properly, but re-running the playbook should be enough to sort that out, without slowing things down too much.
But what are we actually doing with this block of code? A UDR is a “User Defined Route” or routing table for Azure. Effectively, you treat each network interface as being plumbed directly to the router (none of this “same subnet broadcast” stuff works here!) so you can do routing at the router for all the networks.
By default there are some existing network routes (stuff to the internet flows to the internet, RFC1918 addresses are dropped with the exception of any RFC1918 addresses you have covered in your VNETs, and each of your subnets can reach each other “directly”). Adding a UDR overrides this routing table. The UDRs we’re creating here are applied at a subnet level, but currently don’t override any of the existing routes (they’re blank). We’ll start putting routes in after we’ve added the UDRs to the subnets. Talking of which….
Again, this block is not really suitable for production use, and assumes the VNET supernet of /8 will be broken down into several /24’s. In the “real world” you might deliver a handful of /26’s in a /24 VNET… or you might even have lots of disparate /24’s in the VNET which are then allocated exactly as individual /24 subnets… this is not what this model delivers but you might wish to investigate further!
Now that we’ve created our subnets, we can start adding the routing table to the UDR. This is a basic one – add a 0.0.0.0/0 route (internet access) from the “protected” network via the firewall. You can get a lot more specific than this – most people are likely to want to add the VNET range (in this case 10.0.0.0/8) via the firewall as well, except for this subnet (because otherwise, for example, 10.0.0.100 trying to reach 10.0.0.101 will go via the firewall too).
Without going too much into the intricacies of network architecture, if you are routing your traffic between subnets to the firewall, it’s probably better to get an appliance with more interfaces, so you can route traffic across the appliance, rather than going across a single interface as this will halve your traffic bandwidth (it’s currently capped 1Gb/s – so 500Mb/s).
Having mentioned “The Internet” – let’s give our firewall a public IP address, and create the rest of the interfaces as well.
This script creates a public IP address by default for each interface unless you explicitly tell it not to (see lines 40, 53 and 62 in the group_vars file I rendered above). You could easily turn this around by changing the lines which contain this:
item.1.public is not defined or (item.1.public is defined and item.1.public == 'true')
into lines which contain this:
item.1.public is defined and item.1.public == 'true'
OK, having done all that, we’re now ready to build our virtual machines. I’ve introduced a “Priority system” here – VMs with priority 0 go first, then 1, and 2 go last. The code snippet below is just for priority 0, but you can easily see how you’d extrapolate that out (and in fact, the full code sample does just that).
There are a few blocks here to draw attention to :) I’ve re-jigged them a bit here so it’s clearer to understand, but when you see them in the main playbook they’re a bit more compact. Let’s start with looking at the Network Interfaces section!
network_interfaces: |
[
{%- for nw in item.value.ports -%}
'{{ prefix }}{{ item.value.name }}port{{ nw.subnet.name }}'
{%- if not loop.last -%}, {%- endif -%}
{%- endfor -%}
]
In this part, we loop over the ports defined for the virtual machine. This is because one device may have 1 interface, or four interfaces. YAML is parsed to make a JSON variable, so here we can create a JSON variable, that when the YAML is parsed it will just drop in. We’ve previously created all the interfaces to have names like this PREFIXhostnamePORTsubnetname (or aFW01portWAN in more conventional terms), so here we construct a JSON array, like this: ['aFW01portWAN'] but that could just as easily have been ['aFW01portWAN', 'aFW01portProtect', 'aFW01portMGMT', 'aFW01portSync']. This will then attach those interfaces to the virtual machine.
Next up, custom_data. This section is sometimes known externally as userdata or config_disk. My code has always referred to it as a “Provision Script” – hence the variable name in the code below!
custom_data: |
{%- if item.value.provision_script is defined and item.value.provision_script != '' -%}
{%- include(item.value.provision_script) -%}
{%- elif item.value.image.provision_script is defined and item.value.image.provision_script != '' -%}
{%- include(item.value.image.provision_script) -%}
{%- else -%}
{{ omit }}
{%- endif -%}
Let’s pick this one apart too. If we’ve defined a provisioning script file for the VM, include it, if we’ve defined a provisioning script file for the image (or marketplace entry), then include that instead… otherwise, pretend that there’s no “custom_data” field before you submit this to Azure.
One last quirk to Azure, is that some images require a “plan” to go with it, and others don’t.
plan: |
{%- if item.value.image.plan is not defined -%}{{ omit }}{%- else -%}
{'name': '{{ item.value.image.sku }}',
'publisher': '{{ item.value.image.publisher }}',
'product': '{{ item.value.image.offer }}'
}
{%- endif -%}
So, here we say “if we’ve not got a plan, omit the value being passed to Azure, otherwise use these fields we previously specified. Weird huh?
The very last thing we do in the script is to re-render the standard password we’ve used for all these builds, so that we can check them out!
One of my colleagues has asked me for some help with an Ansible script he’s writing to push some policy to a cloud hosted FortiGate appliance. Unfortunately, he kept getting some very weird error messages, like this one:
This same colleague came across a post on the Fortinet Developer Network site (access to the site requires vendor approval), which said “this might be an internal bug, but to debug it, use the following”
fgt # diagnose debug enable
fgt # diagnose debug cli 8
Debug messages will be on for 30 minutes.
And then run your API commands. Your error message will be surfaced there… so here’s mine! (Mapped port doesn’t match extport in a vip).
0: config firewall vip
0: edit "vip8080"
0: unset src-filter
0: unset service
0: set extintf "port1"
0: set portforward enable
0: unset srcintf-filter
0: set mappedip "192.0.2.1-192.0.2.1"
0: unset extport
0: set extport 8080-8081
0: unset mappedport
0: set mappedport 8080
-651: end
Late edit 2020-03-27: I spotted a bug in the Ansible issues tracker today, and I added a note to the end of that bug mentioning that as well as diagnose debug cli 8, if that doesn’t give you enough logs to figure out what’s up, you can also try diagnose debug application httpsd -1 but this enables LOTS AND LOTS of logs, so really think twice before turning that one on!
Oh, and if 30 minutes isn’t enough, try diagnose debug duration 480 or however many minutes you think you need. Beware that it will write event logs out to the serial console even when you’ve logged out.
In my day job, I sometimes need to use a self-signed certificate when building a box. As I love using Ansible, I wanted to make the self-signed certificate piece something that was part of my Ansible workflow.
Here follows a bit of basic code that you could use to work through how the process of creating a self-signed certificate would work. I would strongly recommend using something more production-ready (e.g. LetsEncrypt) when you’re looking to move from “development” to “production” :)
Saw this picture showing what your log levels should actually be and figured it was awesome. Thought you might like it too! (Also Jamie’s content is fab generally)
At work, we share tips and tricks, and one of my colleagues recently called me out on the following stanza I posted:
I like this [ansible] one for Debian based systems: - name: "Apt update, Full-upgrade, autoremove, autoclean"
become: yes
apt:
upgrade: full
update_cache: yes
autoremove: yes
autoclean: yes
And if you’re trying to figure out how to do that in Shell: apt-get update && apt-get full-update -y && apt-get autoremove -y && apt-get autoclean -y
His response was “Surely you’re not logging into bash as root”. I said “I normally sudo -i as soon as I’ve logged in. I can’t recall offhand how one does a sudo for a string of command && command statements”
Well, as a result of this, I looked into it. Here’s one comment from the first Stack Overflow page I found:
You can’t run multiple commands from sudo – you always need to trick it into executing a shell which may accept multiple commands to run as parameters
An alternative using eval so avoiding use of a subshell: sudo -s eval 'whoami; whoami'
Why do I prefer the last one? Well, I already use eval for other purposes – mostly for starting my ssh-agent over SSH, like this: eval `ssh-agent` ; ssh-add
For those of you who are working with #Ansible… Ansible 2.5 is out, and has an unusual documentation change around a key Ansible concept – `with_`loops Where you previously had:
Fear not, I hear you say, It’s fine, of course the documentation suggests that this is “how it’s always been”…… HA HA HA Nope. This behaviour is new as of 2.5, and needs ansible to be updated to the latest version. As far as I can tell, there’s no way to indicate to Ansible “Oh, BTW, this needs to be running on 2.5 or later”… so I wrote a role that does that for you.
This post came about after a couple of hours of iterations, so I can’t necessarily quote all the sources I worked from, but I’ll do my best!
In OpenStack (particularly in the “Kilo” release I’m working with), if you have a networking device that will pass traffic on behalf of something else (e.g. Firewall, IDS, Router, Transparent Proxy) you need to tell the virtual NIC that the interface is allowed to pass traffic for other IP addresses, as OpenStack applies by default a “Same Origin” firewall rule to the interface. Defining this in OpenStack is more complex than it could be, because for some reason, you can’t define 0.0.0.0/0 as this allowed address pair, so instead you have to define 0.0.0.0/1 and 128.0.0.0/1.
Here’s how you define those allowed address pairs (note, this assumes you’ve got some scaffolding in place to define things like “network_appliance”):
allowed_address_pairs: "{% if (item.0.network_appliance|default('false')|lower() == 'true') or (item.1.network_appliance|default('false')|lower() == 'true') %}[{'ip_address': '0.0.0.0/1'}, {'ip_address': '128.0.0.0/1'}]{% else %}{{ item.0.allowed_address_pairs|default(omit) }}{% endif %}"
OK, so we’ve defined the allowed address pairs! We can pass traffic across our firewall. But (and there’s always a but), the product I’m working with at the moment has a floating MAC address in a cluster, when you define an HA pair. They have a standard schedule for how each port’s floating MAC is assigned… so here’s what I’ve ended up with (and yes, I know it’s a mess!)
allowed_address_pairs: "{% if (item.0.network_appliance|default('false')|lower() == 'true') or (item.1.network_appliance|default('false')|lower() == 'true') %}[{'ip_address': '0.0.0.0/1'},{'ip_address': '128.0.0.0/1'}{% if item.0.ha is defined and item.0.ha != '' %}{% for vdom in range(0,40, 10) %},{'ip_address': '0.0.0.0/1','mac_address': '{{ item.0.floating_mac_prefix|default(item.0.image.floating_mac_prefix|default(floating_mac_prefix)) }}:{% if item.0.ha.group_id|default(0) < 16 %}0{% endif %}{{ '%0x' | format(item.0.ha.group_id|default(0)|int) }}:{% if vdom+(item.1.interface|default('1')|replace('port', '')|int)-1 < 16 %}0{% endif %}{{ '%0x' | format(vdom+(item.1.interface|default('1')|replace('port', '')|int)-1) }}'}, {'ip_address': '128.0.0.0/1','mac_address': '{{ item.0.floating_mac_prefix|default(item.0.image.floating_mac_prefix|default(floating_mac_prefix)) }}:{% if item.0.ha.group_id|default(0) < 16 %}0{% endif %}{{ '%0x' | format(item.0.ha.group_id|default(0)|int) }}:{% if vdom+(item.1.interface|default('0')|replace('port', '')|int)-1 < 16 %}0{% endif %}{{ '%0x' | format(vdom+(item.1.interface|default('1')|replace('port', '')|int)-1) }}'}{% endfor %}{% endif %}]{% else %}{{ item.0.allowed_address_pairs|default(omit) }}{% endif %}"
Let's break this down a bit. The vendor says that each port gets a standard prefix, (e.g. DE:CA:FB:AD) then the penultimate octet is the "Cluster ID" number in hex, and then the last octet is the sum of the port number (zero-indexed) added to a VDOM number, which increments in 10's. We're only allowed to assign 10 "allowed address pairs" to an interface, so I've got the two originals (which are assigned to "whatever" the defined mac address is of the interface), and four passes around. Other vendors (e.g. this one) do things differently, so I'll probably need to revisit this once I know how the next one (and the next one... etc.) works!
So, we have here a few parts to make that happen.
The penultimate octet, which is the group ID in hex needs to be two hex digits long, and without adding more python modules to our default machines, we can't use a "pad" filter (to add 0's to the beginning of the mac octets), so we do that by hand:
{% if item.0.ha.group_id|default(0) < 16 %}0{% endif %}
And here's how to convert the group ID into a hex number:
Then the next octet is the sum of the VDOM and PortID. First we need to loop around the VDOMs. We don't always know whether we're going to be adding VDOMs until after deployment has started, so here we will assume we've got 3 VDOMs (plus VDOM "0" for management) as it doesn't really matter if we don't end up using them. We create the vdom variable like this:
{% for vdom in range(0, 40, 10) %} STUFF {% endfor %}
We need to put the actual port ID in there too. As we're using a with_subelement loop we can't create an increment, but what we can do is ensure we're recording the interface number. This only works here because the vendor has a sequential port number (port1, port2, etc). We'll need to experiment further with other vendors! So, here's how we're doing this. We already know how to create a hex number, but we do need to use some other Jinja2 filters here:
Let's pull this apart a bit further. item.1.interface is the name of the interface, and if it doesn't exist (using the |default('1') part) we replace it with the string "1". So, let's replace that variable with a "normal" value.
Next, we need to remove the word "port" from the string "port1" to make it just "1", so we use the replace filter to strip part of that value out. Let's do that:
{{ '%0x' | format(vdom+("1"|int)-1) }}
After that, we need to turn the string "1" into the literal number 1:
{{ '%0x' | format(vdom+1-1) }}
We loop through vdom several times, but let's pick one instance of that at random - 30 (the fourth iteration of the vdom for-loop):
{{ '%0x' | format(30+1-1) }}
And then we resolve the maths:
{{ '%0x' | format(30) }}
And then the |format(30) turns the '%0x' into the value "1e"
Assuming the vendor prefix is, as I mentioned, 'de:ca:fb:ad:' and the cluster ID is 0, this gives us the following resulting allowed address pairs:
In my day job, I’m using Ansible to provision networks in OpenStack. One of the complaints I’ve had about the way I now define them is that the person implementing the network has to spell out all the network elements – the subnet size, DHCP pool, the addresses of the firewalls and names of those items. This works for a manual implementation process, but is seriously broken when you try to hand that over to someone else to implement. Most people just want something which says “Here is the network I want to implement – 192.0.2.0/24″… and let the system make it for you.
So, I wrote some code to make that happen. It’s not perfect, and it’s not what’s in production (we have lots more things I need to add for that!) but it should do OK with an IPv4 network.
Hope this makes sense!
---
- hosts: localhost
vars:
- networks:
# Defined as a subnet with specific router and firewall addressing
external:
subnet: "192.0.2.0/24"
firewall: "192.0.2.1"
router: "192.0.2.254"
# Defined as an IP address and CIDR prefix, rather than a proper network address and CIDR prefix
internal_1:
subnet: "198.51.100.64/24"
# A valid smaller network and CIDR prefix
internal_2:
subnet: "203.0.113.0/27"
# A tiny CIDR network
internal_3:
subnet: "203.0.113.64/30"
# These two CIDR networks are unusable for this environment
internal_4:
subnet: "203.0.113.128/31"
internal_5:
subnet: "203.0.113.192/32"
# A massive CIDR network
internal_6:
subnet: "10.0.0.0/8"
tasks:
# Based on https://stackoverflow.com/a/47631963/5738 with serious help from mgedmin and apollo13 via #ansible on Freenode
- name: Add router and firewall addressing for CIDR prefixes < 30
set_fact:
networks: >
{{ networks | default({}) | combine(
{item.key: {
'subnet': item.value.subnet | ipv4('network'),
'router': item.value.router | default((( item.value.subnet | ipv4('network') | ipv4('int') ) + 1) | ipv4),
'firewall': item.value.firewall | default((( item.value.subnet | ipv4('broadcast') | ipv4('int') ) - 1) | ipv4),
'dhcp_start': item.value.dhcp_start | default((( item.value.subnet | ipv4('network') | ipv4('int') ) + 2) | ipv4),
'dhcp_end': item.value.dhcp_end | default((( item.value.subnet | ipv4('broadcast') | ipv4('int') ) - 2) | ipv4)
}
}) }}
with_dict: "{{ networks }}"
when: item.value.subnet | ipv4('prefix') < 30
- name: Add router and firewall addressing for CIDR prefixes = 30
set_fact:
networks: >
{{ networks | default({}) | combine(
{item.key: {
'subnet': item.value.subnet | ipv4('network'),
'router': item.value.router | default((( item.value.subnet | ipv4('network') | ipv4('int') ) + 1) | ipv4),
'firewall': item.value.firewall | default((( item.value.subnet | ipv4('broadcast') | ipv4('int') ) - 1) | ipv4)
}
}) }}
with_dict: "{{ networks }}"
when: item.value.subnet | ipv4('prefix') == 30
- debug:
var: networks
Essentially, inspec is a way of explaining “this is what my server must look like”, so you can then test these statements against a built machine… effectively letting you unit test your provisioning scripts.
I’ve already built a fair bit of my current personal project using Ansible, so I wasn’t exactly keen to re-write everything from scratch, but it did make me think that maybe I should have a common set of tests to see how close my server was to the hardening “Benchmark” guides from CIS… and that’s pretty easy to script in inspec, particularly as the tests in those documents list the “how to test” and “how to remediate” commands to execute.
These are in the process of being drawn up (so far, all I have is an inspec test saying “confirm you’re running on Ubuntu 16.04″… not very complex!!) but, from the looks of things, the following playbook would work relatively well!