Opening to my video: Screencast 003 - Gitlab

Screencast 003: Gitlab

I’ve done a new mentoring style video, talking about how to use a self-hosted version of Gitlab for basic group projects and individual projects.

Screencast 003: Gitlab

Also available on Archive.org and LBRY.

Late edit 2020-03-25: To build the Gitlab environment I created, take a look at this git repository, which uses Terraform, some cloud init scripts and an ansible playbook. In particular, look at the following files:

If you just want to build the Gitlab environment, then it’s worth removing or renaming (to anything that isn’t .tf – I use .tf_unload) the files load_aws_module.tf, load_awx_module.tf, load_azure_module.tf

Opening to my video: Screencast 002 - A quick walk through Git

Screencast 002: A quick walk through Git (a mentoring style video)

I have done a follow-up Mentoring style video to support my last one. This video shows how to fix some of the issues in Git I came across in my last mentoring video!

Screencast 002: A quick walk through Git

I took some advice from a colleague who noticed that I skipped past a couple of issues with my Git setup, so I re-did them :) I hope this makes sense, and at 35 minutes, is a bit more understandable than the last 1h15 video!

Also on LBRY and Archive.org

Opening to my video: Screencast 001 - Ansible and Inspec using Vagrant

Screencast 001: Ansible and Inspec with Vagrant and Git (a mentoring style video)

If you’ve ever wondered how I use Ansible and Inspec, or wondered why some of my Vagrant files look like they do, well, I want to start recording some “mentor” style videos… You know how, if you were sitting next to someone who’s a mentor to you, and you watch how they build a solution.

The first one was released last night!

I recently saw a video by Chris Hartjes on how he creates his TDD (Test driven development) based PHP projects, and I really wanted to emulate that style, but talking about the things I use.

This was my second attempt at recording a mentoring style video yesterday, the first was shown to the Admin Admin Podcast listeners group on Telegram, and then sacrificed to the demo gods (there were lots of issues in that first video) never to be seen again.

From a tooling perspective, I’m using a remote virtual machine running Ubuntu Mate 18.04 over RDP (to improve performance) with xrdp and Remmina, OBS is running locally to record the content, and I’m using Visual Studio Code, git, Vagrant and Virtualbox, as well as Ansible and Inspec.

Late edit 2020-02-29: Like videos like this, hate YouTube? It’s also on archive.org: https://archive.org/details/JonTheNiceGuyScreencast001

Late edit 2020-03-01: Popey told me about LBRY.tv when I announced this on the Admin Admin Podcast telegram channel, and so I’ve also copied the video to there: https://lbry.tv/@JonTheNiceGuy:b/Screencast001-Ansible-and-Inspec-with-Vagrant:8

"vieux port Marseille" by "Jeanne Menjoulet" on Flickr

Networking tricks with Multipass in Virtualbox on Windows (Bridged interfaces and Port Forwards)

TL;DR? Want to “just” bridge one or more interfaces to a Multipass instance when you’re using Virtualbox? See the Bridging Summary below. Want to do a port forward? See the Port Forward section below. You will need the psexec command and to execute this as an administrator. The use of these two may be considered a security incident on your computing environment, depending on how your security processes and infrastructure are defined and configured.

Ah Multipass. This is a tool created by Canonical to create a “A mini-cloud on your Mac or Windows workstation.” (from their website)…

I’ve often seen this endorsed as the tool of choice from Canonical employees to do “stuff” like run Kubernetes, develop tools for UBPorts (previously Ubuntu Touch) devices, and so on.

So far, it seems interesting. It’s a little bit like Vagrant with an in-built cloud-init Provisioner, and as I want to test out the cloud-init files I’m creating for AWS and Azure, that’d be so much easier than actually building the AWS or Azure machines, or finding a viable cloud-init plugin for Vagrant to test it out.

BUT… Multipass is really designed for Linux systems (running LibVirt), OS X (running HyperKit) and Windows (running Hyper-V). Even if I were using Windows 10 Pro on this machine, I use Virtualbox for “things” on my Windows Machine, and Hyper-V steals the VT-X bit, which means that VirtualBox can’t run x64 code…. Soooo I can’t use the Hyper-V mode.

Now, there is a “fix” for this. You can put Multipass into Virtualbox mode, which lets you run Multipass on Windows or OS X without using their designed-for hypervisor, but this has a downside, you see, VirtualBox doesn’t give MultiPass the same interface to route networking connections to the VM, and there’s currently no CLI or GUI options to say “bridge my network” or “forward a port” (in part because it needs to be portable to the native hypervisor options, apparently). So, I needed to fudge some things so I can get my beloved bridged connections.

I got to the point where I could do this, thanks to the responses to a few issues I raised on the Multipass Github issues, mostly #1333.

The first thing you need to install in Windows is PsExec, because Multipass runs it’s Virtual Machines as the SYSTEM account, and talking to SYSTEM account processes is nominally hard. Get PsExec from the SysInternals website. Some IT Security professionals will note the addition of PsExec as a potential security incident, but then again, they might also see the running of a virtual machine as a security incident too, as these aren’t controlled with a central image. Anyway… Just bear it in mind, and don’t shout at me if you get frogmarched in front of your CISO.

I’m guessing if you’re here, you’ve already installed Multipass, (but if not, and it seems interesting – it’s over at https://multipass.run. Get it and install it, then carry on…) and you’ve probably enabled the VirtualBox mode (if not – open a command prompt as administrator, and run “multipass set local.driver=virtualbox“). Now, you can start sorting out your bridges.

Sorting out bridges

First things first, you need to launch a virtual machine. I did, and it generated a name for my image.

C:\Users\JON>multipass launch
Launched: witty-kelpie

Fab! We have a running virtual machine, and you should be able to get a shell in there by running multipass shell "witty-kelpie" (the name of the machine it launched before). But, uh-oh. We have the “default” NAT interface of this device mapped, not a bridged interface.

C:\Users\JON>multipass shell "witty-kelpie"
Welcome to Ubuntu 18.04.3 LTS (GNU/Linux 4.15.0-76-generic x86_64)

 * Documentation:  https://help.ubuntu.com
 * Management:     https://landscape.canonical.com
 * Support:        https://ubuntu.com/advantage

  System information as of Thu Feb  6 10:56:38 GMT 2020

  System load:  0.3               Processes:             82
  Usage of /:   20.9% of 4.67GB   Users logged in:       0
  Memory usage: 11%               IP address for enp0s3: 10.0.2.15
  Swap usage:   0%


0 packages can be updated.
0 updates are security updates.


To run a command as administrator (user "root"), use "sudo <command>".
See "man sudo_root" for details.

ubuntu@witty-kelpie:~$

So, exit the machine, and issue a multipass stop "witty-kelpie" command to ask Virtualbox to shut it down.

So, this is where the fun[1] part begins.
[1] The “Fun” part here depends on how you view this specific set of circumstances 😉

We need to get the descriptions of all the interfaces we might want to bridge to this device. I have three interfaces on my machine – a WiFi interface, a Ethernet interface on my laptop, and an Ethernet interface on my USB3 dock. At some point in the past, I renamed these interfaces, so I’d recognise them in the list of interfaces, so they’re not just called “Connection #1”, “Connection #2” and so on… but you should recognise your interfaces.

To get this list of interfaces, open PowerShell (as a “user”), and run this command:

PS C:\Users\JON> Get-NetAdapter -Physical | format-list -property "Name","DriverDescription"

Name              : On-Board Network Connection
DriverDescription : Intel(R) Ethernet Connection I219-LM

Name              : Wi-Fi
DriverDescription : Intel(R) Dual Band Wireless-AC 8260

Name              : Dock Network Connection
DriverDescription : DisplayLink Network Adapter NCM

For reasons best known to the Oracle team, they use the “Driver Description” to identify the interfaces, not the name assigned to the device by the user, so, before we get started, find your interface, and note down the description for later. If you want to bridge “all” of them, make a note of all the interfaces in question, and in the order you want to attach them. Note that Virtualbox doesn’t really like exposing more than 8 NICs without changing the Chipset to ICH9 (but really… 9+ NICs? really??) and the first one is already consumed with the NAT interface you’re using to connect to it… so that gives you 7 bridgeable interfaces. Whee!

So, now you know what interfaces you want to bridge, let’s configure the Virtualbox side. Like I said before you need psexec. I’ve got psexec stored in my Downloads folder. You can only run psexec as administrator, so open up an Administrator command prompt or powershell session, and run your command.

Just for clarity, your commands are likely to have some different paths, so remember that wherever “your” PsExec64.exe command is located, mine is in C:\Users\JON\Downloads\sysinternals\PsExec64.exe, and wherever your vboxmanage.exe is located, mine is in C:\Program Files\Oracle\VirtualBox\vboxmanage.exe.

Here, I’m going to attach my dock port (“DisplayLink Network Adapter NCM”) to the second VirtualBox interface, the Wifi adaptor to the third interface and my locally connected interface to the fourth interface. Your interfaces WILL have different descriptions, and you’re likely not to need quite so many of them!

C:\WINDOWS\system32>C:\Users\JON\Downloads\sysinternals\PsExec64.exe -s "c:\program files\oracle\virtualbox\vboxmanage" modifyvm "witty-kelpie" --nic2 bridged --bridgeadapter2 "DisplayLink Network Adapter NCM" --nic3 bridged --bridgeadapter3 "Intel(R) Dual Band Wireless-AC 8260" --nic4 bridged --bridgeadapter4 "Intel(R) Ethernet Connection I219-LM"

PsExec v2.2 - Execute processes remotely
Copyright (C) 2001-2016 Mark Russinovich
Sysinternals - www.sysinternals.com

c:\program files\oracle\virtualbox\vboxmanage exited on MINILITH with error code 0.

An error code of 0 means that it completed successfuly and with no issues.

If you wanted to use a “Host Only” network (if you’re used to using Vagrant, you might know it as “Private” Networking), then change the NIC you’re interested in from --nicX bridged --bridgeadapterX "Some Description" to --nicX hostonly --hostonlyadapterX "VirtualBox Host-Only Ethernet Adapter" (where X is replaced with the NIC number you want to swap, ranged between 2 and 8, as 1 is the NAT interface you use to SSH into the virtual machine.)

Now we need to check to make sure the machine has it’s requisite number of interfaces. We use the showvminfo flag to the vboxmanage command. It produces a LOT of content, so I’ve manually filtered the lines I want, but you should spot it reasonably quickly.

C:\WINDOWS\system32>C:\Users\JON\Downloads\sysinternals\PsExec64.exe -s "c:\program files\oracle\virtualbox\vboxmanage" showvminfo "witty-kelpie"

PsExec v2.2 - Execute processes remotely
Copyright (C) 2001-2016 Mark Russinovich
Sysinternals - www.sysinternals.com


Name:                        witty-kelpie
Groups:                      /Multipass
Guest OS:                    Ubuntu (64-bit)
<SNIP SOME CONTENT>
NIC 1:                       MAC: 0800273CCED0, Attachment: NAT, Cable connected: on, Trace: off (file: none), Type: 82540EM, Reported speed: 0 Mbps, Boot priority: 0, Promisc Policy: deny, Bandwidth group: none
NIC 1 Settings:  MTU: 0, Socket (send: 64, receive: 64), TCP Window (send:64, receive: 64)
NIC 1 Rule(0):   name = ssh, protocol = tcp, host ip = , host port = 53507, guest ip = , guest port = 22
NIC 2:                       MAC: 080027303758, Attachment: Bridged Interface 'DisplayLink Network Adapter NCM', Cable connected: on, Trace: off (file: none), Type: 82540EM, Reported speed: 0 Mbps, Boot priority: 0, Promisc Policy: deny, Bandwidth group: none
NIC 3:                       MAC: 0800276EA174, Attachment: Bridged Interface 'Intel(R) Dual Band Wireless-AC 8260', Cable connected: on, Trace: off (file: none), Type: 82540EM, Reported speed: 0 Mbps, Boot priority: 0, Promisc Policy: deny, Bandwidth group: none
NIC 4:                       MAC: 080027042135, Attachment: Bridged Interface 'Intel(R) Ethernet Connection I219-LM', Cable connected: on, Trace: off (file: none), Type: 82540EM, Reported speed: 0 Mbps, Boot priority: 0, Promisc Policy: deny, Bandwidth group: none
NIC 5:                       disabled
NIC 6:                       disabled
NIC 7:                       disabled
NIC 8:                       disabled
<SNIP SOME CONTENT>

Configured memory balloon size: 0MB

c:\program files\oracle\virtualbox\vboxmanage exited on MINILITH with error code 0.

Fab! We now have working interfaces… But wait, let’s start that VM back up and see what happens.

C:\Users\JON>multipass shell "witty-kelpie"
Welcome to Ubuntu 18.04.3 LTS (GNU/Linux 4.15.0-76-generic x86_64)

 * Documentation:  https://help.ubuntu.com
 * Management:     https://landscape.canonical.com
 * Support:        https://ubuntu.com/advantage

  System information as of Thu Feb  6 11:31:08 GMT 2020

  System load:  0.1               Processes:             84
  Usage of /:   21.1% of 4.67GB   Users logged in:       0
  Memory usage: 11%               IP address for enp0s3: 10.0.2.15
  Swap usage:   0%


0 packages can be updated.
0 updates are security updates.


Last login: Thu Feb  6 10:56:45 2020 from 10.0.2.2
To run a command as administrator (user "root"), use "sudo <command>".
See "man sudo_root" for details.

ubuntu@witty-kelpie:~$

Wait, what….. We’ve still only got the one interface up with an IP address… OK, let’s fix this!

As of Ubuntu 18.04, interfaces are managed using Netplan, and, well, when the VM was built, it didn’t know about any interface past the first one, so we need to get Netplan to get them enabled. Let’s check they’re detected by the VM, and see what they’re all called:

ubuntu@witty-kelpie:~$ ip link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP mode DEFAULT group default qlen 1000
    link/ether 08:00:27:3c:ce:d0 brd ff:ff:ff:ff:ff:ff
3: enp0s8: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 08:00:27:30:37:58 brd ff:ff:ff:ff:ff:ff
4: enp0s9: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 08:00:27:6e:a1:74 brd ff:ff:ff:ff:ff:ff
5: enp0s10: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 08:00:27:04:21:35 brd ff:ff:ff:ff:ff:ff
ubuntu@witty-kelpie:~$ 

If you compare the link/ether lines to the output from showvminfo we executed before, you’ll see that the MAC address against enp0s3 matches the NAT interface, while enp0s8 matches the DisplayLink adapter, and so on… So we basically want to ask NetPlan to do a DHCP lookup for all the new interfaces we’ve added to it. If you’ve got 1 NAT and 7 physical interfaces (why oh why…) then you’d have enp0s8, 9, 10, 16, 17, 18 and 19 (I’ll come back to the random numbering in a tic)… so we now need to ask Netplan to do DHCP on all of those interfaces (assuming we’ll be asking for them all to come up!)

If we want to push that in, then we need to add a new file in /etc/netplan called something like 60-extra-interfaces.yaml, that should contain:

network:
  ethernets:
    enp0s8:
      optional: yes
      dhcp4: yes
      dhcp4-overrides:
        route-metric: 10
    enp0s9:
      optional: yes
      dhcp4: yes
      dhcp4-overrides:
        route-metric: 11
    enp0s10:
      optional: yes
      dhcp4: yes
      dhcp4-overrides:
        route-metric: 12
    enp0s16:
      optional: yes
      dhcp4: yes
      dhcp4-overrides:
        route-metric: 13
    enp0s17:
      optional: yes
      dhcp4: yes
      dhcp4-overrides:
        route-metric: 14
    enp0s18:
      optional: yes
      dhcp4: yes
      dhcp4-overrides:
        route-metric: 15
    enp0s19:
      optional: yes
      dhcp4: yes
      dhcp4-overrides:
        route-metric: 16

Going through this, we basically ask netplan not to assume the interfaces are attached. This stops the boot process for waiting for a timeout to configure each of the interfaces before proceeding, so it means your boot should be reasonably fast, particularly if you don’t always attach a network cable or join a Wifi network on all your interfaces!

We also say to assume we want IPv4 DHCP on each of those interfaces. I’ve done IPv4 only, as most people don’t use IPv6 at home, but if you are doing IPv6 as well, then you’d also need the same lines that start dhcp4 copied to show dhcp6 (like dhcp6: yes and dhcp6-overrides: route-metric: 10)

The eagle eyed of you might notice that the route metric increases for each extra interface. This is because realistically, if you have two interfaces connected (perhaps if you’ve got wifi enabled, and plug a network cable in), then you’re more likely to want to prioritize traffic going over the lower numbered interfaces than the higher number interfaces.

Once you’ve created this file, you need to run netplan apply or reboot your machine.

So, yehr, that gets you sorted on the interface front.

Bridging Summary

To review, you launch your machine with multipass launch, and immediately stop it with multipass stop "vm-name", then, as an admin, run psexec vboxmanage modifyvm "vm-name" --nic2 bridged --bridgedadapter2 "NIC description", and then start the machine with multipass start "vm-name". Lastly, ask the interface to do DHCP by manipulating your Netplan configuration.

Interface Names in VirtualBox

Just a quick note on the fact that the interface names aren’t called things like eth0 any more. A few years back, Ubuntu (amongst pretty much all of the Linux distribution vendors) changed from using eth0 style naming to what they call “Predictable Network Interface Names”. This derives the names from things like, what the BIOS provides for on-board interfaces, slot index numbers for PCI Express ports, and for this case, the “geographic location of the connector”. In Virtualbox, these interfaces are provided as the “Geographically” attached to “port 0” (so enp0 are all on port 0), but for some reason, they broadcast themselves as being attached to the port 0 at “slots” 3, 8, 9, 10, 16, 17, 18 and 19… hence enp0s3 and so on. shrug It just means that if you don’t have the interfaces coming up on the interfaces you’re expecting, you need to run ip link to confirm the MAC addresses match.

Port Forwarding

Unlike with the Bridging, we don’t need to power down the VM to add the extra interfaces, we just need to use psexec (as an admin again) to execute a vboxmanage command – in this case, it’s:

C:\WINDOWS\system32>C:\Users\JON\Downloads\sysinternals\PsExec64.exe -s "c:\program files\oracle\virtualbox\vboxmanage" controlvm "witty-kelpie" --natpf1 "myport,tcp,,1234,,2345"

OK, that’s a bit more obscure. Basically it says “Create a NAT rule on NIC 1 called ‘myport’ to forward TCP connections from port 1234 attached to any IP associated to the host OS to port 2345 attached to the DHCP supplied IP on the guest OS”.

If we wanted to run a DNS server in our VM, we could run multiple NAT rules in the same command, like this:

C:\WINDOWS\system32>C:\Users\JON\Downloads\sysinternals\PsExec64.exe -s "c:\program files\oracle\virtualbox\vboxmanage" controlvm "witty-kelpie" --natpf1 "TCP DNS,tcp,127.0.0.1,53,,53" --natpf1 "UDP DNS,udp,127.0.0.1,53,,53"

If we then decide we don’t need those NAT rules any more, we just (with psexec and appropriate paths) issue: vboxmanage controlvm "vm-name" --natpf1 delete "TCP DNS"

Using ifupdown instead of netplan

Late Edit 2020-04-01: On Github, someone asked me how they could use the same type of config with netplan, but instead on a 16.04 system. Ubuntu 16.04 doesn’t use netplan, but instead uses ifupdown instead. Here’s how to configure the file for ifupdown:

You can either add the following stanzas to /etc/network/interfaces, or create a separate file for each interface in /etc/network/interfaces.d/<number>-<interface>.cfg (e.g. /etc/network/interfaces.d/10-enp0s8.cfg)

allow-hotplug enp0s8
iface enp0s8 inet dhcp
  metric 10

To re-iterate, in the above netplan file, the interfaces we identified were: enp0s8, enp0s9, enp0s10, enp0s16, enp0s17, enp0s18 and enp0s19. Each interface was incrementally assigned a route metric, starting at 10 and ending at 16, so enp0s8 has a metric of 10, while enp0s16 has a metric of 13, and so on. To build these files, I’ve created this brief shell script you could use:

export metric=10
for int in 8 9 10 16 17 18 19
do
  echo -e "allow-hotplug enp0s${int}\niface enp0s${int} inet dhcp\n  metric $metric" > /etc/network/interfaces.d/enp0s${int}.cfg
  ((metric++))
done

As before, you could reboot to make the changes to the interfaces. Bear in mind, however, that unlike Netplan, these interfaces will try and DHCP on boot with this configuration, so boot time will take longer if every interface attached isn’t connected to a network.

Using NAT Network instead of NAT Interface

Late update 2020-05-26: Ruzsinsky contacted me by email to ask how I’d use a “NAT Network” instead of a “NAT interface”. Essentially, it’s the same as the Bridged interface above, with one other tweak first, we need to create the Net Network, with this command (as an Admin)

C:\WINDOWS\system32>C:\Users\JON\Downloads\sysinternals\PsExec64.exe -s "c:\program files\oracle\virtualbox\vboxmanage" natnetwork add --netname MyNet --network 192.0.2.0/24

Next, stop your multipass virtual machine with multipass stop "witty-kelpie", and configure your second interface, like this:

C:\WINDOWS\system32>C:\Users\JON\Downloads\sysinternals\PsExec64.exe -s "c:\program files\oracle\virtualbox\vboxmanage" modifyvm "witty-kelpie" --nic2 natnetwork --nat-network2 "MyNet"

PsExec v2.2 - Execute processes remotely
Copyright (C) 2001-2016 Mark Russinovich
Sysinternals - www.sysinternals.com

c:\program files\oracle\virtualbox\vboxmanage exited on MINILITH with error code 0.

Start the vm with multipass start "witty-kelpie", open a shell with it multipass shell "witty-kelpie", become root sudo -i and then configure the interface in /etc/netplan/60-extra-interfaces.yaml like we did before:

network:
  ethernets:
    enp0s8:
      optional: yes
      dhcp4: yes
      dhcp4-overrides:
        route-metric: 10

And then run netplan apply or reboot.

What I would say, however, is that the first interface seems to be expected to be a NAT interface, at which point, having a NAT network as well seems a bit pointless. You might be better off using a “Host Only” (or “Private”) network for any inter-host communications between nodes at a network level… But you know your environments and requirements better than I do :)

Featured image is “vieux port Marseille” by “Jeanne Menjoulet” on Flickr and is released under a CC-BY-ND license.

“code crunching” by “Ruben Molina” on Flickr

Getting Started with Terraform on AWS

I recently wrote a blog post about Getting Started with Terraform on Azure. You might have read it (I know I did!).

Having got a VM stood up in Azure, I wanted to build a VM in AWS, after all, it’s more-or-less the same steps. Note, this is a work-in-progress, and shouldn’t be considered “Final” – this is just something to use as *your* starting block.

What do you need?

You need an AWS account for this. If you’ve not got one, signing up for one is easy, but bear in mind that while there are free resource on AWS (only for the first year!), it’s also quite easy to suddenly enable a load of features that cost you money.

Best practice suggests (or rather, INSISTS) you shouldn’t use your “root” account for AWS. It’s literally just there to let you define the rest of your admin accounts. Turn on MFA (Multi-Factor Authentication) on that account, give it an exceedingly complex password, write that on a sheet of paper, and lock it in a box. You should NEVER use it!

Create your admin account, log in to that account. Turn on MFA on *that* account too. Then, create an “Access Token” for your account. This is in IAM (Identity and Access Management). These are what we’ll use to let Terraform perform actions in AWS, without you needing to actually “log in”.

On my machine, I’ve put the credentials for this in /home/<MYUSER>/.aws/credentials and it looks like this:

[default]
aws_access_key_id = ABC123DEF456
aws_secret_access_key = AaBbCcDd1234EeFf56

This file should be chmod 600 and make sure it’s only your account that can access this file. With this token, Terraform can perform *ANY ACTION* as you, including anything that charges you money, or creating servers that can mine a “cryptocurrency” for someone malicious.

I’m using Windows Subsystem for Linux (WSL). I’m using the Ubuntu 18.04 distribution obtained from the Store. This post won’t explain how to get *that*. Also, you might want to run Terraform on Mac, in Windows or on Linux natively… so, yehr.

Next, we need to actually install Terraform. Excuse the long, unwrapped code block, but it gets what you need quickly (assuming the terraform webpage doesn’t change any time soon!)

mkdir -p ~/bin
cd ~/bin
sudo apt update && sudo apt install unzip
curl -sLO $(curl https://www.terraform.io/downloads.html | grep "linux_amd64.zip" | cut -d\" -f 2) && unzip terraform*.zip && rm terraform*.zip && chmod 755 terraform

Starting coding your infrastructure

Before you can build your first virtual machine on AWS, you need to stand up the supporting infrastructure. These are:

  • An SSH Keypair (no password logins here!)
  • A VPC (“Virtual Private Cloud”, roughly the same as a VNet on Azure, or somewhat like a L3 switch in the Physical Realm).
  • An Internet Gateway (if your VPC isn’t classed as “the default one”)
  • A Subnet.
  • A Security Group.

Once we’ve got these, we can build our Virtual Machine on EC2 (“Elastic Cloud Compute”), and associate a “Public IP” to it.

To quote my previous post:

One quirk with Terraform, versus other tools like Ansible, is that when you run one of the terraform commands (like terraform init, terraform plan or terraform apply), it reads the entire content of any file suffixed “tf” in that directory, so if you don’t want a file to be loaded, you need to either move it out of the directory, comment it out, or rename it so it doesn’t end .tf. By convention, you normally have three “standard” files in a terraform directory – main.tf, variables.tf and output.tf, but logically speaking, you could have everything in a single file, or each instruction in it’s own file.

Getting Started with Terraform on Azure – Building the file structure

For the sake of editing and annotating the files for this post, these code blocks are all separated, but on my machine, they’re all currently one big file called “main.tf“.

In that file, I start by telling it that I’m working with the Terraform AWS provider, and that it should target my nearest region.

If you want to risk financial ruin, you can put things like your access tokens in here, but I really wouldn’t chance this!

Next, we create our network infrastructure – VPC, Internet Gateway and Subnet. We also change the routing table.

I suspect, if I’d created the VPC as “The Default” VPC, then I wouldn’t have needed to amend the routing table, nor added an Internet Gateway. To help us make the routing table change, there’s a “data” block in this section of code. A data block is an instruction to Terraform to go and ask a resource for *something*, in this case, we need AWS to tell Terraform what the routing table is that it created for the VPC. Once we have that we can ask for the routing table change.

AWS doesn’t actually give “proper” names to any of it’s assets. To provide something with a “real” name, you need to tag that thing with the “Name” tag. These can be practically anything, but I’ve given semi-sensible names to everything. You might want to name everything “main” (like I nearly did)!

We’re getting close to being able to create the VM now. First of all, we’ll create the Security Groups. I want to separate out my “Allow Egress Traffic” rule from my “Inbound SSH” rule. This means that I can clearly see what hosts allow inbound SSH access. Like with my Azure post, I’m using a “data provider” to get my public IP address, but in a normal “live” network, you’d specify a collection of valid source address ranges.

Last steps before we create the Virtual Machine. We need to upload our SSH key, and we need to find the “AMI” (AWS Machine ID) of the image we’ll be using. To create the key, in this directory, along side the .tf files, I’ve put my SSH public key (called id_rsa.pub), and we load that key when we create the “my_key” resource. To find the AMI, we need to make another data call, this time asking the AMI index to find the VM with the name containing ubuntu-bionic-18.04 and some other stuff. AMIs are region specific, so the image I’m using in eu-west-2 will not be the same AMI in eu-west-1 or us-east-1 and so on. This filtering means that, as long as the image exists in that region, we can use “the right one”. So let’s take a look at this file.

So, now we have everything we need to create our VM. Let’s do that!

In here, we specify a “user_data” file to upload, in this case, the contents of a file – CloudDev.sh, but you can load anything you want in here. My CloudDev.sh is shown below, so you can see what I’m doing with this file :)

So, having created all this lot, you need to execute the terraform workload. Initially you do terraform init. This downloads all the provisioners and puts them into the same tree as these .tf files are stored in. It also resets the state of the terraform discovered or created datastore.

Next, you do terraform plan -out tfout. Technically, the tfout part can be any filename, but having something like tfout marks it as clearly part of Terraform. This creates the tfout file with the current state, and whatever needs to change in the Terraform state file on it’s next run. Typically, if you don’t use a tfout file within about 20 minutes, it’s probably worth removing it.

Finally, once you’ve run your plan stage, now you need to apply it. In this case you execute terraform apply tfout. This tfout is the same filename you specified in terraform plan. If you don’t include -out tfout on your plan (or even run a plan!) and tfout in your apply, then you can skip the terraform plan stage entirely.

Once you’re done with your environment, use terraform destroy to shut it all down… and enjoy :)

Featured image is “code crunching” by “Ruben Molina” on Flickr and is released under a CC-ND license.

"Key mess" by "Alper Çuğun" on Flickr

Making Ansible Keys more useful in complex and nested data structures

One of the things I miss about Jekyll when I’m working with Ansible is the ability to fragment my data across multiple files, but still have it as a structured *whole* at the end.

For example, given the following directory structure in Jekyll:

+ _data
|
+---+ members
|   +--- member1.yml
|   +--- member2.yml
|
+---+ groups
    +--- group1.yml
    +--- group2.yml

The content of member1.yml and member2.yml will be rendered into site.data.members.member1 and site.data.members.member2 and likewise, group1 and group2 are loaded into their respective variables.

This kind of structure isn’t possible in Ansible, because all the data files are compressed into one vars value that we can read. To work around this on a few different projects I’ve worked on, I’ve ended up doing the following:

- set_fact:
    my_members: |-
      [
        {%- for var in vars | dict2items -%}
          {%- if var.key | regex_search(my_regex) is not none -%}
            "{{ var.key | regex_replace(my_regex, '') }}": 
              {%- if var.value | string %}"{% endif -%}
              {{ var.value }}
              {%- if var.value | string %}"{% endif %},
          {%- endif -%}
        {%- endfor -%}
      ]
  vars:
    my_regex: '^member_'

So, what this does is to step over all the variables defined (for example, in host_vars\*, group_vars\*, from the gathered facts and from the role you’re in – following Ansible’s loading precedence), and then checks to see whether the key of that variable name (e.g. “member_i_am_a_member” or “member_1”) matches the regular expression (click here for more examples). If it does, the key (minus the regular expression matching piece [using regex_replace]) is added to a dictionary, and the value attached. If the value is actually a string, then it wraps it in quotes.

So, while this doesn’t give me my expressive data structure that Jekyll does (no site.data.members.member1.somevalue for me), I do at least get to have my_members.member1.somevalue if I put the right headers in! :)

I’ll leave extending this model for doing other sorts of building variables out (for example, something like if var.value['variable_place'] | default('') == 'my_members.member' + current_position) to the reader to work out how they could use something like this in their workflows!

Featured image is “Key mess” by “Alper Çuğun” on Flickr and is released under a CC-BY license.

"Seca" by "Olearys" on Flickr

Getting Started with Terraform on Azure

I’m strongly in the “Ansible is my tool, what needs fixing” camp, when it comes to Infrastructure as Code (IaC) but, I know there are other tools out there which are equally as good. I’ve been strongly advised to take a look at Terraform from HashiCorp. I’m most familiar at the moment with Azure, so this is going to be based around resources available on Azure.


Late edit: I want to credit my colleague, Pete, for his help getting started with this. While many of the code samples have been changed from what he provided me with, if it hadn’t been for these code samples in the first place, I’d never have got started!

Late edit 2: This post was initially based on Terraform 0.11, and I was prompted by another colleague, Jon, that the available documentation still follows the 0.11 layout. 0.12 was released in May, and changes how variables are reused in the code. This post now *should* follow the 0.12 conventions, but if you spot something where it doesn’t, check out this post from the Terraform team.


As with most things, there’s a learning curve, and I struggled to find a “simple” getting started guide for Terraform. I’m sure this is a failing on my part, but I thought it wouldn’t hurt to put something out there for others to pick up and see if it helps someone else (and, if that “someone else” is you, please let me know in the comments!)

Pre-requisites

You need an Azure account for this. This part is very far outside my spectrum of influence, but I’m assuming you’ve got one. If not, look at something like Digital Ocean, AWS or VMWare :) For my “controller”, I’m using Windows Subsystem for Linux (WSL), and wrote the following notes about getting my pre-requisites.

Building the file structure

One quirk with Terraform, versus other tools like Ansible, is that when you run one of the terraform commands (like terraform init, terraform plan or terraform apply), it reads the entire content of any file suffixed “tf” in that directory, so if you don’t want a file to be loaded, you need to either move it out of the directory, comment it out, or rename it so it doesn’t end .tf. By convention, you normally have three “standard” files in a terraform directory – main.tf, variables.tf and output.tf, but logically speaking, you could have everything in a single file, or each instruction in it’s own file. Because this is a relatively simple script, I’ll use this standard layout.

The actions I’ll be performing are the “standard” steps you’d perform in Azure to build a single Infrastructure as a Service (IAAS) server service:

  • Create your Resource Group (RG)
  • Create a Virtual Network (VNET)
  • Create a Subnet
  • Create a Security Group (SG) and rules
  • Create a Public IP address (PubIP) with a DNS name associated to that IP.
  • Create a Network Interface (NIC)
  • Create a Virtual Machine (VM), supplying a username and password, the size of disks and VM instance, and any post-provisioning instructions (yep, I’m using Ansible for that :) ).

I’m using Visual Studio Code, but almost any IDE will have integrations for Terraform. The main thing I’m using it for is auto-completion of resource, data and output types, also the fact that control+clicking resource types opens your browser to the documentation page on terraform.io.

So, creating my main.tf, I start by telling it that I’m working with the Terraform AzureRM Provider (the bit of code that can talk Azure API).

This simple statement is enough to get Terraform to load the AzureRM, but it still doesn’t tell Terraform how to get access to the Azure account. Use az login from a WSL shell session to authenticate.

Next, we create our basic resource, vnet and subnet resources.

But wait, I hear you cry, what are those var.something bits in there? I mentioned before that in the “standard” set of files is a “variables.tf” file. In here, you specify values for later consumption. I have recorded variables for the resource group name and location, as well as the VNet name and subnet name. Let’s add those into variables.tf.

When you’ve specified a resource, you can capture any of the results from that resource to use later – either in the main.tf or in the output.tf files. By creating the resource group (called “rg” here, but you can call it anything from “demo” to “myfirstresourcegroup”), we can consume the name or location with azurerm_resource_group.rg.name and azurerm_resource_group.rg.location, and so on. In the above code, we use the VNet name in the subnet, and so on.

After the subnet is created, we can start adding the VM specific parts – a security group (with rules), a public IP (with DNS name) and a network interface. I’ll create the VM itself later. So, let’s do this.

BUT WAIT, what’s that ${trimspace(data.http.icanhazip.body)}/32 bit there?? Any resources we want to load from the terraform state, but that we’ve not directly defined ourselves needs to come from somewhere. These items are classed as “data” – that is, we want to know what their values are, but we aren’t *changing* the service to get it. You can also use this to import other resource items, perhaps a virtual network that is created by another team, or perhaps your account doesn’t have the rights to create a resource group. I’ll include a commented out data block in the overall main.tf file for review that specifies a VNet if you want to see how that works.

In this case, I want to put the public IP address I’m coming from into the NSG Rule, so I can get access to the VM, without opening it up to *everyone*. I’m not that sure that my IP address won’t change between one run and the next, so I’m using the icanhazip.com service to determine my IP address. But I’ve not defined how to get that resource yet. Let’s add it to the main.tf for now.

So, we’re now ready to create our virtual machine. It’s quite a long block, but I’ll pull certain elements apart once I’ve pasted this block in.

So, this is broken into four main pieces.

  • Virtual Machine Details. This part is relatively sensible. Name RG, location, NIC, Size and what happens to the disks when the machine powers on. OK.
name                             = "iaas-vm"
location                         = azurerm_resource_group.rg.location
resource_group_name              = azurerm_resource_group.rg.name
network_interface_ids            = [azurerm_network_interface.iaasnic.id]
vm_size                          = "Standard_DS1_v2"
delete_os_disk_on_termination    = true
delete_data_disks_on_termination = true
  • Disk details.
storage_image_reference {
  publisher = "Canonical"
  offer     = "UbuntuServer"
  sku       = "18.04-LTS"
  version   = "latest"
}
storage_os_disk {
  name              = "iaas-os-disk"
  caching           = "ReadWrite"
  create_option     = "FromImage"
  managed_disk_type = "Standard_LRS"
}
  • OS basics: VM Hostname, username of the first user, and it’s password. Note, if you want to use an SSH key, this must be stored for Terraform to use without passphrase. If you mention an SSH key here, as well as a password, this can cause all sorts of connection issues, so pick one or the other.
os_profile {
  computer_name  = "iaas"
  admin_username = var.ssh_user
  admin_password = var.ssh_password
}
os_profile_linux_config {
  disable_password_authentication = false
}
  • And lastly, provisioning. I want to use Ansible for my provisioning. In this example, I have a basic playbook stored locally on my Terraform host, which I transfer to the VM, install Ansible via pip, and then execute ansible-playbook against the file I uploaded. This could just as easily be a git repo to clone or a shell script to copy in, but this is a “simple” example.
provisioner "remote-exec" {
  inline = ["mkdir /tmp/ansible"]

  connection {
    type     = "ssh"
    host     = azurerm_public_ip.iaaspubip.fqdn
    user     = var.ssh_user
    password = var.ssh_password
  }
}

provisioner "file" {
  source = "ansible/"
  destination = "/tmp/ansible"

  connection {
    type     = "ssh"
    host     = azurerm_public_ip.iaaspubip.fqdn
    user     = var.ssh_user
    password = var.ssh_password
  }
}

provisioner "remote-exec" {
  inline = [
    "sudo apt update > /tmp/apt_update || cat /tmp/apt_update",
    "sudo apt install -y python3-pip > /tmp/apt_install_python3_pip || cat /tmp/apt_install_python3_pip",
    "sudo -H pip3 install ansible > /tmp/pip_install_ansible || cat /tmp/pip_install_ansible",
    "ansible-playbook /tmp/ansible/main.yml"
  ]

  connection {
    type     = "ssh"
    host     = azurerm_public_ip.iaaspubip.fqdn
    user     = var.ssh_user
    password = var.ssh_password
  }
}

This part of code is done in three parts – create upload path, copy the files in, and then execute it. If you don’t create the upload path, it’ll upload just the first file it comes to into the path specified.

Each remote-exec and file provisioner statement must include the hostname, username and either the password, or SSH private key. In this example, I provide just the password.

So, having created all this lot, you need to execute the terraform workload. Initially you do terraform init. This downloads all the provisioners and puts them into the same tree as these .tf files are stored in. It also resets the state of the terraform discovered or created datastore.

Next, you do terraform plan -out tfout. Technically, the tfout part can be any filename, but having something like tfout marks it as clearly part of Terraform. This creates the tfout file with the current state, and whatever needs to change in the Terraform state file on it’s next run. Typically, if you don’t use a tfout file within about 20 minutes, it’s probably worth removing it.

Finally, once you’ve run your plan stage, now you need to apply it. In this case you execute terraform apply tfout. This tfout is the same filename you specified in terraform plan. If you don’t include -out tfout on your plan (or even run a plan!) and tfout in your apply, then you can skip the terraform plan stage entirely.

When I ran this, with a handful of changes to the variable files, I got this result:

Once you’re done with your environment, use terraform destroy to shut it all down… and enjoy :)

The full source is available in the associated Gist. Pull requests and constructive criticism are very welcome!

Featured image is “Seca” by “Olearys” on Flickr and is released under a CC-BY license.

"Tower" by " Yijun Chen" on Flickr

Building a Gitlab and Ansible Tower (AWX) Demo in Vagrant with Ansible

TL;DR – I created a repository on GitHub‌ containing a Vagrantfile and an Ansible Playbook to build a VM running Docker. That VM hosts AWX (Ansible Tower’s upstream open-source project) and Gitlab.

A couple of years ago, a colleague created (and I enhanced) a Vagrant and Ansible playbook called “Project X” which would run an AWX instance in a Virtual Machine. It’s a bit heavy, and did a lot of things to do with persistence that I really didn’t need, so I parked my changes and kept an eye on his playbook…

Fast-forward to a week-or-so ago. I needed to explain what a Git/Ansible Workflow would look like, and so I went back to look at ProjectX. Oh my, it looks very complex and consumed a lot of roles that, historically, I’ve not been that impressed with… I just needed the basics to run AWX. Oh, and I also needed a Gitlab environment.

I knew that Gitlab had a docker-based install, and so does AWX, so I trundled off to find some install guides. These are listed in the playbook I eventually created (hence not listing them here). Not all the choices I made were inspired by those guides – I wanted to make quite a bit of this stuff “build itself”… this meant I wanted users, groups and projects to be created in Gitlab, and users, projects, organisations, inventories and credentials to be created in AWX.

I knew that you can create Docker Containers in Ansible, so after I’d got my pre-requisites built (full upgrade, docker installed, pip libraries installed), I add the gitlab-ce:latest docker image, and expose some ports. Even now, I’m not getting the SSH port mapped that I was expecting, but … it’s no disaster.

I did notice that the Gitlab service takes ages to start once the container is marked as running, so I did some more digging, and found that the uri module can be used to poll a URL. It wasn’t documented well how you can make it keep polling until you get the response you want, so … I added a PR on the Ansible project’s github repo for that one (and I also wrote a blog post about that earlier too).

Once I had a working Gitlab service, I needed to customize it. There are a bunch of Gitlab modules in Ansible but since a few releases back of Gitlab, these don’t work any more, so I had to find a different way. That different way was to run an internal command called “gitlab-rails”. It’s not perfect (so it doesn’t create repos in your projects) but it’s pretty good at giving you just enough to build your demo environment. So that’s getting Gitlab up…

Now I need to build AWX. There’s lots of build guides for this, but actually I had most luck using the README in their repository (I know, who’d have thought it!??!) There are some “Secrets” that should be changed in production that I’m changing in my script, but on the whole, it’s pretty much a vanilla install.

Unlike the Gitlab modules, the Ansible Tower modules all work, so I use these to create the users, credentials and so-on. Like the gitlab-rails commands, however, the documentation for using the tower modules is pretty ropey, and I still don’t have things like “getting your users to have access to your organisation” working from the get-go, but for the bulk of the administration, it does “just work”.

Like all my playbooks, I use group_vars to define the stuff I don’t want to keep repeating. In this demo, I’ve set all the passwords to “Passw0rd”, and I’ve created 3 users in both AWX and Gitlab – csa, ops and release – indicative of the sorts of people this demo I ran was aimed at – Architects, Operations and Release Managers.

Maybe, one day, I’ll even be able to release the presentation that went with the demo ;)

On a more productive note, if you’re doing things with the tower_ modules and want to tell me what I need to fix up, or if you’re doing awesome things with the gitlab-rails tool, please visit the repo with this automation code in, and take a look at some of my “todo” items! Thanks!!

Featured image is “Tower” by “Yijun Chen” on Flickr and is released under a CC-BY-SA license.

"funfair action" by "Jon Bunting" on Flickr

Improving the speed of Azure deployments in Ansible with Async

Recently I was building a few environments in Azure using Ansible, and found this stanza which helped me to speed things up.

  - name: "Schedule UDR Creation"
    azure_rm_routetable:
      resource_group: "{{ resource_group }}"
      name: "{{ item.key }}_udr"
    loop: "{{ routetables | dict2items }}"
    loop_control:
        label: "{{ item.key }}_udr"
    async: 1000
    poll: 0
    changed_when: False
    register: sleeper

  - name: "Check UDRs Created"
    async_status:
      jid: "{{ item.ansible_job_id }}"
    register: sleeper_status
    until: sleeper_status.finished
    retries: 500
    delay: 4
    loop: "{{ sleeper.results|flatten(levels=1) }}"
    when: item.ansible_job_id is defined
    loop_control:
      label: "{{ item._ansible_item_label }}"

What we do here is to start an action with an “async” time (to give the Schedule an opportunity to register itself) and a “poll” time of 0 (to prevent the Schedule from waiting to be finished). We then tell it that it’s “never changed” (changed_when: False) because otherwise it always shows as changed, and to register the scheduled item itself as a “sleeper”.

After all the async jobs get queued, we then check the status of all the scheduled items with the async_status module, passing it the registered job ID. This lets me spin up a lot more items in parallel, and then “just” confirm afterwards that they’ve been run properly.

It’s not perfect, and it can make for rather messy code. But, it does work, and it’s well worth giving it the once over, particularly if you’ve got some slow-to-run tasks in your playbook!

Featured image is “funfair action” by “Jon Bunting” on Flickr and is released under a CC-BY license.

A web browser with the example.com web page loaded

Working around the fact that Ansible’s URI module doesn’t honour the no_proxy variable…

An Ansible project I’ve been working on has tripped me up this week. I’m working with some HTTP APIs and I need to check early whether I can reach the host. To do this, I used a simple Ansible Core Module which lets you call an HTTP URI.

- uri:
    follow_redirects: none
    validate_certs: False
    timeout: 5
    url: "http{% if ansible_https | default(True) %}s{% endif %}://{{ ansible_host }}/login"
  register: uri_data
  failed_when: False
  changed_when: False

This all seems pretty simple. One of the environments I’m working in uses the following values in their environment:

http_proxy="http://192.0.2.1:8080"
https_proxy="http://192.0.2.1:8080"
no_proxy="10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16, 192.0.2.0/24, 198.51.100.0/24, 203.0.113.0/24"

And this breaks the uri module, because it tries to punt everything through the proxy if the “no_proxy” contains CIDR values (like 192.0.2.0/24) (there’s a bug raised for this)… So here’s my fix!

- set_fact:
    no_proxy_match: |
      {
        {% for no_proxy in (lookup('env', 'no_proxy') | replace(',', '') ).split() %}
          {% if no_proxy| ipaddr | type_debug != 'NoneType' %}
            {% if ansible_host | ipaddr(no_proxy) | type_debug != 'NoneType' %}
              "match": "True"
            {% endif %}
          {% endif %}
        {% endfor %}
      }

- uri:
    follow_redirects: none
    validate_certs: False
    timeout: 5
    url: "http{% if ansible_https | default(True) %}s{% endif %}://{{ ansible_host }}/login"
  register: uri_data
  failed_when: False
  changed_when: False
  environment: "{ {% if no_proxy_match.match | default(False) %}'no_proxy': '{{ ansible_host }}'{% endif %} }"

So, let’s break this down.

The key part to this script is that we need to override the no_proxy environment variable with the IP address that we’re trying to address (so that we’re not putting 16M addresses for 10.0.0.0/8 into no_proxy, for example). To do that, we use the exact same URI block, except for the environment line at the end.

In turn, the set_fact block steps through the no_proxy values, looking for IP Addresses to check ({% if no_proxy | ipaddr ... %}‌ says “if the no_proxy value is an IP Address, return it, but if it isn’t, return a ‘None’ value”) and if it’s an IP address or subnet mask, it checks to see whether the IP address of the host you’re trying to reach falls inside that IP Address or Subnet Mask ({% if ansible_host | ipaddr(no_proxy) ... %} says “if the ansible_host address falls inside the no_proxy range, then return it, otherwise return a ‘None’ value”). Both of these checks say “If this previous check returns anything other than a ‘None’ value, do the next thing”, and on the last check, the “next” thing is to set the flag ‘match’ to ‘true’. When we get to the environment variable, we say “if match is not true, it’s false, so don’t put a value in there”.

So that’s that! Yes, I could merge the set_fact block into the environment variable, but I do end up using that a fair amount. And really, if it was merged, that would be even MORE complicated to pick through.

I have raised a pull request on the Ansible project to update the documentation, so we’ll see whether we end up with people over here looking for ways around this issue. If so, let me know in the comments below! Thanks!!