"From one bloody orange!" by "Terry Madeley" on Flickr

Making Vagrant install the latest version of Ansible using Pip and run it as root in Ubuntu Virtual Machines

As previously mentioned, I use Ansible a lot inside Virtual machines orchestrated with Vagrant. Today’s brief tip is how to make Vagrant install the absolutely latest version of Ansible on Ubuntu boxes with Pip.

Here’s your Vagrantfile

Vagrant.configure("2") do |config|
  config.vm.box = "ubuntu/focal64"
  config.vm.provision "ansible_local", run: "always" do |ansible|
    ansible.playbook         = "setup.yml"
    ansible.playbook_command = "sudo ansible-playbook"
    ansible.install_mode     = "pip"
    ansible.pip_install_cmd  = "(until sudo apt update ; do sleep 1 ; done && sudo apt install -y python3-pip && sudo rm -f /usr/bin/pip && sudo ln -s /usr/bin/pip3 /usr/bin/pip && sudo -H pip install --upgrade pip) 2>&1 | tee -a /var/log/vagrant-init"
  end
end

“But, that pip_install_cmd block is huge”, I hear you cry!

Well, yes, but let’s split that out into a slightly more readable code block! (Yes, I’ve removed the “&&” for clarity sake – it just means “only execute the next command if this one worked”)

(
  # Wait until we get the apt "package lock" released
  until sudo apt update
  do
    # By sleeping for 1 second increments until it works
    sleep 1
  done

  # Then install python3-pip
  sudo apt install -y python3-pip

  # Just in case python2-pip is installed, delete it
  sudo rm -f /usr/bin/pip

  # And symbolically link pip3 to pip
  sudo ln -s /usr/bin/pip3 /usr/bin/pip

  # And then do a pip self-upgrade
  sudo -H pip install --upgrade pip

# And output this to the end of the file /var/log/vagrant-init, including any error messages
) 2>&1 | tee -a /var/log/vagrant-init

What does this actually do? Well, pip is the python package manager, so we’re asking for the latest packaged version to be installed (it often isn’t particularly with older releases of, well, frankly any Linux distribution) – this is the “pip_install_cmd” block. Then, once pip is installed, it’ll run “pip install ansible” – which will give it the latest version available to Pip, and then when that’s all done, it’ll run “sudo ansible-playbook /vagrant/setup.yml”

Featured image is “From one bloody orange!” by “Terry Madeley” on Flickr and is released under a CC-BY license.

"Milestone, Otley" by "Tim Green" on Flickr

Changing the default routing metric with Netplan, NetworkManager and ifupdown

In the past few months I’ve been working on a project, and I’ve been doing the bulk of that work using Vagrant.

By default and convention, all Vagrant machines, set up using Virtualbox have a “NAT” interface defined as the first network interface, but I like to configure a second interface as a “Bridged” interface which gives the host a “Real” IP address on the network as this means that any security appliances I have on my network can see what device is causing what traffic, and I can quickly identify which hosts are misbehaving.

By default, Virtualbox uses the network 10.0.2.0/24 for the NAT interface, and runs a DHCP server for that interface. In the past, I’ve removed the default route which uses 10.0.2.2 (the IP address of the NAT interface on the host device), but with Ubuntu 20.04, this route keeps being re-injected, so I had to come up with a solution.

Fixing Netplan

Ubuntu, in at least 20.04, but (according to Wikipedia) probably since 17.10, has used Netplan to define network interfaces, superseding the earlier ifupdown package (which uses /etc/network/interfaces and /etc/network/interface.d/* files to define the network). Netplan is a kind of meta-script which, instructs systemd or NetworkManager to reconfigure the network interfaces, and so making the configuration changes here seemed most sensible.

Vagrant configures the file /etc/netplan/50-cloud-init.yml with a network configuration to support this DHCP interface, and then applies it. To fix it, we need to rewrite this file completely.

#!/bin/bash

# Find details about the interface
ifname="$(grep -A1 ethernets "/etc/netplan/50-cloud-init.yaml" | tail -n1 | sed -Ee 's/[ ]*//' | cut -d: -f1)"
match="$(grep macaddress "/etc/netplan/50-cloud-init.yaml" | sed -Ee 's/[ ]*//' | cut -d\  -f2)"

# Configure the netplan file
{
  echo "network:"
  echo "  ethernets:"
  echo "    ${ifname}:"
  echo "      dhcp4: true"
  echo "      dhcp4-overrides:"
  echo "        route-metric: 250"
  echo "      match:"
  echo "        macaddress: ${match}"
  echo "      set-name: ${ifname}"
  echo "  version: 2"
} >/etc/netplan/50-cloud-init.yaml

# Apply the config
netplan apply

When I then came to a box running Fedora, I had a similar issue, except now I don’t have NetPlan to work with? How do I resolve this one?!

Actually, this is a four line script!

#!/bin/bash

# Get the name of the interface which has the IP address 10.0.2.2
netname="$(ip route | grep 10.0.2.2 | head -n 1 | sed -Ee 's/^(.*dev )(.*)$/\2/;s/proto [A-Za-z0-9]+//;s/metric [0-9]+//;s/[ \t]+$//')"

# Ask NetworkManager for a list of all the active connections, look for the string "eth0" and then just get the connection name.
nm="$(nmcli connection show --active | grep "${netname}" | sed -Ee 's/^(.*)([ \t][-0-9a-f]{36})(.*)$/\1/;s/[\t ]+$//g')"
# Set the network to have a metric of 250
nmcli connection modify "$nm" ipv4.route-metric 250
# And then re-apply the network config
nmcli connection up "$nm"

The last major interface management tool I’ve experienced on standard server Linux is “ifupdown” – /etc/network/interfaces. This is mostly used on Debian. How do we fix that one? Well, that’s a bit more tricky!

#!/bin/bash

# Get the name of the interface with the IP address 10.0.2.2
netname="$(ip route | grep 10.0.2.2 | head -n 1 | sed -Ee 's/^(.*dev )(.*)$/\2/;s/proto [A-Za-z0-9]+//;s/metric [0-9]+//;s/[ \t]+$//')"

# Create a new /etc/network/interfaces file which just looks in "interfaces.d"
echo "source /etc/network/interfaces.d/*" > /etc/network/interfaces

# Create the loopback interface file
{
  echo "auto lo"
  echo "iface lo inet loopback"
} > "/etc/network/interfaces.d/lo"
# Bounce the interface
ifdown lo ; ifup lo

# Create the first "real" interface file
{
  echo "allow-hotplug ${netname}"
  echo "iface ${netname} inet dhcp"
  echo "  metric 1000"
} > "/etc/network/interfaces.d/${netname}"
# Bounce the interface
ifdown "${netname}" ; ifup "${netname}"

# Loop through the rest of the interfaces
ip link | grep UP | grep -v lo | grep -v "${netname}" | cut -d: -f2 | sed -Ee 's/[ \t]+([A-Za-z0-9.]+)[ \t]*/\1/' | while IFS= read -r int
do
  # Create the interface file for this interface, assuming DHCP
  {
    echo "allow-hotplug ${int}"
    echo "iface ${int} inet dhcp"
  } > "/etc/network/interfaces.d/${int}"
  # Bounce the interface
  ifdown "${int}" ; ifup "${int}"
done

Looking for one consistent script which does this all?

#!/bin/bash
# This script ensures that the metric of the first "NAT" interface is set to 1000,
# while resetting the rest of the interfaces to "whatever" the DHCP server offers.

function netname() {
  ip route | grep 10.0.2.2 | head -n 1 | sed -Ee 's/^(.*dev )(.*)$/\2/;s/proto [A-Za-z0-9]+//;s/metric [0-9]+//;s/[ \t]+$//'
}

if command -v netplan
then
  ################################################
  # NETPLAN
  ################################################

  # Find details about the interface
  ifname="$(grep -A1 ethernets "/etc/netplan/50-cloud-init.yaml" | tail -n1 | sed -Ee 's/[ ]*//' | cut -d: -f1)"
  match="$(grep macaddress "/etc/netplan/50-cloud-init.yaml" | sed -Ee 's/[ ]*//' | cut -d\  -f2)"

  # Configure the netplan file
  {
    echo "network:"
    echo "  ethernets:"
    echo "    ${ifname}:"
    echo "      dhcp4: true"
    echo "      dhcp4-overrides:"
    echo "        route-metric: 1000"
    echo "      match:"
    echo "        macaddress: ${match}"
    echo "      set-name: ${ifname}"
    echo "  version: 2"
  } >/etc/netplan/50-cloud-init.yaml

  # Apply the config
  netplan apply
elif command -v nmcli
then
  ################################################
  # NETWORKMANAGER
  ################################################

  # Ask NetworkManager for a list of all the active connections, look for the string "eth0" and then just get the connection name.
  nm="$(nmcli connection show --active | grep "$(netname)" | sed -Ee 's/^(.*)([ \t][-0-9a-f]{36})(.*)$/\1/;s/[\t ]+$//g')"
  # Set the network to have a metric of 250
  nmcli connection modify "$nm" ipv4.route-metric 1000
  nmcli connection modify "$nm" ipv6.route-metric 1000
  # And then re-apply the network config
  nmcli connection up "$nm"
elif command -v ifup
then
  ################################################
  # IFUPDOWN
  ################################################

  # Get the name of the interface with the IP address 10.0.2.2
  netname="$(netname)"
  # Create a new /etc/network/interfaces file which just looks in "interfaces.d"
  echo "source /etc/network/interfaces.d/*" > /etc/network/interfaces
  # Create the loopback interface file
  {
    echo "auto lo"
    echo "iface lo inet loopback"
  } > "/etc/network/interfaces.d/lo"
  # Bounce the interface
  ifdown lo ; ifup lo
  # Create the first "real" interface file
  {
    echo "allow-hotplug ${netname}"
    echo "iface ${netname} inet dhcp"
    echo "  metric 1000"
  } > "/etc/network/interfaces.d/${netname}"
  # Bounce the interface
  ifdown "${netname}" ; ifup "${netname}"
  # Loop through the rest of the interfaces
  ip link | grep UP | grep -v lo | grep -v "${netname}" | cut -d: -f2 | sed -Ee 's/[ \t]+([A-Za-z0-9.]+)[ \t]*/\1/' | while IFS= read -r int
  do
    # Create the interface file for this interface, assuming DHCP
    {
      echo "allow-hotplug ${int}"
      echo "iface ${int} inet dhcp"
    } > "/etc/network/interfaces.d/${int}"
    # Bounce the interface
    ifdown "${int}" ; ifup "${int}"
  done
fi

Featured image is “Milestone, Otley” by “Tim Green” on Flickr and is released under a CC-BY license.

"Platform" by "Brian Crawford" on Flickr

Cross Platform Decision Records/Architectural Decision Records – a HowTo Guide

Several months ago, I wrote a post talking about Architectural Decision Records with adr-tools, but since then I’ve moved on a bit with things, so I wanted to write about alternatives.

I also wanted to comment a bit on why I use the term “Decision Records” (always “decision record”, never “DR” due to the overloading of that particular abbreviation) rather than “Architectural Decision Records” (ADR), but I’ll get to that towards the end of the post 😊

Using Decision Records the Manual Way

A decision record is usually basically a text file, using the “Markdown” format, which has several “standard” blocks of text in it. The “npryce” version, which most people use, has the following sections in it:

  1. Title (as a “level 1” heading) which also holds the date of the record.
  2. A (level 2 heading) status section, holding the status of this decision (and any links to documents which supersede or relate to this decision).
  3. The context of the decision.
  4. The decision.
  5. The consequences of that decision.

So, somewhat understandably, your organisational tooling should support you making your own documents, without using those tools.

There are conventions about how the index-critical details will be stored:

  1. Your title block should follow the format # 1. Decision Title. The # symbol means it is the primary heading for the document, then the number, which should probably be lower than 9999, is used as an index for linking to other records and then the text of the title should also be the name of the file you’ve created. In this case, it will likely be 0001-decision-title.md.
  2. The status will usually be one of: Approved or Proposed. If a document is superseded, it should remove this status. Any other link type will live under the line showing the current status.

So, there’s no reason why you couldn’t just use this template for any files you create:

# NUMBER. TITLE

Date: yyyy-mm-dd

## Status

Accepted
Superseded by [2. Another Decision](0002-another-decision.md)

## Context

The context of the decision.

## Decision

The decision.

## Consequences

The consequences of that decision.

BUT, that’s not very automated, is it?

ADRs using Bash

Of course, most people making decision records use the Bash command line….. right? Oh, perhaps not. I’ll get back to you in a tic. If you’re using Bash, the “npryce” tooling I mentioned above is the same one I wrote about those months ago. So, read that, and then crack on with your ADRs.

ADRs using Powershell

So, if you’re using Windows, you might be tempted to find a decision record tool for Powershell. If so, I found “ajoberstar” on Github had produced just such a thing, and you “just”, as an administrator, run:

Install-Module -Name ArchitectureDecisionRecords
Set-ExecutionPolicy -ExecutionPolicy RemoteSigned

Then edit the script you installed (in C:\Program Files\WindowsPowerShell\Modules\ArchitectureDecisionRecords\0.1.1\ArchitectureDecisionRecords.psm1) and search-and-replace UTF8NoBOM for UTF8 and then save it…

And then you can run commands like Initialize-Adr or New-Adr -Title 'Use a database'. However, this script was last touched on 2nd July 2018, and although I’ve raised a few issues, they don’t seem to have been resolved (see also replacing UTF8NoBOM above).

ADRs using VSCode

By far, so far, the best tooling I’ve seen in this space is the adr-tools extension for VSCode. It too, however, has it’s own caveats, but these are not disastrous. Essentially, you need to create a path in which you store the template to use. You can get this from his own repo, here: https://github.com/vincent-ledu/adr-template.git and put it in .adr-templates in the root directory of your project. This, however, is customizable, by going to the settings for your user or workspace, searching for ADR and adjusting the paths accordingly.

A settings pane showing the Adr paths in your project’s tree

To add a new decision record, press Ctrl+Shift+P or click the cog icon in the sidebar, and select “Command Palette…”

Opening the Command Palette in VS Code

Then start typing “adr” to select from “ADR New”, “ADR Init”, “ADR Change Status” or “ADR Link”.

The Command Palette showing your options for commands to run

All of these will walk you through some options at the top of the screen, either asking for some text input, or asking you to select between options.

You may be tempted to just run this up now, and select “ADR New”, and it’ll look like it’s working, but, you first need to have obtained the template and create the directory structure. Selecting “ADR Init” will create the directory structure for your project and will try to perform a git clone of the repo mentioned above, but if you are already in a git repository, or you have some form of MITM proxy in the way, this will also break silently. The easiest thing to do is to either manually create the paths in your tree, according to what you have set or selected, or just run the ADR init, and then obtain the template from the git repo.

Talking of templates, in the previous scripts, the script would come with a template file built-in, and it would do a simple string replacement of the values “NUMBER”, “TITLE” and “STATUS”. With this script it instead uses it’s own template, which is stored in your project’s file tree, and uses parameter substitution, finding strings wrapped in pairs of curled braces (like {{ this }}). The downside to this is that you can’t just reuse the template I listed above… but no worries, get the file from the repo and stick it in your tree where it’s expecting it, or let the adr init function clone the template into your path – job done.

What other options are there?

Well, actually, this comes down to why I’m using the term “decision records” rather than “architectural decision record”, because I’m writing my own tool, and all the “adr” namespaces on Github were taken, and I’d seen a fair amount of posts suggesting that the “A” in “ADR” should stand for “Any”.. and I figured why should it exist at all?

The tool I’ve written so far is written in Javascript, and is starting from a (somewhat loose) TDD development process. It’s here: https://github.com/DecisionRecords/javascript-decision-records

Why Javascript? Frankly, I needed to learn a modern programming language, and wanted to apply it to a domain I was interested in. It’s currently not complete, it creates the record path and a configuration file, and I’m currently writing the functions to create new records. Also, because it’s Javascript, in theory I can also use the internals to create a VSCode extension with this later… MUCH later!

Why re-implement this at all? Firstly, it looks like most of the development work on those projects halted around 3-4 years ago, with no further interest in updating them to resolve bugs and issues. I didn’t want to fork the projects as-is, as I think they were largely written to scratch a particular itch (which is fine!) but they all miss key things I want to provide, like proper unit testing (only the npryce project comes close to this), internationalisation (none of them have this) and the ability to use a company- or project-wide template (only the VSCode extension does this). I also saw requests to support alternative file formats (like Restructured Text, which was completely rejected) and realised that if you built the script in such a way that these alternate formats could be used, then there was no reason not to support that.

In summary

There are tools you can use, whatever platform you’re using. My preference is the VSCode extension, and eventually will (hopefully!!) be the script I’m writing… but it’s not ready, yet.

Featured image is “Platform” by “Brian Crawford” on Flickr and is released under a CC-BY license.

"Picture in Picture" by "Mats" on Flickr

Hints and Tips when using Vagrant on Windows

I’ve been using HashiCorp’s Vagrant with Oracle’s VirtualBox for several years (probably since 2013, if my blog posts are anything to go by), and I’ve always been pretty comfortable with how it works.

This said, when using a Windows machine running Microsoft’s Hyper-V (built into Windows since Windows 7/2018) VirtualBox is unable (by default) to run 64 bit virtual machines (thanks to Hyper-V “stealing” the VT-x/AMD-V bit from the BIOS/EFI).

Around last year or maybe even the year before, Microsoft introduced a “Hypervisior Platform” add-on, which lets VirtualBox run 64 bit machines on a Hyper-V host (more on this later). HOWEVER, it is much slower than in native mode, and can often freeze on booting…

Meanwhile, Vagrant, (a configuration file that acts as a wrapper around various hypervisors, using VirtualBox by default) boots machines in a “headless” mode by default, so you can’t see the freezing.

I’m trying to use an Ubuntu 18.04 virtual machine for various builds I’m creating, and found that I’d get a few issues on boot, so let’s get these sorted out.

VirtualBox can’t start 64bit virtual machines when Hyper-V is installed.

You need to confirm that certain Windows features are enabled, including “Hyper-V” and “Windows Hypervisor Platform”. Confirm you’re running at least Windows 10 version 1803 which is the first release with the “Windows Hypervisor Platform”.

GUI mode

Run winver to bring up this box. Confirm the version number is greater than 1803. Mine is 1909.

A screenshot of the “winver” command, highlighting the version number, which in this case shows 1909, but needs to show at least 1803.

Right click on the start menu, and select “Apps and Features”. Click on “Programs and Features”.

The settings panel found by right clicking the “Start Menu” and selecting “Apps and Features”. Note the desired next step – “Programs and Features” is highlighted.

In the “Programs and Features” window, click on “Turn Windows Features on or off”. Note the shield icon here indicates that administrative access is required, and you may be required to authenticate to the machine to progress past this stage.

A fragment of the “Programs and Features” window, with the “Turn Windows features on or off” link highlighted.

Next, ensure that the following “Windows Features” are enabled; “Hyper-V”, “Virtual Machine Platform” and “Windows Hypervisor Platform”. Click on “OK” to install these features, if they’re not already installed.

A screen capture of the “Turn Windows features on or off” dialogue box, with certain features obscured and others highlighted.

Note that once you’ve pressed “OK”, you’ll likely need to reboot your machine, if any of these features were not already installed.

CLI mode

Right click on the start menu, and start an Administrative Powershell session.

Run the command Get-ComputerInfo | select WindowsVersion. You should get a response which looks like this:

WindowsVersion
--------------
1909

Note that the version number needs to be greater than 1803.

Next, find the names of the features you need to install. These features have region specific names, so outside EN-GB, these names may not match your requirements!

Run the command Get-WindowsOptionalFeature -online | select FeatureName,State and you’re looking for the following lines (this has been cropped to just what you need):

FeatureName                                     State
-----------                                     -----
HypervisorPlatform                            Enabled
VirtualMachinePlatform                        Enabled
Microsoft-Hyper-V-All                         Enabled

If any of these three features are not enabled, run Enable-WindowsOptionalFeature -online -FeatureName x where “x” is the name of the feature, listed in the above text block, you want to install. For example: Enable-WindowsOptionalFeature -online -FeatureName HypervisorPlatform,VirtualMachinePlatform,Microsoft-Hyper-V-All. If you run this when they’re already enabled, it should return RestartNeeded : False, but otherwise you’re likely to need to reboot.

After the reboot

After you’ve rebooted, and you start a 64 bit virtual machine in VirtualBox, you’ll see this icon in the bottom corner.

A screen grab of the VirtualBox Status Bar, highlighting the “Slow Mode” icon representing the CPU

Booting the Virtual Machine with Vagrant fails because it takes too long to boot

This was historically a big issue with Vagrant and VirtualBox, particularly with Windows Vagrant boxes, but prior to the Hyper-V/VirtualBox solution, it’d been largely fixed (or at least, I wasn’t seeing it!) There is a “standard” timeout for booting a Virtual Machine, I think at approximately 5 minutes, but I might be wrong. To make this “issue” stop occurring, add this config.vm.boot_timeout = 0 line to your Vagrantfile, like this:

Vagrant.configure("2") do |config|
  config.vm.boot_timeout = 0
end

This says to Vagrant, don’t worry how long it takes to boot, just keep waiting until it does. Yes, it will be slower, but it should get there in the end!

Booting the Virtual Maching with Vagrant does not fail, but it never authenticates with your Private Key.

Your VM may sit at this block for quite a while:

==> default: Waiting for machine to boot. This may take a few minutes...
    default: SSH address: 127.0.0.1:2222
    default: SSH username: vagrant
    default: SSH auth method: private key

If this occurs, you may find that your virtual machine has hung during the boot process… but weirdly, a simple work-around to this is to ensure that the VirtualBox GUI is open, and that you’ve got a block like this (config.vm.provider / vb.gui=true / end) in your Vagrantfile:

Vagrant.configure("2") do |config|
  config.vm.provider "virtualbox" do |vb|
    vb.gui = true
  end
end

This forces VirtualBox to open a window with your Virtual Machine’s console on it (much like having a monitor attached to real hardware). You don’t need to interact with it, but any random hangs or halts on your virtual machine may be solved just by bringing this window, or the VirtualBox Machines GUI, to the foreground.

Sometimes you may see, when this happens, a coredump or section of kernel debugging code on the console. Don’t worry about this!

Vagrant refuses to SSH to your virtual machine when using the vagrant ssh command.

Provisioning works like a treat, and you can SSH into the virtual machine from any other environment, but, when you run vagrant ssh, you get an error about keys not being permitted or usable. This is fixable by adding a single line, either to your system or user -wide environment variables, or by adding a line to your Vagrantfile.

The environment variable is VAGRANT_PREFER_SYSTEM_BIN, and by setting this to 0, it will use bundled versions of ssh or rsync instead of using any versions provided by Windows.

You can add a line like this ENV['VAGRANT_PREFER_SYSTEM_BIN']="0" to your Vagrantfile, outside of the block Vagrant.configureend, like this:

ENV['VAGRANT_PREFER_SYSTEM_BIN']="0"
Vagrant.configure("2") do |config|
end

Sources

Featured image is “Picture in Picture” by “Mats” on Flickr and is released under a CC-BY-SA license.

"2015_12_06_Visé_135942" by "Norbert Schnitzler" on Flickr

Idea for Reusable “Custom Data” templates across multiple modules with Terraform

A few posts ago I wrote about building Windows virtual machines with Terraform, and a couple of days ago, “YoureInHell” on Twitter reached out and asked what advice I’d give about having several different terraform modules use the same basic build of custom data.

They’re trying to avoid putting the same template file into several repos (I suspect so that one team can manage the “custom-data”, “user-data” or “cloud-init” files, and another can manage the deployment terraform files), and asked if I had any suggestions.

I had three ideas.

Using a New Module

This was my initial thought; create a new module called something like “Standard Build File”, and this build file contains just the following terraform file, and a template file called “build.tmpl”.

variable "someKey" {
  default = "someVar"
}

variable "hostName" {
  default = "hostName"
}

variable "unsetVar" {}

output "template" {
  value = templatefile("build.tmpl",
    {
      someKey  = var.someKey
      hostName = var.hostName
      unsetVar = var.unsetVar
    }
  )
}

Now, in your calling module, you can do:

module "buildTemplate" {
  source   = "git::https://git.example.net/buildTemplate.git?ref=latestLive"
  # See https://www.terraform.io/docs/language/modules/sources.html
  #   for more details on how to specify the source of this module
  unsetVar = "Set To This String"
}

output "RenderedTemplate" {
  value = module.buildTemplate.template
}

And that means that you can use the module.buildTemplate.template anywhere you’d normally specify your templateFile, and get a consistent, yet customizable template (and note, because I specified a particular tag, you can use that to move to the “current latest” or “the version we released into live on YYYY-MM-DD” by using a tag, or a commit ref.)

Now, the downside to this is that you’ve now got a whole separate module for creating your instances that needs to be maintained. What are our other options?

Git Submodules for your template

I use Git Submodules a LOT for my code. It’s a bit easy to get into a state with them, particularly if you’re not great at keeping on top of them, but… if you are OK with them, you’d create a repo, again, let’s use “https://git.example.net/buildTemplate.git” as our git repo, and put your template in there. In your terraform git repo, you’d run this command: git submodule add https://git.example.net/buildTemplate.git and this would add a directory to your repo called “buildTemplate” that you can use your templatefile function in Terraform against (like this: templatefile("buildTemplate/build.tmpl", {someVar="var"})).

Now, this means that you’ve effectively got two git repos in one tree, and if any changes occur in your submodule repo, you’d need to do git checkout main ; git pull to get the latest updates from your main branch, and when you check it out initially on another machine, you’ll need to do git clone https://git.example.net/terraform --recurse-submodules to get the submodules populated at the same time.

A benefit to this is that because it’s “inline” with the rest of your tree, if you need to make any changes to this template, it’s clearly where it’s supposed to be in your tree, you just need to remember about the submodule when it comes to making PRs and suchforth.

How about that third idea?

Keep it simple, stupid 😁

Why bother with submodules, or modules from a git repo? Terraform can be quite easy to over complicate… so why not create all your terraform files in something like this structure:

project\build.tmpl
project\web_servers\main.tf
project\logic_servers\main.tf
project\database_servers\main.tf

And then in each of your terraform files (web_servers, logic_servers and database_servers) just reference the file in your project root, like this: templatefile("../build.tmpl", {someVar="var"})

The downside to this is that you can’t as easily farm off the control of that build script to another team, and they’d be making (change|pull|merge) requests against the same repo as you… but then again, isn’t that the idea for functional teams? 😃

Featured image is “2015_12_06_Visé_135942” by “Norbert Schnitzler” on Flickr and is released under a CC-BY-SA license.

"Router" by "Ryan Hodnett" on Flickr

Post-Config of a RaspberryPi Zero W as an OTG-USB Gadget that routes

In my last post in this series I mentioned that I’d got my Raspberry Pi Zero W to act as a USB Ethernet adaptor via libComposite, and that I was using DNSMasq to provide a DHCP service to the host computer (the one you plug the Pi into). In this part, I’m going to extend what local services I could provide on this device, and start to use this as a router.

Here’s what you missed last time… When you plug the RPi in (to receive power on the data line), it powers up the RPi Zero, and uses a kernel module called “libComposite” to turn the USB interface into an Ethernet adaptor. Because of how Windows and non-Windows devices handle network interfaces, we use two features of libComposite to create an ECM/CDC interface and a RNDIS interface, called usb0 and usb1, and whichever one of these two is natively supported in the OS, that’s which interface comes up. As a result, we can then use DNSMasq to “advertise” a DHCP address for each interface, and use that to advertise services on, like an SSH server.

By making this device into a router, we can use it to access the network, without using the in-built network adaptor (which might be useful if your in-built WiFi adaptors isn’t detected under Linux or Windows without a driver), or to protect your computer from malware (by adding a second firewall that doesn’t share the same network stack as it’s host), or perhaps to ensure that your traffic is sent over a VPN tunnel.

Read More
"DeBugged!" by "Randy Heinitz" on Flickr

Debugging Bash Scripts

Yesterday I was struggling a bit with a bash script I was writing. I needed to stop it from running flat out through every loop, and I wanted to see what certain values were at key points in the script.

Yes, I know I could use “read” to pause the script and “echo” to print values, but that leaves a lot of mess that I need to clean up afterwards… so I went looking for something else I could try.

You can have extensive debug statements, which are enabled with a --debug flag or environment variable… but again, messy.

You can run bash -x ./myscript.sh – and, indeed, I do frequently do that… but that shows you the commands which were run at each point, not what the outcome is of each of those commands.

If my problem had been a syntax one, I could have installed shellcheck, which is basically a linter for Bash and other shell scripting languages, but no, I needed more detail about what was happening during the processing.

Instead, I wanted something like xdebug (from PHP)… and I found Bash Debug for VSCode. This doesn’t even need you to install any scripts or services on the target machine – it’s interactive, and has a “watch” section, where you either highlight and right-click a variable expression (like $somevar or ${somevar}) to see when it changes. You can see where in the “callstack” you are and see what values are registered by that script.

Shellcheck shows me problems in my code…
But Bash Debug helps me to find out what values are at specific points in the code.

All in all, a worthy addition to my toolbelt!

Featured image is “DeBugged!” by “Randy Heinitz” on Flickr and is released under a CC-BY license.

"raspberry pie" by "stu_spivack" on Flickr

Post-Config of a RaspberryPi Zero W as an OTG-USB Gadget for off-device computing

History

A few months ago, I was working on a personal project that needed a separate, offline linux environment. I tried various different schemes to run what I was doing in the confines of my laptop and I couldn’t make what I was working on actually achieve my goals. So… I bought a Raspberry Pi Zero W and a “Solderless Zero Dongle“, with the intention of running Docker containers on it… unfortunately, while Docker runs on a Pi Zero, it’s really hard to find base images for the ARMv6/armhf platform that the Pi Zero W… so I put it back in the drawer, and left it there.

Roll forwards a month or so, and I was doing some experiments with Nebula, and only had an old Chromebook to test it on… except, I couldn’t install the Nebula client for Linux on there, and the Android client wouldn’t give me some features I wanted… so I broke out that old Pi Zero W again…

Now, while the tests with Nebula I was working towards will be documented later, I found that a lot of the documentation about using a Raspberry Pi Zero as a USB gadget were rough and unexplained. So, this post breaks down much of the content of what I found, what I tried, and what did and didn’t work.

Late Edit 2021-06-04: I spotted some typos around providing specific DHCP options for interfaces, based on work I’m doing elsewhere with this script. I’ve updated these values accordingly. I’ve also created a specific branch for this revision.

Late Edit 2021-06-06: I’ve noticed this document doesn’t cover IPv6 at all right now. I started to perform some tweaks to cover IPv6, but as my ISP has decided not to bother with IPv6, and won’t support Hurricane Electric‘s Tunnelbroker system, I can’t test any of it, without building out an IPv6 test environment… maybe soon, eh?

Read More
"Observatories Combine to Crack Open the Crab Nebula" by "NASA Goddard Space Flight Center" on Flickr

Nebula Offline Certificate Management with a Raspberry Pi using Bash

I have been playing again, recently, with Nebula, an Open Source Peer-to-Peer VPN product which boasts speed, simplicity and in-built firewalling. Although I only have a few nodes to play with (my VPS, my NAS, my home server and my laptop), I still wanted to simplify, for me, the process of onboarding devices. So, naturally, I spent a few evenings writing a bash script that helps me to automate the creation of my Nebula nodes.

Nebula Certificates

Nebula have implemented their own certificate structure. It’s similar to an x509 “TLS Certificate” (like you’d use to access an HTTPS website, or to establish an OpenVPN connection), but has a few custom fields.

The result of typing “nebula-cert print -path ca.crt” to print the custom fields

In this context, I’ve created a nebula Certificate Authority (CA), using this command:

nebula-cert ca -name nebula.example.org -ips 192.0.2.0/24,198.51.100.0/24,203.0.113.0/24 -groups Mobile,Workstation,Server,Lighthouse,db

So, what does this do?

Well, it creates the certificate and private key files, storing the name for the CA as “nebula.example.org” (there’s a reason for this!) and limiting the subnets and groups (like AWS or Azure Tags) the CA can issue certificates with.

Here, I’ve limited the CA to only issue IP addresses in the RFC5737 “Documentation” ranges, which are 192.0.2.0/24, 198.51.100.0/24 and 203.0.113.0/24, but this can easily be expanded to 10.0.0.0/8 or lots of individual subnets (I tested, and proved 1026 separate subnets which worked fine).

Groups, in Nebula parlance, are building blocks of the Security product, and can act like source or destination filters. In this case, I limited the CA to only being allowed to issue certificates with the groups of “Mobile”, “Workstation”, “Server”, “Lighthouse” and “db”.

As this certificate authority requires no internet access, and only enough access to read and write files, I have created my Nebula CA server on a separate Micro SD card to use with a Raspberry Pi device, and this is used only to generate a new CA certificate each 6 months (in theory, I’ve not done this part yet!), and to sign keys for all the client devices as they come on board.

I copy the ca.crt file to my target machines, and then move on to creating my client certificates

Client Certificates

When you generate key materials for Public Key Cryptographic activities (like this one), you’re supposed to generate the private key on the source device, and the private key should never leave the device on which it’s generated. Nebula allows you to do this, using the nebula-cert command again. That command looks like this:

nebula-cert keygen -out-key host.key -out-pub host.pub

If you notice, there’s a key difference at this point between Nebula’s key signing routine, and an x509 TLS style certificate, you see, this stage would be called a “Certificate Signing Request” or CSR in TLS parlance, and it usually would specify the record details for the certificate (normally things like “region”, “organisational unit”, “subject name” and so on) before sending it to the CA for signing (marking it as trusted).

In the Nebula world, you create a key, and send the public part of that (in this case, “host.pub” but it can have any name you like) to the CA, at which point the CA defines what IP addresses it will have, what groups it is in, and so on, so let’s do that.

nebula-cert sign -ca-crt ca.crt -ca-key ca.key -in-pub host.pub -out-crt host.crt -groups Workstation -ip 192.0.2.5/24 -name host.nebula.example.org

Let’s pick apart these options, shall we? The first four flags “-ca-crt“, “-ca-key“, “-in-pub” and “-out-crt” all refer to the CSR process – it’s reading the CA certificate and key, as well as the public part of the keypair created for the process, and then defines what the output certificate will be called. The next switch, -groups, identifies the tags we’re assigning to this node, then (the mandatory flag) -ip sets the IP address allocated to the node. Note that the certificate is using one of the valid group names, and has been allocated a valid IP address address in the ranges defined above. If you provide a value for the certificate which isn’t valid, you’ll get a warning message.

nebula-cert issues a warning when signing a certificate that tries to specify a value outside the constraints of the CA

In the above screenshot, I’ve bypassed the key generation and asked for the CA to sign with values which don’t match the constraints.

The last part is the name of the certificate. This is relevant because Nebula has a DNS service which can resolve the Nebula IPs to the hostnames assigned on the Certificates.

Anyway… Now that we know how to generate certificates the “hard” way, let’s make life a bit easier for you. I wrote a little script – Nebula Cert Maker, also known as certmaker.sh.

certmaker.sh

So, what does certmaker.sh do that is special?

  1. It auto-assigns an IP address, based on the MD5SUM of the FQDN of the node. It uses (by default) the first CIDR mask (the IP range, written as something like 192.0.2.0/24) specified in the CA certificate. If multiple CIDR masks are specified in the certificate, there’s a flag you can use to select which one to use. You can override this to get a specific increment from the network address.
  2. It takes the provided name (perhaps webserver) and adds, as a suffix, the name of the CA Certificate (like nebula.example.org) to the short name, to make the FQDN. This means that you don’t need to run a DNS service for support staff to access machines (perhaps you’ll have webserver1.nebula.example.org and webserver2.nebula.example.org as well as database.nebula.example.org).
  3. Three “standard” roles have been defined for groups, these are “Server”, “Workstation” and “Lighthouse” [1] (the latter because you can configure Lighthouses to be the DNS servers mentioned in step 2.) Additional groups can also be specified on the command line.

[1] A lighthouse, in Nebula terms, is a publically accessible node, either with a static IP, or a DNS name which resolves to a known host, that can help other nodes find each other. Because all the nodes connect to it (or a couple of “it”s) this is a prime place to run the DNS server, as, well, it knows where all the nodes are!

So, given these three benefits, let’s see these in a script. This script is (at least currently) at the end of the README file in that repo.

# Create the CA
mkdir -p /tmp/nebula_ca
nebula-cert ca -out-crt /tmp/nebula_ca/ca.crt -out-key /tmp/nebula_ca/ca.key -ips 192.0.2.0/24,198.51.100.0/24 -name nebula.example.org

# First lighthouse, lighthouse1.nebula.example.org - 192.0.2.1, group "Lighthouse"
./certmaker.sh --cert_path /tmp/nebula_ca --name lighthouse1 --ip 1 --lighthouse

# Second lighthouse, lighthouse2.nebula.example.org - 192.0.2.2, group "Lighthouse"
./certmaker.sh -c /tmp/nebula_ca -n lighthouse2 -i 2 -l

# First webserver, webserver1.nebula.example.org - 192.0.2.168, groups "Server" and "web"
./certmaker.sh --cert_path /tmp/nebula_ca --name webserver1 --server --group web

# Second webserver, webserver2.nebula.example.org - 192.0.2.191, groups "Server" and "web"
./certmaker.sh -c /tmp/nebula_ca -n webserver2 -s -g web

# Database Server, db.nebula.example.org - 192.0.2.182, groups "Server" and "db"
./certmaker.sh --cert_path /tmp/nebula_ca --name db --server --group db

# First workstation, admin1.nebula.example.org - 198.51.100.205, group "Workstation"
./certmaker.sh --cert_path /tmp/nebula_ca --index 1 --name admin1 --workstation

# Second workstation, admin2.nebula.example.org - 198.51.100.77, group "Workstation"
./certmaker.sh -c /tmp/nebula_ca -d 1 -n admin2 -w

# First Mobile device - Create the private/public key pairing first
nebula-cert keygen -out-key mobile1.key -out-pub mobile1.pub
# Then sign it, mobile1.nebula.example.org - 198.51.100.217, group "mobile"
./certmaker.sh --cert_path /tmp/nebula_ca --index 1 --name mobile1 --group mobile --public mobile1.pub

# Second Mobile device - Create the private/public key pairing first
nebula-cert keygen -out-key mobile2.key -out-pub mobile2.pub
# Then sign it, mobile2.nebula.example.org - 198.51.100.22, group "mobile"
./certmaker.sh -c /tmp/nebula_ca -d 1 -n mobile2 -g mobile -p mobile2.pub

Technically, the mobile devices are simulating the local creation of the private key, and the sharing of the public part of that key. It also simulates what might happen in a more controlled environment – not where everything is run locally.

So, let’s pick out some spots where this content might be confusing. I’ve run each type of invocation twice, once with the short version of all the flags (e.g. -c instead of --cert_path, -n instead of --name) and so on, and one with the longer versions. Before each ./certmaker.sh command, I’ve added a comment, showing what the hostname would be, the IP address, and the Nebula Groups assigned to that node.

It is also possible to override the FQDN with your own FQDN, but this command option isn’t in here. Also, if the CA doesn’t provide a CIDR mask, one will be selected for you (10.44.88.0/24), or you can provide one with the -b/--subnet flag.

If the CA has multiple names (e.g. nebula.example.org and nebula.example.com), then the name for the host certificates will be host.nebula.example.org and also host.nebula.example.com.

Using Bash

So, if you’ve looked at, well, almost anything on my site, you’ll see that I like to use tools like Ansible and Terraform to deploy things, but for something which is going to be run on this machine, I’d like to keep things as simple as possible… and there’s not much in this script that needed more than what Bash offers us.

For those who don’t know, bash is the default shell for most modern Linux distributions and Docker containers. It can perform regular expression parsing (checking that strings, or specific collections of characters appear in a variable), mathematics, and perform extensive loop and checks on values.

I used a bash template found on a post at BetterDev.blog to give me a basic structure – usage, logging and parameter parsing. I needed two functions to parse and check whether IP addresses were valid, and what ranges of those IP addresses might be available. These were both found online. To get just enough of the MD5SUM to generate a random IPv4 address, I used a function to convert the hexedecimal number that the MDSUM produces, and then turned that into a decimal number, which I loop around the address space in the subnets. Lastly, I made extensive use of Bash Arrays in this. This was largely thanks to an article on OpenSource.com about bash arrays. It’s well worth a read!

So, take a look at the internals of the script, if you want to know some options on writing bash scripts that manipulate IP addresses and read the output of files!

If you’re looking for some simple tasks to start your portfolio of work, there are some “good first issue” tasks in the “issues” of the repo, and I’d be glad to help you work through them.

Wrap up

I hope you enjoy using this script, and I hope, if you’re planning on writing some bash scripts any time soon, that you take a look over the code and consider using some of the templates I reference.

Featured image is “Observatories Combine to Crack Open the Crab Nebula” by “NASA Goddard Space Flight Center” on Flickr and is released under a CC-BY license.

"Family" by "Ivan" on Flickr

Debian on Docker using Vagrant

I want to use Vagrant-Docker to try standing up some environments. There’s no reasonable justification, it’s just a thing I wanted to do. Normally, I’d go into this long and rambling story about why… but on this occasion, the reason was “Because it’s possible”…

TL;DR?: Get the code from the repo and enjoy 😁

Installing Docker

On Ubuntu you can install Docker following the instructions on the Docker Install Page, which includes a convenience script (that runs all the commands you need), if you want to use it. Similar instructions for Debian, CentOS and Fedora exist.

On Windows or Mac there are downloads you can get from the Docker Hub. The Windows Version requires WSL2. I don’t have a Mac, so I don’t know what the requirements are there! Installing WSL2 has a whole host of extra steps that I can’t really do justice to. See this Microsoft article for details.

Installing Vagrant

On Debian and Ubuntu you can add the HashiCorp Apt Repo and then install Vagrant, using these commands:

curl -fsSL https://apt.releases.hashicorp.com/gpg | sudo apt-key add -
sudo apt-add-repository "deb [arch=amd64] https://apt.releases.hashicorp.com $(lsb_release -cs) main"
sudo apt install vagrant

There are similar instructions for RHEL, CentOS and Fedora users there too.

Windows and Mac users will have to get the application from the download page.

Creating your Dockerfile

A Dockerfile is a simple text file which has a series of line prefixes which instruct the Docker image processor to add certain instructions to the Docker Image. I found two pages which helped me with what to add for this; “Ansible. Docker. Vagrant. Bringing together” and the git repo “AkihiroSuda/containerized-systemd“.

You see, while a Dockerfile is great at starting single binary files or scripts, it’s not very good at running SystemD… and I needed SystemD to be able to run the SSH service that Vagrant requires, and to also run the scripts and commands I needed for the image I wanted to build…

Sooooo…. here’s the Dockerfile I created:

# Based on https://vtorosyan.github.io/ansible-docker-vagrant/
# and https://github.com/AkihiroSuda/containerized-systemd/

FROM debian:buster AS debian_with_systemd

# This stuff enables SystemD on Debian based systems
STOPSIGNAL SIGRTMIN+3
RUN DEBIAN_FRONTEND=noninteractive apt update && DEBIAN_FRONTEND=noninteractive apt install -y --no-install-recommends systemd systemd-sysv dbus dbus-user-session
COPY docker-entrypoint.sh /
RUN chmod 755 /docker-entrypoint.sh
ENTRYPOINT [ "/docker-entrypoint.sh" ]
CMD [ "/bin/bash" ]

# This part installs an SSH Server (required for Vagrant)
RUN DEBIAN_FRONTEND=noninteractive apt install -y sudo openssh-server
RUN mkdir /var/run/sshd
#    We enable SSH here, but don't start it with "now" as the build stage doesn't run anything long-lived.
RUN systemctl enable ssh
EXPOSE 22

# This part creates the vagrant user, sets the password to "vagrant", adds the insecure key and sets up password-less sudo.
RUN useradd -G sudo -m -U -s /bin/bash vagrant
#    chpasswd takes a colon delimited list of username/password pairs.
RUN echo 'vagrant:vagrant' | chpasswd
RUN mkdir -m 700 /home/vagrant/.ssh
# This key from https://github.com/hashicorp/vagrant/tree/main/keys. It will be replaced on first run.
RUN echo 'ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEA6NF8iallvQVp22WDkTkyrtvp9eWW6A8YVr+kz4TjGYe7gHzIw+niNltGEFHzD8+v1I2YJ6oXevct1YeS0o9HZyN1Q9qgCgzUFtdOKLv6IedplqoPkcmF0aYet2PkEDo3MlTBckFXPITAMzF8dJSIFo9D8HfdOV0IAdx4O7PtixWKn5y2hMNG0zQPyUecp4pzC6kivAIhyfHilFR61RGL+GPXQ2MWZWFYbAGjyiYJnAmCP3NOTd0jMZEnDkbUvxhMmBYSdETk1rRgm+R4LOzFUGaHqHDLKLX+FIPKcF96hrucXzcWyLbIbEgE98OHlnVYCzRdK8jlqm8tehUc9c9WhQ== vagrant insecure public key' > /home/vagrant/.ssh/authorized_keys
RUN chmod 600 /home/vagrant/.ssh/authorized_keys
RUN chown -R vagrant:vagrant /home/vagrant
RUN echo 'vagrant ALL=(ALL:ALL) NOPASSWD:ALL' >> /etc/sudoers

This Dockerfile calls out to a separate script, called docker-entrypoint.sh, taken verbatim from AkihiroSuda’s repo, so here’s that file:

#!/bin/bash
set -ex
container=docker
export container

if [ $# -eq 0 ]; then
	echo >&2 'ERROR: No command specified. You probably want to run `journalctl -f`, or maybe `bash`?'
	exit 1
fi

if [ ! -t 0 ]; then
	echo >&2 'ERROR: TTY needs to be enabled (`docker run -t ...`).'
	exit 1
fi

env >/etc/docker-entrypoint-env

cat >/etc/systemd/system/docker-entrypoint.target <<EOF
[Unit]
Description=the target for docker-entrypoint.service
Requires=docker-entrypoint.service systemd-logind.service systemd-user-sessions.service
EOF
cat /etc/systemd/system/docker-entrypoint.target

quoted_args="$(printf " %q" "${@}")"
echo "${quoted_args}" >/etc/docker-entrypoint-cmd
cat /etc/docker-entrypoint-cmd

cat >/etc/systemd/system/docker-entrypoint.service <<EOF
[Unit]
Description=docker-entrypoint.service

[Service]
ExecStart=/bin/bash -exc "source /etc/docker-entrypoint-cmd"
# EXIT_STATUS is either an exit code integer or a signal name string, see systemd.exec(5)
ExecStopPost=/bin/bash -ec "if echo \${EXIT_STATUS} | grep [A-Z] > /dev/null; then echo >&2 \"got signal \${EXIT_STATUS}\"; systemctl exit \$(( 128 + \$( kill -l \${EXIT_STATUS} ) )); else systemctl exit \${EXIT_STATUS}; fi"
StandardInput=tty-force
StandardOutput=inherit
StandardError=inherit
WorkingDirectory=$(pwd)
EnvironmentFile=/etc/docker-entrypoint-env

[Install]
WantedBy=multi-user.target
EOF
cat /etc/systemd/system/docker-entrypoint.service

systemctl mask systemd-firstboot.service systemd-udevd.service
systemctl unmask systemd-logind
systemctl enable docker-entrypoint.service

systemd=
if [ -x /lib/systemd/systemd ]; then
	systemd=/lib/systemd/systemd
elif [ -x /usr/lib/systemd/systemd ]; then
	systemd=/usr/lib/systemd/systemd
elif [ -x /sbin/init ]; then
	systemd=/sbin/init
else
	echo >&2 'ERROR: systemd is not installed'
	exit 1
fi
systemd_args="--show-status=false --unit=multi-user.target"
echo "$0: starting $systemd $systemd_args"
exec $systemd $systemd_args

Now, if you were to run this straight in Docker, it will fail, because you must pass certain flags to Docker to get this to run. These flags are:

  • -t : pass a “TTY” to the shell
  • --tmpfs /tmp : Create a temporary filesystem in /tmp
  • --tmpfs /run : Create another temporary filesystem in /run
  • --tmpfs /run/lock : Apparently having a tmpfs in /run isn’t enough – you ALSO need one in /run/lock
  • -v /sys/fs/cgroup:/sys/fs/cgroup:ro : Mount the CGroup kernel configuration values into the container

(I found these flags via a RedHat blog post, and a Podman issue on Github.)

So, how would this look, if you were to try and run it?

docker exec -t --tmpfs /tmp --tmpfs /run --tmpfs /run/lock -v /sys/fs/cgroup:/sys/fs/cgroup:ro YourImage

Blimey, what a long set of text! Perhaps we could hide that behind something a bit more legible? Enter Vagrant.

Creating your Vagrantfile

Vagrant is an abstraction tool, designed to hide complicated virtualisation scripts into a simple command. In this case, we’re hiding a containerisation script into a simple command.

Like with the Dockerfile, I made extensive use of the two pages I mentioned before, as well as the two pages to get the flags to run this.

# Based on https://vtorosyan.github.io/ansible-docker-vagrant/
# and https://github.com/AkihiroSuda/containerized-systemd/
# and https://developers.redhat.com/blog/2016/09/13/running-systemd-in-a-non-privileged-container/
# with tweaks indicated by https://github.com/containers/podman/issues/3295
ENV['VAGRANT_DEFAULT_PROVIDER'] = 'docker'
Vagrant.configure("2") do |config|
  config.vm.provider "docker" do |d|
    d.build_dir       = "."
    d.has_ssh         = true
    d.remains_running = false
    d.create_args     = ['--tmpfs', '/tmp', '--tmpfs', '/run', '--tmpfs', '/run/lock', '-v', '/sys/fs/cgroup:/sys/fs/cgroup:ro', '-t']
  end
end

If you create that file, and run vagrant up you’ll get a working Vagrant boot… But if you try and execute any shell scripts, they’ll fail to run, as the they aren’t passed in with execute permissions… so I want to use Ansible to execute things, as these don’t require execute permissions on the /vagrant directory (also, as the thing I’m building in there requires Ansible… so it’s helpful either way 😁)

Executing Ansible scripts

Ansible still expects to find python in /usr/bin/python but current systems don’t make the symlink to /usr/bin/python3, as python was typically a symlink to /usr/bin/python2… and also I wanted to put the PPA for Ansible in the sources, which is what the Ansible team recommend in their documentation. I’ve done this as part of the Dockerfile, as again, I can’t run scripts from Vagrant. So, here’s the addition I made to the Dockerfile.

FROM debian_with_systemd AS debian_with_systemd_and_ansible
RUN apt install -y gnupg2 lsb-release software-properties-common
RUN apt-key adv --keyserver keyserver.ubuntu.com --recv-keys 93C4A3FD7BB9C367
RUN add-apt-repository "deb http://ppa.launchpad.net/ansible/ansible/ubuntu trusty main"
RUN apt install -y ansible
# Yes, I know. Trusty? On Debian Buster?? But, that's what the Ansible Docs say!

In the Vagrantfile, I’ve added this block:

config.vm.provision "ansible_local" do |ansible|
  ansible.playbook = "test.yml"
end

And I created a test.yml, which looks like this:

---
- hosts: all
  tasks:
  - debug:
      msg: "Hello from Docker"

Running it

So how does this look on Windows when I run it?

PS C:\Dev\VagrantDockerBuster> vagrant up
==> default: Creating and configuring docker networks...
==> default: Building the container from a Dockerfile...
<SNIP A LOAD OF DOCKER STUFF>
    default: #20 DONE 0.1s
    default:
    default: Image: 190ffdeaeed0b7ed206097e6c1d4b5cc796a428700c9bd3e27eedacce47fb63b
==> default: Creating the container...
    default:   Name: 2021-02-13DockerBusterWithSSH_default_1613469604
    default:  Image: 190ffdeaeed0b7ed206097e6c1d4b5cc796a428700c9bd3e27eedacce47fb63b
    default: Volume: C:/Users/SPRIGGSJ/OneDrive - FUJITSU/Documents/95 My Projects/2021-02-13 Docker Buster With SSH:/vagrant
    default:   Port: 127.0.0.1:2222:22
    default:
    default: Container created: b64ed264d8949b12
==> default: Enabling network interfaces...
==> default: Starting container...
==> default: Waiting for machine to boot. This may take a few minutes...
    default: SSH address: 127.0.0.1:2222
    default: SSH username: vagrant
    default: SSH auth method: private key
    default:
    default: Vagrant insecure key detected. Vagrant will automatically replace
    default: this with a newly generated keypair for better security.
    default:
    default: Inserting generated public key within guest...
==> default: Machine booted and ready!
==> default: Running provisioner: ansible_local...
    default: Running ansible-playbook...

PLAY [all] *********************************************************************

TASK [Gathering Facts] *********************************************************
[WARNING]: Platform linux on host default is using the discovered Python
interpreter at /usr/bin/python, but future installation of another Python
interpreter could change this. See https://docs.ansible.com/ansible/2.9/referen
ce_appendices/interpreter_discovery.html for more information.
ok: [default]

TASK [debug] *******************************************************************
ok: [default] => {
    "msg": "Hello from Docker"
}

PLAY RECAP *********************************************************************
default                    : ok=2    changed=0    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0   

PS C:\Dev\VagrantDockerBuster>

And on Linux?

Bringing machine 'default' up with 'docker' provider...
==> default: Creating and configuring docker networks...
==> default: Building the container from a Dockerfile...
<SNIP A LOAD OF DOCKER STUFF>
    default: Removing intermediate container e56bed4f7be9
    default:  ---> cef749c205bf
    default: Successfully built cef749c205bf
    default:
    default: Image: cef749c205bf
==> default: Creating the container...
    default:   Name: 2021-02-13DockerBusterWithSSH_default_1613470091
    default:  Image: cef749c205bf
    default: Volume: /home/spriggsj/Projects/2021-02-13 Docker Buster With SSH:/vagrant
    default:   Port: 127.0.0.1:2222:22
    default:
    default: Container created: 3fe46b02d7ad10ab
==> default: Enabling network interfaces...
==> default: Starting container...
==> default: Waiting for machine to boot. This may take a few minutes...
    default: SSH address: 127.0.0.1:2222
    default: SSH username: vagrant
    default: SSH auth method: private key
    default:
    default: Vagrant insecure key detected. Vagrant will automatically replace
    default: this with a newly generated keypair for better security.
    default:
    default: Inserting generated public key within guest...
    default: Removing insecure key from the guest if it's present...
    default: Key inserted! Disconnecting and reconnecting using new SSH key...
==> default: Machine booted and ready!
==> default: Running provisioner: ansible_local...
    default: Running ansible-playbook...

PLAY [all] *********************************************************************

TASK [Gathering Facts] *********************************************************
[WARNING]: Platform linux on host default is using the discovered Python
interpreter at /usr/bin/python, but future installation of another Python
interpreter could change this. See https://docs.ansible.com/ansible/2.9/referen
ce_appendices/interpreter_discovery.html for more information.
ok: [default]

TASK [debug] *******************************************************************
ok: [default] => {
    "msg": "Hello from Docker"
}

PLAY RECAP *********************************************************************
default                    : ok=2    changed=0    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0

So, if you’re crazy and want to do Vagrant using Docker with Debian Buster and Ansible, this is how to do it. I don’t know how much I’m likely to be using this in the future, but if you use it, let me know what you’re doing with it! 😀

Featured image is “Family” by “Ivan” on Flickr and is released under a CC-BY license.