A text dialogue from a web page showing "Uh oh. Something really just went wrong. Good thing we know about it and have our crack team of squirrels getting their nuts out of the system!"

How to capture stdout and stderr from a command in a shellscript without preventing piped processes from seeing them

I love the tee command – it captures stdout [1] and puts it in a file, while then returning that output to stdout for the next process in a pipe to consume, for example:

$ ls -l | tee /tmp/output
total 1
xrwxrwxrw 1 jonspriggs jonspriggs 0 Jul 27 11:16 build.sh
$ cat /tmp/output
total 1
xrwxrwxrw 1 jonspriggs jonspriggs 0 Jul 27 11:16 build.sh

But wait, why is that useful? Well, in a script, you don’t always want to see the content scrolling past, but in the case of a problem, you might need to catch up with the logs afterwards. Alternatively, you might do something like this:

if some_process | tee /tmp/output | grep -q "some text"
then
  echo "Found 'some text' - full output:"
  cat /tmp/output
fi

This works great for stdout but what about stderr [2]? In this case you could just do:

some_process 2>&1 | tee /tmp/output

But that mashes all of stdout and stderr into the same blob.

In my case, I want to capture all the output (stdout and stderr) of a given process into a file. Only stdout is forwarded to the next process, but I still wanted to have the option to see stderr as well during processing. Enter process substitution.

TEMP_DATA_PATH="$(mktemp -d)"
capture_out() {
  base="${TEMP_DATA_PATH}/${1}"
  mkdir "${base}"
  shift
  "$@" 2> >(tee "${base}/stderr" >&2) 1> >(tee "${base}/stdout")
}

With this, I run capture_out step-1 do_a_thing and then in /tmp/tmp.sometext/step-1/stdout and /tmp/tmp.sometext/step-1/stderr are the full outputs I need… but wait, I can also do:

$ capture_out step-1 do_a_thing | \
  capture_out step-2 process --the --thing && \
  capture_out step-3 echo "..." | capture_out step-4 profit
$ find /tmp/tmp.sometext -type f
/tmp/tmp.sometext/step-1/stdout
/tmp/tmp.sometext/step-1/stderr
/tmp/tmp.sometext/step-2/stdout
/tmp/tmp.sometext/step-2/stderr
/tmp/tmp.sometext/step-4/stdout
/tmp/tmp.sometext/step-4/stderr
/tmp/tmp.sometext/step-3/stderr
/tmp/tmp.sometext/step-3/stdout

Or

if capture_out has_an_error something-wrong | capture_out handler check_output
then
  echo "It all went great"
else
  echo "Process failure"
  echo "--Initial process"
  # Use wc -c to check the number of characters in the file
  if [ -e "${TEMP_DATA_PATH}/has_an_error/stdout"] && [ 0 -ne "$(wc -c "${TEMP_DATA_PATH}/has_an_error/stdout")" ]
  then
    echo "----stdout:"
    cat "${TEMP_DATA_PATH}/has_an_error/stdout"
  fi
  if [ -e "${TEMP_DATA_PATH}/has_an_error/stderr"] && [ 0 -ne "$(wc -c "${TEMP_DATA_PATH}/has_an_error/stderr")" ]
  then
    echo "----stderr:"
    cat "${TEMP_DATA_PATH}/has_an_error/stderr"
  fi
  echo "--Second stage"
  if [ -e "${TEMP_DATA_PATH}/handler/stdout"] && [ 0 -ne "$(wc -c "${TEMP_DATA_PATH}/handler/stdout")" ]
  then
    echo "----stdout:"
    cat "${TEMP_DATA_PATH}/handler/stdout"
  fi
  if [ -e "${TEMP_DATA_PATH}/handler/stderr"] && [ 0 -ne "$(wc -c "${TEMP_DATA_PATH}/handler/stderr")" ]
  then
    echo "----stderr:"
    cat "${TEMP_DATA_PATH}/handler/stderr"
  fi
fi

This has become part of my normal toolkit now for logging processes. Thanks bash!

Also, thanks to ChatGPT for helping me find this structure that I’d seen before, but couldn’t remember how to do it! (it almost got it right too! Remember kids, don’t *trust* what ChatGPT gives you, use it as a research starting point, test *that* against your own knowledge, test *that* against your environment and test *that* against expected error cases too! Copy & Paste is not the best idea with AI generated code!)

Footnotes

[1] stdout is the name of the normal output text we see in a shell, it’s also sometimes referred to as “file descriptor 1” or “fd1”. You can also output to &1 with >&1 which means “send to fd1”

[2] stderr is the name of the output in a shell when an error occurs. It isn’t caught by things like some_process > /dev/null which makes it useful when you don’t want to see output, just errors. Like stdout, it’s also referred to as “file descriptor 2” or “fd2” and you can output to &2 with >&2 if you want to send stdout to stderr.

Featured image is “WordPress Error” by “tara hunt” on Flickr and is released under a CC-BY-SA license.

A series of gold blocks, each crossed, and one of the lower blocks has engraved "Eduardo Nery 1995-1998 Aleluia - Secla"

Deploying the latest build of a template (machine) image with #Xen #Orchestrator

In my current role we are using Packer to build images on a Xen Orchestrator environment, use a CI/CD system to install that image into both a Xen Template and an AWS AMI, and then we use Terraform to use that image across our estate. The images we build with Packer have this stanza in it:

locals {
  timestamp = regex_replace(timestamp(), "[- TZ:]", "")
}
variable "artifact_name" {
  default = "SomeLinux-version.iso"
}
source "xenserver-iso" "this" {
  vm_name = "${var.artifact_name}-${local.timestamp}"
  # more config below
}

As a result, the built images include a timestamp.

When we use the AMI in Terraform, we can locate it with this code:

variable "ami_name" {
  default = "SomeLinux-version.iso-"
}

data "aws_ami" "this" {
  most_recent = true

  filter {
    name   = "name"
    values = [var.ami_name]
  }

  filter {
    name   = "virtualization-type"
    values = ["hvm"]
  }

  owners = [var.owner]
}

But, because Xen doesn’t track when a template is created, instead I needed to do something different. Enter get_xoa_template.sh.

#!/bin/bash
trap cleanup SIGINT SIGTERM EXIT
exit_message=""
set_exit=0
fail() {
[ -n "$1" ] && echo "$1" >&2
[ "$2" -gt 0 ] && exit $2
}
cleanup() {
trap – SIGINT SIGTERM EXIT
[ "$UNREGISTER" -eq 1 ] && [ "$STATE" == "signed_in" ] && xo-cli –unregister 2>&1
[ -n "$exit_message" ] && fail "$exit_message" $set_exit
}
log_debug() {
[ -n "$DEBUG" ] && echo "$1" >> "$DEBUG"
}
parse_params() {
UNREGISTER=1
DEBUG=""
while :; do
case "${1-}" in
-h | –help)
echo "usage: get_xoa_template.sh –template SomeTemplatePrefix" >&2
echo "" >&2
echo "Options:" >&2
echo " -t | –template MyTemplatePrefix = The template to look for (required)" >&2
echo " -s | –server ws://192.0.2.1 = Sign into Xen Orchestrator on 192.0.2.1" >&2
echo " [Default to using XOA_URL environment variable]" >&2
echo " -u | –user username@example.org = Sign into Xen Orchestrator using this username" >&2
echo " [Default to using XOA_USER environment variable]" >&2
echo " -p | –password hunter2 = Sign into Xen Orchestrator using this password" >&2
echo " [Default to using XOA_PASSWORD environment variable]" >&2
echo " -l | –pool MyXenPool1 = Use this pool when looking for the template." >&2
echo " [Omit to ignore]" >&2
echo " -x | –no-unregister = Don't log out from the XOA server once connected." >&2
echo " -d | –debug = Log output to /tmp/xocli_output" >&2
echo " -d | –debug /path/to/debug = Log output to the specified path" >&2
echo " –debug=/path/to/debug = Log output to the specified path" >&2
exit 255
;;
-s | –server)
XOA_URL="${2-}"
shift
;;
-u | –user)
XOA_USER="${2-}"
shift
;;
-p | –password)
XOA_PASSWORD="${2-}"
shift
;;
-l | –pool)
XOA_POOL="${2-}"
shift
;;
-t | –template)
TEMPLATE="${2-}"
shift
;;
-x | –no-unregister)
UNREGISTER=0
;;
-d | –debug)
DEBUG=/tmp/xocli_output
[ -n "${2-}" ] && [ "$(echo "${2-}" | cut -c1)" != "" ] && DEBUG="${2-}" && shift
;;
–debug=*)
DEBUG="$(echo $1 | sed -E -e 's/^[^=]+=//')"
;;
*)
break
;;
esac
shift
done
}
sign_in() {
[ -z "$XOA_URL" ] || [ -z "$XOA_USER" ] || [ -z "$XOA_PASSWORD" ] && fail "Missing sign-in details" 1
log_debug "Logging in"
if [ -n "$DEBUG" ]
then
xo-cli –register –au "$XOA_URL" "$XOA_USER" "$XOA_PASSWORD" 2>&1 | tee -a "$DEBUG" | grep -q 'Successfully' || fail "Login failed" 2
else
xo-cli –register –au "$XOA_URL" "$XOA_USER" "$XOA_PASSWORD" 2>&1 | grep -q 'Successfully' || fail "Login failed" 2
fi
STATE="signed_in"
}
get_pool() {
[ -z "$XOA_POOL" ] && log_debug "No Pool" && return 0
log_debug "Getting Pool ID"
if [ -n "$DEBUG" ]
then
POOL_ID="\$pool=$(xo-cli –list-objects type=pool | jq -c -r ".[] | select(.name_label | match(\"${XOA_POOL}\")) | .uuid" | sort | tail -n 1 | tee -a "$DEBUG")"
else
POOL_ID="\$pool=$(xo-cli –list-objects type=pool | jq -c -r ".[] | select(.name_label | match(\"${XOA_POOL}\")) | .uuid" | sort | tail -n 1)"
fi
[ "$POOL_ID" == "\$pool=" ] && fail "Pool provided but no ID received" 3
}
get_template() {
log_debug "Getting template"
if [ -n "$DEBUG" ]
then
TEMPLATE_IS="$(xo-cli –list-objects type=VM-template "${POOL_ID-}" | jq -c ".[] | select(.name_label | match(\"${TEMPLATE}\")) | .name_label" | sort | tail -n 1 | tee -a "$DEBUG")"
else
TEMPLATE_IS="$(xo-cli –list-objects type=VM-template "${POOL_ID-}" | jq -c ".[] | select(.name_label | match(\"${TEMPLATE}\")) | .name_label" | sort | tail -n 1)"
fi
[ -z "$TEMPLATE_IS" ] && fail "Could not match this template" 4
if [ -n "$DEBUG" ]
then
echo "{\"is\": ${TEMPLATE_IS}}" | tee -a "$DEBUG"
else
echo "{\"is\": ${TEMPLATE_IS}}"
fi
}
[ -n "$(command -v xo-cli)" ] || fail "xo-cli is missing, and is a required dependency for this script. Please install it; \`sudo npm -g install xo-cli\`" 5
parse_params "$@"
if [ -n "$DEBUG" ]
then
rm -f "$DEBUG"
log_debug "Invoked: $(date)"
log_debug "Template: $TEMPLATE"
log_debug "Pool: $XOA_POOL"
fi
sign_in
get_pool
get_template

This script is invoked from your terraform like this:

variable "template_name" {
  default     = "SomeLinux-version.iso-"
  description = "A regex, partial or full string to match in the template name"
}

variable "poolname" {
  default = "MyPool"
}

data "external" "get_xoa_template" {
  program = [
    "/bin/bash", "${path.module}/get_xoa_template.sh",
    "--template", var.template_name,
    "--pool", var.poolname
  ]
}

data "xenorchestra_pool" "pool" {
  name_label = var.poolname
}

data "xenorchestra_template" "template" {
  name_label = data.external.get_xoa_template.result.is
  pool_id    = data.xenorchestra_pool.pool.id
}

And that’s how you do it. Oh, and if you need to pin to a specific version? Change the template_name value from the partial or regex version to the full version, like this:

variable "template_name" {
  # This assumes your image was minted at midnight on 1970-01-01
  default     = "SomeLinux-version.iso-19700101000000"
}

Featured image is “Barcelos and Braga-18” by “Graeme Churchard” on Flickr and is released under a CC-BY license.

A photo of a door with the focus on the handle which has a lock in the centre of the knob. The lock has a key in it, with a bunch of keys dangling from the central ring.

Using direnv with terraform, terragrunt, saml2aws, SOPS and AWS KMS

In my current project I am often working with Infrastructure as Code (IoC) in the form of Terraform and Terragrunt files. Before I joined the team a decision was made to use SOPS from Mozilla, and this is encrypted with an AWS KMS key. You can only access specific roles using the SAML2AWS credentials, and I won’t be explaining how to set that part up, as that is highly dependant on your SAML provider.

While much of our environment uses AWS, we do have a small presence hosted on-prem, using a hypervisor service. I’ll demonstrate this with Proxmox, as this is something that I also use personally :)

Firstly, make sure you have all of the above tools installed! For one stage, you’ll also require yq to be installed. Ensure you’ve got your shell hook setup for direnv as we’ll need this later too.

Late edit 2023-07-03: There was a bug in v0.22.0 of the terraform which didn’t recognise the environment variables prefixed PROXMOX_VE_ – a workaround by using TF_VAR_PROXMOX_VE and a variable "PROXMOX_VE_" {} block in the Terraform code was put in place for the inital publication of this post. The bug was fixed in 0.23.0 which this post now uses instead, and so as a result the use of TF_VAR_ prefixed variables was removed too.

Set up AWS Vault

AWS KMS

AWS Key Management Service (KMS) is a service which generates and makes available encryption keys, backed by the AWS service. There are *lots* of ways to cut that particular cake, but let’s do this a quick and easy way… terraform

variable "name" {
  default = "SOPS"
  type    = string
}
resource "aws_kms_key" "this" {
  tags                     = {
    Name : var.name,
    Owner : "Admins"
  }
  key_usage                = "ENCRYPT_DECRYPT"
  customer_master_key_spec = "SYMMETRIC_DEFAULT"
  deletion_window_in_days  = 30
  is_enabled               = true
  enable_key_rotation      = false
  policy                   = <<EOF
{
  "Version": "2012-10-17",
  "Id": "key-default-1",
  "Statement": [
    {
      "Sid": "Root Access",
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::${get_aws_account_id()}:root"
      },
      "Action": "kms:*",
      "Resource": "*"
    },
    {
      "Sid": "Estate Admin Access",
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::${get_aws_account_id()}:role/estateadmins"
      },
      "Action": [
        "kms:Describe*",
        "kms:List*",
        "kms:Get*",
        "kms:Encrypt*"
      ],
      "Resource": "*"
    }
  ]
}
EOF
}

resource "aws_kms_alias" "this" {
  target_key_id = aws_kms_key.this.key_id
  name          = "alias/${var.name}"
}

output "key" {
  value = aws_kms_alias.this.arn
}

After running this, let’s assume that we get an output for the “key” value of:

arn:aws:kms:us-east-1:123456789012:alias/main

Setup Sops

In your terragrunt tree, create a file called .sops.yaml, which contains:

---
creation_rules:
  - kms: arn:aws:kms:us-east-1:123456789012:alias/main

And a file called secrets.enc.yaml which contains:

---
PROXMOX_VE_USERNAME: root@pam
PROXMOX_VE_PASSWORD: deadb33f@2023

Test that your KMS works by assuming your IAM role via SAML2AWS like this:

$ saml2aws login --skip-prompt --quiet
$ saml2aws exec -- sops --verbose --encrypt --in-place secrets.enc.yaml
[AWSKMS]	 INFO[0000] Encryption succeeded                          arn="arn:aws:kms:us-east-1:123456789012:alias/main"
[CMD]		 INFO[0000] File written successfully

Setup direnv

Outside your tree, in ~/.config/direnv/lib create a file called use_sops.sh (does not need to be chmod +x or chmod 755!) containing this:

# Based on https://github.com/direnv/direnv/wiki/Sops
use_sops() {
    local path=${1:-$PWD/secrets.enc.yaml}
    if [ -e "$path" ]
    then
        if grep -q -E '^sops:' "$path"
        then
            eval "$(sops --decrypt --output-type dotenv "$path" 2>/dev/null | direnv dotenv bash /dev/stdin || false)"
        else
            if [ -n "$(command -v yq)" ]
            then
                eval "$(yq eval --output-format props "$path" | direnv dotenv bash /dev/stdin)"
                export SOPS_WARNING="unencrypted $path"
            fi
        fi
    fi
    watch_file "$path"
}

There are two key lines here, the first of which is:

eval "$(sops -d --output-type dotenv "$path" 2>/dev/null | direnv dotenv bash /dev/stdin || false)"

This line asks sops to decrypt the secrets file, using the “dotenv” output type, however, the dotenv format looks like this:

some_key = "some value"

So, as a result, we then pass that value to direnv and ask it to rewrite it in the format it expects, which looks like this:

export some_key="some value"

The second key line is this:

eval "$(yq eval --output-format props "$path" | direnv dotenv bash /dev/stdin)"

This asks yq to parse the secrets file, using the “props” formatter, which results in lines just like the dotenv output we saw above.

However, because we used yq to parse the file, it means that we know this file isn’t encrypted, so we also add an extra export value:

export SOPS_WARNING="unencrypted $path"

This can be picked up as part of your shell prompt to put a warning in! Anyway… let’s move on.

Now that you have your reusable library file, we now configure the direnv file, .envrc for the root of your proxmox cluster:

use sops

Oh, ok, that was simple. You can add several files here if you wish, like this:

use sops file1.enc.yaml
use sops file2.enc.yml
use sops ~/.core_sops

But, we don’t need that right now!

Open your shell in that window, and you’ll get this warning:

direnv: error /path/to/demo/.envrc is blocked. Run `direnv allow` to approve its content

So, let’s do that!

$ direnv allow
direnv: loading /path/to/demo/.envrc
direnv: using sops
direnv: export +PROXMOX_VE_USERNAME +PROXMOX_VE_PASSWORD
$

So far, so good… but wait, you’ve authenticated to your SAML access to AWS. Let’s close that shell, and go back in again

$ cd /path/to/demo
direnv: loading /path/to/demo/.envrc
direnv: using sops
$

Ah, now we don’t have our values exported. That’s what we wanted!

What now?!

Configuring the details of the proxmox cluster

We have our .envrc file which provides our credentials (let’s pretend we’re using a shared set of credentials across all the boxes), but now we need to setup access to each of the boxes.

Let’s make our two cluster directories;

mkdir cluster_01
mkdir cluster_02

And in each of these clusters, we need to put an .envrc file with the right IP address in. This needs to check up the tree for any credentials we may have already loaded:

source_env "$(find_up ../.envrc)"
export PROXMOX_VE_ENDPOINT="https://192.0.2.1:8006" # Documentation IP address for the first cluster - change for the second cluster.

The first line works up the tree, looking for a parent .envrc file to inject, and then, with the second line, adds the Proxmox API endpoint to the end of that chain. When we run direnv allow (having logged back into our saml2aws session), we get this:

$ direnv allow
direnv: loading /path/to/demo/cluster_01/.envrc
direnv: loading /path/to/demo/.envrc
direnv: using sops
direnv: export +PROXMOX_VE_ENDPOINT +PROXMOX_VE_USERNAME +PROXMOX_VE_PASSWORD
$

Great, now we can setup the connection to the cluster in the terragrunt file!

Set up Terragrunt

In /path/to/demo/terragrunt.hcl put this:

remote_state {
  backend = "s3"
  config  = {
    encrypt                = true
    bucket                 = "example-inc-terraform-state"
    key                    = "${path_relative_to_include()}/terraform.tfstate"
    region                 = "us-east-1"
    dynamodb_table         = "example-inc-terraform-state-lock"
    skip_bucket_versioning = false
  }
}
generate "providers" {
  path      = "providers.tf"
  if_exists = "overwrite"
  contents  = <<EOF
terraform {
  required_providers {
    proxmox = {
      source = "bpg/proxmox"
      version = "0.23.0"
    }
  }
}

provider "proxmox" {
  insecure = true
}
EOF
}

Then in the cluster_01 directory, create a directory for the code you want to run (e.g. create a VLAN might be called “VLANs/30/“) and put in it this terragrunt.hcl

terraform {
  source = "${get_terragrunt_dir()}/../../../terraform-module-network//vlan"
  # source = "git@github.com:YourProject/terraform-module-network//vlan?ref=production"
}

include {
  path = find_in_parent_folders()
}

inputs = {
  vlan_tag    = 30
  description = "VLAN30"
}

This assumes you have a terraform directory called terraform-module-network/vlan in a particular place in your tree or even better, a module in your git repo, which uses the input values you’ve provided.

That double slash in the source line isn’t a typo either – this is the point in that tree that Terragrunt will copy into the directory to run terraform from too.

A quick note about includes and provider blocks

The other key thing is that the “include” block loads the values from the first matching terragrunt.hcl file in the parent directories, which in this case is the one which defined the providers block. You can’t include multiple different parent files, and you can’t have multiple generate blocks either.

Running it all together!

Now we have all our depending files, let’s run it!

user@host:~$ cd test
direnv: loading ~/test/.envrc
direnv: using sops
user@host:~/test$ saml2aws login --skip-prompt --quiet ; saml2aws exec -- bash
direnv: loading ~/test/.envrc
direnv: using sops
direnv: export +PROXMOX_VE_USERNAME +PROXMOX_VE_PASSWORD
user@host:~/test$ cd cluster_01/VLANs/30
direnv: loading ~/test/cluster_01/.envrc
direnv: loading ~/test/.envrc
direnv: using sops
direnv: export +PROXMOX_VE_ENDPOINT +PROXMOX_VE_USERNAME +PROXMOX_VE_PASSWORD
user@host:~/test/cluster_01/VLANs/30$ terragrunt apply
data.proxmox_virtual_environment_nodes.available_nodes: Reading...
data.proxmox_virtual_environment_nodes.available_nodes: Read complete after 0s [id=nodes]

Terraform used the selected providers to generate the following execution
plan. Resource actions are indicated with the following symbols:
  + create

Terraform will perform the following actions:

  # proxmox_virtual_environment_network_linux_bridge.this[0] will be created
  + resource "proxmox_virtual_environment_network_linux_bridge" "this" {
      + autostart  = true
      + comment    = "VLAN30"
      + id         = (known after apply)
      + mtu        = (known after apply)
      + name       = "vmbr30"
      + node_name  = "proxmox01"
      + ports      = [
          + "enp3s0.30",
        ]
      + vlan_aware = (known after apply)
    }

Plan: 1 to add, 0 to change, 0 to destroy.

Do you want to perform these actions?
  Terraform will perform the actions described above.
  Only 'yes' will be accepted to approve.

  Enter a value: yes
proxmox_virtual_environment_network_linux_bridge.this[0]: Creating...
proxmox_virtual_environment_network_linux_bridge.this[0]: Creation complete after 2s [id=proxmox01:vmbr30]
user@host:~/test/cluster_01/VLANs/30$

Winning!!

Featured image is “2018/365/1 Home is Where The Key Fits” by “Alan Levine” on Flickr and is released under a CC-0 license.

Using Github Actions to create Debian (DEB) and RedHat (RPM) Packages and Repositories

Last week I created a post talking about the new project I’ve started on Github called “Terminate-Notice” (which in hindsight isn’t very accurate – at best it’s ‘spot-instance-responses’ and at worst it’s ‘instance-rebalance-and-actions-responder’ but neither work well)… Anyway, I mentioned how I was creating RPM and DEB packages for my bash scripts and that I hadn’t put it into a repo yet.

Well, now I have, so let’s wander through how I made this work.

TL;DR:

Please don’t hesitate to use the .github directory I’m using for terminate-notice, which is available in the -skeleton repo and then to make it into a repo, you can reuse the .github directory in the terminate-notice.github.io repo to start your adventure.

Start with your source tree

I have a the following files in my shell script, which are:

  • /usr/sbin/terminate-notice (the actual script which will run)
  • /usr/lib/systemd/system/terminate-notice.service (the SystemD Unit file to start and stop the script)
  • /usr/share/doc/terminate-notice/LICENSE (the license under which the code is released)
  • /etc/terminate-notice.conf.d/service.conf (the file which tells the script how to run)

These live in the root directory of my repository.

I also have the .github directory (where the things that make this script work will live), a LICENSE file (so Github knows what license it’s released under) and a README.md file (so people visiting the repo can find out about it).

A bit about Github Actions

Github Actions is a CI/CD pipeline built into Github. It responds to triggers – in our case, pushes (or uploads, in old fashioned terms) to the repository, and then runs commands or actions. The actions which will run are stored in a simple YAML formatted file, referred to as a workflow which contains some setup fields and then the “jobs” (collections of actions) themselves. The structure is as follows:

# The pretty name rendered by Actions to refer to this workflow
name: Workflow Name

# Only run this workflow when the push is an annotated tag starting v
on:
  push:
    tags:
      - 'v*'

# The workflow contains a collection of jobs, each of which has
# some actions (or "steps") to run
jobs:
  # This is used to identify the output in other jobs
  Unique_Name_For_This_Job:
    # This is the pretty name rendered in the Github UI for this job
    name: Job Name
    # This is the OS that the job will run on - typically
    # one of: ubuntu-latest, windows-latest, macos-latest
    runs-on: runner-os
    # The actual actions to perform
    steps:
      # This is a YAML list, so note where the hyphens (-) are
        # The pretty name of this step
      - name: Checkout Code
        # The name of the public collection of actions to perform
        uses: actions/checkout@v3
        # Any variables to pass into this action module
        with:
          path: "REPO"

      # This action will run a shell command
      - name: Run a command
        run: echo "Hello World"

Build a DEB package

At the simplest point, creating a DEB package is;

  1. Create the directory structure (as above) that will unpack from your package file and put the files in the right places.
  2. Create a DEBIAN/control file which provides enough details for your package manager to handle it.
  3. Run dpkg-deb --build ${PATH_TO_SOURCE} ${OUTPUT_FILENAME}

The DEBIAN/control file looks like this:

Package: PACKAGE_NAME
Version: VERSION_ID
Section: misc
Priority: optional
Architecture: all
Maintainer: YOUR_NAME <your_email@example.org>
Description: SOME_TEXT

Section, Priority and Architecture have specifically defined dictionaries you can choose from.

Assuming the DEBIAN/control file was static and also lived in the repo, and I were just releasing the DEB file, then I could make the above work with the following steps:

name: Create the DEB

permissions:
  contents: write

on:
  push:
    tags:
      - 'v*'

jobs:
  Create_Packages:
    name: Create Package
    runs-on: ubuntu-latest
    steps:
      - name: Checkout code
        uses: actions/checkout@v3
        with:
          path: "REPO"

      - name: Copy script files around to stop .github from being added to the package then build the package
        run: |
          mkdir PKG_SOURCE
          cp -Rf REPO/usr REPO/etc REPO/DEBIAN PKG_SOURCE
          dpkg-deb --build PKG_SOURCE package.deb

      - name: Release the Package
        uses: softprops/action-gh-release@v1
        with:
          files: package.deb

But no, I had to get complicated and ALSO build an RPM file… and put some dynamic stuff in there.

Build an RPM file

RPMs are a little more complex, but not by much. RPM takes a spec file, which starts off looking like the DEBIAN/control file, and adds some “install” instructions. Let’s take a look at that spec file:

Name: PACKAGE_NAME
Version: VERSION_ID
Release: 1
Summary: SOME_TEXT
BuildArch: noarch
Source0: %{name}
License: YOUR_LICENSE

%description
SOME_TEXT
MORE_DETAIL

%prep

%build

%install
install -D -m 600 -o root -g root %{SOURCE0}etc/config/file ${RPM_BUILD_ROOT}%{_sysconfdir}/config/file
install -D -m 755 -o root -g root %{SOURCE0}usr/sbin/script ${RPM_BUILD_ROOT}%{_sbindir}/script

%files
etc/config/file
usr/sbin/script

The “Name”, “Version”, “Release” and “BuildArch” values in the top of that file define what the resulting filename is (NAME_VERSION-RELEASE.BUILDARCH.rpm).

Notice that there are some “macros” which replace /etc with %{_sysconfdir}, /usr/sbin with %{_sbindir} and so on, which means that, theoretically, this RPM could be installed in an esoteric tree… but most people won’t bother.

The one quirk with this is that %{name} bit there – RPM files need to have all these sources in a directory named after the package name, which in turn is stored in a directory called SOURCES (so SOURCES/my-package for example), and then it copies the files to wherever they need to go. I’ve listed etc/config/file and usr/sbin/script but these could just have easily been file and script for all that the spec file cares.

Once you have the spec file, you run sudo rpmbuild --define "_topdir $(pwd)" -bb file.spec to build the RPM.

So, again, how would that work from a workflow YAML file perspective, assuming a static spec and source tree as described above?

name: Create the DEB

permissions:
  contents: write

on:
  push:
    tags:
      - 'v*'

jobs:
  Create_Packages:
    name: Create Package
    runs-on: ubuntu-latest
    steps:
      - name: Checkout code
        uses: actions/checkout@v3
        with:
          path: "REPO"

      - name: Copy script files around to stop .github from being added to the package then build the package
        run: |
          mkdir -p SOURCES/my-package-name
          cp -Rf REPO/usr REPO/etc SOURCES/my-package-name
          sudo rpmbuild --define "_topdir $(pwd)" -bb my-package-name.spec

      - name: Release the Package
        uses: softprops/action-gh-release@v1
        with:
          files: RPMS/my-package-name_0.0.1-1.noarch.rpm

But again, I want to be fancy (and I want to make resulting packages as simple to repeat as possible)!

So, this is my release.yml as of today:

name: Run the Release

permissions:
  contents: write

on:
  push:
    tags:
      - 'v*'

jobs:
  Create_Packages:
    name: Create Packages
    runs-on: ubuntu-latest
    steps:
      - name: Checkout code
        uses: actions/checkout@v3
        with:
          path: "REPO"

      - name: Calculate some variables
        run: |
          (
            echo "GITHUB_REPO_NAME=$(echo "${GITHUB_REPOSITORY}" | cut -d/ -f2)"
            echo "VERSION=$(echo "${GITHUB_REF_NAME}" | sed -e 's/^v//')"
            echo "DESCRIPTION=A script which polls the AWS Metadata Service looking for an 'instance action', and triggers scripts in response to the termination notice."
            echo "DEB_ARCHITECTURE=${ARCHITECTURE:-all}"
            echo "RPM_ARCHITECTURE=${ARCHITECTURE:-noarch}"
            echo "RELEASE=1"
            cd REPO
            echo "FIRST_YEAR=$(git log $(git rev-list --max-parents=0 HEAD) --date="format:%Y" --format="format:%ad")"
            echo "THIS_COMMIT_YEAR=$(git log HEAD -n1 --date="format:%Y" --format="format:%ad")"
            echo "THIS_COMMIT_DATE=$(git log HEAD -n1 --format="format:%as")"
            if [ "$FIRST_YEAR" = "$THIS_COMMIT_YEAR" ]
            then
              echo "YEAR_RANGE=$FIRST_YEAR"
            else
              echo "YEAR_RANGE=${FIRST_YEAR}-${THIS_COMMIT_YEAR}"
            fi
            cd ..
          ) >> $GITHUB_ENV

      - name: Make Directory Structure
        run: mkdir -p "SOURCES/${GITHUB_REPO_NAME}" SPECS release

      - name: Copy script files into SOURCES
        run: |
          cp -Rf REPO/[a-z]* "SOURCES/${GITHUB_REPO_NAME}"
          cp REPO/LICENSE REPO/README.md "SOURCES/${GITHUB_REPO_NAME}/usr/share/doc/${GITHUB_REPO_NAME}/"
          if grep -lr '#TAG#' SOURCES
          then
            sed -i -e "s/#TAG#/${VERSION}/" $(grep -lr '#TAG#' SOURCES)
          fi
          if grep -lr '#TAG_DATE#' SOURCES
          then
            sed -i -e "s/#TAG_DATE#/${THIS_COMMIT_YEAR}/" $(grep -lr '#TAG_DATE#' SOURCES)
          fi
          if grep -lr '#DATE_RANGE#' SOURCES
          then
            sed -i -e "s/#DATE_RANGE#/${YEAR_RANGE}/" $(grep -lr '#DATE_RANGE#' SOURCES)
          fi
          if grep -lr '#MAINTAINER#' SOURCES
          then
            sed -i -e "s/#MAINTAINER#/${MAINTAINER:-Jon Spriggs <jon@sprig.gs>}/" $(grep -lr '#MAINTAINER#' SOURCES)
          fi

      - name: Create Control File
        # Fields from https://www.debian.org/doc/debian-policy/ch-controlfields.html#binary-package-control-files-debian-control
        run: |
          mkdir -p SOURCES/${GITHUB_REPO_NAME}/DEBIAN
          (
            echo "Package:      ${GITHUB_REPO_NAME}"
            echo "Version:      ${VERSION}"
            echo "Section:      ${SECTION:-misc}"
            echo "Priority:     ${PRIORITY:-optional}"
            echo "Architecture: ${DEB_ARCHITECTURE}"
            if [ -n "${DEPENDS}" ]
            then
              echo "Depends: ${DEPENDS}"
            fi
            echo "Maintainer: ${MAINTAINER:-Jon Spriggs <jon@sprig.gs>}"
            echo "Description: ${DESCRIPTION}"
            if [ -n "${HOMEPAGE}" ]
            then
              echo "Homepage: ${HOMEPAGE}"
            fi
          ) | tee SOURCES/${GITHUB_REPO_NAME}/DEBIAN/control
          (
            echo "Files:"
            echo " *"
            echo "Copyright: ${YEAR_RANGE} ${MAINTAINER:-Jon Spriggs <jon@sprig.gs>}"
            echo "License: MIT"
            echo ""
            echo "License: MIT"
            sed 's/^/ /' "SOURCES/${GITHUB_REPO_NAME}/usr/share/doc/${GITHUB_REPO_NAME}/LICENSE"
          ) | tee SOURCES/${GITHUB_REPO_NAME}/DEBIAN/copyright

      - name: Create Spec File
        run: PATH="REPO/.github/scripts:${PATH}" create_spec_file.sh

      - name: Build DEB Package
        run: dpkg-deb --build SOURCES/${GITHUB_REPO_NAME} "${{ env.GITHUB_REPO_NAME }}_${{ env.VERSION }}_${{ env.DEB_ARCHITECTURE }}.deb"

      - name: Build RPM Package
        run: sudo rpmbuild --define "_topdir $(pwd)" -bb SPECS/${GITHUB_REPO_NAME}.spec

      - name: Confirm builds complete
        run: sudo install -m 644 -o runner -g runner $(find . -type f -name *.deb && find . -type f -name *.rpm) release/

      - name: Release
        uses: softprops/action-gh-release@v1
        with:
          files: release/*

So this means I can, within reason, drop this workflow (plus a couple of other scripts to generate the slightly more complex RPM file – see the other files in that directory structure) into another package to release it.

OH WAIT, I DID! (for the terminate-notice-slack repo, for example!) All I actually needed to do there was to change the description line, and off it went!

So, this is all well and good, but how can I distribute these? Enter Repositories.

Making a Repository

Honestly, I took most of the work here from two fantastic blog posts for creating an RPM repo and a DEB repo.

First you need to create a GPG key.

To do this, I created the following pgp-key.batch file outside my repositories tree

%echo Generating an example PGP key
Key-Type: RSA
Key-Length: 4096
Name-Real: YOUR_ORG_NAME
Name-Email: your_org_name@users.noreply.github.com
Expire-Date: 0
%no-ask-passphrase
%no-protection
%commit

To make the key, I used this set of commands:

export GNUPGHOME="$(mktemp -d /tmp/pgpkeys-XXXXXX)"
gpg --no-tty --batch --gen-key pgp-key.batch
gpg --armor --export YOUR_ORG_NAME > public.asc
gpg --armor --export-secret-keys YOUR_ORG_NAME > private.asc
rm -Rf "$GNUPGHOME"

Store the public.asc file to one side (you’ll need it later) and keep the private.asc safe because we need to put that into Github.

Creating Github Pages

Create a new Git repository in your organisation called your-org.github.io. This marks the repository as being a Github Pages repository. Just to make that more explicit, in the settings for the repository, go to the pages section. (Note that yes, the text around this may differ, but are accurate as of 2023-03-28 in EN-GB localisation.)

Under “Source” select “GitHub Actions”.

Clone this repository to your local machine, and copy public.asc into the root of the tree with a sensible name, ending .asc.

In the Github settings, find “Secrets and variables” under “Security” and pick “Actions”.

Select “New repository secret” and call it “PRIVATE_KEY”.

Now you can use this to sign things (and you will sign *SO MUCH* stuff)

Building the HTML front to your repo (I’m using Jekyll)

I’ve elected to use Jekyll because I know it, and it’s quite easy, but you should pick what works for you. My workflow for deploying these repos into the website rely on Jekyll because Github built that integration, but you’ll likely find other tools for things like Eleventy or Hugo.

Put a file called _config.yml into the root directory, and fill it with relevant content:

title: your-org
email: email_address@example.org
description: >- 
  This project does stuff.
baseurl: ""
url: "https://your-org.github.io"
github_username: your-org

# Build settings
theme: minima
plugins:
  - jekyll-feed
exclude:
  - tools/
  - doc/

Naturally, make “your-org” “email_address@example.org” and the descriptions more relevant to your environment.

Next, create an index.md file with whatever is relevant for your org, but it must start with something like:

---
layout: home
title: YOUR-ORG Website
---
Here is the content for the front page.

Building the repo behind your static content

We’re back to working with Github Actions workflow files, so let’s pop that open.

.github/workflows/repo.yml

name: Deploy Debian and RPM Repo plus Jekyll homepage

on:
  push:
    branches: ["main"]
  # Allows you to run this workflow manually from the Actions tab
  workflow_dispatch:

permissions:
  contents: read
  pages: write
  id-token: write

concurrency:
  group: "pages"
  cancel-in-progress: false

jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout
        uses: actions/checkout@v3

      - name: [REPO] Install required packages
        run: |
          until sudo apt update
          do
            sleep 1
          done
          sudo apt install -y jq createrepo-c coreutils gnupg2 dpkg-dev

      - name: [REPO] Insert environment variables
        run: |
          echo GNUPGHOME="$(mktemp -d /tmp/pgpkeys-XXXXXX)" >> $GITHUB_ENV
          echo REPO_OWNER="$(echo "${GITHUB_REPOSITORY}" | cut -d/ -f1)" >> $GITHUB_ENV
          echo REPO_NAME="$(echo "${GITHUB_REPOSITORY}" | cut -d/ -f2)" >> $GITHUB_ENV

      - name: [REPO] Import GPG key
        id: import_gpg
        uses: crazy-max/ghaction-import-gpg@v5
        with:
          gpg_private_key: ${{ secrets.PRIVATE_KEY }}

      - name: [JEKYLL] Setup Pages
        uses: actions/configure-pages@v3

      - name: [JEKYLL] Build with Jekyll
        uses: actions/jekyll-build-pages@v1
        with:
          source: ./
          destination: ./_site

      - name: [REPO] Set permissions on the _site directory
        run: sudo chown -R runner:docker _site

      - name: [REPO] Build DEB and RPM Repos
        run: |
          export GPG_FINGERPRINT="${{ steps.import_gpg.outputs.fingerprint }}"
          export ORIGIN="${{ steps.import_gpg.outputs.name }}"
          .github/scripts/build_repos.sh

      - name: [JEKYLL] Upload artifact
        uses: actions/upload-pages-artifact@v1

  deploy:
    environment:
      name: github-pages
      url: ${{ steps.deployment.outputs.page_url }}
    runs-on: ubuntu-latest
    needs: build
    steps:
      - name: [JEKYLL] Deploy to GitHub Pages
        id: deployment
        uses: actions/deploy-pages@v1

I’ve basically changed the “stock” Jekyll static site Github Actions file and added every step that starts [REPO] to make the repository stuff fit in around the steps that start [JEKYLL] which build and deploy the Jekyll based site.

The key part to all this though is the step Build DEB and RPM repos which calls a script that downloads all the RPM and DEB files from the various other repository build stages and does some actions to them. Now yes, I could have put all of this into the workflow.yml file, but I think it would have made it all a bit more confusing! So, let’s work through those steps!

Making an RPM Repo

To build a RPM repo you get and sign each of the RPM packages you want to offer. You do this with this command:

rpm --define "%_signature gpg" --define "%_gpg_name ${FINGERPRINT}" --addsign FILENAME.rpm

Then, once you have all your RPM files signed, you then run a command called createrepo_c (available in Debian archives – Github Actions doesn’t have a RedHat based distro available at this time, so I didn’t look for the RPM equivalent). This creates the repository metadata, and finally you sign that file, like this:

gpg --detach-sign --armor repodata/repomd.xml

Making a DEB Repo

To build a DEB repo you get each of the DEB packages you want to offer in a directory called pool/main (you can also call “main” something else – for example “contrib”, “extras” and so on).

Once you have all your files, you create another directory called dists/stable/main/binary-all into which we’ll run a command dpkg-scanpackages to create the list of the available packages. Yes, “main” could also be called “contrib”, “extras” and “stable” could be called “testing” or “preprod” or the name of your software release (like “jaunty”, “focal” or “warty”). The “all” after the word “binary” is the architecture in question.

dpkg-scanpackages creates an index of the packages in that directory including the version number, maintainer and the cryptographic hashes of the DEB files.

We zip (using gzip and bzip2) the Packages file it creates to improve the download speeds of these files, and then make a Release file. This in turn has the cryptographic hashes of each of the Packages and zipped Packages files, which in turn is then signed with GPG.

Ugh, that was MESSY

Making the repository available to your distributions

RPM repos have it quite easy here – there’s a simple file, that looks like this:

[org-name]
name=org-name Repository
baseurl=https://org-name.github.io/rpm
enabled=1
gpgcheck=1
gpgkey=https://org-name.github.io/public.asc

The distribution user simply downloads this file, puts it into /etc/yum.sources.d/org-name.repo and now all the packages are available for download. Woohoo!

DEB repos are a little harder.

First, download the public key – https://org-name.github.io/public.asc and put it in /etc/apt/keyrings/org-name.asc. Next, create file in /etc/apt/sources.list.d/org-name.list with this line in:

deb [arch=all signed-by=/etc/apt/keyrings/org-name.asc] https://org-name.github.io/deb stable main

And now they can install whatever packages they want too!

Doing this the simple way

Of course, this is all well-and-good, but if you’ve got a simple script you want to package, please don’t hesitate to use the .github directory I’m using for terminate-notice, which is available in the -skeleton repo and then to make it into a repo, you can reuse the .github directory in the terminate-notice.github.io repo to start your adventure.

Good luck, and let me know how it goes!

Featured image is “Some Math” by “Behdad Esfahbod” on Flickr and is released under a CC-BY license.

Responding to AWS Spot Instance “Instance Actions” (like terminate and stop)

During some debugging of an issue with our AWS Spot Instances at work, a colleague noticed that we weren’t responding to the Instance Actions that AWS sends when it’s due to shut down a spot instance.

We had a bit of a poke around, and found that no-one seems to have a service solution to respond to these events, to shut things down cleanly… so I wrote a set of shell scripts and a SystemD service to react to them.

On the journey, I discovered that there is a metadata mocking service that AWS provides, I learned how to create both RPM and DEB packages with Github actions (still not got them into a repo yet though!) and found that my new employer is really nice because they let me write this and release it as open source 😀

So, if this seems like something that might help you, or perhaps you’ve found a better way of doing this, let me know!

A screen shot of the github organisation for the terminate-notice script (link)

Project logo: Target icons created by Freepik – Flaticon

"Catch and Release" by "Trish Hamme" on Flickr

Releasing files for multiple operating systems with Github Actions in 2021

Hi! Long time, no see!

I’ve been working on my Decision Records open source project for a few months now, and I’ve finally settled on the cross-platform language Rust to create my script. As a result, I’ve got a build process which lets me build for Windows, Mac OS and Linux. I’m currently building a single, unsigned binary for each platform, and I wanted to make it so that Github Actions would build and release these three files for me. Most of the guidance which is currently out there points to some unmaintained actions, originally released by GitHub… but now they point to a 3rd party “release” action as their recommended alternative, so I thought I’d explain how I’m using it to release on several platforms at once.

Although I can go into detail about the release file I’m using for Rust-Decision-Records, I’m instead going to provide a much more simplistic view, based on my (finally working) initial test run.

GitHub Actions

GitHub have a built-in Continuous Integration, Continuous Deployment/Delivery (CI/CD) system, called GitHub Actions. You can have several activities it performs, and these are executed by way of instructions in .github/workflows/<somefile>.yml. I’ll be using .github/workflows/build.yml in this example. If you have multiple GitHub Action files you wanted to invoke (perhaps around issue management, unit testing and so on), these can be stored in separate .yml files.

The build.yml actions file will perform several tasks, separated out into two separate activities, a “Create Release” stage, and a “Build Release” stage. The Build stage will use a “Matrix” to execute builds on the three platforms at the same time – Linux AMD64, Windows and Mac OS.

The actual build steps? In this case, it’ll just be writing a single-line text file, stating the release it’s using.

So, let’s get started.

Create Release

A GitHub Release is typically linked to a specific “tagged” commit. To trigger the release feature, every time a commit is tagged with a string starting “v” (like v1.0.0), this will trigger the release process. So, let’s add those lines to the top of the file:

name: Create Release

on:
  push:
    tags:
      - 'v*'

You could just as easily use the filter pattern ‘v[0-9]+.[0-9]+.[0-9]+’ if you wanted to use proper Semantic Versioning, but this is a simple demo, right? 😉

Next we need the actual action we want to start with. This is at the same level as the “on” and “name” tags in that YML file, like this:

jobs:
  create_release:
    name: Create Release
    runs-on: ubuntu-latest
    steps:
      - name: Create Release
        id: create_release
        uses: softprops/action-gh-release@v1
        with:
          name: ${{ github.ref_name }}
          draft: false
          prerelease: false
          generate_release_notes: false

So, this is the actual “create release” job. I don’t think it matters what OS it runs on, but ubuntu-latest is the one I’ve seen used most often.

In this, you instruct it to create a simple release, using the text in the annotated tag you pushed as the release notes.

This is using a third-party release action, softprops/action-gh-release, which has not been vetted by me, but is explicitly linked from GitHub’s own action.

If you check the release at this point, (that is, without any other code working) you’d get just the source code as a zip and a .tgz file. BUT WE WANT MORE! So let’s build this mutha!

Build Release

Like with the create_release job, we have a few fields of instructions before we get to the actual actions it’ll take. Let’s have a look at them first. These instructions are at the same level as the jobs:\n create_release: line in the previous block, and I’ll have the entire file listed below.

  build_release:
    name: Build Release
    needs: create_release
    strategy:
      matrix:
        os: [ubuntu-latest, macos-latest, windows-latest]
        include:
          - os: ubuntu-latest
            release_suffix: ubuntu
          - os: macos-latest
            release_suffix: mac
          - os: windows-latest
            release_suffix: windows
    runs-on: ${{ matrix.os }}

So this section gives this job an ID (build_release) and a name (Build Release), so far, so exactly the same as the previous block. Next we say “You need to have finished the previous action (create_release) before proceeding” with the needs: create_release line.

But the real sting here is the strategy:\n matrix: block. This says “run these activities with several runners” (in this case, an unspecified Ubuntu, Mac OS and Windows release (each just “latest”). The include block asks the runners to add some template variables to the tasks we’re about to run – specifically release_suffix.

The last line in this snippet asks the runner to interpret the templated value matrix.os as the OS to use for this run.

Let’s move on to the build steps.

    steps:
      - name: Checkout code
        uses: actions/checkout@v2

      - name: Run Linux Build
        if: matrix.os == 'ubuntu-latest'
        run: echo "Ubuntu Latest" > release_ubuntu
      
      - name: Run Mac Build
        if: matrix.os == 'macos-latest'
        run: echo "MacOS Latest" > release_mac

      - name: Run Windows Build
        if: matrix.os == 'windows-latest'
        run: echo "Windows Latest" > release_windows

This checks out the source code on each runner, and then has a conditional build statement, based on the OS you’re using for each runner.

It should be fairly simple to see how you could build this out to be much more complex.

The final step in the matrix activity is to add the “built” file to the release. For this we use the softprops release action again.

      - name: Release
        uses: softprops/action-gh-release@v1
        with:
          tag_name: ${{ needs.create_release.outputs.tag-name }}
          files: release_${{ matrix.release_suffix }}

The finished file

So how does this all look when it’s done, this most simple CI/CD build script?

name: Create Release

on:
  push:
    tags:
      - 'v*'

jobs:
  create_release:
    name: Create Release
    runs-on: ubuntu-latest
    steps:
      - name: Create Release
        id: create_release
        uses: softprops/action-gh-release@v1
        with:
          name: ${{ github.ref_name }}
          draft: false
          prerelease: false
          generate_release_notes: false

  build_release:
    name: Build Release
    needs: create_release
    strategy:
      matrix:
        os: [ubuntu-latest, macos-latest, windows-latest]
        include:
          - os: ubuntu-latest
            release_suffix: ubuntu
          - os: macos-latest
            release_suffix: mac
          - os: windows-latest
            release_suffix: windows
    runs-on: ${{ matrix.os }}
    steps:
      - name: Checkout code
        uses: actions/checkout@v2

      - name: Run Linux Build
        if: matrix.os == 'ubuntu-latest'
        run: echo "Ubuntu Latest" > release_ubuntu
      
      - name: Run Mac Build
        if: matrix.os == 'macos-latest'
        run: echo "MacOS Latest" > release_mac

      - name: Run Windows Build
        if: matrix.os == 'windows-latest'
        run: echo "Windows Latest" > release_windows

      - name: Release
        uses: softprops/action-gh-release@v1
        with:
          tag_name: ${{ needs.create_release.outputs.tag-name }}
          files: release_${{ matrix.release_suffix }}

I hope this helps you!

My Sources and Inspirations

Featured image is “Catch and Release” by “Trish Hamme” on Flickr and is released under a CC-BY license.

"Milestone, Otley" by "Tim Green" on Flickr

Changing the default routing metric with Netplan, NetworkManager and ifupdown

In the past few months I’ve been working on a project, and I’ve been doing the bulk of that work using Vagrant.

By default and convention, all Vagrant machines, set up using Virtualbox have a “NAT” interface defined as the first network interface, but I like to configure a second interface as a “Bridged” interface which gives the host a “Real” IP address on the network as this means that any security appliances I have on my network can see what device is causing what traffic, and I can quickly identify which hosts are misbehaving.

By default, Virtualbox uses the network 10.0.2.0/24 for the NAT interface, and runs a DHCP server for that interface. In the past, I’ve removed the default route which uses 10.0.2.2 (the IP address of the NAT interface on the host device), but with Ubuntu 20.04, this route keeps being re-injected, so I had to come up with a solution.

Fixing Netplan

Ubuntu, in at least 20.04, but (according to Wikipedia) probably since 17.10, has used Netplan to define network interfaces, superseding the earlier ifupdown package (which uses /etc/network/interfaces and /etc/network/interface.d/* files to define the network). Netplan is a kind of meta-script which, instructs systemd or NetworkManager to reconfigure the network interfaces, and so making the configuration changes here seemed most sensible.

Vagrant configures the file /etc/netplan/50-cloud-init.yml with a network configuration to support this DHCP interface, and then applies it. To fix it, we need to rewrite this file completely.

#!/bin/bash

# Find details about the interface
ifname="$(grep -A1 ethernets "/etc/netplan/50-cloud-init.yaml" | tail -n1 | sed -Ee 's/[ ]*//' | cut -d: -f1)"
match="$(grep macaddress "/etc/netplan/50-cloud-init.yaml" | sed -Ee 's/[ ]*//' | cut -d\  -f2)"

# Configure the netplan file
{
  echo "network:"
  echo "  ethernets:"
  echo "    ${ifname}:"
  echo "      dhcp4: true"
  echo "      dhcp4-overrides:"
  echo "        route-metric: 250"
  echo "      match:"
  echo "        macaddress: ${match}"
  echo "      set-name: ${ifname}"
  echo "  version: 2"
} >/etc/netplan/50-cloud-init.yaml

# Apply the config
netplan apply

When I then came to a box running Fedora, I had a similar issue, except now I don’t have NetPlan to work with? How do I resolve this one?!

Actually, this is a four line script!

#!/bin/bash

# Get the name of the interface which has the IP address 10.0.2.2
netname="$(ip route | grep 10.0.2.2 | head -n 1 | sed -Ee 's/^(.*dev )(.*)$/\2/;s/proto [A-Za-z0-9]+//;s/metric [0-9]+//;s/[ \t]+$//')"

# Ask NetworkManager for a list of all the active connections, look for the string "eth0" and then just get the connection name.
nm="$(nmcli connection show --active | grep "${netname}" | sed -Ee 's/^(.*)([ \t][-0-9a-f]{36})(.*)$/\1/;s/[\t ]+$//g')"
# Set the network to have a metric of 250
nmcli connection modify "$nm" ipv4.route-metric 250
# And then re-apply the network config
nmcli connection up "$nm"

The last major interface management tool I’ve experienced on standard server Linux is “ifupdown” – /etc/network/interfaces. This is mostly used on Debian. How do we fix that one? Well, that’s a bit more tricky!

#!/bin/bash

# Get the name of the interface with the IP address 10.0.2.2
netname="$(ip route | grep 10.0.2.2 | head -n 1 | sed -Ee 's/^(.*dev )(.*)$/\2/;s/proto [A-Za-z0-9]+//;s/metric [0-9]+//;s/[ \t]+$//')"

# Create a new /etc/network/interfaces file which just looks in "interfaces.d"
echo "source /etc/network/interfaces.d/*" > /etc/network/interfaces

# Create the loopback interface file
{
  echo "auto lo"
  echo "iface lo inet loopback"
} > "/etc/network/interfaces.d/lo"
# Bounce the interface
ifdown lo ; ifup lo

# Create the first "real" interface file
{
  echo "allow-hotplug ${netname}"
  echo "iface ${netname} inet dhcp"
  echo "  metric 1000"
} > "/etc/network/interfaces.d/${netname}"
# Bounce the interface
ifdown "${netname}" ; ifup "${netname}"

# Loop through the rest of the interfaces
ip link | grep UP | grep -v lo | grep -v "${netname}" | cut -d: -f2 | sed -Ee 's/[ \t]+([A-Za-z0-9.]+)[ \t]*/\1/' | while IFS= read -r int
do
  # Create the interface file for this interface, assuming DHCP
  {
    echo "allow-hotplug ${int}"
    echo "iface ${int} inet dhcp"
  } > "/etc/network/interfaces.d/${int}"
  # Bounce the interface
  ifdown "${int}" ; ifup "${int}"
done

Looking for one consistent script which does this all?

#!/bin/bash
# This script ensures that the metric of the first "NAT" interface is set to 1000,
# while resetting the rest of the interfaces to "whatever" the DHCP server offers.

function netname() {
  ip route | grep 10.0.2.2 | head -n 1 | sed -Ee 's/^(.*dev )(.*)$/\2/;s/proto [A-Za-z0-9]+//;s/metric [0-9]+//;s/[ \t]+$//'
}

if command -v netplan
then
  ################################################
  # NETPLAN
  ################################################

  # Find details about the interface
  ifname="$(grep -A1 ethernets "/etc/netplan/50-cloud-init.yaml" | tail -n1 | sed -Ee 's/[ ]*//' | cut -d: -f1)"
  match="$(grep macaddress "/etc/netplan/50-cloud-init.yaml" | sed -Ee 's/[ ]*//' | cut -d\  -f2)"

  # Configure the netplan file
  {
    echo "network:"
    echo "  ethernets:"
    echo "    ${ifname}:"
    echo "      dhcp4: true"
    echo "      dhcp4-overrides:"
    echo "        route-metric: 1000"
    echo "      match:"
    echo "        macaddress: ${match}"
    echo "      set-name: ${ifname}"
    echo "  version: 2"
  } >/etc/netplan/50-cloud-init.yaml

  # Apply the config
  netplan apply
elif command -v nmcli
then
  ################################################
  # NETWORKMANAGER
  ################################################

  # Ask NetworkManager for a list of all the active connections, look for the string "eth0" and then just get the connection name.
  nm="$(nmcli connection show --active | grep "$(netname)" | sed -Ee 's/^(.*)([ \t][-0-9a-f]{36})(.*)$/\1/;s/[\t ]+$//g')"
  # Set the network to have a metric of 250
  nmcli connection modify "$nm" ipv4.route-metric 1000
  nmcli connection modify "$nm" ipv6.route-metric 1000
  # And then re-apply the network config
  nmcli connection up "$nm"
elif command -v ifup
then
  ################################################
  # IFUPDOWN
  ################################################

  # Get the name of the interface with the IP address 10.0.2.2
  netname="$(netname)"
  # Create a new /etc/network/interfaces file which just looks in "interfaces.d"
  echo "source /etc/network/interfaces.d/*" > /etc/network/interfaces
  # Create the loopback interface file
  {
    echo "auto lo"
    echo "iface lo inet loopback"
  } > "/etc/network/interfaces.d/lo"
  # Bounce the interface
  ifdown lo ; ifup lo
  # Create the first "real" interface file
  {
    echo "allow-hotplug ${netname}"
    echo "iface ${netname} inet dhcp"
    echo "  metric 1000"
  } > "/etc/network/interfaces.d/${netname}"
  # Bounce the interface
  ifdown "${netname}" ; ifup "${netname}"
  # Loop through the rest of the interfaces
  ip link | grep UP | grep -v lo | grep -v "${netname}" | cut -d: -f2 | sed -Ee 's/[ \t]+([A-Za-z0-9.]+)[ \t]*/\1/' | while IFS= read -r int
  do
    # Create the interface file for this interface, assuming DHCP
    {
      echo "allow-hotplug ${int}"
      echo "iface ${int} inet dhcp"
    } > "/etc/network/interfaces.d/${int}"
    # Bounce the interface
    ifdown "${int}" ; ifup "${int}"
  done
fi

Featured image is “Milestone, Otley” by “Tim Green” on Flickr and is released under a CC-BY license.

"Platform" by "Brian Crawford" on Flickr

Cross Platform Decision Records/Architectural Decision Records – a HowTo Guide

Several months ago, I wrote a post talking about Architectural Decision Records with adr-tools, but since then I’ve moved on a bit with things, so I wanted to write about alternatives.

Late edit 2021-12-14: I released (v0.0.1) my own rust-based application for creating Decision Records. Please feel free to make pull requests, raise issues, etc :)

I also wanted to comment a bit on why I use the term “Decision Records” (always “decision record”, never “DR” due to the overloading of that particular abbreviation) rather than “Architectural Decision Records” (ADR), but I’ll get to that towards the end of the post 😊

Using Decision Records the Manual Way

A decision record is usually basically a text file, using the “Markdown” format, which has several “standard” blocks of text in it. The “npryce” version, which most people use, has the following sections in it:

  1. Title (as a “level 1” heading) which also holds the date of the record.
  2. A (level 2 heading) status section, holding the status of this decision (and any links to documents which supersede or relate to this decision).
  3. The context of the decision.
  4. The decision.
  5. The consequences of that decision.

So, somewhat understandably, your organisational tooling should support you making your own documents, without using those tools.

There are conventions about how the index-critical details will be stored:

  1. Your title block should follow the format # 1. Decision Title. The # symbol means it is the primary heading for the document, then the number, which should probably be lower than 9999, is used as an index for linking to other records and then the text of the title should also be the name of the file you’ve created. In this case, it will likely be 0001-decision-title.md.
  2. The status will usually be one of: Approved or Proposed. If a document is superseded, it should remove this status. Any other link type will live under the line showing the current status.

So, there’s no reason why you couldn’t just use this template for any files you create:

# NUMBER. TITLE

Date: yyyy-mm-dd

## Status

Accepted
Superseded by [2. Another Decision](0002-another-decision.md)

## Context

The context of the decision.

## Decision

The decision.

## Consequences

The consequences of that decision.

BUT, that’s not very automated, is it?

ADRs using Bash

Of course, most people making decision records use the Bash command line….. right? Oh, perhaps not. I’ll get back to you in a tic. If you’re using Bash, the “npryce” tooling I mentioned above is the same one I wrote about those months ago. So, read that, and then crack on with your ADRs.

ADRs using Powershell

So, if you’re using Windows, you might be tempted to find a decision record tool for Powershell. If so, I found “ajoberstar” on Github had produced just such a thing, and you “just”, as an administrator, run:

Install-Module -Name ArchitectureDecisionRecords
Set-ExecutionPolicy -ExecutionPolicy RemoteSigned

Then edit the script you installed (in C:\Program Files\WindowsPowerShell\Modules\ArchitectureDecisionRecords\0.1.1\ArchitectureDecisionRecords.psm1) and search-and-replace UTF8NoBOM for UTF8 and then save it…

And then you can run commands like Initialize-Adr or New-Adr -Title 'Use a database'. However, this script was last touched on 2nd July 2018, and although I’ve raised a few issues, they don’t seem to have been resolved (see also replacing UTF8NoBOM above).

ADRs using VSCode

By far, so far, the best tooling I’ve seen in this space is the adr-tools extension for VSCode. It too, however, has it’s own caveats, but these are not disastrous. Essentially, you need to create a path in which you store the template to use. You can get this from his own repo, here: https://github.com/vincent-ledu/adr-template.git and put it in .adr-templates in the root directory of your project. This, however, is customizable, by going to the settings for your user or workspace, searching for ADR and adjusting the paths accordingly.

A settings pane showing the Adr paths in your project’s tree

To add a new decision record, press Ctrl+Shift+P or click the cog icon in the sidebar, and select “Command Palette…”

Opening the Command Palette in VS Code

Then start typing “adr” to select from “ADR New”, “ADR Init”, “ADR Change Status” or “ADR Link”.

The Command Palette showing your options for commands to run

All of these will walk you through some options at the top of the screen, either asking for some text input, or asking you to select between options.

You may be tempted to just run this up now, and select “ADR New”, and it’ll look like it’s working, but, you first need to have obtained the template and create the directory structure. Selecting “ADR Init” will create the directory structure for your project and will try to perform a git clone of the repo mentioned above, but if you are already in a git repository, or you have some form of MITM proxy in the way, this will also break silently. The easiest thing to do is to either manually create the paths in your tree, according to what you have set or selected, or just run the ADR init, and then obtain the template from the git repo.

Talking of templates, in the previous scripts, the script would come with a template file built-in, and it would do a simple string replacement of the values “NUMBER”, “TITLE” and “STATUS”. With this script it instead uses it’s own template, which is stored in your project’s file tree, and uses parameter substitution, finding strings wrapped in pairs of curled braces (like {{ this }}). The downside to this is that you can’t just reuse the template I listed above… but no worries, get the file from the repo and stick it in your tree where it’s expecting it, or let the adr init function clone the template into your path – job done.

What other options are there?

Well, actually, this comes down to why I’m using the term “decision records” rather than “architectural decision record”, because I’m writing my own tool, and all the “adr” namespaces on Github were taken, and I’d seen a fair amount of posts suggesting that the “A” in “ADR” should stand for “Any”.. and I figured why should it exist at all?

The tool I’ve written so far is written in Javascript, and is starting from a (somewhat loose) TDD development process. It’s here: https://github.com/DecisionRecords/javascript-decision-records

Why Javascript? Frankly, I needed to learn a modern programming language, and wanted to apply it to a domain I was interested in. It’s currently not complete, it creates the record path and a configuration file, and I’m currently writing the functions to create new records. Also, because it’s Javascript, in theory I can also use the internals to create a VSCode extension with this later… MUCH later!

Why re-implement this at all? Firstly, it looks like most of the development work on those projects halted around 3-4 years ago, with no further interest in updating them to resolve bugs and issues. I didn’t want to fork the projects as-is, as I think they were largely written to scratch a particular itch (which is fine!) but they all miss key things I want to provide, like proper unit testing (only the npryce project comes close to this), internationalisation (none of them have this) and the ability to use a company- or project-wide template (only the VSCode extension does this). I also saw requests to support alternative file formats (like Restructured Text, which was completely rejected) and realised that if you built the script in such a way that these alternate formats could be used, then there was no reason not to support that.

In summary

There are tools you can use, whatever platform you’re using. My preference is the VSCode extension, and eventually will (hopefully!!) be the script I’m writing… but it’s not ready, yet.

Featured image is “Platform” by “Brian Crawford” on Flickr and is released under a CC-BY license.

"Bat Keychain" by "Nishant Khurana" on Flickr

Unit Testing Bash scripts with BATS-Core

I’m taking a renewed look into Unit Testing the scripts I’m writing, because (amongst other reasons) it’s important to know what expected behaviours you break when you make a change to a script!

A quick detour – what is Unit Testing?

A unit test is where you take one component of your script, and prove that, given specific valid or invalid tests, it works in an expected way.

For example, if you normally run sum_two_digits 1 1 and expect to see 2 as the result, with a unit test, you might write the following tests:

  • sum_two_digits should fail (no arguments)
  • sum_two_digits 1 should fail (no arguments)
  • sum_two_digits 1 1 should pass!
  • sum_two_digits 1 1 1 may fail (too many arguments), may pass (only sum the first two digits)
  • sum_two_digits a b should fail (not numbers)

and so on… you might have seen this tweet, for example

Things you might unit test in a bar.

Preparing your environment

Everyone’s development methodology differs slightly, but I create my scripts in a git repository.

I start from a new repo, like this:

mkdir my_script
cd my_script
git init

echo '# `my_script`' > README.md
echo "" >> README.md
echo "This script does awesome things for awesome people. CC-0 licensed." >> README.md
git add README.md
git commit -m 'Added README'

echo '#!/bin/bash' > my_script.sh
chmod +x my_script.sh
git add my_script.sh
git commit -m 'Added initial commit of "my_script.sh"'

OK, so far, so awesome. Now let’s start adding BATS. (Yes, this is not necessarily the “best” way to create your “test_all.sh” script, but it works for my case!)

git submodule add https://github.com/bats-core/bats-core.git test/libs/bats
git commit -m 'Added BATS library'
echo '#!/bin/bash' > test/test_all.sh
echo 'cd "$(dirname "$0")" || true' >> test/test_all.sh
echo 'libs/bats/bin/bats $(find *.bats -maxdepth 0 | sort)' >> test/test_all.sh
chmod +x test/test_all.sh
git add test/test_all.sh
git commit -m 'Added test runner'

Now, let’s write two simple tests, one which fails and one which passes, so I can show you what this looks like. Create a file called test/prove_bats.bats

#!/usr/bin/env ./libs/bats/bin/bats

@test "This will fail" {
  run false
  [ "$status" -eq 0 ]
}

@test "This will pass" {
  run true
  [ "$status" -eq 0 ]
}

And now, when we run this with test/test_all.sh we get the following:

 ✗ This will fail
   (in test file prove_bats.bats, line 5)
     `[ "$status" -eq 0 ]' failed
 ✓ This will pass

2 tests, 1 failure

Excellent, now we know that our test library works, and we have a rough idea of what a test looks like. Let’s build something a bit more awesome. But first, let’s remove prove_bats.bats file, with rm test/prove_bats.bats.

Starting to develop “real” tests

Let’s create a new file, test/path_checking.bats. Our amazing script needs to have a configuration file, but we’re not really sure where in the path it is! Let’s get building!

#!/usr/bin/env ./libs/bats/bin/bats

# This runs before each of the following tests are executed.
setup() {
  source "../my_script.sh"
  cd "$BATS_TEST_TMPDIR"
}

@test "No configuration file is found" {
  run find_config_file
  echo "Status received: $status"
  echo "Actual output:"
  echo "$output"
  [ "$output" == "No configuration file found." ]
  [ "$status" -eq 1 ]
}

When we run this test (using test/test_all.sh), we get this response:

 ✗ No configuration file is found
   (in test file path_checking.bats, line 14)
     `[ "$output" == "No configuration file found." ]' failed with status 127
   Status received: 127
   Actual output:
   /tmp/my_script/test/libs/bats/lib/bats-core/test_functions.bash: line 39: find_config_file: command not found

1 test, 1 failure

Uh oh! Well, I guess that’s because we don’t have a function called find_config_file yet in that script. Ah, yes, let’s quickly divert into making your script more testable, by making use of functions!

Bash script testing with functions

When many people write a bash script, you’ll see something like this:

#!/bin/bash
echo "Validate 'uname -a' returns a string: "
read_some_value="$(uname -a)"
if [ -n "$read_some_value" ]
then
  echo "Yep"
fi

While this works, what it’s not good for is testing each of those bits (and also, as a sideline, if your script is edited while you’re running it, it’ll break, because Bash parses each line as it gets to it!)

A good way of making this “better” is to break this down into functions. At the very least, create a “main” function, and put everything into there, like this:

#!/bin/bash
function main() {
  echo "Validate 'uname -a' returns a string: "
  read_some_value="$(uname -a)"
  if [ -n "$read_some_value" ]
  then
    echo "Yep"
  fi
}

main

By splitting this into a “main” function, which is called when it runs, at the very least, a change to the script during operation won’t break it… but it’s still not very testable. Let’s break down some more of this functionality.

#!/bin/bash
function read_uname() {
  echo "$(uname -a)"
}
function test_response() {
  if [ -n "$1" ]
  then
    echo "Yep"
  fi
}
function main() {
  echo "Validate 'uname -a' returns a string: "
  read_some_value="$(read_uname)"
  test_response "$read_some_value"
}

main

So, what does this give us? Well, in theory we can test each part of this in isolation, but at the moment, bash will execute all those functions straight away, because they’re being called under “main”… so we need to abstract main out a bit further. Let’s replace that last line, main into a quick check.

if [[ "${BASH_SOURCE[0]}" == "${0}" ]]
then
  main
fi

Stopping your code from running by default with some helper variables

The special value $BASH_SOURCE[0] will return the name of the file that’s being read at this point, while $0 is the name of the script that was executed. As a little example, I’ve created two files, source_file.sh and test_sourcing.sh. Here’s source_file.sh:

#!/bin/bash

echo "Source: ${BASH_SOURCE[0]}"
echo "File: ${0}"

And here’s test_sourcing.sh:

#!/bin/bash
source ./source_file.sh

What happens when we run the two of them?

user@host:/tmp/my_script$ ./source_file.sh
Source: ./source_file.sh
File: ./source_file.sh
user@host:/tmp/my_script$ ./test_sourcing.sh
Source: ./source_file.sh
File: ./test_sourcing.sh

So, this means if we source our script (which we’ll do with our testing framework), $BASH_SOURCE[0] will return a different value from $0, so it knows not to invoke the “main” function, and we can abstract that all into more test code.

Now we’ve addressed all that lot, we need to start writing code… where did we get to? Oh yes, find_config_file: command not found

Walking up a filesystem tree

The function we want needs to look in this path, and all the parent paths for a file called “.myscript-config“. To do this, we need two functions – one to get the directory name of the “real” directory, and the other to do the walking up the path.

function _absolute_directory() {
  # Change to the directory provided, or if we can't, return with error 1
  cd "$1" || return 1
  # Return the full pathname, resolving symbolic links to "real" paths
  pwd -P
}

function find_config_file() {
  # Get the "real" directory name for this path
  absolute_directory="$(_absolute_directory ".")"
  # As long as the directory name isn't "/" (the root directory), and the
  #  return value (config_path) isn't empty, check for the config file.
  while [ "$absolute_directory" != "/" ] && 
        [ -n "$absolute_directory" ] && 
        [ -z "$config_path" ]
  do
    # Is the file we're looking for here?
    if [ -f "$absolute_directory/.myscript-config" ]
    then
      # Store the value
      config_path="$absolute_directory/.myscript-config"
    else
      # Get the directory name for the parent directory, ready to loop.
      absolute_directory="$(_absolute_directory "$absolute_directory/..")"
    fi
  done
  # If we've exited the loop, but have no return value, exit with an error
  if [ -z "$config_path" ]
  then
    echo "No config found. Please create .myscript-config in your project's root directory."
    # Failure states return an exit code of anything greater than 0. Success is 0.
    exit 1
  else
    # Output the result
    echo "$config_path"
  fi
}

Let’s re-run our test!

 ✗ No configuration file is found
   (in test file path_checking.bats, line 14)
     `[ "$output" == "No configuration file found." ]' failed
   Status received: 1
   Actual output:
   No config found. Please create .myscript-config in your project's root directory.

1 test, 1 failure

Uh oh! Our output isn’t what we told it to use. Fortunately, we’ve recorded the output it sent (“No config found. Please...“) so we can fix our test (or, find that output line and fix that).

Let’s fix the test! (The BATS test file just shows the test we’re amending)

@test "No configuration file is found" {
  run find_config_file
  echo "Status received: $status"
  echo "Actual output:"
  echo "$output"
  [ "$output" == "No config found. Please create .myscript-config in your project's root directory." ]
  [ "$status" -eq 1 ]
}

Fab, and now when we run it, it’s all good!

user@host:/tmp/my_script$ test/test_all.sh
 ✓ No configuration file is found

1 test, 0 failures

So, how do we test what happens when the file is there? We make a new test! Add this to your test file, or create a new one, ending .bats in the test directory.

@test "Configuration file is found and is OK" {
  touch .myscript-config
  run find_config_file
  echo "Status received: $status"
  echo "Actual output:"
  echo "$output"
  [ "$output" == "$BATS_TEST_TMPDIR/.myscript-config" ]
  [ "$status" -eq 0 ]
}

And now, when you run your test, you’ll see this:

user@host:/tmp/my_script$ test/test_all.sh
 ✓ No configuration file is found
 ✓ Configuration file is found and is OK

2 tests, 0 failures

Extending BATS

There are some extra BATS tests you can run – at the moment you’re doing manual checks of output and success or failure checks which aren’t very pretty. Let’s include the “assert” library for BATS.

Firstly, we need this library added as a submodule again.

# This module provides the formatting for the other non-core libraries
git submodule add https://github.com/bats-core/bats-support.git test/libs/bats-support
# This is the actual assertion tests library
git submodule add https://github.com/bats-core/bats-assert.git test/libs/bats-assert

And now we need to update our test. At the top of the file, under the #!/usr/bin/env line, add these:

load "libs/bats-support/load"
load "libs/bats-assert/load"

And then update your tests:

@test "No configuration file is found" {
  run find_config_file
  assert_output "No config found. Please create .myscript-config in your project's root directory."
  assert_failure
}

@test "Configuration file is found and is OK" {
  touch .myscript-config
  run find_config_file
  assert_output "$BATS_TEST_TMPDIR/.myscript-config"
  assert_success
}

Note that we removed the “echo” statements in this file. I’ve purposefully broken both types of tests (exit 1 became exit 0 and the file I’m looking for is $absolute_directory/.config instead of $absolute_directory/.myscript-config) in the source file, and now you can see what this looks like:

 ✗ No configuration file is found
   (from function `assert_failure' in file libs/bats-assert/src/assert_failure.bash, line 66,
    in test file path_checking.bats, line 15)
     `assert_failure' failed

   -- command succeeded, but it was expected to fail --
   output : No config found. Please create .myscript-config in your project's root directory.
   --

 ✗ Configuration file is found and is OK
   (from function `assert_output' in file libs/bats-assert/src/assert_output.bash, line 194,
    in test file path_checking.bats, line 21)
     `assert_output "$BATS_TEST_TMPDIR/.myscript-config"' failed

   -- output differs --
   expected : /tmp/bats-run-21332-1130Ph/suite-tmpdir-QMDmz6/file-tmpdir-path_checking.bats-nQf7jh/test-tmpdir--I3pJYk/.myscript-config
   actual   : No config found. Please create .myscript-config in your project's root directory.
   --

And so now you can see some of how to do unit testing with Bash and BATS. BATS also says you can unit test any command that can be run in a Bash environment, so have fun!

Featured image is “Bat Keychain” by “Nishant Khurana” on Flickr and is released under a CC-BY license.

"Picture in Picture" by "Mats" on Flickr

Hints and Tips when using Vagrant on Windows

I’ve been using HashiCorp’s Vagrant with Oracle’s VirtualBox for several years (probably since 2013, if my blog posts are anything to go by), and I’ve always been pretty comfortable with how it works.

This said, when using a Windows machine running Microsoft’s Hyper-V (built into Windows since Windows 7/2018) VirtualBox is unable (by default) to run 64 bit virtual machines (thanks to Hyper-V “stealing” the VT-x/AMD-V bit from the BIOS/EFI).

Around last year or maybe even the year before, Microsoft introduced a “Hypervisior Platform” add-on, which lets VirtualBox run 64 bit machines on a Hyper-V host (more on this later). HOWEVER, it is much slower than in native mode, and can often freeze on booting…

Meanwhile, Vagrant, (a configuration file that acts as a wrapper around various hypervisors, using VirtualBox by default) boots machines in a “headless” mode by default, so you can’t see the freezing.

I’m trying to use an Ubuntu 18.04 virtual machine for various builds I’m creating, and found that I’d get a few issues on boot, so let’s get these sorted out.

VirtualBox can’t start 64bit virtual machines when Hyper-V is installed.

You need to confirm that certain Windows features are enabled, including “Hyper-V” and “Windows Hypervisor Platform”. Confirm you’re running at least Windows 10 version 1803 which is the first release with the “Windows Hypervisor Platform”.

GUI mode

Run winver to bring up this box. Confirm the version number is greater than 1803. Mine is 1909.

A screenshot of the “winver” command, highlighting the version number, which in this case shows 1909, but needs to show at least 1803.

Right click on the start menu, and select “Apps and Features”. Click on “Programs and Features”.

The settings panel found by right clicking the “Start Menu” and selecting “Apps and Features”. Note the desired next step – “Programs and Features” is highlighted.

In the “Programs and Features” window, click on “Turn Windows Features on or off”. Note the shield icon here indicates that administrative access is required, and you may be required to authenticate to the machine to progress past this stage.

A fragment of the “Programs and Features” window, with the “Turn Windows features on or off” link highlighted.

Next, ensure that the following “Windows Features” are enabled; “Hyper-V”, “Virtual Machine Platform” and “Windows Hypervisor Platform”. Click on “OK” to install these features, if they’re not already installed.

A screen capture of the “Turn Windows features on or off” dialogue box, with certain features obscured and others highlighted.

Note that once you’ve pressed “OK”, you’ll likely need to reboot your machine, if any of these features were not already installed.

CLI mode

Right click on the start menu, and start an Administrative Powershell session.

Run the command Get-ComputerInfo | select WindowsVersion. You should get a response which looks like this:

WindowsVersion
--------------
1909

Note that the version number needs to be greater than 1803.

Next, find the names of the features you need to install. These features have region specific names, so outside EN-GB, these names may not match your requirements!

Run the command Get-WindowsOptionalFeature -online | select FeatureName,State and you’re looking for the following lines (this has been cropped to just what you need):

FeatureName                                     State
-----------                                     -----
HypervisorPlatform                            Enabled
VirtualMachinePlatform                        Enabled
Microsoft-Hyper-V-All                         Enabled

If any of these three features are not enabled, run Enable-WindowsOptionalFeature -online -FeatureName x where “x” is the name of the feature, listed in the above text block, you want to install. For example: Enable-WindowsOptionalFeature -online -FeatureName HypervisorPlatform,VirtualMachinePlatform,Microsoft-Hyper-V-All. If you run this when they’re already enabled, it should return RestartNeeded : False, but otherwise you’re likely to need to reboot.

After the reboot

After you’ve rebooted, and you start a 64 bit virtual machine in VirtualBox, you’ll see this icon in the bottom corner.

A screen grab of the VirtualBox Status Bar, highlighting the “Slow Mode” icon representing the CPU

Booting the Virtual Machine with Vagrant fails because it takes too long to boot

This was historically a big issue with Vagrant and VirtualBox, particularly with Windows Vagrant boxes, but prior to the Hyper-V/VirtualBox solution, it’d been largely fixed (or at least, I wasn’t seeing it!) There is a “standard” timeout for booting a Virtual Machine, I think at approximately 5 minutes, but I might be wrong. To make this “issue” stop occurring, add this config.vm.boot_timeout = 0 line to your Vagrantfile, like this:

Vagrant.configure("2") do |config|
  config.vm.boot_timeout = 0
end

This says to Vagrant, don’t worry how long it takes to boot, just keep waiting until it does. Yes, it will be slower, but it should get there in the end!

Booting the Virtual Maching with Vagrant does not fail, but it never authenticates with your Private Key.

Your VM may sit at this block for quite a while:

==> default: Waiting for machine to boot. This may take a few minutes...
    default: SSH address: 127.0.0.1:2222
    default: SSH username: vagrant
    default: SSH auth method: private key

If this occurs, you may find that your virtual machine has hung during the boot process… but weirdly, a simple work-around to this is to ensure that the VirtualBox GUI is open, and that you’ve got a block like this (config.vm.provider / vb.gui=true / end) in your Vagrantfile:

Vagrant.configure("2") do |config|
  config.vm.provider "virtualbox" do |vb|
    vb.gui = true
  end
end

This forces VirtualBox to open a window with your Virtual Machine’s console on it (much like having a monitor attached to real hardware). You don’t need to interact with it, but any random hangs or halts on your virtual machine may be solved just by bringing this window, or the VirtualBox Machines GUI, to the foreground.

Sometimes you may see, when this happens, a coredump or section of kernel debugging code on the console. Don’t worry about this!

Vagrant refuses to SSH to your virtual machine when using the vagrant ssh command.

Provisioning works like a treat, and you can SSH into the virtual machine from any other environment, but, when you run vagrant ssh, you get an error about keys not being permitted or usable. This is fixable by adding a single line, either to your system or user -wide environment variables, or by adding a line to your Vagrantfile.

The environment variable is VAGRANT_PREFER_SYSTEM_BIN, and by setting this to 0, it will use bundled versions of ssh or rsync instead of using any versions provided by Windows.

You can add a line like this ENV['VAGRANT_PREFER_SYSTEM_BIN']="0" to your Vagrantfile, outside of the block Vagrant.configureend, like this:

ENV['VAGRANT_PREFER_SYSTEM_BIN']="0"
Vagrant.configure("2") do |config|
end

Sources

Featured image is “Picture in Picture” by “Mats” on Flickr and is released under a CC-BY-SA license.