"Platform" by "Brian Crawford" on Flickr

Cross Platform Decision Records/Architectural Decision Records – a HowTo Guide

Several months ago, I wrote a post talking about Architectural Decision Records with adr-tools, but since then I’ve moved on a bit with things, so I wanted to write about alternatives.

Late edit 2021-12-14: I released (v0.0.1) my own rust-based application for creating Decision Records. Please feel free to make pull requests, raise issues, etc :)

I also wanted to comment a bit on why I use the term “Decision Records” (always “decision record”, never “DR” due to the overloading of that particular abbreviation) rather than “Architectural Decision Records” (ADR), but I’ll get to that towards the end of the post 😊

Using Decision Records the Manual Way

A decision record is usually basically a text file, using the “Markdown” format, which has several “standard” blocks of text in it. The “npryce” version, which most people use, has the following sections in it:

  1. Title (as a “level 1” heading) which also holds the date of the record.
  2. A (level 2 heading) status section, holding the status of this decision (and any links to documents which supersede or relate to this decision).
  3. The context of the decision.
  4. The decision.
  5. The consequences of that decision.

So, somewhat understandably, your organisational tooling should support you making your own documents, without using those tools.

There are conventions about how the index-critical details will be stored:

  1. Your title block should follow the format # 1. Decision Title. The # symbol means it is the primary heading for the document, then the number, which should probably be lower than 9999, is used as an index for linking to other records and then the text of the title should also be the name of the file you’ve created. In this case, it will likely be 0001-decision-title.md.
  2. The status will usually be one of: Approved or Proposed. If a document is superseded, it should remove this status. Any other link type will live under the line showing the current status.

So, there’s no reason why you couldn’t just use this template for any files you create:

# NUMBER. TITLE

Date: yyyy-mm-dd

## Status

Accepted
Superseded by [2. Another Decision](0002-another-decision.md)

## Context

The context of the decision.

## Decision

The decision.

## Consequences

The consequences of that decision.

BUT, that’s not very automated, is it?

ADRs using Bash

Of course, most people making decision records use the Bash command line….. right? Oh, perhaps not. I’ll get back to you in a tic. If you’re using Bash, the “npryce” tooling I mentioned above is the same one I wrote about those months ago. So, read that, and then crack on with your ADRs.

ADRs using Powershell

So, if you’re using Windows, you might be tempted to find a decision record tool for Powershell. If so, I found “ajoberstar” on Github had produced just such a thing, and you “just”, as an administrator, run:

Install-Module -Name ArchitectureDecisionRecords
Set-ExecutionPolicy -ExecutionPolicy RemoteSigned

Then edit the script you installed (in C:\Program Files\WindowsPowerShell\Modules\ArchitectureDecisionRecords\0.1.1\ArchitectureDecisionRecords.psm1) and search-and-replace UTF8NoBOM for UTF8 and then save it…

And then you can run commands like Initialize-Adr or New-Adr -Title 'Use a database'. However, this script was last touched on 2nd July 2018, and although I’ve raised a few issues, they don’t seem to have been resolved (see also replacing UTF8NoBOM above).

ADRs using VSCode

By far, so far, the best tooling I’ve seen in this space is the adr-tools extension for VSCode. It too, however, has it’s own caveats, but these are not disastrous. Essentially, you need to create a path in which you store the template to use. You can get this from his own repo, here: https://github.com/vincent-ledu/adr-template.git and put it in .adr-templates in the root directory of your project. This, however, is customizable, by going to the settings for your user or workspace, searching for ADR and adjusting the paths accordingly.

A settings pane showing the Adr paths in your project’s tree

To add a new decision record, press Ctrl+Shift+P or click the cog icon in the sidebar, and select “Command Palette…”

Opening the Command Palette in VS Code

Then start typing “adr” to select from “ADR New”, “ADR Init”, “ADR Change Status” or “ADR Link”.

The Command Palette showing your options for commands to run

All of these will walk you through some options at the top of the screen, either asking for some text input, or asking you to select between options.

You may be tempted to just run this up now, and select “ADR New”, and it’ll look like it’s working, but, you first need to have obtained the template and create the directory structure. Selecting “ADR Init” will create the directory structure for your project and will try to perform a git clone of the repo mentioned above, but if you are already in a git repository, or you have some form of MITM proxy in the way, this will also break silently. The easiest thing to do is to either manually create the paths in your tree, according to what you have set or selected, or just run the ADR init, and then obtain the template from the git repo.

Talking of templates, in the previous scripts, the script would come with a template file built-in, and it would do a simple string replacement of the values “NUMBER”, “TITLE” and “STATUS”. With this script it instead uses it’s own template, which is stored in your project’s file tree, and uses parameter substitution, finding strings wrapped in pairs of curled braces (like {{ this }}). The downside to this is that you can’t just reuse the template I listed above… but no worries, get the file from the repo and stick it in your tree where it’s expecting it, or let the adr init function clone the template into your path – job done.

What other options are there?

Well, actually, this comes down to why I’m using the term “decision records” rather than “architectural decision record”, because I’m writing my own tool, and all the “adr” namespaces on Github were taken, and I’d seen a fair amount of posts suggesting that the “A” in “ADR” should stand for “Any”.. and I figured why should it exist at all?

The tool I’ve written so far is written in Javascript, and is starting from a (somewhat loose) TDD development process. It’s here: https://github.com/DecisionRecords/javascript-decision-records

Why Javascript? Frankly, I needed to learn a modern programming language, and wanted to apply it to a domain I was interested in. It’s currently not complete, it creates the record path and a configuration file, and I’m currently writing the functions to create new records. Also, because it’s Javascript, in theory I can also use the internals to create a VSCode extension with this later… MUCH later!

Why re-implement this at all? Firstly, it looks like most of the development work on those projects halted around 3-4 years ago, with no further interest in updating them to resolve bugs and issues. I didn’t want to fork the projects as-is, as I think they were largely written to scratch a particular itch (which is fine!) but they all miss key things I want to provide, like proper unit testing (only the npryce project comes close to this), internationalisation (none of them have this) and the ability to use a company- or project-wide template (only the VSCode extension does this). I also saw requests to support alternative file formats (like Restructured Text, which was completely rejected) and realised that if you built the script in such a way that these alternate formats could be used, then there was no reason not to support that.

In summary

There are tools you can use, whatever platform you’re using. My preference is the VSCode extension, and eventually will (hopefully!!) be the script I’m writing… but it’s not ready, yet.

Featured image is β€œPlatform” by β€œBrian Crawford” on Flickr and is released under a CC-BY license.

"Bat Keychain" by "Nishant Khurana" on Flickr

Unit Testing Bash scripts with BATS-Core

I’m taking a renewed look into Unit Testing the scripts I’m writing, because (amongst other reasons) it’s important to know what expected behaviours you break when you make a change to a script!

A quick detour – what is Unit Testing?

A unit test is where you take one component of your script, and prove that, given specific valid or invalid tests, it works in an expected way.

For example, if you normally run sum_two_digits 1 1 and expect to see 2 as the result, with a unit test, you might write the following tests:

  • sum_two_digits should fail (no arguments)
  • sum_two_digits 1 should fail (no arguments)
  • sum_two_digits 1 1 should pass!
  • sum_two_digits 1 1 1 may fail (too many arguments), may pass (only sum the first two digits)
  • sum_two_digits a b should fail (not numbers)

and so on… you might have seen this tweet, for example

https://twitter.com/sempf/status/514473420277694465
Things you might unit test in a bar.

Preparing your environment

Everyone’s development methodology differs slightly, but I create my scripts in a git repository.

I start from a new repo, like this:

mkdir my_script
cd my_script
git init

echo '# `my_script`' > README.md
echo "" >> README.md
echo "This script does awesome things for awesome people. CC-0 licensed." >> README.md
git add README.md
git commit -m 'Added README'

echo '#!/bin/bash' > my_script.sh
chmod +x my_script.sh
git add my_script.sh
git commit -m 'Added initial commit of "my_script.sh"'

OK, so far, so awesome. Now let’s start adding BATS. (Yes, this is not necessarily the “best” way to create your “test_all.sh” script, but it works for my case!)

git submodule add https://github.com/bats-core/bats-core.git test/libs/bats
git commit -m 'Added BATS library'
echo '#!/bin/bash' > test/test_all.sh
echo 'cd "$(dirname "$0")" || true' >> test/test_all.sh
echo 'libs/bats/bin/bats $(find *.bats -maxdepth 0 | sort)' >> test/test_all.sh
chmod +x test/test_all.sh
git add test/test_all.sh
git commit -m 'Added test runner'

Now, let’s write two simple tests, one which fails and one which passes, so I can show you what this looks like. Create a file called test/prove_bats.bats

#!/usr/bin/env ./libs/bats/bin/bats

@test "This will fail" {
  run false
  [ "$status" -eq 0 ]
}

@test "This will pass" {
  run true
  [ "$status" -eq 0 ]
}

And now, when we run this with test/test_all.sh we get the following:

 βœ— This will fail
   (in test file prove_bats.bats, line 5)
     `[ "$status" -eq 0 ]' failed
 βœ“ This will pass

2 tests, 1 failure

Excellent, now we know that our test library works, and we have a rough idea of what a test looks like. Let’s build something a bit more awesome. But first, let’s remove prove_bats.bats file, with rm test/prove_bats.bats.

Starting to develop “real” tests

Let’s create a new file, test/path_checking.bats. Our amazing script needs to have a configuration file, but we’re not really sure where in the path it is! Let’s get building!

#!/usr/bin/env ./libs/bats/bin/bats

# This runs before each of the following tests are executed.
setup() {
  source "../my_script.sh"
  cd "$BATS_TEST_TMPDIR"
}

@test "No configuration file is found" {
  run find_config_file
  echo "Status received: $status"
  echo "Actual output:"
  echo "$output"
  [ "$output" == "No configuration file found." ]
  [ "$status" -eq 1 ]
}

When we run this test (using test/test_all.sh), we get this response:

 βœ— No configuration file is found
   (in test file path_checking.bats, line 14)
     `[ "$output" == "No configuration file found." ]' failed with status 127
   Status received: 127
   Actual output:
   /tmp/my_script/test/libs/bats/lib/bats-core/test_functions.bash: line 39: find_config_file: command not found

1 test, 1 failure

Uh oh! Well, I guess that’s because we don’t have a function called find_config_file yet in that script. Ah, yes, let’s quickly divert into making your script more testable, by making use of functions!

Bash script testing with functions

When many people write a bash script, you’ll see something like this:

#!/bin/bash
echo "Validate 'uname -a' returns a string: "
read_some_value="$(uname -a)"
if [ -n "$read_some_value" ]
then
  echo "Yep"
fi

While this works, what it’s not good for is testing each of those bits (and also, as a sideline, if your script is edited while you’re running it, it’ll break, because Bash parses each line as it gets to it!)

A good way of making this “better” is to break this down into functions. At the very least, create a “main” function, and put everything into there, like this:

#!/bin/bash
function main() {
  echo "Validate 'uname -a' returns a string: "
  read_some_value="$(uname -a)"
  if [ -n "$read_some_value" ]
  then
    echo "Yep"
  fi
}

main

By splitting this into a “main” function, which is called when it runs, at the very least, a change to the script during operation won’t break it… but it’s still not very testable. Let’s break down some more of this functionality.

#!/bin/bash
function read_uname() {
  echo "$(uname -a)"
}
function test_response() {
  if [ -n "$1" ]
  then
    echo "Yep"
  fi
}
function main() {
  echo "Validate 'uname -a' returns a string: "
  read_some_value="$(read_uname)"
  test_response "$read_some_value"
}

main

So, what does this give us? Well, in theory we can test each part of this in isolation, but at the moment, bash will execute all those functions straight away, because they’re being called under “main”… so we need to abstract main out a bit further. Let’s replace that last line, main into a quick check.

if [[ "${BASH_SOURCE[0]}" == "${0}" ]]
then
  main
fi

Stopping your code from running by default with some helper variables

The special value $BASH_SOURCE[0] will return the name of the file that’s being read at this point, while $0 is the name of the script that was executed. As a little example, I’ve created two files, source_file.sh and test_sourcing.sh. Here’s source_file.sh:

#!/bin/bash

echo "Source: ${BASH_SOURCE[0]}"
echo "File: ${0}"

And here’s test_sourcing.sh:

#!/bin/bash
source ./source_file.sh

What happens when we run the two of them?

user@host:/tmp/my_script$ ./source_file.sh
Source: ./source_file.sh
File: ./source_file.sh
user@host:/tmp/my_script$ ./test_sourcing.sh
Source: ./source_file.sh
File: ./test_sourcing.sh

So, this means if we source our script (which we’ll do with our testing framework), $BASH_SOURCE[0] will return a different value from $0, so it knows not to invoke the “main” function, and we can abstract that all into more test code.

Now we’ve addressed all that lot, we need to start writing code… where did we get to? Oh yes, find_config_file: command not found

Walking up a filesystem tree

The function we want needs to look in this path, and all the parent paths for a file called “.myscript-config“. To do this, we need two functions – one to get the directory name of the “real” directory, and the other to do the walking up the path.

function _absolute_directory() {
  # Change to the directory provided, or if we can't, return with error 1
  cd "$1" || return 1
  # Return the full pathname, resolving symbolic links to "real" paths
  pwd -P
}

function find_config_file() {
  # Get the "real" directory name for this path
  absolute_directory="$(_absolute_directory ".")"
  # As long as the directory name isn't "/" (the root directory), and the
  #  return value (config_path) isn't empty, check for the config file.
  while [ "$absolute_directory" != "/" ] && 
        [ -n "$absolute_directory" ] && 
        [ -z "$config_path" ]
  do
    # Is the file we're looking for here?
    if [ -f "$absolute_directory/.myscript-config" ]
    then
      # Store the value
      config_path="$absolute_directory/.myscript-config"
    else
      # Get the directory name for the parent directory, ready to loop.
      absolute_directory="$(_absolute_directory "$absolute_directory/..")"
    fi
  done
  # If we've exited the loop, but have no return value, exit with an error
  if [ -z "$config_path" ]
  then
    echo "No config found. Please create .myscript-config in your project's root directory."
    # Failure states return an exit code of anything greater than 0. Success is 0.
    exit 1
  else
    # Output the result
    echo "$config_path"
  fi
}

Let’s re-run our test!

 βœ— No configuration file is found
   (in test file path_checking.bats, line 14)
     `[ "$output" == "No configuration file found." ]' failed
   Status received: 1
   Actual output:
   No config found. Please create .myscript-config in your project's root directory.

1 test, 1 failure

Uh oh! Our output isn’t what we told it to use. Fortunately, we’ve recorded the output it sent (“No config found. Please...“) so we can fix our test (or, find that output line and fix that).

Let’s fix the test! (The BATS test file just shows the test we’re amending)

@test "No configuration file is found" {
  run find_config_file
  echo "Status received: $status"
  echo "Actual output:"
  echo "$output"
  [ "$output" == "No config found. Please create .myscript-config in your project's root directory." ]
  [ "$status" -eq 1 ]
}

Fab, and now when we run it, it’s all good!

user@host:/tmp/my_script$ test/test_all.sh
 βœ“ No configuration file is found

1 test, 0 failures

So, how do we test what happens when the file is there? We make a new test! Add this to your test file, or create a new one, ending .bats in the test directory.

@test "Configuration file is found and is OK" {
  touch .myscript-config
  run find_config_file
  echo "Status received: $status"
  echo "Actual output:"
  echo "$output"
  [ "$output" == "$BATS_TEST_TMPDIR/.myscript-config" ]
  [ "$status" -eq 0 ]
}

And now, when you run your test, you’ll see this:

user@host:/tmp/my_script$ test/test_all.sh
 βœ“ No configuration file is found
 βœ“ Configuration file is found and is OK

2 tests, 0 failures

Extending BATS

There are some extra BATS tests you can run – at the moment you’re doing manual checks of output and success or failure checks which aren’t very pretty. Let’s include the “assert” library for BATS.

Firstly, we need this library added as a submodule again.

# This module provides the formatting for the other non-core libraries
git submodule add https://github.com/bats-core/bats-support.git test/libs/bats-support
# This is the actual assertion tests library
git submodule add https://github.com/bats-core/bats-assert.git test/libs/bats-assert

And now we need to update our test. At the top of the file, under the #!/usr/bin/env line, add these:

load "libs/bats-support/load"
load "libs/bats-assert/load"

And then update your tests:

@test "No configuration file is found" {
  run find_config_file
  assert_output "No config found. Please create .myscript-config in your project's root directory."
  assert_failure
}

@test "Configuration file is found and is OK" {
  touch .myscript-config
  run find_config_file
  assert_output "$BATS_TEST_TMPDIR/.myscript-config"
  assert_success
}

Note that we removed the “echo” statements in this file. I’ve purposefully broken both types of tests (exit 1 became exit 0 and the file I’m looking for is $absolute_directory/.config instead of $absolute_directory/.myscript-config) in the source file, and now you can see what this looks like:

 βœ— No configuration file is found
   (from function `assert_failure' in file libs/bats-assert/src/assert_failure.bash, line 66,
    in test file path_checking.bats, line 15)
     `assert_failure' failed

   -- command succeeded, but it was expected to fail --
   output : No config found. Please create .myscript-config in your project's root directory.
   --

 βœ— Configuration file is found and is OK
   (from function `assert_output' in file libs/bats-assert/src/assert_output.bash, line 194,
    in test file path_checking.bats, line 21)
     `assert_output "$BATS_TEST_TMPDIR/.myscript-config"' failed

   -- output differs --
   expected : /tmp/bats-run-21332-1130Ph/suite-tmpdir-QMDmz6/file-tmpdir-path_checking.bats-nQf7jh/test-tmpdir--I3pJYk/.myscript-config
   actual   : No config found. Please create .myscript-config in your project's root directory.
   --

And so now you can see some of how to do unit testing with Bash and BATS. BATS also says you can unit test any command that can be run in a Bash environment, so have fun!

Featured image is “Bat Keychain” by “Nishant Khurana” on Flickr and is released under a CC-BY license.

"Picture in Picture" by "Mats" on Flickr

Hints and Tips when using Vagrant on Windows

I’ve been using HashiCorp’s Vagrant with Oracle’s VirtualBox for several years (probably since 2013, if my blog posts are anything to go by), and I’ve always been pretty comfortable with how it works.

This said, when using a Windows machine running Microsoft’s Hyper-V (built into Windows since Windows 7/2018) VirtualBox is unable (by default) to run 64 bit virtual machines (thanks to Hyper-V “stealing” the VT-x/AMD-V bit from the BIOS/EFI).

Around last year or maybe even the year before, Microsoft introduced a “Hypervisior Platform” add-on, which lets VirtualBox run 64 bit machines on a Hyper-V host (more on this later). HOWEVER, it is much slower than in native mode, and can often freeze on booting…

Meanwhile, Vagrant, (a configuration file that acts as a wrapper around various hypervisors, using VirtualBox by default) boots machines in a “headless” mode by default, so you can’t see the freezing.

I’m trying to use an Ubuntu 18.04 virtual machine for various builds I’m creating, and found that I’d get a few issues on boot, so let’s get these sorted out.

VirtualBox can’t start 64bit virtual machines when Hyper-V is installed.

You need to confirm that certain Windows features are enabled, including “Hyper-V” and “Windows Hypervisor Platform”. Confirm you’re running at least Windows 10 version 1803 which is the first release with the “Windows Hypervisor Platform”.

GUI mode

Run winver to bring up this box. Confirm the version number is greater than 1803. Mine is 1909.

A screenshot of the “winver” command, highlighting the version number, which in this case shows 1909, but needs to show at least 1803.

Right click on the start menu, and select “Apps and Features”. Click on “Programs and Features”.

The settings panel found by right clicking the “Start Menu” and selecting “Apps and Features”. Note the desired next step – “Programs and Features” is highlighted.

In the “Programs and Features” window, click on “Turn Windows Features on or off”. Note the shield icon here indicates that administrative access is required, and you may be required to authenticate to the machine to progress past this stage.

A fragment of the “Programs and Features” window, with the “Turn Windows features on or off” link highlighted.

Next, ensure that the following “Windows Features” are enabled; “Hyper-V”, “Virtual Machine Platform” and “Windows Hypervisor Platform”. Click on “OK” to install these features, if they’re not already installed.

A screen capture of the “Turn Windows features on or off” dialogue box, with certain features obscured and others highlighted.

Note that once you’ve pressed “OK”, you’ll likely need to reboot your machine, if any of these features were not already installed.

CLI mode

Right click on the start menu, and start an Administrative Powershell session.

Run the command Get-ComputerInfo | select WindowsVersion. You should get a response which looks like this:

WindowsVersion
--------------
1909

Note that the version number needs to be greater than 1803.

Next, find the names of the features you need to install. These features have region specific names, so outside EN-GB, these names may not match your requirements!

Run the command Get-WindowsOptionalFeature -online | select FeatureName,State and you’re looking for the following lines (this has been cropped to just what you need):

FeatureName                                     State
-----------                                     -----
HypervisorPlatform                            Enabled
VirtualMachinePlatform                        Enabled
Microsoft-Hyper-V-All                         Enabled

If any of these three features are not enabled, run Enable-WindowsOptionalFeature -online -FeatureName x where “x” is the name of the feature, listed in the above text block, you want to install. For example: Enable-WindowsOptionalFeature -online -FeatureName HypervisorPlatform,VirtualMachinePlatform,Microsoft-Hyper-V-All. If you run this when they’re already enabled, it should return RestartNeeded : False, but otherwise you’re likely to need to reboot.

After the reboot

After you’ve rebooted, and you start a 64 bit virtual machine in VirtualBox, you’ll see this icon in the bottom corner.

A screen grab of the VirtualBox Status Bar, highlighting the “Slow Mode” icon representing the CPU

Booting the Virtual Machine with Vagrant fails because it takes too long to boot

This was historically a big issue with Vagrant and VirtualBox, particularly with Windows Vagrant boxes, but prior to the Hyper-V/VirtualBox solution, it’d been largely fixed (or at least, I wasn’t seeing it!) There is a “standard” timeout for booting a Virtual Machine, I think at approximately 5 minutes, but I might be wrong. To make this “issue” stop occurring, add this config.vm.boot_timeout = 0 line to your Vagrantfile, like this:

Vagrant.configure("2") do |config|
  config.vm.boot_timeout = 0
end

This says to Vagrant, don’t worry how long it takes to boot, just keep waiting until it does. Yes, it will be slower, but it should get there in the end!

Booting the Virtual Maching with Vagrant does not fail, but it never authenticates with your Private Key.

Your VM may sit at this block for quite a while:

==> default: Waiting for machine to boot. This may take a few minutes...
    default: SSH address: 127.0.0.1:2222
    default: SSH username: vagrant
    default: SSH auth method: private key

If this occurs, you may find that your virtual machine has hung during the boot process… but weirdly, a simple work-around to this is to ensure that the VirtualBox GUI is open, and that you’ve got a block like this (config.vm.provider / vb.gui=true / end) in your Vagrantfile:

Vagrant.configure("2") do |config|
  config.vm.provider "virtualbox" do |vb|
    vb.gui = true
  end
end

This forces VirtualBox to open a window with your Virtual Machine’s console on it (much like having a monitor attached to real hardware). You don’t need to interact with it, but any random hangs or halts on your virtual machine may be solved just by bringing this window, or the VirtualBox Machines GUI, to the foreground.

Sometimes you may see, when this happens, a coredump or section of kernel debugging code on the console. Don’t worry about this!

Vagrant refuses to SSH to your virtual machine when using the vagrant ssh command.

Provisioning works like a treat, and you can SSH into the virtual machine from any other environment, but, when you run vagrant ssh, you get an error about keys not being permitted or usable. This is fixable by adding a single line, either to your system or user -wide environment variables, or by adding a line to your Vagrantfile.

The environment variable is VAGRANT_PREFER_SYSTEM_BIN, and by setting this to 0, it will use bundled versions of ssh or rsync instead of using any versions provided by Windows.

You can add a line like this ENV['VAGRANT_PREFER_SYSTEM_BIN']="0" to your Vagrantfile, outside of the block Vagrant.configureend, like this:

ENV['VAGRANT_PREFER_SYSTEM_BIN']="0"
Vagrant.configure("2") do |config|
end

Sources

Featured image is β€œPicture in Picture” by β€œMats” on Flickr and is released under a CC-BY-SA license.

"2015_12_06_VisΓ©_135942" by "Norbert Schnitzler" on Flickr

Idea for Reusable “Custom Data” templates across multiple modules with Terraform

A few posts ago I wrote about building Windows virtual machines with Terraform, and a couple of days ago, “YoureInHell” on Twitter reached out and asked what advice I’d give about having several different terraform modules use the same basic build of custom data.

They’re trying to avoid putting the same template file into several repos (I suspect so that one team can manage the “custom-data”, “user-data” or “cloud-init” files, and another can manage the deployment terraform files), and asked if I had any suggestions.

I had three ideas.

Using a New Module

This was my initial thought; create a new module called something like “Standard Build File”, and this build file contains just the following terraform file, and a template file called “build.tmpl”.

variable "someKey" {
  default = "someVar"
}

variable "hostName" {
  default = "hostName"
}

variable "unsetVar" {}

output "template" {
  value = templatefile("build.tmpl",
    {
      someKey  = var.someKey
      hostName = var.hostName
      unsetVar = var.unsetVar
    }
  )
}

Now, in your calling module, you can do:

module "buildTemplate" {
  source   = "git::https://git.example.net/buildTemplate.git?ref=latestLive"
  # See https://www.terraform.io/docs/language/modules/sources.html
  #   for more details on how to specify the source of this module
  unsetVar = "Set To This String"
}

output "RenderedTemplate" {
  value = module.buildTemplate.template
}

And that means that you can use the module.buildTemplate.template anywhere you’d normally specify your templateFile, and get a consistent, yet customizable template (and note, because I specified a particular tag, you can use that to move to the “current latest” or “the version we released into live on YYYY-MM-DD” by using a tag, or a commit ref.)

Now, the downside to this is that you’ve now got a whole separate module for creating your instances that needs to be maintained. What are our other options?

Git Submodules for your template

I use Git Submodules a LOT for my code. It’s a bit easy to get into a state with them, particularly if you’re not great at keeping on top of them, but… if you are OK with them, you’d create a repo, again, let’s use “https://git.example.net/buildTemplate.git” as our git repo, and put your template in there. In your terraform git repo, you’d run this command: git submodule add https://git.example.net/buildTemplate.git and this would add a directory to your repo called “buildTemplate” that you can use your templatefile function in Terraform against (like this: templatefile("buildTemplate/build.tmpl", {someVar="var"})).

Now, this means that you’ve effectively got two git repos in one tree, and if any changes occur in your submodule repo, you’d need to do git checkout main ; git pull to get the latest updates from your main branch, and when you check it out initially on another machine, you’ll need to do git clone https://git.example.net/terraform --recurse-submodules to get the submodules populated at the same time.

A benefit to this is that because it’s “inline” with the rest of your tree, if you need to make any changes to this template, it’s clearly where it’s supposed to be in your tree, you just need to remember about the submodule when it comes to making PRs and suchforth.

How about that third idea?

Keep it simple, stupid 😁

Why bother with submodules, or modules from a git repo? Terraform can be quite easy to over complicate… so why not create all your terraform files in something like this structure:

project\build.tmpl
project\web_servers\main.tf
project\logic_servers\main.tf
project\database_servers\main.tf

And then in each of your terraform files (web_servers, logic_servers and database_servers) just reference the file in your project root, like this: templatefile("../build.tmpl", {someVar="var"})

The downside to this is that you can’t as easily farm off the control of that build script to another team, and they’d be making (change|pull|merge) requests against the same repo as you… but then again, isn’t that the idea for functional teams? πŸ˜ƒ

Featured image is β€œ2015_12_06_VisΓ©_135942” by β€œNorbert Schnitzler” on Flickr and is released under a CC-BY-SA license.

"DeBugged!" by "Randy Heinitz" on Flickr

Debugging Bash Scripts

Yesterday I was struggling a bit with a bash script I was writing. I needed to stop it from running flat out through every loop, and I wanted to see what certain values were at key points in the script.

Yes, I know I could use “read” to pause the script and “echo” to print values, but that leaves a lot of mess that I need to clean up afterwards… so I went looking for something else I could try.

You can have extensive debug statements, which are enabled with a --debug flag or environment variable… but again, messy.

You can run bash -x ./myscript.sh – and, indeed, I do frequently do that… but that shows you the commands which were run at each point, not what the outcome is of each of those commands.

If my problem had been a syntax one, I could have installed shellcheck, which is basically a linter for Bash and other shell scripting languages, but no, I needed more detail about what was happening during the processing.

Instead, I wanted something like xdebug (from PHP)… and I found Bash Debug for VSCode. This doesn’t even need you to install any scripts or services on the target machine – it’s interactive, and has a “watch” section, where you either highlight and right-click a variable expression (like $somevar or ${somevar}) to see when it changes. You can see where in the “callstack” you are and see what values are registered by that script.

Shellcheck shows me problems in my code…
But Bash Debug helps me to find out what values are at specific points in the code.

All in all, a worthy addition to my toolbelt!

Featured image is β€œDeBugged!” by β€œRandy Heinitz” on Flickr and is released under a CC-BY license.

'Geocache "Goodies"' by 'sk' on Flickr

Caching online data sources in Ansible for later development or testing

My current Ansible project relies on me collecting a lot of data from AWS and then checking it again later, to see if something has changed.

This is great for one-off tests (e.g. terraform destroy ; terraform apply ; ansible-playbook run.yml) but isn’t great for repetitive tests, especially if you have to collect data that may take many minutes to run all the actions, or if you have slow or unreliable internet in your development environment.

To get around this, I wrote a wrapper for caching this data.

At the top of my playbook, run.yml, I have these tasks:

- name: Set Online Status.
  # This stores the value of run_online, unless run_online
  # is not set, in which case, it defines it as "true".
  ansible.builtin.set_fact:
    run_online: |-
      {{- run_online | default(true) | bool -}}

- name: Create cache_data path.
  # This creates a "cached_data" directory in the same
  # path as the playbook.
  when: run_online | bool and cache_data | default(false) | bool
  delegate_to: localhost
  run_once: true
  file:
    path: "cached_data"
    state: directory
    mode: 0755

- name: Create cache_data for host.
  # This creates a directory under "cached_data" in the same
  # path as the playbook, with the name of each of the inventory
  # items.
  when: run_online | bool and cache_data | default(false) | bool
  delegate_to: localhost
  file:
    path: "cached_data/{{ inventory_hostname }}"
    state: directory
    mode: 0755

Running this sets up an expectation for the normal operation of the playbook, that it will be “online”, by default.

Then, every time I need to call something “online”, for example, collect EC2 Instance Data (using the community.aws.ec2_instance_info module), I call out to (something like) this set of tasks, instead of just calling the task by itself.

- name: List all EC2 instances in the regions of interest.
  when: run_online | bool
  community.aws.ec2_instance_info:
    region: "{{ item.region_name }}"
  loop: "{{ regions }}"
  loop_control:
    label: "{{ item.region_name }}"
  register: regional_ec2

- name: "NOTE: Set regional_ec2 data path"
  when: not run_online | bool or cache_data | default(false) | bool
  set_fact:
    regional_ec2_cached_data_file_loop: "{{ regional_ec2_cached_data_file_loop | default(0) | int + 1 }}"
    cached_data_filename: "cached_data/{{ inventory_hostname }}/{{ cached_data_file | default('regional_ec2') }}.{{ regional_ec2_cached_data_file_loop | default(0) | int + 1 }}.json"

- name: "NOTE: Cache/Get regional_ec2 data path"
  when: not run_online | bool or cache_data | default(false) | bool
  debug:
    msg: "File: {{ cached_data_filename }}"

- name: Cache all EC2 instances in the regions of interest.
  when: run_online | bool and cache_data | default(false) | bool
  delegate_to: localhost
  copy:
    dest: "{{ cached_data_filename }}"
    mode: "0644"
    content: "{{ regional_ec2 }}"

- name: "OFFLINE: Load all EC2 instances in the regions of interest."
  when: not run_online | bool
  set_fact:
    regional_ec2: "{% include( cached_data_filename ) %}"

The first task, if it’s still set to being “online” will execute the task, and registers the result for later. If cache_data is configured, we generate a filename for the caching, record the filename to the log (via the debug task) and then store it (using the copy task). So far, so online… but what happens when we don’t need the instance to be up and running?

In that case, we use the set_fact module, triggered by running the playbook like this: ansible-playbook run.yml -e run_online=false. This reads the cached data out of that locally stored pool of data for later use.

Featured image is ‘Geocache “Goodies”‘ by ‘sk‘ on Flickr and is released under a CC-BY-ND license.

"Main console" by "Steve Parker" on Flickr

Running services (like SSH, nginx, etc) on Windows Subsystem for Linux (WSL1) on boot

I recently got a new laptop, and for various reasons, I’m going to be primarily running Windows on that laptop. However, I still like having a working SSH server, running in the context of my Windows Subsystem for Linux (WSL) environment.

Initially, trying to run service ssh start failed with an error, because you need to re-execute the ssh configuration steps which are missed in a WSL environment. To fix that, run sudo apt install --reinstall openssh-server.

Once you know your service runs OK, you start digging around to find out how to start it on boot, and you’ll see lots of people saying things like “Just run a shell script that starts your first service, and then another shell script for the next service.”

Well, the frustration for me is that Linux already has this capability – the current popular version is called SystemD, but a slightly older variant is still knocking around in modern linux distributions, and it’s called SystemV Init, often referred to as just “sysv” or “init.d”.

The way that those services work is that you have an “init” file in /etc/init.d and then those files have a symbolic link into a “runlevel” directory, for example /etc/rc3.d. Each symbolic link is named S##service or K##service, where the ## represents the order in which it’s to be launched. The SSH Daemon, for example, that I want to run is created in there as /etc/rc3.d/S01ssh.

So, how do I make this work in the grander scheme of WSL? I can’t use SystemD, where I could say systemctl enable --now ssh, instead I need to add a (yes, I know) shell script, which looks in my desired runlevel directory. Runlevel 3 is the level at which network services have started, hence using that one. If I was trying to set up a graphical desktop, I’d instead be looking to use Runlevel 5, but the X Windows system isn’t ported to Windows like that yet… Anyway.

Because the rc#.d directory already has this structure for ordering and naming services to load, I can just step over this directory looking for files which match or do not match the naming convention, and I do that with this script:

#! /bin/bash
function run_rc() {
  base="$(basename "$1")"
  if [[ ${base:0:1} == "S" ]]
  then
    "$1" start
  else
    "$1" stop
  fi
}

if [ "$1" != "" ] && [ -e "$1" ]
then
  run_rc "$1"
else
  rc=3
  if [ "$1" != "" ] && [ -e "/etc/rc${$1}.d/" ]
  then
    rc="$1"
  fi
  for digit1 in {0..9}
  do
    for digit2 in {0..9}
    do
      find "/etc/rc${rc}.d/" -name "[SK]${digit1}${digit2}*" -exec "$0" '{}' \; 2>/dev/null
    done
  done
fi

I’ve put this script in /opt/wsl_init.sh

This does a bit of trickery, but basically runs the bottom block first. It loops over the digits 0 to 9 twice (giving you 00, 01, 02 and so on up to 99) and looks in /etc/rc3.d for any file containing the filename starting S or K and then with the two digits you’ve looped to by that point. Finally, it runs itself again, passing the name of the file it just found, and this is where the top block comes in.

In the top block we look at the “basename” – the part of the path supplied, without any prefixed directories attached, and then extract just the first character (that’s the ${base:0:1} part) to see whether it’s an “S” or anything else. If it’s an S (which everything there is likely to be), it executes the task like this: /etc/rc3.d/S01ssh start and this works because it’s how that script is designed! You can run one of the following instances of this command: service ssh start, /etc/init.d/ssh start or /etc/rc3.d/S01ssh start. There are other options, notably “stop” or “status”, but these aren’t really useful here.

Now, how do we make Windows execute this on boot? I’m using NSSM, the “Non-sucking service manager” to add a line to the Windows System services. I placed the NSSM executable in C:\Program Files\nssm\nssm.exe, and then from a command line, ran C:\Program Files\nssm\nssm.exe install WSL_Init.

I configured it with the Application Path: C:\Windows\System32\wsl.exe and the Arguments: -d ubuntu -e sudo /opt/wsl_init.sh. Note that this only works because I’ve also got Sudo setup to execute this command without prompting for a password.

Here I invoke C:\Windows\System32\wsl.exe -d ubuntu -e sudo /opt/wsl_init.sh
I define the name of the service, as Services will see it, and also the description of the service.
I put in MY username and My Windows Password here, otherwise I’m not running WSL in my user context, but another one.

And then I rebooted. SSH was running as I needed it.

Featured image is β€œMain console” by β€œSteve Parker” on Flickr and is released under a CC-BY license.

"Stockholms Stadsbibliotek" by "dilettantiquity" on Flickr

Terraform templates with Maps

For a project I’m working on, I needed to define a list of ports, and set some properties on some of them. In the Ansible world, you’d use statements like:

{% if data.somekey is defined %}something {{ data.somekey }}{% endif %}

or

{{ data.somekey | default('') }}

In a pinch, you can also do this:

{{ (data | default({}) ).somekey | default('') }}

With Terraform, I was finding it much harder to work out how to find whether a value as part of a map (the Terraform term for a Dictionary in Ansible terms, or an Associative Array in PHP terms), until I stumbled over the Lookup function. Here’s how that looks for just a simple Terraform file:

output "test" {
    value = templatefile(
        "template.tmpl",
        {
            ports = {
                "eth0": {"ip": "192.168.1.1/24", "name": "public"}, 
                "eth1": {"ip": "172.16.1.1/24", "name": "protected"}, 
                "eth2": {"ip": "10.1.1.1/24", "name": "management", "management": true}, 
                "eth3": {}
            },
            management = "1"
        }
    )
}

And the template that goes with that?

%{ for port, data in ports ~}
Interface: ${port}%{ if lookup(data, "name", "") != ""}
Alias: ${ lookup(data, "name", "") }%{ endif }
Services: ping%{ if lookup(data, "management", false) == true } ssh https%{ endif }
IP: ${ lookup(data, "ip", "Not Defined") }

%{ endfor }

This results in the following output:

C:\tf>terraform.exe apply -auto-approve

Apply complete! Resources: 0 added, 0 changed, 0 destroyed.

Outputs:

test = Interface: eth0
Alias: public
Services: ping
IP: 192.168.1.1/24

Interface: eth1
Alias: protected
Services: ping
IP: 172.16.1.1/24

Interface: eth2
Alias: management
Services: ping ssh https
IP: 10.1.1.1/24

Interface: eth3
Services: ping
IP: Not Defined

Naturally, using this in your own user-data or Custom Data field will probably make more sense than just writing it to “output” 😁

Featured image is β€œStockholms Stadsbibliotek” by β€œdilettantiquity” on Flickr and is released under a CC-BY-SA license.

"Sydney Observatory I" by "Newtown grafitti" on Flickr

Using Feature Flags in Terraform with Count Statements

In a project I’m working on in Terraform, I’ve got several feature flags in a module. These flags relate to whether this module should turn on a system in a cloud provider, or not, and looks like this:

variable "turn_on_feature_x" {
  description = "Setting this to 'yes' will enable Feature X. Any other value will disable it. (Default 'yes')"
  value = "yes"
}

variable "turn_on_feature_y" {
  description = "Setting this to 'yes' will enable Feature Y. Any other value will disable it. (Default 'no')"
  value = "no"
}

When I call the module, I then can either leave the feature with the default values, or selectively enable or disable them, like this:

module "region1" {
  source = "./my_module"
}

module "region2" {
  source = "./my_module"
  turn_on_feature_x = "no"
  turn_on_feature_y = "yes"
}

module "region3" {
  source = "./my_module"
  turn_on_feature_y = "yes"
}

module "region4" {
  source = "./my_module"
  turn_on_feature_x = "no"
}

# Result:
# region1 has X=yes, Y=no
# region2 has X=no, Y=yes
# region3 has X=yes, Y=yes
# region4 has X=no, Y=no

When I then want to use the feature, I have to remember a couple of key parts.

  1. Normally this feature check is done with a “count” statement, and the easiest way to use this is to use the ternary operator to check values and return a “1” or a “0” for if you want the value used.

    Ternary operators look like this: var.turn_on_feature_x == "yes" ? 1 : 0 which basically means, if the value of the variable turn_on_feature_x is set to “yes”, then return 1 otherwise return 0.

    This can get a bit complex, particularly if you want to check several flags a few times, like this: var.turn_on_feature_x == "yes" ? var.turn_on_feature_y == "yes" ? 1 : 0 : 0. I’ve found that wrapping them in brackets helps to understand what you’re getting, like this:

    (
      var.turn_on_feature_x == "yes" ?
      (
        var.turn_on_feature_y == "yes" ?
        1 :
        0
      ) :
      0
    )
  2. If you end up using a count statement, the resulting value must be treated as an 0-indexed array, like this: some_provider_service.my_name[0].result

    This is because, using the count value says “I want X number of resources”, so Terraform has to treat it as an array, in case you actually wanted 10 instead of 1 or 0.

Here’s an example of that in use:

resource "aws_guardduty_detector" "Region" {
  count = var.enable_guardduty == "yes" ? 1 : 0
  enable = true
}

resource "aws_cloudwatch_event_rule" "guardduty_finding" {
  count = (var.enable_guardduty == "yes" ? (var.send_guardduty_findings_to_sns == "yes" ? 1 : (var.send_guardduty_findings_to_sqs == "yes" ? 1 : 0)) : 0)
  name = "${data.aws_caller_identity.current.account_id}-${data.aws_region.current.name}-${var.sns_guardduty_finding_suffix}"
  event_pattern = <<PATTERN
{
  "source": [
    "aws.guardduty"
  ],
  "detail-type": [
    "GuardDuty Finding"
  ]
}
PATTERN
}

resource "aws_cloudwatch_event_target" "sns_guardduty_finding" {
  count = (var.enable_guardduty == "yes" ? (var.send_guardduty_findings_to_sns == "yes" ? 1 : 0) : 0)
  rule = aws_cloudwatch_event_rule.guardduty_finding[0].name
  target_id = aws_sns_topic.guardduty_finding[0].name
  arn = aws_sns_topic.guardduty_finding[0].arn
}

resource "aws_cloudwatch_event_target" "sqs_guardduty_finding" {
  count = (var.enable_guardduty == "yes" ? (var.send_guardduty_findings_to_sqs == "yes" ? 1 : 0) : 0)
  rule = aws_cloudwatch_event_rule.guardduty_finding[0].name
  target_id = "SQS"
  arn = aws_sqs_queue.guardduty_finding[0].arn
}

One thing that bit me rather painfully around this was that if you change from an uncounted resource, like this:

resource "some_tool" "this" {
  some_setting = 1
}

To a counted resource, like this:

resource "some_tool" "this" {
  count = var.some_tool == "yes" ? 1 : 0
  some_setting = 1
}

Then, Terraform will promptly destroy some_tool.this to replace it with some_tool.this[0], because they’re not the same referenced thing!

Fun, huh? 😊

Featured image is β€œSydney Observatory I” by β€œNewtown grafitti” on Flickr and is released under a CC-BY license.

"Kelvin Test" by "Eelke" on Flickr

In Ansible, determine the type of a value, and casting those values to other types

TL;DR? It’s possible to work out what type of variable you’re working with in Ansible. The in-built filters don’t always do quite what you’re expecting. Jump to the “In Summary” heading for my suggestions.

LATE EDIT: 2021-05-23 After raising a question in #ansible on Freenode, flowerysong noticed that my truth table around mappings, iterables and strings was wrong. I’ve amended the table accordingly, and have added a further note below the table.

One of the things I end up doing quite a bit with Ansible is value manipulation. I know it’s not really normal, but… well, I like rewriting values from one type of a thing to the next type of a thing.

For example, I like taking a value that I don’t know if it’s a list or a string, and passing that to an argument that expects a list.

Doing it wrong, getting it better

Until recently, I’d do that like this:

- debug:
    msg: |-
      {
        {%- if value | type_debug == "string" or value | type_debug == "AnsibleUnicode" -%}
           "string": "{{ value }}"
        {%- elif value | type_debug == "dict" or value | type_debug == "ansible_mapping" -%}
          "dict": {{ value }}
        {%- elif value | type_debug == "list" -%}
          "list": {{ value }}
        {%- else -%}
          "other": "{{ value }}"
        {%- endif -%}
      }

But, following finding this gist, I now know I can do this:

- debug:
    msg: |-
      {
        {%- if value is string -%}
           "string": "{{ value }}"
        {%- elif value is mapping -%}
          "dict": {{ value }}
        {%- elif value is iterable -%}
          "list": {{ value }}
        {%- else -%}
          "other": "{{ value }}"
        {%- endif -%}
      }

So, how would I use this, given the context of what I was saying before?

- assert:
    that:
    - value is string
    - value is not mapping
    - value is iterable
- some_module:
    some_arg: |-
      {%- if value is string -%}
        ["{{ value }}"]
      {%- else -%}
        {{ value }}
      {%- endif -%}

More details on finding a type

Why in this order? Well, because of how values are stored in Ansible, the following states are true:

⬇️Type \ ➑️Checkis iterableis mappingis sequenceis string
a_dict (e.g. {})βœ”οΈβœ”οΈβœ”οΈβŒ
a_list (e.g. [])βœ”οΈβŒβœ”οΈβŒ
a_string (e.g. “”)βœ”οΈβŒβœ”οΈβœ”οΈ
A comparison between value types

So, if you were to check for is iterable first, you might match on a_list or a_dict instead of a_string, but string can only match on a_string. Once you know it can’t be a string, you can check whether something is mapping – again, because a mapping can only match a_dict, but it can’t match a_list or a_string. Once you know it’s not that, you can check for either is iterable or is sequence because both of these match a_string, a_dict and a_list.

LATE EDIT: 2021-05-23 Note that a prior revision of this table and it’s following paragraph showed “is_mapping” as true for a_string. This is not correct, and has been fixed, both in the table and the paragraph.

Likewise, if you wanted to check whether a_float and an_integer is number and not is string, you can check these:

⬇️Type \ ➑️Checkis floatis integeris iterableis mappingis numberis sequenceis string
a_floatβœ”οΈβŒβŒβŒβœ”οΈβŒβŒ
an_integerβŒβœ”οΈβŒβŒβœ”οΈβŒβŒ
A comparison between types of numbers

So again, a_float and an_integer don’t match is string, is mapping or is iterable, but they both match is number and they each match their respective is float and is integer checks.

How about each of those (a_float and an_integer) wrapped in quotes, making them a string? What happens then?

⬇️Type \ ➑️Checkis floatis integeris iterableis mappingis numberis sequenceis string
a_float_as_stringβŒβŒβœ”οΈβŒβŒβœ”οΈβœ”οΈ
an_integer_as_stringβŒβŒβœ”οΈβŒβŒβœ”οΈβœ”οΈ
A comparison between types of numbers when held as a string

This is somewhat interesting, because they look like a number, but they’re actually “just” a string. So, now you need to do some comparisons to make them look like numbers again to check if they’re numbers.

Changing the type of a string

What happens if you cast the values? Casting means to convert from one type of value (e.g. string) into another (e.g. float) and to do that, Ansible has three filters we can use, float, int and string. You can’t cast to a dict or a list, but you can use dict2items and items2dict (more on those later). So let’s start with casting our group of a_ and an_ items from above. Here’s a list of values I want to use:

---
- hosts: localhost
  gather_facts: no
  vars:
    an_int: 1
    a_float: 1.1
    a_string: "string"
    an_int_as_string: "1"
    a_float_as_string: "1.1"
    a_list:
      - item1
    a_dict:
      key1: value1

With each of these values, I returned the value as Ansible knows it, what happens when you do {{ value | float }} to cast it as a float, as an integer by doing {{ value | int }} and as a string {{ value | string }}. Some of these results are interesting. Note that where you see u'some value' means that Python converted that string to a Unicode string.

⬇️Value \ ➑️Castvaluevalue when cast as floatvalue when cast as integervalue when cast as string
a_dict{“key1”: “value1”}0.00“{u’key1′: u’value1′}”
a_float1.11.11“1.1”
a_float_as_string“1.1”1.11“1.1”
a_list[“item1”]0.00“[u’item1′]”
a_string“string”0.00“string”
an_int111“1”
an_int_as_string“1”11“1”
Casting between value types

So, what does this mean for us? Well, not a great deal, aside from to note that you can “force” a number to be a string, or a string which is “just” a number wrapped in quotes can be forced into being a number again.

Oh, and casting dicts to lists and back again? This one is actually pretty clearly documented in the current set of documentation (as at 2.9 at least!)

Checking for miscast values

How about if I want to know whether a value I think might be a float stored as a string, how can I check that?

{{ vars[var] | float | string == vars[var] | string }}

What is this? If I cast a value that I think might be a float, to a float, and then turn both the cast value and the original into a string, do they match? If I’ve got a string or an integer, then I’ll get a false, but if I have actually got a float, then I’ll get true. Likewise for casting an integer. Let’s see what that table looks like:

⬇️Type \ ➑️Checkvalue when cast as floatvalue when cast as integervalue when cast as string
a_floatβœ”οΈβŒβœ”οΈ
a_float_as_stringβœ”οΈβŒβœ”οΈ
an_integerβŒβœ”οΈβœ”οΈ
an_integer_as_stringβŒβœ”οΈβœ”οΈ
A comparison between types of numbers when cast to a string

So this shows us the values we were after – even if you’ve got a float (or an integer) stored as a string, by doing some careful casting, you can confirm they’re of the type you wanted… and then you can pass them through the right filter to use them in your playbooks!

Booleans

Last thing to check – boolean values – “True” or “False“. There’s a bit of confusion here, as a “boolean” can be: true or false, yes or no, 1 or 0, however, is true and True and TRUE the same? How about false, False and FALSE? Let’s take a look!

⬇️Value \ ➑️Checktype_debug is booleanis numberis iterableis mappingis stringvalue when cast as boolvalue when cast as stringvalue when cast as integer
yesboolβœ”οΈβœ”οΈβŒβŒβŒTrueTrue1
YesAnsibleUnicodeβŒβŒβœ”οΈβŒβœ”οΈFalseYes0
YESAnsibleUnicodeβŒβŒβœ”οΈβŒβœ”οΈFalseYES0
“yes”AnsibleUnicodeβŒβŒβœ”οΈβŒβœ”οΈTrueyes0
“Yes”AnsibleUnicodeβŒβŒβœ”οΈβŒβœ”οΈTrueYes0
“YES”AnsibleUnicodeβŒβŒβœ”οΈβŒβœ”οΈTrueYES0
trueboolβœ”οΈβœ”οΈβŒβŒβŒTrueTrue1
Trueboolβœ”οΈβœ”οΈβŒβŒβŒTrueTrue1
TRUEboolβœ”οΈβœ”οΈβŒβŒβŒTrueTrue1
“true”AnsibleUnicodeβŒβŒβœ”οΈβŒβœ”οΈTruetrue0
“True”AnsibleUnicodeβŒβŒβœ”οΈβŒβœ”οΈTrueTrue0
“TRUE”AnsibleUnicodeβŒβŒβœ”οΈβŒβœ”οΈTrueTRUE0
1intβŒβœ”οΈβŒβŒβŒTrue11
“1”AnsibleUnicodeβŒβŒβœ”οΈβŒβœ”οΈTrue11
noboolβœ”οΈβœ”οΈβŒβŒβŒFalseFalse0
Noboolβœ”οΈβœ”οΈβŒβŒβŒFalseFalse0
NOboolβœ”οΈβœ”οΈβŒβŒβŒFalseFalse0
“no”AnsibleUnicodeβŒβŒβœ”οΈβŒβœ”οΈFalseno0
“No”AnsibleUnicodeβŒβŒβœ”οΈβŒβœ”οΈFalseNo0
“NO”AnsibleUnicodeβŒβŒβœ”οΈβŒβœ”οΈFalseNO0
falseboolβœ”οΈβœ”οΈβŒβŒβŒFalseFalse0
Falseboolβœ”οΈβœ”οΈβŒβŒβŒFalseFalse0
FALSEboolβœ”οΈβœ”οΈβŒβŒβŒFalseFalse0
“false”AnsibleUnicodeβŒβŒβœ”οΈβŒβœ”οΈFalsefalse0
“False”AnsibleUnicodeβŒβŒβœ”οΈβŒβœ”οΈFalseFalse0
“FALSE”AnsibleUnicodeβŒβŒβœ”οΈβŒβœ”οΈFalseFALSE0
0intβŒβœ”οΈβŒβŒβŒFalse00
“0”AnsibleUnicodeβŒβŒβœ”οΈβŒβœ”οΈFalse00
Comparisons between various stylings of boolean representations

So, the stand out thing for me here is that while all the permutations of string values of the boolean representations (those wrapped in quotes, like this: "yes") are treated as strings, and shouldn’t be considered as “boolean” (unless you cast for it explicitly!), and all non-string versions of true, false, and no are considered to be boolean, yes, Yes and YES are treated differently, depending on case. So, what would I do?

In summary

  • Consistently use no or yes, true or false in lower case to indicate a boolean value. Don’t use 1 or 0 unless you have to.
  • If you’re checking that you’re working with a string, a list or a dict, check in the order string (using is string), dict (using is mapping) and then list (using is sequence or is iterable)
  • Checking for numbers that are stored as strings? Cast your string through the type check for that number, like this: {% if value | float | string == value | string %}{{ value | float }}{% elif value | int | string == value | string %}{{ value | int }}{% else %}{{ value }}{% endif %}
  • Try not to use type_debug unless you really can’t find any other way. These values will change between versions, and this caused me a lot of issues with a large codebase I was working on a while ago!

Run these tests yourself!

Want to run these tests yourself? Here’s the code I ran (also available in a Gist on GitHub), using Ansible 2.9.10.

---
- hosts: localhost
  gather_facts: no
  vars:
    an_int: 1
    a_float: 1.1
    a_string: "string"
    an_int_as_string: "1"
    a_float_as_string: "1.1"
    a_list:
      - item1
    a_dict:
      key1: value1
  tasks:
    - debug:
        msg: |
          {
          {% for var in ["an_int", "an_int_as_string","a_float", "a_float_as_string","a_string","a_list","a_dict"] %}
            "{{ var }}": {
              "type_debug": "{{ vars[var] | type_debug }}",
              "value": "{{ vars[var] }}",
              "is float": "{{ vars[var] is float }}",
              "is integer": "{{ vars[var] is integer }}",
              "is iterable": "{{ vars[var] is iterable }}",
              "is mapping": "{{ vars[var] is mapping }}",
              "is number": "{{ vars[var] is number }}",
              "is sequence": "{{ vars[var] is sequence }}",
              "is string": "{{ vars[var] is string }}",
              "value cast as float": "{{ vars[var] | float }}",
              "value cast as integer": "{{ vars[var] | int }}",
              "value cast as string": "{{ vars[var] | string }}",
              "is same when cast to float": "{{ vars[var] | float | string == vars[var] | string }}",
              "is same when cast to integer": "{{ vars[var] | int | string == vars[var] | string }}",
              "is same when cast to string": "{{ vars[var] | string == vars[var] | string }}",
            },
          {% endfor %}
          }
---
- hosts: localhost
  gather_facts: false
  vars:
    # true, True, TRUE, "true", "True", "TRUE"
    a_true: true
    a_true_initial_caps: True
    a_true_caps: TRUE
    a_string_true: "true"
    a_string_true_initial_caps: "True"
    a_string_true_caps: "TRUE"
    # yes, Yes, YES, "yes", "Yes", "YES"
    a_yes: yes
    a_yes_initial_caps: Tes
    a_yes_caps: TES
    a_string_yes: "yes"
    a_string_yes_initial_caps: "Yes"
    a_string_yes_caps: "Yes"
    # 1, "1"
    a_1: 1
    a_string_1: "1"
    # false, False, FALSE, "false", "False", "FALSE"
    a_false: false
    a_false_initial_caps: False
    a_false_caps: FALSE
    a_string_false: "false"
    a_string_false_initial_caps: "False"
    a_string_false_caps: "FALSE"
    # no, No, NO, "no", "No", "NO"
    a_no: no
    a_no_initial_caps: No
    a_no_caps: NO
    a_string_no: "no"
    a_string_no_initial_caps: "No"
    a_string_no_caps: "NO"
    # 0, "0"
    a_0: 0
    a_string_0: "0"
  tasks:
    - debug:
        msg: |
          {
          {% for var in ["a_true","a_true_initial_caps","a_true_caps","a_string_true","a_string_true_initial_caps","a_string_true_caps","a_yes","a_yes_initial_caps","a_yes_caps","a_string_yes","a_string_yes_initial_caps","a_string_yes_caps","a_1","a_string_1","a_false","a_false_initial_caps","a_false_caps","a_string_false","a_string_false_initial_caps","a_string_false_caps","a_no","a_no_initial_caps","a_no_caps","a_string_no","a_string_no_initial_caps","a_string_no_caps","a_0","a_string_0"] %}
            "{{ var }}": {
              "type_debug": "{{ vars[var] | type_debug }}",
              "value": "{{ vars[var] }}",
              "is float": "{{ vars[var] is float }}",
              "is integer": "{{ vars[var] is integer }}",
              "is iterable": "{{ vars[var] is iterable }}",
              "is mapping": "{{ vars[var] is mapping }}",
              "is number": "{{ vars[var] is number }}",
              "is sequence": "{{ vars[var] is sequence }}",
              "is string": "{{ vars[var] is string }}",
              "is bool": "{{ vars[var] is boolean }}",
              "value cast as float": "{{ vars[var] | float }}",
              "value cast as integer": "{{ vars[var] | int }}",
              "value cast as string": "{{ vars[var] | string }}",
              "value cast as bool": "{{ vars[var] | bool }}",
              "is same when cast to float": "{{ vars[var] | float | string == vars[var] | string }}",
              "is same when cast to integer": "{{ vars[var] | int | string == vars[var] | string }}",
              "is same when cast to string": "{{ vars[var] | string == vars[var] | string }}",
              "is same when cast to bool": "{{ vars[var] | bool | string == vars[var] | string }}",
            },
          {% endfor %}
          }

Featured image is β€œKelvin Test” by β€œEelke” on Flickr and is released under a CC-BY license.