"The Guitar Template" by "Neil Williamson" on Flickr

Testing (and failing inline) for data types in Ansible

I tend to write long and overly complicated set_fact statements in Ansible, ALL THE DAMN TIME. I write stuff like this:

rulebase: |
  {
    {% for var in vars | dict2items %}
      {% if var.key | regex_search(regex_rulebase_match) | type_debug != "NoneType"
        and (
          var.value | type_debug == "dict" 
          or var.value | type_debug == "AnsibleMapping"
        ) %}
        {% for item in var.value | dict2items %}
          {% if item.key | regex_search(regex_rulebase_match) | type_debug != "NoneType"
            and (
              item.value | type_debug == "dict" 
              or item.value | type_debug == "AnsibleMapping"
            ) %}
            "{{ var.key | regex_replace(regex_rulebase_match, '\2') }}{{ item.key | regex_replace(regex_rulebase_match, '\2') }}": {
              {# This block is used for rulegroup level options #}
              {% for key in ['log_from_start', 'log', 'status', 'nat', 'natpool', 'schedule', 'ips_enable', 'ssl_ssh_profile', 'ips_sensor'] %}
                {% if var.value[key] is defined and rule.value[key] is not defined %}
                  {% if var.value[key] | type_debug in ['string', 'AnsibleUnicode'] %}
                    "{{ key }}": "{{ var.value[key] }}",
                  {% else %}
                    "{{ key }}": {{ var.value[key] }},
                  {% endif %}
                {% endif %}
              {% endfor %}
              {% for rule in item.value | dict2items %}
                {% if rule.key in ['sources', 'destinations', 'services', 'src_internet_service', 'dst_internet_service'] and rule.value | type_debug not in ['list', 'AnsibleSequence'] %}
                  "{{ rule.key }}": ["{{ rule.value }}"],
                {% elif rule.value | type_debug in ['string', 'AnsibleUnicode'] %}
                  "{{ rule.key }}": "{{ rule.value }}",
                {% else %}
                  "{{ rule.key }}": {{ rule.value }},
                {% endif %}
              {% endfor %}
            },
          {% endif %}
        {% endfor %}
      {% endif %}
    {% endfor %}
  }

Now, if you’re writing set_fact or vars like this a lot, what you tend to end up with is the dreaded dict2items requires a dictionary, got instead. which basically means “Hah! You wrote a giant blob of what you thought was JSON, but didn’t render right, so we cast it to a string for you!”

The way I usually write my playbooks, I’ll do something with this set_fact at line, let’s say, 10, and then use it at line, let’s say, 500… So, I don’t know what the bloomin’ thing looks like then!

So, how to get around that? Well, you could do a type check. In fact, I wrote a bloomin’ big blog post explaining just how to do that!

However, that gets unwieldy really quickly, and what I actually wanted to do was to throw the breaks on as soon as I’d created an invalid data type. So, to do that, I created a collection of functions which helped me with my current project, and they look a bit like this one, called “is_a_string.yml“:

- name: Type Check - is_a_string
  assert:
    quiet: yes
    that:
    - vars[this_key] is not boolean
    - vars[this_key] is not number
    - vars[this_key] | int | string != vars[this_key] | string
    - vars[this_key] | float | string != vars[this_key] | string
    - vars[this_key] is string
    - vars[this_key] is not mapping
    - vars[this_key] is iterable
    success_msg: "{{ this_key }} is a string"
    fail_msg: |-
      {{ this_key }} should be a string, and is instead
      {%- if vars[this_key] is not defined %} undefined
      {%- else %} {{ vars[this_key] is boolean | ternary(
        'a boolean',
        (vars[this_key] | int | string == vars[this_key] | string) | ternary(
          'an integer',
          (vars[this_key] | float | string == vars[this_key] | string) | ternary(
            'a float',
            vars[this_key] is string | ternary(
              'a string',
              vars[this_key] is mapping | ternary(
                'a dict',
                vars[this_key] is iterable | ternary(
                  'a list',
                  'unknown (' ~ vars[this_key] | type_debug ~ ')'
                )
              )
            )
          )
        )
      )}}{% endif %} - {{ vars[this_key] | default('unset') }}

To trigger this, I do the following:

- hosts: localhost
  gather_facts: false
  vars:
    SomeString: abc123
    SomeDict: {'somekey': 'somevalue'}
    SomeList: ['somevalue']
    SomeInteger: 12
    SomeFloat: 12.0
    SomeBoolean: false
  tasks:
  - name: Type Check - SomeString
    vars:
      this_key: SomeString
    include_tasks: tasks/type_check/is_a_string.yml
  - name: Type Check - SomeDict
    vars:
      this_key: SomeDict
    include_tasks: tasks/type_check/is_a_dict.yml
  - name: Type Check - SomeList
    vars:
      this_key: SomeList
    include_tasks: tasks/type_check/is_a_list.yml
  - name: Type Check - SomeInteger
    vars:
      this_key: SomeInteger
    include_tasks: tasks/type_check/is_an_integer.yml
  - name: Type Check - SomeFloat
    vars:
      this_key: SomeFloat
    include_tasks: tasks/type_check/is_a_float.yml
  - name: Type Check - SomeBoolean
    vars:
      this_key: SomeBoolean
    include_tasks: tasks/type_check/is_a_boolean.yml

I hope this helps you, bold traveller with complex jinja2 templating requirements!

(Oh, and if you get “template error while templating string: no test named 'boolean'“, you’re probably running Ansible which you installed using apt from Ubuntu Universe, version 2.9.6+dfsg-1 [or, at least I was!] – to fix this, use pip to install a more recent version – preferably using virtualenv first!)

Featured image is “The Guitar Template” by “Neil Williamson” on Flickr and is released under a CC-BY-SA license.

'Geocache "Goodies"' by 'sk' on Flickr

Caching online data sources in Ansible for later development or testing

My current Ansible project relies on me collecting a lot of data from AWS and then checking it again later, to see if something has changed.

This is great for one-off tests (e.g. terraform destroy ; terraform apply ; ansible-playbook run.yml) but isn’t great for repetitive tests, especially if you have to collect data that may take many minutes to run all the actions, or if you have slow or unreliable internet in your development environment.

To get around this, I wrote a wrapper for caching this data.

At the top of my playbook, run.yml, I have these tasks:

- name: Set Online Status.
  # This stores the value of run_online, unless run_online
  # is not set, in which case, it defines it as "true".
  ansible.builtin.set_fact:
    run_online: |-
      {{- run_online | default(true) | bool -}}

- name: Create cache_data path.
  # This creates a "cached_data" directory in the same
  # path as the playbook.
  when: run_online | bool and cache_data | default(false) | bool
  delegate_to: localhost
  run_once: true
  file:
    path: "cached_data"
    state: directory
    mode: 0755

- name: Create cache_data for host.
  # This creates a directory under "cached_data" in the same
  # path as the playbook, with the name of each of the inventory
  # items.
  when: run_online | bool and cache_data | default(false) | bool
  delegate_to: localhost
  file:
    path: "cached_data/{{ inventory_hostname }}"
    state: directory
    mode: 0755

Running this sets up an expectation for the normal operation of the playbook, that it will be “online”, by default.

Then, every time I need to call something “online”, for example, collect EC2 Instance Data (using the community.aws.ec2_instance_info module), I call out to (something like) this set of tasks, instead of just calling the task by itself.

- name: List all EC2 instances in the regions of interest.
  when: run_online | bool
  community.aws.ec2_instance_info:
    region: "{{ item.region_name }}"
  loop: "{{ regions }}"
  loop_control:
    label: "{{ item.region_name }}"
  register: regional_ec2

- name: "NOTE: Set regional_ec2 data path"
  when: not run_online | bool or cache_data | default(false) | bool
  set_fact:
    regional_ec2_cached_data_file_loop: "{{ regional_ec2_cached_data_file_loop | default(0) | int + 1 }}"
    cached_data_filename: "cached_data/{{ inventory_hostname }}/{{ cached_data_file | default('regional_ec2') }}.{{ regional_ec2_cached_data_file_loop | default(0) | int + 1 }}.json"

- name: "NOTE: Cache/Get regional_ec2 data path"
  when: not run_online | bool or cache_data | default(false) | bool
  debug:
    msg: "File: {{ cached_data_filename }}"

- name: Cache all EC2 instances in the regions of interest.
  when: run_online | bool and cache_data | default(false) | bool
  delegate_to: localhost
  copy:
    dest: "{{ cached_data_filename }}"
    mode: "0644"
    content: "{{ regional_ec2 }}"

- name: "OFFLINE: Load all EC2 instances in the regions of interest."
  when: not run_online | bool
  set_fact:
    regional_ec2: "{% include( cached_data_filename ) %}"

The first task, if it’s still set to being “online” will execute the task, and registers the result for later. If cache_data is configured, we generate a filename for the caching, record the filename to the log (via the debug task) and then store it (using the copy task). So far, so online… but what happens when we don’t need the instance to be up and running?

In that case, we use the set_fact module, triggered by running the playbook like this: ansible-playbook run.yml -e run_online=false. This reads the cached data out of that locally stored pool of data for later use.

Featured image is ‘Geocache “Goodies”‘ by ‘sk‘ on Flickr and is released under a CC-BY-ND license.

"Main console" by "Steve Parker" on Flickr

Running services (like SSH, nginx, etc) on Windows Subsystem for Linux (WSL1) on boot

I recently got a new laptop, and for various reasons, I’m going to be primarily running Windows on that laptop. However, I still like having a working SSH server, running in the context of my Windows Subsystem for Linux (WSL) environment.

Initially, trying to run service ssh start failed with an error, because you need to re-execute the ssh configuration steps which are missed in a WSL environment. To fix that, run sudo apt install --reinstall openssh-server.

Once you know your service runs OK, you start digging around to find out how to start it on boot, and you’ll see lots of people saying things like “Just run a shell script that starts your first service, and then another shell script for the next service.”

Well, the frustration for me is that Linux already has this capability – the current popular version is called SystemD, but a slightly older variant is still knocking around in modern linux distributions, and it’s called SystemV Init, often referred to as just “sysv” or “init.d”.

The way that those services work is that you have an “init” file in /etc/init.d and then those files have a symbolic link into a “runlevel” directory, for example /etc/rc3.d. Each symbolic link is named S##service or K##service, where the ## represents the order in which it’s to be launched. The SSH Daemon, for example, that I want to run is created in there as /etc/rc3.d/S01ssh.

So, how do I make this work in the grander scheme of WSL? I can’t use SystemD, where I could say systemctl enable --now ssh, instead I need to add a (yes, I know) shell script, which looks in my desired runlevel directory. Runlevel 3 is the level at which network services have started, hence using that one. If I was trying to set up a graphical desktop, I’d instead be looking to use Runlevel 5, but the X Windows system isn’t ported to Windows like that yet… Anyway.

Because the rc#.d directory already has this structure for ordering and naming services to load, I can just step over this directory looking for files which match or do not match the naming convention, and I do that with this script:

#! /bin/bash
function run_rc() {
  base="$(basename "$1")"
  if [[ ${base:0:1} == "S" ]]
  then
    "$1" start
  else
    "$1" stop
  fi
}

if [ "$1" != "" ] && [ -e "$1" ]
then
  run_rc "$1"
else
  rc=3
  if [ "$1" != "" ] && [ -e "/etc/rc${$1}.d/" ]
  then
    rc="$1"
  fi
  for digit1 in {0..9}
  do
    for digit2 in {0..9}
    do
      find "/etc/rc${rc}.d/" -name "[SK]${digit1}${digit2}*" -exec "$0" '{}' \; 2>/dev/null
    done
  done
fi

I’ve put this script in /opt/wsl_init.sh

This does a bit of trickery, but basically runs the bottom block first. It loops over the digits 0 to 9 twice (giving you 00, 01, 02 and so on up to 99) and looks in /etc/rc3.d for any file containing the filename starting S or K and then with the two digits you’ve looped to by that point. Finally, it runs itself again, passing the name of the file it just found, and this is where the top block comes in.

In the top block we look at the “basename” – the part of the path supplied, without any prefixed directories attached, and then extract just the first character (that’s the ${base:0:1} part) to see whether it’s an “S” or anything else. If it’s an S (which everything there is likely to be), it executes the task like this: /etc/rc3.d/S01ssh start and this works because it’s how that script is designed! You can run one of the following instances of this command: service ssh start, /etc/init.d/ssh start or /etc/rc3.d/S01ssh start. There are other options, notably “stop” or “status”, but these aren’t really useful here.

Now, how do we make Windows execute this on boot? I’m using NSSM, the “Non-sucking service manager” to add a line to the Windows System services. I placed the NSSM executable in C:\Program Files\nssm\nssm.exe, and then from a command line, ran C:\Program Files\nssm\nssm.exe install WSL_Init.

I configured it with the Application Path: C:\Windows\System32\wsl.exe and the Arguments: -d ubuntu -e sudo /opt/wsl_init.sh. Note that this only works because I’ve also got Sudo setup to execute this command without prompting for a password.

Here I invoke C:\Windows\System32\wsl.exe -d ubuntu -e sudo /opt/wsl_init.sh
I define the name of the service, as Services will see it, and also the description of the service.
I put in MY username and My Windows Password here, otherwise I’m not running WSL in my user context, but another one.

And then I rebooted. SSH was running as I needed it.

Featured image is “Main console” by “Steve Parker” on Flickr and is released under a CC-BY license.

"inventory" by "Lee" on Flickr

Using a AWS Dynamic Inventory with Ansible 2.10

In Ansible 2.10, Ansible started bundling modules and plugins as “Collections”, basically meaning that Ansible didn’t need to make a release every time a vendor wanted to update the libraries it required, or API changes required new fields to be supplied to modules. As part of this split between “Collections” and “Core”, the AWS modules and plugins got moved into a collection.

Now, if you’re using Ansible 2.9 or earlier, this probably doesn’t impact you, but there are some nice features in Ansible 2.10 that I wanted to use, so… buckle up :)

Getting started with Ansible 2.10, using a virtual environment

If you currently are using Ansible 2.9, it’s probably worth creating a “python virtual environment”, or “virtualenv” to try out Ansible 2.10. I did this on my Ubuntu 20.04 machine by typing:

sudo apt install -y virtualenv
mkdir -p ~/bin
cd ~/bin
virtualenv -p python3 ansible_2.10

The above ensures that you have virtualenv installed, creates a directory called “bin” in your home directory, if it doesn’t already exist, and then places the virtual environment, using Python3, into a directory there called “ansible_2.10“.

Whenever we want to use this new environment you must activate it, using this command:

source ~/bin/ansible_2.10/bin/activate

Once you’ve executed this, any binary packages created in that virtual environment will be executed from there, in preference to the file system packages.

You can tell that you’ve “activated” this virtual environment, because your prompt changes from user@HOST:~$ to (ansible_2.10) user@HOST:~$ which helps 😀

Next, let’s create a requirements.txt file. This will let us install the environment in a repeatable manner (which is useful with Ansible). Here’s the content of this file.

ansible>=2.10
boto3
botocore

So, this isn’t just Ansible, it’s also the supporting libraries we’ll need to talk to AWS from Ansible.

We execute the following command:

pip install -r requirements.txt

Note, on Windows Subsystem for Linux version 1 (which I’m using) this will take a reasonable while, particularly if it’s crossing from the WSL environment into the Windows environment, depending on where you have specified the virtual environment to be placed.

If you get an error message about something to do with being unable to install ffi, then you’ll need to install the package libffi-dev with sudo apt install -y libffi-dev and then re-run the pip install command above.

Once the installation has completed, you can run ansible --version to see something like the following:

ansible 2.10.2
  config file = None
  configured module search path = ['/home/user/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = /home/user/ansible_2.10/lib/python3.8/site-packages/ansible
  executable location = /home/user/ansible_2.10/bin/ansible
  python version = 3.8.2 (default, Jul 16 2020, 14:00:26) [GCC 9.3.0]

Configuring Ansible for local collections

Ansible relies on certain paths in the filesystem to store things like collections, roles and modules, but I like to circumvent these things – particularly if I’m developing something, or moving from one release to the next. Fortunately, Ansible makes this very easy, using a single file, ansible.cfg to tell the code that’s running in this path where to find things.

A quick note on File permissions with ansible.cfg

Note that the POSIX file permissions for the directory you’re in really matter! It must be set to 775 (-rwxrwxr-x) as a maximum – if it’s “world writable” (the last number) it won’t use this file! Other options include 770, 755. If you accidentally set this as world writable, or are using a directory from the “Windows” side of WSL, then you’ll get an error message like this:

[WARNING]: Ansible is being run in a world writable directory (/home/user/ansible_2.10_aws), ignoring it as an ansible.cfg source. For more information see
https://docs.ansible.com/ansible/devel/reference_appendices/config.html#cfg-in-world-writable-dir

That link is this one: https://docs.ansible.com/ansible/devel/reference_appendices/config.html#cfg-in-world-writable-dir and has some useful advice.

Back to configuring Ansible

In ansible.cfg, I have the following configured:

[defaults]
collections_paths = ./collections:~/.ansible/collections:/usr/share/ansible/collections

This file didn’t previously exist in this directory, so I created that file.

This block asks Ansible to check the following paths in order:

  • collections in this path (e.g. /home/user/ansible_2.10_aws/collections)
  • collections in the .ansible directory under the user’s home directory (e.g. /home/user/.ansible/collections)
  • and finally /usr/share/ansible/collections for system-wide collections.

If you don’t configure Ansible with the ansible.cfg file, the default is to store the collections in ~/.ansible/collections, but you can “only have one version of the collection”, so this means that if you’re relying on things not to change when testing, or if you’re running multiple versions of Ansible on your system, then it’s safest to store the collections in the same file tree as you’re working in!

Installing Collections

Now we have Ansible 2.10 installed, and our Ansible configuration file set up, let’s get our collection ready to install. We do this with a requirements.yml file, like this:

---
collections:
- name: amazon.aws
  version: ">=1.2.1"

What does this tell us? Firstly, that we want to install the Amazon AWS collection from Ansible Galaxy. Secondly that we want at least the most current version (which is currently version 1.2.1). If you leave the version line out, it’ll get “the latest” version. If you replace ">=1.2.1" with 1.2.1 it’ll install exactly that version from Galaxy.

If you want any other collections, you add them as subsequent lines (more details here), like this:

collections:
- name: amazon.aws
  version: ">=1.2.1"
- name: some.other
- name: git+https://example.com/someorg/somerepo.git
  version: 1.0.0
- name: git@example.com:someorg/someotherrepo.git

Once we’ve got this file, we run this command to install the content of the requirements.yml: ansible-galaxy collection install -r requirements.yml

In our case, this installs just the amazon.aws collection, which is what we want. Fab!

Getting our dynamic inventory

Right, so we’ve got all the pieces now that we need! Let’s tell Ansible that we want it to ask AWS for an inventory. There are three sections to this.

Configuring Ansible, again!

We need to open up our ansible.cfg file. Because we’re using the collection to get our Dynamic Inventory plugin, we need to tell Ansible to use that plugin. Edit ./ansible.cfg in your favourite editor, and add this block to the end:

[inventory]
enable_plugins = aws_ec2

If you previously created the ansible.cfg file when you were setting up to get the collection installed alongside, then your ansible.cfg file will look (something) like this:

[defaults]
collections_paths     = ./collections:~/.ansible/collections:/usr/share/ansible/collections

[inventory]
enable_plugins = amazon.aws.aws_ec2

Configure AWS

Your machine needs to have access tokens to interact with the AWS API. These are stored in ~/.aws/credentials (e.g. /home/user/.aws/credentials) and look a bit like this:

[default]
aws_access_key_id = A1B2C3D4E5F6G7H8I9J0
aws_secret_access_key = A1B2C3D4E5F6G7H8I9J0a1b2c3d4e5f6g7h8i9j0

Set up your inventory

In a bit of a change to how Ansible usually does the inventory, to have a plugin based dynamic inventory, you can’t specify a file any more, you have to specify a directory. So, create the file ./inventory/aws_ec2.yaml (having created the directory inventory first). The file contains the following:

---
plugin: amazon.aws.aws_ec2

Late edit 2020-12-01: Further to the comment by Giovanni, I’ve amended this file snippet from plugin: aws_ec2 to plugin: amazon.aws.aws_ec2.

By default, this just retrieves the hostnames of any running EC2 instance, as you can see by running ansible-inventory -i inventory --graph

@all:
  |--@aws_ec2:
  |  |--ec2-176-34-76-187.eu-west-1.compute.amazonaws.com
  |  |--ec2-54-170-131-24.eu-west-1.compute.amazonaws.com
  |  |--ec2-54-216-87-131.eu-west-1.compute.amazonaws.com
  |--@ungrouped:

I need a bit more detail than this – I like to use the tags I assign to AWS assets to decide what I’m going to target the machines with. I also know exactly which regions I’ve got my assets in, and what I want to use to get the names of the devices, so this is what I’ve put in my aws_ec2.yaml file:

---
plugin: amazon.aws.aws_ec2
keyed_groups:
- key: tags
  prefix: tag
- key: 'security_groups|json_query("[].group_name")'
  prefix: security_group
- key: placement.region
  prefix: aws_region
- key: tags.Role
  prefix: role
regions:
- eu-west-1
hostnames:
- tag:Name
- dns-name
- public-ip-address
- private-ip-address

Late edit 2020-12-01: Again, I’ve amended this file snippet from plugin: aws_ec2 to plugin: amazon.aws.aws_ec2.

Now, when I run ansible-inventory -i inventory --graph, I get this output:

@all:
  |--@aws_ec2:
  |  |--euwest1-firewall
  |  |--euwest1-demo
  |  |--euwest1-manager
  |--@aws_region_eu_west_1:
  |  |--euwest1-firewall
  |  |--euwest1-demo
  |  |--euwest1-manager
  |--@role_Firewall:
  |  |--euwest1-firewall
  |--@role_Firewall_Manager:
  |  |--euwest1-manager
  |--@role_VM:
  |  |--euwest1-demo
  |--@security_group_euwest1_allow_all:
  |  |--euwest1-firewall
  |  |--euwest1-demo
  |  |--euwest1-manager
  |--@tag_Name_euwest1_firewall:
  |  |--euwest1-firewall
  |--@tag_Name_euwest1_demo:
  |  |--euwest1-demo
  |--@tag_Name_euwest1_manager:
  |  |--euwest1-manager
  |--@tag_Role_Firewall:
  |  |--euwest1-firewall
  |--@tag_Role_Firewall_Manager:
  |  |--euwest1-manager
  |--@tag_Role_VM:
  |  |--euwest1-demo
  |--@ungrouped:

To finish

Now you have your dynamic inventory, you can target your playbook at any of the groups listed above (like role_Firewall, aws_ec2, aws_region_eu_west_1 or some other tag) like you would any other inventory assignment, like this:

---
- hosts: role_Firewall
  gather_facts: false
  tasks:
  - name: Show the name of this device
    debug:
      msg: "{{ inventory_hostname }}"

And there you have it. Hope this is useful!

Late edit: 2020-11-23: Following a conversation with Andy from Work, we’ve noticed that if you’re trying to do SSM connections, rather than username/password based ones, you might want to put this in your aws_ec2.yml file:

---
plugin: amazon.aws.aws_ec2
hostnames:
  - tag:Name
compose:
  ansible_host: instance_id
  ansible_connection: 'community.aws.aws_ssm'

Late edit 2020-12-01: One final instance, I’ve changed plugin: aws_ec2 to plugin: amazon.aws.aws_ec2.

This will keep your hostnames “pretty” (with whatever you’ve tagged it as), but will let you connect over SSM to the Instance ID. Good fun :)

Featured image is “inventory” by “Lee” on Flickr and is released under a CC-BY-SA license.

"centos login" by "fsse8info" on Flickr

Getting the default username and AMI for an OS with #Terraform

I have a collection of AWS AMIs I use for various builds at work. These come from two places – the AWS Marketplace and our internal Build process.

Essentially, our internal builds (for those who work for my employer – these are the OptiMISe builds) are taken from specific AWS Marketplace builds and hardened.

Because I don’t want to share the AMI details when I put stuff on GitHub, I have an override.tf file that handles the different AMI search strings. So, here’s the ami.tf file I have with the AWS Marketplace version:

data "aws_ami" "centos7" {
  most_recent = true

  filter {
    name   = "name"
    values = ["CentOS Linux 7 x86_64 HVM EBS ENA*"]
  }

  filter {
    name   = "architecture"
    values = ["x86_64"]
  }

  owners = ["679593333241"] # CentOS Project
}

And here’s an example of the override.tf file I have:

data "aws_ami" "centos7" {
  most_recent = true

  filter {
    name   = "name"
    values = ["SomeUniqueString Containing CentOS*"]
  }

  owners = ["123456789012"]
}

Next, I put these AMI images into a “null” data source, which is evaluated at runtime:

data "null_data_source" "os" {
  inputs = {
    centos7 = data.aws_ami.centos7.id
  }
}

I always forget which username goes with each image, so in the ami.tf file, I also have this:

variable "username" {
  type = map(string)
  default = {
    centos7 = "centos"
  }
}

And in the override.tf file, I have this:

variable "username" {
  type = map(string)
  default = {
    centos7 = "someuser"
  }
}

To get the right combination of username and AMI, I have this in the file where I create my “instance” (virtual machine):

variable "os" {
  default = "centos7"
}

resource "aws_instance" "vm01" {
  ami = data.null_data_source.os.outputs[var.os]
  # additional lines omitted for brevity
}

output "username" {
  value = var.username[var.os]
}

output "vm01" {
  value = aws_instance.vm01.public_ip
}

And that way, I get the VM’s default username and IP address on build. Nice.

Late edit – 2020-09-20: It’s worth noting that this is fine for short-lived builds, proof of concept, etc. But, for longer lived environments, you should be calling out exactly which AMI you’re using, right from the outset. That way, your builds will (or should) all start out from the same point, no ambiguity about exactly which point release they’re getting, etc.

Featured image is “centos login” by “fsse8info” on Flickr and is released under a CC-BY-SA license.

"Stockholms Stadsbibliotek" by "dilettantiquity" on Flickr

Terraform templates with Maps

For a project I’m working on, I needed to define a list of ports, and set some properties on some of them. In the Ansible world, you’d use statements like:

{% if data.somekey is defined %}something {{ data.somekey }}{% endif %}

or

{{ data.somekey | default('') }}

In a pinch, you can also do this:

{{ (data | default({}) ).somekey | default('') }}

With Terraform, I was finding it much harder to work out how to find whether a value as part of a map (the Terraform term for a Dictionary in Ansible terms, or an Associative Array in PHP terms), until I stumbled over the Lookup function. Here’s how that looks for just a simple Terraform file:

output "test" {
    value = templatefile(
        "template.tmpl",
        {
            ports = {
                "eth0": {"ip": "192.168.1.1/24", "name": "public"}, 
                "eth1": {"ip": "172.16.1.1/24", "name": "protected"}, 
                "eth2": {"ip": "10.1.1.1/24", "name": "management", "management": true}, 
                "eth3": {}
            },
            management = "1"
        }
    )
}

And the template that goes with that?

%{ for port, data in ports ~}
Interface: ${port}%{ if lookup(data, "name", "") != ""}
Alias: ${ lookup(data, "name", "") }%{ endif }
Services: ping%{ if lookup(data, "management", false) == true } ssh https%{ endif }
IP: ${ lookup(data, "ip", "Not Defined") }

%{ endfor }

This results in the following output:

C:\tf>terraform.exe apply -auto-approve

Apply complete! Resources: 0 added, 0 changed, 0 destroyed.

Outputs:

test = Interface: eth0
Alias: public
Services: ping
IP: 192.168.1.1/24

Interface: eth1
Alias: protected
Services: ping
IP: 172.16.1.1/24

Interface: eth2
Alias: management
Services: ping ssh https
IP: 10.1.1.1/24

Interface: eth3
Services: ping
IP: Not Defined

Naturally, using this in your own user-data or Custom Data field will probably make more sense than just writing it to “output” 😁

Featured image is “Stockholms Stadsbibliotek” by “dilettantiquity” on Flickr and is released under a CC-BY-SA license.

"Sydney Observatory I" by "Newtown grafitti" on Flickr

Using Feature Flags in Terraform with Count Statements

In a project I’m working on in Terraform, I’ve got several feature flags in a module. These flags relate to whether this module should turn on a system in a cloud provider, or not, and looks like this:

variable "turn_on_feature_x" {
  description = "Setting this to 'yes' will enable Feature X. Any other value will disable it. (Default 'yes')"
  value = "yes"
}

variable "turn_on_feature_y" {
  description = "Setting this to 'yes' will enable Feature Y. Any other value will disable it. (Default 'no')"
  value = "no"
}

When I call the module, I then can either leave the feature with the default values, or selectively enable or disable them, like this:

module "region1" {
  source = "./my_module"
}

module "region2" {
  source = "./my_module"
  turn_on_feature_x = "no"
  turn_on_feature_y = "yes"
}

module "region3" {
  source = "./my_module"
  turn_on_feature_y = "yes"
}

module "region4" {
  source = "./my_module"
  turn_on_feature_x = "no"
}

# Result:
# region1 has X=yes, Y=no
# region2 has X=no, Y=yes
# region3 has X=yes, Y=yes
# region4 has X=no, Y=no

When I then want to use the feature, I have to remember a couple of key parts.

  1. Normally this feature check is done with a “count” statement, and the easiest way to use this is to use the ternary operator to check values and return a “1” or a “0” for if you want the value used.

    Ternary operators look like this: var.turn_on_feature_x == "yes" ? 1 : 0 which basically means, if the value of the variable turn_on_feature_x is set to “yes”, then return 1 otherwise return 0.

    This can get a bit complex, particularly if you want to check several flags a few times, like this: var.turn_on_feature_x == "yes" ? var.turn_on_feature_y == "yes" ? 1 : 0 : 0. I’ve found that wrapping them in brackets helps to understand what you’re getting, like this:

    (
      var.turn_on_feature_x == "yes" ?
      (
        var.turn_on_feature_y == "yes" ?
        1 :
        0
      ) :
      0
    )
  2. If you end up using a count statement, the resulting value must be treated as an 0-indexed array, like this: some_provider_service.my_name[0].result

    This is because, using the count value says “I want X number of resources”, so Terraform has to treat it as an array, in case you actually wanted 10 instead of 1 or 0.

Here’s an example of that in use:

resource "aws_guardduty_detector" "Region" {
  count = var.enable_guardduty == "yes" ? 1 : 0
  enable = true
}

resource "aws_cloudwatch_event_rule" "guardduty_finding" {
  count = (var.enable_guardduty == "yes" ? (var.send_guardduty_findings_to_sns == "yes" ? 1 : (var.send_guardduty_findings_to_sqs == "yes" ? 1 : 0)) : 0)
  name = "${data.aws_caller_identity.current.account_id}-${data.aws_region.current.name}-${var.sns_guardduty_finding_suffix}"
  event_pattern = <<PATTERN
{
  "source": [
    "aws.guardduty"
  ],
  "detail-type": [
    "GuardDuty Finding"
  ]
}
PATTERN
}

resource "aws_cloudwatch_event_target" "sns_guardduty_finding" {
  count = (var.enable_guardduty == "yes" ? (var.send_guardduty_findings_to_sns == "yes" ? 1 : 0) : 0)
  rule = aws_cloudwatch_event_rule.guardduty_finding[0].name
  target_id = aws_sns_topic.guardduty_finding[0].name
  arn = aws_sns_topic.guardduty_finding[0].arn
}

resource "aws_cloudwatch_event_target" "sqs_guardduty_finding" {
  count = (var.enable_guardduty == "yes" ? (var.send_guardduty_findings_to_sqs == "yes" ? 1 : 0) : 0)
  rule = aws_cloudwatch_event_rule.guardduty_finding[0].name
  target_id = "SQS"
  arn = aws_sqs_queue.guardduty_finding[0].arn
}

One thing that bit me rather painfully around this was that if you change from an uncounted resource, like this:

resource "some_tool" "this" {
  some_setting = 1
}

To a counted resource, like this:

resource "some_tool" "this" {
  count = var.some_tool == "yes" ? 1 : 0
  some_setting = 1
}

Then, Terraform will promptly destroy some_tool.this to replace it with some_tool.this[0], because they’re not the same referenced thing!

Fun, huh? 😊

Featured image is “Sydney Observatory I” by “Newtown grafitti” on Flickr and is released under a CC-BY license.

"Tracking Methane Sources and Movement Around the Globe" by "NASA/Scientific Visualization Studio" on Nasa.gov

Flexibly loading files in Terraform to license a FortiGate firewall on AWS, Azure and other Cloud platforms

One of the things I’m currently playing with is a project to deploy some FortiGate Firewalls into cloud platforms. I have a couple of Evaluation Licenses I can use (as we’re a partner), but when it comes to automatically scaling, you need to use the PAYG license.

To try to keep my terraform files as reusable as possible, I came up with this work around. It’s likely to be useful in other places too. Enjoy!

This next block is stored in license.tf and basically says “by default, you have no license.”

variable "license_file" {
  default = ""
  description = "Path to the license file to load, or leave blank to use a PAYG license."
}

We can either override this with a command line switch terraform apply -var 'license_file=mylicense.lic', or (more likely) the above override file named license_override.tf (ignored in Git) which has this next block in it:

variable "license_file" {
  default = "mylicense.lic"
}

This next block is also stored in license.tf and says “If var.license is not empty, load that license file [var.license != "" ? var.license] but if it is empty, check whether /dev/null exists (*nix platforms) [fileexists("/dev/null")] in which case, use /dev/null, otherwise use the NUL: device (Windows platforms).”

data "local_file" "license" {
  filename = var.license_file != "" ? var.license_file : fileexists("/dev/null") ? "/dev/null" : "NUL:"
}

👉 Just as an aside, I’ve seen this “ternary” construct in a few languages. It basically looks like this: boolean_operation ? true_value : false_value

That check, logically, could have been written like this instead: "%{if boolean_operation}${true_value}%{else}${false_value}%{endif}"

By combining two of these together, while initially it looks far more messy and hard to parse, I’ve found that, especially in single-line statements, it’s much more compact and eventually easier to read than the alternative if/else/endif structure.

So, this means that we can now refer to data.local_file.license as our data source.

Next, I want to select either the PAYG (Pay As You Go) or BYOL (Bring Your Own License) licensed AMI in AWS (the same principle applies in Azure, GCP, etc), so in this block we provide a different value to the filter in the AMI Data Source, suggesting the string “FortiGate-VM64-AWS *x.y.z*” if we have a value provided license, or “FortiGate-VM64-AWSONDEMAND *x.y.z*” if we don’t.

data "aws_ami" "FortiGate" {
  most_recent = true

  filter {
    name   = "name"
    values = ["FortiGate-VM64-AWS%{if data.local_file.license.content == ""}ONDEMAND%{endif} *${var.release}*"]
  }

  filter {
    name   = "virtualization-type"
    values = ["hvm"]
  }

  owners = ["679593333241"] # AWS
}

And the very last thing is to create the user-data template (known as customdata in Azure), using this block:

data "template_cloudinit_config" "config" {
  gzip          = false
  base64_encode = false

  part {
    filename     = "config"
    content_type = "multipart/mixed"
    content      = templatefile(
      "${path.module}/user_data.txt.tmpl",
      {
        hostname = "firewall"
      }
    )
  }

  part {
    filename     = "license"
    content_type = "text/plain"
    content      = data.local_file.license.content
  }
}

And so that is how I can elect to provide a license, or use a pre-licensed image from AWS, and these lessons can also be applied in an Azure or GCP environment too.

Featured image is “Tracking Methane Sources and Movement Around the Globe” by “NASA/Scientific Visualization Studio

"Kelvin Test" by "Eelke" on Flickr

In Ansible, determine the type of a value, and casting those values to other types

TL;DR? It’s possible to work out what type of variable you’re working with in Ansible. The in-built filters don’t always do quite what you’re expecting. Jump to the “In Summary” heading for my suggestions.

LATE EDIT: 2021-05-23 After raising a question in #ansible on Freenode, flowerysong noticed that my truth table around mappings, iterables and strings was wrong. I’ve amended the table accordingly, and have added a further note below the table.

One of the things I end up doing quite a bit with Ansible is value manipulation. I know it’s not really normal, but… well, I like rewriting values from one type of a thing to the next type of a thing.

For example, I like taking a value that I don’t know if it’s a list or a string, and passing that to an argument that expects a list.

Doing it wrong, getting it better

Until recently, I’d do that like this:

- debug:
    msg: |-
      {
        {%- if value | type_debug == "string" or value | type_debug == "AnsibleUnicode" -%}
           "string": "{{ value }}"
        {%- elif value | type_debug == "dict" or value | type_debug == "ansible_mapping" -%}
          "dict": {{ value }}
        {%- elif value | type_debug == "list" -%}
          "list": {{ value }}
        {%- else -%}
          "other": "{{ value }}"
        {%- endif -%}
      }

But, following finding this gist, I now know I can do this:

- debug:
    msg: |-
      {
        {%- if value is string -%}
           "string": "{{ value }}"
        {%- elif value is mapping -%}
          "dict": {{ value }}
        {%- elif value is iterable -%}
          "list": {{ value }}
        {%- else -%}
          "other": "{{ value }}"
        {%- endif -%}
      }

So, how would I use this, given the context of what I was saying before?

- assert:
    that:
    - value is string
    - value is not mapping
    - value is iterable
- some_module:
    some_arg: |-
      {%- if value is string -%}
        ["{{ value }}"]
      {%- else -%}
        {{ value }}
      {%- endif -%}

More details on finding a type

Why in this order? Well, because of how values are stored in Ansible, the following states are true:

⬇️Type \ ➡️Checkis iterableis mappingis sequenceis string
a_dict (e.g. {})✔️✔️✔️
a_list (e.g. [])✔️✔️
a_string (e.g. “”)✔️✔️✔️
A comparison between value types

So, if you were to check for is iterable first, you might match on a_list or a_dict instead of a_string, but string can only match on a_string. Once you know it can’t be a string, you can check whether something is mapping – again, because a mapping can only match a_dict, but it can’t match a_list or a_string. Once you know it’s not that, you can check for either is iterable or is sequence because both of these match a_string, a_dict and a_list.

LATE EDIT: 2021-05-23 Note that a prior revision of this table and it’s following paragraph showed “is_mapping” as true for a_string. This is not correct, and has been fixed, both in the table and the paragraph.

Likewise, if you wanted to check whether a_float and an_integer is number and not is string, you can check these:

⬇️Type \ ➡️Checkis floatis integeris iterableis mappingis numberis sequenceis string
a_float✔️✔️
an_integer✔️✔️
A comparison between types of numbers

So again, a_float and an_integer don’t match is string, is mapping or is iterable, but they both match is number and they each match their respective is float and is integer checks.

How about each of those (a_float and an_integer) wrapped in quotes, making them a string? What happens then?

⬇️Type \ ➡️Checkis floatis integeris iterableis mappingis numberis sequenceis string
a_float_as_string✔️✔️✔️
an_integer_as_string✔️✔️✔️
A comparison between types of numbers when held as a string

This is somewhat interesting, because they look like a number, but they’re actually “just” a string. So, now you need to do some comparisons to make them look like numbers again to check if they’re numbers.

Changing the type of a string

What happens if you cast the values? Casting means to convert from one type of value (e.g. string) into another (e.g. float) and to do that, Ansible has three filters we can use, float, int and string. You can’t cast to a dict or a list, but you can use dict2items and items2dict (more on those later). So let’s start with casting our group of a_ and an_ items from above. Here’s a list of values I want to use:

---
- hosts: localhost
  gather_facts: no
  vars:
    an_int: 1
    a_float: 1.1
    a_string: "string"
    an_int_as_string: "1"
    a_float_as_string: "1.1"
    a_list:
      - item1
    a_dict:
      key1: value1

With each of these values, I returned the value as Ansible knows it, what happens when you do {{ value | float }} to cast it as a float, as an integer by doing {{ value | int }} and as a string {{ value | string }}. Some of these results are interesting. Note that where you see u'some value' means that Python converted that string to a Unicode string.

⬇️Value \ ➡️Castvaluevalue when cast as floatvalue when cast as integervalue when cast as string
a_dict{“key1”: “value1”}0.00“{u’key1′: u’value1′}”
a_float1.11.11“1.1”
a_float_as_string“1.1”1.11“1.1”
a_list[“item1”]0.00“[u’item1′]”
a_string“string”0.00“string”
an_int111“1”
an_int_as_string“1”11“1”
Casting between value types

So, what does this mean for us? Well, not a great deal, aside from to note that you can “force” a number to be a string, or a string which is “just” a number wrapped in quotes can be forced into being a number again.

Oh, and casting dicts to lists and back again? This one is actually pretty clearly documented in the current set of documentation (as at 2.9 at least!)

Checking for miscast values

How about if I want to know whether a value I think might be a float stored as a string, how can I check that?

{{ vars[var] | float | string == vars[var] | string }}

What is this? If I cast a value that I think might be a float, to a float, and then turn both the cast value and the original into a string, do they match? If I’ve got a string or an integer, then I’ll get a false, but if I have actually got a float, then I’ll get true. Likewise for casting an integer. Let’s see what that table looks like:

⬇️Type \ ➡️Checkvalue when cast as floatvalue when cast as integervalue when cast as string
a_float✔️✔️
a_float_as_string✔️✔️
an_integer✔️✔️
an_integer_as_string✔️✔️
A comparison between types of numbers when cast to a string

So this shows us the values we were after – even if you’ve got a float (or an integer) stored as a string, by doing some careful casting, you can confirm they’re of the type you wanted… and then you can pass them through the right filter to use them in your playbooks!

Booleans

Last thing to check – boolean values – “True” or “False“. There’s a bit of confusion here, as a “boolean” can be: true or false, yes or no, 1 or 0, however, is true and True and TRUE the same? How about false, False and FALSE? Let’s take a look!

⬇️Value \ ➡️Checktype_debug is booleanis numberis iterableis mappingis stringvalue when cast as boolvalue when cast as stringvalue when cast as integer
yesbool✔️✔️TrueTrue1
YesAnsibleUnicode✔️✔️FalseYes0
YESAnsibleUnicode✔️✔️FalseYES0
“yes”AnsibleUnicode✔️✔️Trueyes0
“Yes”AnsibleUnicode✔️✔️TrueYes0
“YES”AnsibleUnicode✔️✔️TrueYES0
truebool✔️✔️TrueTrue1
Truebool✔️✔️TrueTrue1
TRUEbool✔️✔️TrueTrue1
“true”AnsibleUnicode✔️✔️Truetrue0
“True”AnsibleUnicode✔️✔️TrueTrue0
“TRUE”AnsibleUnicode✔️✔️TrueTRUE0
1int✔️True11
“1”AnsibleUnicode✔️✔️True11
nobool✔️✔️FalseFalse0
Nobool✔️✔️FalseFalse0
NObool✔️✔️FalseFalse0
“no”AnsibleUnicode✔️✔️Falseno0
“No”AnsibleUnicode✔️✔️FalseNo0
“NO”AnsibleUnicode✔️✔️FalseNO0
falsebool✔️✔️FalseFalse0
Falsebool✔️✔️FalseFalse0
FALSEbool✔️✔️FalseFalse0
“false”AnsibleUnicode✔️✔️Falsefalse0
“False”AnsibleUnicode✔️✔️FalseFalse0
“FALSE”AnsibleUnicode✔️✔️FalseFALSE0
0int✔️False00
“0”AnsibleUnicode✔️✔️False00
Comparisons between various stylings of boolean representations

So, the stand out thing for me here is that while all the permutations of string values of the boolean representations (those wrapped in quotes, like this: "yes") are treated as strings, and shouldn’t be considered as “boolean” (unless you cast for it explicitly!), and all non-string versions of true, false, and no are considered to be boolean, yes, Yes and YES are treated differently, depending on case. So, what would I do?

In summary

  • Consistently use no or yes, true or false in lower case to indicate a boolean value. Don’t use 1 or 0 unless you have to.
  • If you’re checking that you’re working with a string, a list or a dict, check in the order string (using is string), dict (using is mapping) and then list (using is sequence or is iterable)
  • Checking for numbers that are stored as strings? Cast your string through the type check for that number, like this: {% if value | float | string == value | string %}{{ value | float }}{% elif value | int | string == value | string %}{{ value | int }}{% else %}{{ value }}{% endif %}
  • Try not to use type_debug unless you really can’t find any other way. These values will change between versions, and this caused me a lot of issues with a large codebase I was working on a while ago!

Run these tests yourself!

Want to run these tests yourself? Here’s the code I ran (also available in a Gist on GitHub), using Ansible 2.9.10.

---
- hosts: localhost
  gather_facts: no
  vars:
    an_int: 1
    a_float: 1.1
    a_string: "string"
    an_int_as_string: "1"
    a_float_as_string: "1.1"
    a_list:
      - item1
    a_dict:
      key1: value1
  tasks:
    - debug:
        msg: |
          {
          {% for var in ["an_int", "an_int_as_string","a_float", "a_float_as_string","a_string","a_list","a_dict"] %}
            "{{ var }}": {
              "type_debug": "{{ vars[var] | type_debug }}",
              "value": "{{ vars[var] }}",
              "is float": "{{ vars[var] is float }}",
              "is integer": "{{ vars[var] is integer }}",
              "is iterable": "{{ vars[var] is iterable }}",
              "is mapping": "{{ vars[var] is mapping }}",
              "is number": "{{ vars[var] is number }}",
              "is sequence": "{{ vars[var] is sequence }}",
              "is string": "{{ vars[var] is string }}",
              "value cast as float": "{{ vars[var] | float }}",
              "value cast as integer": "{{ vars[var] | int }}",
              "value cast as string": "{{ vars[var] | string }}",
              "is same when cast to float": "{{ vars[var] | float | string == vars[var] | string }}",
              "is same when cast to integer": "{{ vars[var] | int | string == vars[var] | string }}",
              "is same when cast to string": "{{ vars[var] | string == vars[var] | string }}",
            },
          {% endfor %}
          }
---
- hosts: localhost
  gather_facts: false
  vars:
    # true, True, TRUE, "true", "True", "TRUE"
    a_true: true
    a_true_initial_caps: True
    a_true_caps: TRUE
    a_string_true: "true"
    a_string_true_initial_caps: "True"
    a_string_true_caps: "TRUE"
    # yes, Yes, YES, "yes", "Yes", "YES"
    a_yes: yes
    a_yes_initial_caps: Tes
    a_yes_caps: TES
    a_string_yes: "yes"
    a_string_yes_initial_caps: "Yes"
    a_string_yes_caps: "Yes"
    # 1, "1"
    a_1: 1
    a_string_1: "1"
    # false, False, FALSE, "false", "False", "FALSE"
    a_false: false
    a_false_initial_caps: False
    a_false_caps: FALSE
    a_string_false: "false"
    a_string_false_initial_caps: "False"
    a_string_false_caps: "FALSE"
    # no, No, NO, "no", "No", "NO"
    a_no: no
    a_no_initial_caps: No
    a_no_caps: NO
    a_string_no: "no"
    a_string_no_initial_caps: "No"
    a_string_no_caps: "NO"
    # 0, "0"
    a_0: 0
    a_string_0: "0"
  tasks:
    - debug:
        msg: |
          {
          {% for var in ["a_true","a_true_initial_caps","a_true_caps","a_string_true","a_string_true_initial_caps","a_string_true_caps","a_yes","a_yes_initial_caps","a_yes_caps","a_string_yes","a_string_yes_initial_caps","a_string_yes_caps","a_1","a_string_1","a_false","a_false_initial_caps","a_false_caps","a_string_false","a_string_false_initial_caps","a_string_false_caps","a_no","a_no_initial_caps","a_no_caps","a_string_no","a_string_no_initial_caps","a_string_no_caps","a_0","a_string_0"] %}
            "{{ var }}": {
              "type_debug": "{{ vars[var] | type_debug }}",
              "value": "{{ vars[var] }}",
              "is float": "{{ vars[var] is float }}",
              "is integer": "{{ vars[var] is integer }}",
              "is iterable": "{{ vars[var] is iterable }}",
              "is mapping": "{{ vars[var] is mapping }}",
              "is number": "{{ vars[var] is number }}",
              "is sequence": "{{ vars[var] is sequence }}",
              "is string": "{{ vars[var] is string }}",
              "is bool": "{{ vars[var] is boolean }}",
              "value cast as float": "{{ vars[var] | float }}",
              "value cast as integer": "{{ vars[var] | int }}",
              "value cast as string": "{{ vars[var] | string }}",
              "value cast as bool": "{{ vars[var] | bool }}",
              "is same when cast to float": "{{ vars[var] | float | string == vars[var] | string }}",
              "is same when cast to integer": "{{ vars[var] | int | string == vars[var] | string }}",
              "is same when cast to string": "{{ vars[var] | string == vars[var] | string }}",
              "is same when cast to bool": "{{ vars[var] | bool | string == vars[var] | string }}",
            },
          {% endfor %}
          }

Featured image is “Kelvin Test” by “Eelke” on Flickr and is released under a CC-BY license.

"Accept a New SSH Host Key" by "Linux Screenshots" on Flickr

Purposefully Reducing SSH Security when performing Builds of short-lived devices

I’ve recently been developing a few builds of things at home using throw-away sessions of virtual machines, and I found myself repeatedly having to accept and even having to remove SSH host keys for things I knew wouldn’t be around for long. It’s not a huge disaster, but it’s an annoyance.

This annoyance comes from the fact that SSH uses a thing called “Trust-On-First-Use” (Or TOFU) to protect yourself against a “Man-in-the-Middle” attack (or even where the host has been replaced with something malicious), which, for infrastructure that has a long lifetime (anything more than a couple of days) makes sense! You’re building something you want to trust hasn’t been compromised! That said, if you’re building new virtual machines, testing something and then rebuilding it to prove your script worked… well, that’s not so useful!

So, in this case, if you’ve got a designated build network, or if you trust, implicitly, your normal working network, this is a dead simple work-around.

In $HOME/.ssh/config or in $HOME/.ssh/config.d/local (if you’ve followed my previous advice to use separate ssh config files), add the following stanza:

# RFC1918
Host 10.* 172.16.* 172.17.* 172.18.* 172.19.* 172.20.* 172.21.* 172.22.* 172.23.* 172.24.* 172.25.* 172.26.* 172.27.* 172.28.* 172.29.* 172.30.* 172.31.* 192.168.*
        StrictHostKeyChecking no
        UserKnownHostsFile /dev/null

# RFC5373 and RFC2544
Host 192.0.2.* 198.51.100.* 203.0.113.* 198.18.* 198.19.*
        StrictHostKeyChecking no
        UserKnownHostsFile /dev/null

These stanzas let you disable host key checking for any IP address in the RFC1918 ranges (10.0.0.0/8, 172.16.0.0/12 and 192.168.0.0/16), and for the RFC5373 ranges (192.0.2.0/24, 198.51.100.0/24 and 203.0.113.0/24) – which should be used for documentation, and for the RFC2544 range (198.18.0.0/15) which should be used for inter-network testing.

Alternatively, if you always use a DDNS provider for short-lived assignments (for example, I use davd/docker-ddns) then instead, you can use this stanza:

Host *.ddns.example.com
        StrictHostKeyChecking no
        UserKnownHostsFile /dev/null

(Assuming, of course, you use ddns.example.com as your DDNS address!)

Featured image is “Accept a New SSH Host Key” by “Linux Screenshots” on Flickr and is released under a CC-BY license.