"Blueprints" by "Cameron Degelia" on Flickr

Using Architectural Decision Records (ADR) with adr-tools

Introducing Architectural Decision Records

Over the last week, I discovered a new tool for my arsenal called Architectural Decision Records (ADR). They were first written about in 2011, in a post called “Documenting Architecture Decisions“, where the author, Michael Nygard, advocates for short documents explaining each decision that influences the architecture of an environment.

I found this via a Github repository, created by the team at gov.uk, which includes their ADR library, and references the tool they use to manage these documents – adr-tools.

Late edit 2021-01-25: I also found a post which suggests that Spotify uses ADR.

Installing adr-tools on Linux

Currently adr-tools are easier to install under OSX rather than Linux or Windows Subsystem for Linux (WSL) (I’m working on this – bear with me! πŸ˜ƒ ).

The current installation notes suggest for Linux (which would also work on WSL) is to download the latest release tar.gz or zip file and unpack it into your path somewhere. This isn’t exactly the best way to deploy anything on Linux, but… I guess it works right now?

For me, I downloaded the file, and unpacked the whole tar.gz file (as root) into /usr/local/bin/, giving me a directory of /usr/local/bin/adr-tools-3.0.0/. There’s a subdirectory in here, called src which contains a large number of files – mostly starting _adr or adr- and two additional files, init.md and template.md.

Rather than putting all of these files into /usr/local/bin directly, instead I leave them in the adr-tools-3.0.0 directory, and create a symbolic link (symlink) to the /usr/local/bin directory with this command:

cd /usr/local/bin
ln -s adr-tools-3.0.0/src/* .

This gives me all those files in one place, so I can refer to them later.

An aside – why link everything in that src directory? (Feel free to skip this block!)

Now, why, you might ask, do all of these unrelated files need to be in the same place? Well…. the author of the script has put this in at the top of almost all the files:

#!/bin/bash
set -e
eval "$($(dirname $0)/adr-config)"

And then in that script, it says:

#!/bin/bash
basedir=$(cd -L $(dirname $0) > /dev/null 2>&1 && pwd -L)
echo "adr_bin_dir=$basedir"
echo "adr_template_dir=$basedir"

There are, technically, good reasons for this! This is designed to be run in, what in the Windows world, you might call as a “Portable Script”. So, you bung adr-tools into some directory somewhere, and then just call adr somecommand and it knows that all the files are where they need to be. The (somewhat) down side to this is that if you just want to call adr somecommand rather than path/to/my/adr somecommand then all those files need to be there

I’m currently looking to see if I can improve this somewhat, so that it’s not quite so complex to install, but for now, that’s what you need.

Anyway…

Using adr-tools to document your decisions

I’ll start documenting a fictional hosted web service project, and note down some of the decisions which have been made.

Initializing your ADR directory

Start by running adr init. You may want to specify a directory where you want to put these records, so instead use: adr init path/to/adr, like this:

Initializing the ADR in “documentation/architecture-decisions” with adr init documentation/architecture-decisions

You’ll notice that when I run this command, it creates a new entry, called 0001-record-architecture-decisions.md. Let’s open this up, and see what’s in here.

The VSCode record for the choice to use ADR. It is a markdown file, with the standard types of data recorded.

In here we have the record ID (1.), the title of the record Record architecture decisions, the date the choice was made Date: 2021-01-19, a status of Accepted, the context on why we made this choice, the decision, and the consequences of making this decision. Make changes, if needed, and save it. Let’s move on.

Creating our first own record

This all is quite straightforward thus far. Let’s create our next record.

Issuing the command adr new <sometitle> you create the next ADR record.

Let’s open up that record.

The template for the ADR record for “Use AWS”.

Like the first record, we have a title, a status, a context, decision and consequences. Let’s define these.

A “finished” brief ADR record.

This document shouldn’t be very long! It just describes why a choice was made and what that entails.

Changing decisions – completely replacing (superseding) a decision

Of course, over time, decisions will be replaced due to various decisions elsewhere.

You can ask adr to supersede a previous record, using the “-s” flag, and the record number.

Let’s look at how that works on the second ADR record.

After the command adr new -s 2 Use Azure, the ADR record number 2 has a new status, “Superceded by” and the superseded linked document. Yes, “Superceded” is a typo. There is an open PR for it

So, under the “Status”, where is previously said “Approved”, it now says “Superceded by [3. Use Azure](0003-use-azure.md)“. This is a markdown statement which indicates where the superseded document is located. As I mentioned in the comment below the above image, there is an open Pull Request to fix this on the adr-tools, so hopefully that typo won’t last long!

We’ve got our new ADR too – let’s take a look at that one?

Our new ADR shows that it “supercedes” the previous record. Which is good! Typo aside :)

Other references

Of course, you don’t always completely overrule a decision. Sometimes your decision is influenced by, or has a dependency on something else, like this one.

We know which provider we’re using at long last, now let’s pick a region. Use the -l flag to “link” between the referenced and new ADR. The context for the -l flag is “<number>:<text for link to number>:<text for link in targetted document>”.

The command here is:

adr new -l '3:Dependency:Influences' Use Region UK South and UK West

I’m just going to crop from the “Status” block on both the referenced ADR (3) and the ADR which references it (4):

Status block in ADR 0003 which is referenced by ADR 0004
Status block in the new ADR 0004 which references ADR 0003

And of course, you can also use the same switch to mark documents as partially obsoleted, like this:

adr new -l '4:Partially obsoletes:Partially obsoleted by' Use West Europe region instead of UK West region
Status block in ADR 0004 indicating it’s partially obsoleted. Probably worth updating the status properly to show it’s not just “Accepted”.

If you forget to add the referencing in, you can also use the adr link command, like this:

adr link 3 Influences 5 Dependency

To be clear, that command adds a (complete) line to ADR 0003 saying “Influences [5. ADR Title](link)” and a separate (complete) line to ADR 0005 saying “Dependency [3. ADR Title](link)“.

What else can we do?

There are four other “things” that it’s worth doing at this point.

  1. Note that you can change the template per-ADR directory.

Create a directory called “templates” in the ADR directory, and put a file in there called “template.md“. Tweak this as you need. Ensure you have AT LEAST the line ## Status and # NUMBER. TITLE as these are required by the script.

A much abbreviated template file, containing just “Number”, “Title”, “Date”, “Status”, and a new dummy heading called “Stuff”.
And the result of running adr new Some Text once you’ve created that template.

As you can see, it’s possible to add all sorts of content in this template as a result. Bear in mind, before your template turns into something like this, that it’s supposed to be a short document explaining why each decision was made, not a funding proposal, or a complex epic of your user stories!

Be careful not to let your template run away with you!
  1. Note that you can automatically open an editor, by setting the EDITOR (where the process is expected to finish before returning control, like using nano, emacs or vim, for example) or VISUAL (where the process is expected to “fork”, like for example, gedit or vscode) environment variable, and then running adr new A Title, like this:
  1. We can create “Table of Contents” files, using the adr generate toc command, like this:
Generating the table of contents, for injecting into other files.

This can be included into your various other markdown files. There are switches, so you can set the link path, but your best bet is to find that using adr help generate toc.

  1. We can also generate graphviz files of the link maps between elements of the various ADRs, like this: adr generate graph | dot -Tjpg > graph.jpg

If you omit the “| dot -Tjpg > graph.jpg” part, then you’ll see the graphviz output, which looks like this: (I’ve removed the documents 6 and 7).

digraph {
  node [shape=plaintext];
  subgraph {
    _1 [label="1. Record architecture decisions"; URL="0001-record-architecture-decisions.html"];
    _2 [label="2. Use AWS"; URL="0002-use-aws.html"];
    _1 -> _2 [style="dotted", weight=1];
    _3 [label="3. Use Azure"; URL="0003-use-azure.html"];
    _2 -> _3 [style="dotted", weight=1];
    _4 [label="4. Use Region UK South and UK West"; URL="0004-use-region-uk-south-and-uk-west.html"];
    _3 -> _4 [style="dotted", weight=1];
    _5 [label="5. Use West Europe region instead of UK West region"; URL="0005-use-west-europe-region-instead-of-uk-west-region.html"];
    _4 -> _5 [style="dotted", weight=1];
  }
  _3 -> _2 [label="Supercedes", weight=0]
  _3 -> _5 [label="Influences", weight=0]
  _4 -> _3 [label="Dependency", weight=0]
  _5 -> _4 [label="Partially obsoletes", weight=0]
  _5 -> _3 [label="Dependency", weight=0]
}

To make the graphviz part work, you’ll need to install graphviz, which is just an apt get away.

Any caveats?

adr-tools is not actively maintained. I’ve contacted the author, about seeing if I can help out with the maintenance, but… we’ll see, and given some fairly high profile malware takeovers of projects like this sort of thing on Github, Docker, NPM, and more… I can see why there might be some reluctance to consider it! Also, I’m an unknown entity, I’ve just dropped in on the project and offered to help, with no previous exposure to the lead dev or the project… so, we’ll see. Worst case, I’ll fork it!

Working with this also requires an understanding of markdown files, and why these might be a useful document format for records like this. There was a PR submitted to support multiple file formats (like asciidoc and rst) but these were not approved by the author.

There is no current intention to support languages other than English. The tool is hard-coded to look for strings like “status” and “superceded” which is hard. Part of the reason I raised the PRs I did was to let me fix some of these sorts of issues. Again, we’ll see what happens.

Lastly, it can be overwhelming to see a lot of documents in one place, particularly if they’re as granular as the documents I produced in this demo. If the project supported categories, or could be broken down into components (like doc/adr/networking and doc/adr/server_builds and doc/adr/applications) then this might help, but it’s not on the roadmap right now!

Late edit 2021-01-25: If you don’t think these templates have enough context or content, there are lots of others listed on Joel Parker Henderson’s repo of examples and templates. If you want a python based viewer of ADR records, take a look at adr-viewer.

Featured image is β€œBlueprints” by β€œCameron Degelia” on Flickr and is released under a CC-BY license.

"inventory" by "Lee" on Flickr

Using a AWS Dynamic Inventory with Ansible 2.10

In Ansible 2.10, Ansible started bundling modules and plugins as “Collections”, basically meaning that Ansible didn’t need to make a release every time a vendor wanted to update the libraries it required, or API changes required new fields to be supplied to modules. As part of this split between “Collections” and “Core”, the AWS modules and plugins got moved into a collection.

Now, if you’re using Ansible 2.9 or earlier, this probably doesn’t impact you, but there are some nice features in Ansible 2.10 that I wanted to use, so… buckle up :)

Getting started with Ansible 2.10, using a virtual environment

If you currently are using Ansible 2.9, it’s probably worth creating a “python virtual environment”, or “virtualenv” to try out Ansible 2.10. I did this on my Ubuntu 20.04 machine by typing:

sudo apt install -y virtualenv
mkdir -p ~/bin
cd ~/bin
virtualenv -p python3 ansible_2.10

The above ensures that you have virtualenv installed, creates a directory called “bin” in your home directory, if it doesn’t already exist, and then places the virtual environment, using Python3, into a directory there called “ansible_2.10“.

Whenever we want to use this new environment you must activate it, using this command:

source ~/bin/ansible_2.10/bin/activate

Once you’ve executed this, any binary packages created in that virtual environment will be executed from there, in preference to the file system packages.

You can tell that you’ve “activated” this virtual environment, because your prompt changes from user@HOST:~$ to (ansible_2.10) user@HOST:~$ which helps πŸ˜€

Next, let’s create a requirements.txt file. This will let us install the environment in a repeatable manner (which is useful with Ansible). Here’s the content of this file.

ansible>=2.10
boto3
botocore

So, this isn’t just Ansible, it’s also the supporting libraries we’ll need to talk to AWS from Ansible.

We execute the following command:

pip install -r requirements.txt

Note, on Windows Subsystem for Linux version 1 (which I’m using) this will take a reasonable while, particularly if it’s crossing from the WSL environment into the Windows environment, depending on where you have specified the virtual environment to be placed.

If you get an error message about something to do with being unable to install ffi, then you’ll need to install the package libffi-dev with sudo apt install -y libffi-dev and then re-run the pip install command above.

Once the installation has completed, you can run ansible --version to see something like the following:

ansible 2.10.2
  config file = None
  configured module search path = ['/home/user/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = /home/user/ansible_2.10/lib/python3.8/site-packages/ansible
  executable location = /home/user/ansible_2.10/bin/ansible
  python version = 3.8.2 (default, Jul 16 2020, 14:00:26) [GCC 9.3.0]

Configuring Ansible for local collections

Ansible relies on certain paths in the filesystem to store things like collections, roles and modules, but I like to circumvent these things – particularly if I’m developing something, or moving from one release to the next. Fortunately, Ansible makes this very easy, using a single file, ansible.cfg to tell the code that’s running in this path where to find things.

A quick note on File permissions with ansible.cfg

Note that the POSIX file permissions for the directory you’re in really matter! It must be set to 775 (-rwxrwxr-x) as a maximum – if it’s “world writable” (the last number) it won’t use this file! Other options include 770, 755. If you accidentally set this as world writable, or are using a directory from the “Windows” side of WSL, then you’ll get an error message like this:

[WARNING]: Ansible is being run in a world writable directory (/home/user/ansible_2.10_aws), ignoring it as an ansible.cfg source. For more information see
https://docs.ansible.com/ansible/devel/reference_appendices/config.html#cfg-in-world-writable-dir

That link is this one: https://docs.ansible.com/ansible/devel/reference_appendices/config.html#cfg-in-world-writable-dir and has some useful advice.

Back to configuring Ansible

In ansible.cfg, I have the following configured:

[defaults]
collections_paths = ./collections:~/.ansible/collections:/usr/share/ansible/collections

This file didn’t previously exist in this directory, so I created that file.

This block asks Ansible to check the following paths in order:

  • collections in this path (e.g. /home/user/ansible_2.10_aws/collections)
  • collections in the .ansible directory under the user’s home directory (e.g. /home/user/.ansible/collections)
  • and finally /usr/share/ansible/collections for system-wide collections.

If you don’t configure Ansible with the ansible.cfg file, the default is to store the collections in ~/.ansible/collections, but you can “only have one version of the collection”, so this means that if you’re relying on things not to change when testing, or if you’re running multiple versions of Ansible on your system, then it’s safest to store the collections in the same file tree as you’re working in!

Installing Collections

Now we have Ansible 2.10 installed, and our Ansible configuration file set up, let’s get our collection ready to install. We do this with a requirements.yml file, like this:

---
collections:
- name: amazon.aws
  version: ">=1.2.1"

What does this tell us? Firstly, that we want to install the Amazon AWS collection from Ansible Galaxy. Secondly that we want at least the most current version (which is currently version 1.2.1). If you leave the version line out, it’ll get “the latest” version. If you replace ">=1.2.1" with 1.2.1 it’ll install exactly that version from Galaxy.

If you want any other collections, you add them as subsequent lines (more details here), like this:

collections:
- name: amazon.aws
  version: ">=1.2.1"
- name: some.other
- name: git+https://example.com/someorg/somerepo.git
  version: 1.0.0
- name: git@example.com:someorg/someotherrepo.git

Once we’ve got this file, we run this command to install the content of the requirements.yml: ansible-galaxy collection install -r requirements.yml

In our case, this installs just the amazon.aws collection, which is what we want. Fab!

Getting our dynamic inventory

Right, so we’ve got all the pieces now that we need! Let’s tell Ansible that we want it to ask AWS for an inventory. There are three sections to this.

Configuring Ansible, again!

We need to open up our ansible.cfg file. Because we’re using the collection to get our Dynamic Inventory plugin, we need to tell Ansible to use that plugin. Edit ./ansible.cfg in your favourite editor, and add this block to the end:

[inventory]
enable_plugins = aws_ec2

If you previously created the ansible.cfg file when you were setting up to get the collection installed alongside, then your ansible.cfg file will look (something) like this:

[defaults]
collections_paths     = ./collections:~/.ansible/collections:/usr/share/ansible/collections

[inventory]
enable_plugins = amazon.aws.aws_ec2

Configure AWS

Your machine needs to have access tokens to interact with the AWS API. These are stored in ~/.aws/credentials (e.g. /home/user/.aws/credentials) and look a bit like this:

[default]
aws_access_key_id = A1B2C3D4E5F6G7H8I9J0
aws_secret_access_key = A1B2C3D4E5F6G7H8I9J0a1b2c3d4e5f6g7h8i9j0

Set up your inventory

In a bit of a change to how Ansible usually does the inventory, to have a plugin based dynamic inventory, you can’t specify a file any more, you have to specify a directory. So, create the file ./inventory/aws_ec2.yaml (having created the directory inventory first). The file contains the following:

---
plugin: amazon.aws.aws_ec2

Late edit 2020-12-01: Further to the comment by Giovanni, I’ve amended this file snippet from plugin: aws_ec2 to plugin: amazon.aws.aws_ec2.

By default, this just retrieves the hostnames of any running EC2 instance, as you can see by running ansible-inventory -i inventory --graph

@all:
  |--@aws_ec2:
  |  |--ec2-176-34-76-187.eu-west-1.compute.amazonaws.com
  |  |--ec2-54-170-131-24.eu-west-1.compute.amazonaws.com
  |  |--ec2-54-216-87-131.eu-west-1.compute.amazonaws.com
  |--@ungrouped:

I need a bit more detail than this – I like to use the tags I assign to AWS assets to decide what I’m going to target the machines with. I also know exactly which regions I’ve got my assets in, and what I want to use to get the names of the devices, so this is what I’ve put in my aws_ec2.yaml file:

---
plugin: amazon.aws.aws_ec2
keyed_groups:
- key: tags
  prefix: tag
- key: 'security_groups|json_query("[].group_name")'
  prefix: security_group
- key: placement.region
  prefix: aws_region
- key: tags.Role
  prefix: role
regions:
- eu-west-1
hostnames:
- tag:Name
- dns-name
- public-ip-address
- private-ip-address

Late edit 2020-12-01: Again, I’ve amended this file snippet from plugin: aws_ec2 to plugin: amazon.aws.aws_ec2.

Now, when I run ansible-inventory -i inventory --graph, I get this output:

@all:
  |--@aws_ec2:
  |  |--euwest1-firewall
  |  |--euwest1-demo
  |  |--euwest1-manager
  |--@aws_region_eu_west_1:
  |  |--euwest1-firewall
  |  |--euwest1-demo
  |  |--euwest1-manager
  |--@role_Firewall:
  |  |--euwest1-firewall
  |--@role_Firewall_Manager:
  |  |--euwest1-manager
  |--@role_VM:
  |  |--euwest1-demo
  |--@security_group_euwest1_allow_all:
  |  |--euwest1-firewall
  |  |--euwest1-demo
  |  |--euwest1-manager
  |--@tag_Name_euwest1_firewall:
  |  |--euwest1-firewall
  |--@tag_Name_euwest1_demo:
  |  |--euwest1-demo
  |--@tag_Name_euwest1_manager:
  |  |--euwest1-manager
  |--@tag_Role_Firewall:
  |  |--euwest1-firewall
  |--@tag_Role_Firewall_Manager:
  |  |--euwest1-manager
  |--@tag_Role_VM:
  |  |--euwest1-demo
  |--@ungrouped:

To finish

Now you have your dynamic inventory, you can target your playbook at any of the groups listed above (like role_Firewall, aws_ec2, aws_region_eu_west_1 or some other tag) like you would any other inventory assignment, like this:

---
- hosts: role_Firewall
  gather_facts: false
  tasks:
  - name: Show the name of this device
    debug:
      msg: "{{ inventory_hostname }}"

And there you have it. Hope this is useful!

Late edit: 2020-11-23: Following a conversation with Andy from Work, we’ve noticed that if you’re trying to do SSM connections, rather than username/password based ones, you might want to put this in your aws_ec2.yml file:

---
plugin: amazon.aws.aws_ec2
hostnames:
  - tag:Name
compose:
  ansible_host: instance_id
  ansible_connection: 'community.aws.aws_ssm'

Late edit 2020-12-01: One final instance, I’ve changed plugin: aws_ec2 to plugin: amazon.aws.aws_ec2.

This will keep your hostnames “pretty” (with whatever you’ve tagged it as), but will let you connect over SSM to the Instance ID. Good fun :)

Featured image is β€œinventory” by β€œLee” on Flickr and is released under a CC-BY-SA license.

"Salmon leaping" by "openpad" on Flickr

Using public #git sources in private projects

The last post I made was about using submodules to work with code that is being developed, either in isolation from other aspects of a project, or so components can be reused without requiring lots of copy-and-paste activities. It was inspired by a question from a colleague. After asking a few more questions, it turns out that may be what that colleague needed was to consume code from other repositories and store them in their own project.

In this case, I’ve created two repositories, both on GitHub (which will both be removed by the time this post is published) called JonTheNiceGuy/Git_Demo (the “upstream”, open source project) and JonTheNiceGuy-Inc/Git_Demo (the private project, referred to as “mine”).

Getting the “Open Source” project started

Here we have a simple repository, showing the README file for the project (which is likely, in the real world, to show what license that code has been released under, some explaination on what it’s for, etc.) and the actual data source. In this demo, the data source is a series of numbers, showing the decimal number in the first column, the binary representation of that number in the second column, and the hexedecimal representation in the third column.

Our “upstream” repository, showing the README.md file and the data source we want to use.
The data source itself. Note, I forgot to take a screen shot of this file, so I’ve had to “go back to a previous commit” to collect this particular image.

Elsewhere in the world, a private project has started! It’s going to use this data source as some element of this project, and to ensure that the code they’re relying on doesn’t go away, they create their own repository which this code will go into.

Preparing the private project

If both repositories are using GitHub, or if both repositories are using GitLab, then you should be able to “just” Fork the repository, using the “Fork” button in the top right corner:

The “Fork” button

And then select the organisation or account to place the forked repo into.

A list of potential targets to fork the repository into. Your view may differ if you are part of less organisations.

Gitlab has a similar workflow – they have a similar “fork” button, but the list of potential targets is different (but still works the same way).

Gitlab’s list of potential targets to fork the repository into.

Note that you can’t “easily” fork between different Version Control Services! To do something similar, you need to create a new repository in the target service, and then, run some commands to move the code over.

The screen you see immediately after you’ve created a new project – here I’ve created it in the “JonTheNiceGuy-Inc” Organisation. You can see the “quick setup” panel which has the URL to use for the repository.
Here we see the results of running five commands, which are: git clone <url> ; cd <target-dir> ; git remote rename origin upstream ; git remote add mine <url> ; git push –set-upstream mine main

If you’re using the command line method, here’s the commands you issue:

  • git clone http://service/user/repo – This command clones the repository from your service of choice to your local file system. It usually places it into the name of the repository you specified. In this case, “repo”, but in the above context (cloning from Git_Demo.git) it goes into “Git_Demo”. Note, HTTP(S) isn’t the only git transport, another common one is SSH, so if you prefer using SSH instead of HTTP, the URL in this case will be something like git@service:user/repo or service:user/repo. If you’re using submodules, however, I’d strongly recommend using HTTP(S) over SSH for at least the initial pull, as this is much easier for clients to navigate.
  • cd repo – Move into the directory where the cloned repository has been placed.
  • OPTIONAL: git remote rename origin upstream – Rename the remote source of the repository. By default, when you git clone or use git submodule add, the name of the remote resource is called “origin”. I prefer to give a descriptive name for my remote sources, so using “upstream” makes more sense to me. In later commands, I’ll use the remote name “upstream” again. If you don’t want to run this command, and leave the remote name as “origin”, you’ll just have to remember to change it back to “origin”.
  • git remote add mine http://new-service/user/repo – this adds a new remote source, to which you can push new commits, or pull code from your peers. Again, like in the git clone command above, you may use another URL format instead of HTTP(S). You may want to use a different name for the new remote, but again, I tend to prefer “mine” for anything I’m personally working on.
  • git push --set-upstream mine main – This sends the entire commit tree for the branch you’re currently on to your remote source.
Once we’ve run the git push, you can now see that the code has all been pushed to your private project.
Issuing a git log command, shows the current tip on the branch “main” in the “upstream” repository is equal to the current tip on the branch “main” in the “mine” repository, as well as the tip of the “main” repository locally.

Making your local changes

So, while you could just keep using just the upstream project’s code (and doing the above groundwork is good practice to keep you from putting yourself into the situation that the NPM world got into with “left-pad”). What’s more likely is that you want to make your own, local changes to this repository. I’ve done this in the past where I wanted to demonstrate a software build using a public machine image, but internally at work we used our own images. Using this method, I can consume the code I’ve created in public, and just update the assets we use at work.

In this example, let’s update that data file. I’ve added two new lines, “115” (and it’s binary/hex representations) and “132”. I can use the git diff command to confirm the changes I want to make – it’s all good!

Next, I stage the changes with git add, use git commit to write it to the branch, and git push to push it up to my repository. This is all fairly standard stuff in the Git world.

Here we make a change to the data source, confirm there is a difference, add and commit it, and then push it to our default branch (mine/main).

When I then check the git log, we see that there’s a divergence, between my local main branch and the upstream main branch. You could also use git log -p to see the exact code changes, if you wanted… but we know what’s changed already.

The git log, showing that we have a “local” change from the “upstream” source, and that we’ve pushed that local change to the “mine” source.

Bringing data from the upstream source

Oh joy! The upstream project (“JonTheNiceGuy” not “JonTheNiceGuy-Inc”) have updated their Git_Demo repository – they’ve had the audacity to add three new numbers – 9, 10 and 15 – to the data source.

The patch that was applied to this branch. We can check the difference here before we try to do anything with it! It’s something we want!

Well, actually we want to use that data, so let’s start bringing it in. We use the git pull command.

The git pull command, with the remote source (“upstream”) and the branch (“main”) to use.

Because this makes a change to a file that you’ve amended as part of your work, it can’t perform a “Fast forward” of these changes, so Git has to perform a merge commit. This means there’s a new commit in the log, so it’s clear that we’ve updated files because of this merge.

If there were a conflict in this file (which, fortunately, there isn’t!) you’d also be prompted to fix the merge conflicts too. This is a bit bigger than what I’m trying to explain, so instead, I’ll link to a tutorial by Atlassian on merge conflicts. You may also want to take a quick look at the rebasing page on the Git Project’s documentation site, and see whether this might have made your life easier in the case of a conflict!

Anyway, let’s use the default merge message.

The default message when performing a git pull where the change can’t be fast-forwarded.

Once the merge message is done, the merge completes. Yey!

We successfully merged our change, and it’s now part of our local tree

And to prove it, we can now see that we have all the changes from the upstream (commits starting 3b75eb, 8ad9ae, 8bdcae and the new one at a64de2) and our local changes (starting 02e40e).

Because we performed a merge, not a fast forward, our local branch is at a different commit than either of our remote sources – the commit starting 6f4db6 is on our local version, “upstream” is at a64de2 and “mine” is at 02e40e. So we need to fix at least our “mine/main” branch. We do this with a git push.

We do our git push here to get the code into our “mine/main” branch.

And now we can see the git log on our service.

The list of commits on Github for our “mine/main” branch.

And locally, we can see that the remote state has changed too. Let’s look at that git log again.

The result of the git log command on our local machine, showing the new position of the pointers for “upstream/main”, “mine/main” and the local “main” branches.

We can also look at the git blame on the service.

The git blame screen on GitHub, showing who made the various commits.

Or on our local machine.

git blame run locally, showing the commit reference, the author, the date and time of the commit, and the line number, followed by the line in question.

Featured image is β€œSalmon leaping” by β€œopenpad” on Flickr and is released under a CC-BY license.

"centos login" by "fsse8info" on Flickr

Getting the default username and AMI for an OS with #Terraform

I have a collection of AWS AMIs I use for various builds at work. These come from two places – the AWS Marketplace and our internal Build process.

Essentially, our internal builds (for those who work for my employer – these are the OptiMISe builds) are taken from specific AWS Marketplace builds and hardened.

Because I don’t want to share the AMI details when I put stuff on GitHub, I have an override.tf file that handles the different AMI search strings. So, here’s the ami.tf file I have with the AWS Marketplace version:

data "aws_ami" "centos7" {
  most_recent = true

  filter {
    name   = "name"
    values = ["CentOS Linux 7 x86_64 HVM EBS ENA*"]
  }

  filter {
    name   = "architecture"
    values = ["x86_64"]
  }

  owners = ["679593333241"] # CentOS Project
}

And here’s an example of the override.tf file I have:

data "aws_ami" "centos7" {
  most_recent = true

  filter {
    name   = "name"
    values = ["SomeUniqueString Containing CentOS*"]
  }

  owners = ["123456789012"]
}

Next, I put these AMI images into a “null” data source, which is evaluated at runtime:

data "null_data_source" "os" {
  inputs = {
    centos7 = data.aws_ami.centos7.id
  }
}

I always forget which username goes with each image, so in the ami.tf file, I also have this:

variable "username" {
  type = map(string)
  default = {
    centos7 = "centos"
  }
}

And in the override.tf file, I have this:

variable "username" {
  type = map(string)
  default = {
    centos7 = "someuser"
  }
}

To get the right combination of username and AMI, I have this in the file where I create my “instance” (virtual machine):

variable "os" {
  default = "centos7"
}

resource "aws_instance" "vm01" {
  ami = data.null_data_source.os.outputs[var.os]
  # additional lines omitted for brevity
}

output "username" {
  value = var.username[var.os]
}

output "vm01" {
  value = aws_instance.vm01.public_ip
}

And that way, I get the VM’s default username and IP address on build. Nice.

Late edit – 2020-09-20: It’s worth noting that this is fine for short-lived builds, proof of concept, etc. But, for longer lived environments, you should be calling out exactly which AMI you’re using, right from the outset. That way, your builds will (or should) all start out from the same point, no ambiguity about exactly which point release they’re getting, etc.

Featured image is β€œcentos login” by β€œfsse8info” on Flickr and is released under a CC-BY-SA license.

"Stockholms Stadsbibliotek" by "dilettantiquity" on Flickr

Terraform templates with Maps

For a project I’m working on, I needed to define a list of ports, and set some properties on some of them. In the Ansible world, you’d use statements like:

{% if data.somekey is defined %}something {{ data.somekey }}{% endif %}

or

{{ data.somekey | default('') }}

In a pinch, you can also do this:

{{ (data | default({}) ).somekey | default('') }}

With Terraform, I was finding it much harder to work out how to find whether a value as part of a map (the Terraform term for a Dictionary in Ansible terms, or an Associative Array in PHP terms), until I stumbled over the Lookup function. Here’s how that looks for just a simple Terraform file:

output "test" {
    value = templatefile(
        "template.tmpl",
        {
            ports = {
                "eth0": {"ip": "192.168.1.1/24", "name": "public"}, 
                "eth1": {"ip": "172.16.1.1/24", "name": "protected"}, 
                "eth2": {"ip": "10.1.1.1/24", "name": "management", "management": true}, 
                "eth3": {}
            },
            management = "1"
        }
    )
}

And the template that goes with that?

%{ for port, data in ports ~}
Interface: ${port}%{ if lookup(data, "name", "") != ""}
Alias: ${ lookup(data, "name", "") }%{ endif }
Services: ping%{ if lookup(data, "management", false) == true } ssh https%{ endif }
IP: ${ lookup(data, "ip", "Not Defined") }

%{ endfor }

This results in the following output:

C:\tf>terraform.exe apply -auto-approve

Apply complete! Resources: 0 added, 0 changed, 0 destroyed.

Outputs:

test = Interface: eth0
Alias: public
Services: ping
IP: 192.168.1.1/24

Interface: eth1
Alias: protected
Services: ping
IP: 172.16.1.1/24

Interface: eth2
Alias: management
Services: ping ssh https
IP: 10.1.1.1/24

Interface: eth3
Services: ping
IP: Not Defined

Naturally, using this in your own user-data or Custom Data field will probably make more sense than just writing it to “output” 😁

Featured image is β€œStockholms Stadsbibliotek” by β€œdilettantiquity” on Flickr and is released under a CC-BY-SA license.

"Sydney Observatory I" by "Newtown grafitti" on Flickr

Using Feature Flags in Terraform with Count Statements

In a project I’m working on in Terraform, I’ve got several feature flags in a module. These flags relate to whether this module should turn on a system in a cloud provider, or not, and looks like this:

variable "turn_on_feature_x" {
  description = "Setting this to 'yes' will enable Feature X. Any other value will disable it. (Default 'yes')"
  value = "yes"
}

variable "turn_on_feature_y" {
  description = "Setting this to 'yes' will enable Feature Y. Any other value will disable it. (Default 'no')"
  value = "no"
}

When I call the module, I then can either leave the feature with the default values, or selectively enable or disable them, like this:

module "region1" {
  source = "./my_module"
}

module "region2" {
  source = "./my_module"
  turn_on_feature_x = "no"
  turn_on_feature_y = "yes"
}

module "region3" {
  source = "./my_module"
  turn_on_feature_y = "yes"
}

module "region4" {
  source = "./my_module"
  turn_on_feature_x = "no"
}

# Result:
# region1 has X=yes, Y=no
# region2 has X=no, Y=yes
# region3 has X=yes, Y=yes
# region4 has X=no, Y=no

When I then want to use the feature, I have to remember a couple of key parts.

  1. Normally this feature check is done with a “count” statement, and the easiest way to use this is to use the ternary operator to check values and return a “1” or a “0” for if you want the value used.

    Ternary operators look like this: var.turn_on_feature_x == "yes" ? 1 : 0 which basically means, if the value of the variable turn_on_feature_x is set to “yes”, then return 1 otherwise return 0.

    This can get a bit complex, particularly if you want to check several flags a few times, like this: var.turn_on_feature_x == "yes" ? var.turn_on_feature_y == "yes" ? 1 : 0 : 0. I’ve found that wrapping them in brackets helps to understand what you’re getting, like this:

    (
      var.turn_on_feature_x == "yes" ?
      (
        var.turn_on_feature_y == "yes" ?
        1 :
        0
      ) :
      0
    )
  2. If you end up using a count statement, the resulting value must be treated as an 0-indexed array, like this: some_provider_service.my_name[0].result

    This is because, using the count value says “I want X number of resources”, so Terraform has to treat it as an array, in case you actually wanted 10 instead of 1 or 0.

Here’s an example of that in use:

resource "aws_guardduty_detector" "Region" {
  count = var.enable_guardduty == "yes" ? 1 : 0
  enable = true
}

resource "aws_cloudwatch_event_rule" "guardduty_finding" {
  count = (var.enable_guardduty == "yes" ? (var.send_guardduty_findings_to_sns == "yes" ? 1 : (var.send_guardduty_findings_to_sqs == "yes" ? 1 : 0)) : 0)
  name = "${data.aws_caller_identity.current.account_id}-${data.aws_region.current.name}-${var.sns_guardduty_finding_suffix}"
  event_pattern = <<PATTERN
{
  "source": [
    "aws.guardduty"
  ],
  "detail-type": [
    "GuardDuty Finding"
  ]
}
PATTERN
}

resource "aws_cloudwatch_event_target" "sns_guardduty_finding" {
  count = (var.enable_guardduty == "yes" ? (var.send_guardduty_findings_to_sns == "yes" ? 1 : 0) : 0)
  rule = aws_cloudwatch_event_rule.guardduty_finding[0].name
  target_id = aws_sns_topic.guardduty_finding[0].name
  arn = aws_sns_topic.guardduty_finding[0].arn
}

resource "aws_cloudwatch_event_target" "sqs_guardduty_finding" {
  count = (var.enable_guardduty == "yes" ? (var.send_guardduty_findings_to_sqs == "yes" ? 1 : 0) : 0)
  rule = aws_cloudwatch_event_rule.guardduty_finding[0].name
  target_id = "SQS"
  arn = aws_sqs_queue.guardduty_finding[0].arn
}

One thing that bit me rather painfully around this was that if you change from an uncounted resource, like this:

resource "some_tool" "this" {
  some_setting = 1
}

To a counted resource, like this:

resource "some_tool" "this" {
  count = var.some_tool == "yes" ? 1 : 0
  some_setting = 1
}

Then, Terraform will promptly destroy some_tool.this to replace it with some_tool.this[0], because they’re not the same referenced thing!

Fun, huh? 😊

Featured image is β€œSydney Observatory I” by β€œNewtown grafitti” on Flickr and is released under a CC-BY license.

"Tracking Methane Sources and Movement Around the Globe" by "NASA/Scientific Visualization Studio" on Nasa.gov

Flexibly loading files in Terraform to license a FortiGate firewall on AWS, Azure and other Cloud platforms

One of the things I’m currently playing with is a project to deploy some FortiGate Firewalls into cloud platforms. I have a couple of Evaluation Licenses I can use (as we’re a partner), but when it comes to automatically scaling, you need to use the PAYG license.

To try to keep my terraform files as reusable as possible, I came up with this work around. It’s likely to be useful in other places too. Enjoy!

This next block is stored in license.tf and basically says “by default, you have no license.”

variable "license_file" {
  default = ""
  description = "Path to the license file to load, or leave blank to use a PAYG license."
}

We can either override this with a command line switch terraform apply -var 'license_file=mylicense.lic', or (more likely) the above override file named license_override.tf (ignored in Git) which has this next block in it:

variable "license_file" {
  default = "mylicense.lic"
}

This next block is also stored in license.tf and says “If var.license is not empty, load that license file [var.license != "" ? var.license] but if it is empty, check whether /dev/null exists (*nix platforms) [fileexists("/dev/null")] in which case, use /dev/null, otherwise use the NUL: device (Windows platforms).”

data "local_file" "license" {
  filename = var.license_file != "" ? var.license_file : fileexists("/dev/null") ? "/dev/null" : "NUL:"
}

πŸ‘‰ Just as an aside, I’ve seen this “ternary” construct in a few languages. It basically looks like this: boolean_operation ? true_value : false_value

That check, logically, could have been written like this instead: "%{if boolean_operation}${true_value}%{else}${false_value}%{endif}"

By combining two of these together, while initially it looks far more messy and hard to parse, I’ve found that, especially in single-line statements, it’s much more compact and eventually easier to read than the alternative if/else/endif structure.

So, this means that we can now refer to data.local_file.license as our data source.

Next, I want to select either the PAYG (Pay As You Go) or BYOL (Bring Your Own License) licensed AMI in AWS (the same principle applies in Azure, GCP, etc), so in this block we provide a different value to the filter in the AMI Data Source, suggesting the string “FortiGate-VM64-AWS *x.y.z*” if we have a value provided license, or “FortiGate-VM64-AWSONDEMAND *x.y.z*” if we don’t.

data "aws_ami" "FortiGate" {
  most_recent = true

  filter {
    name   = "name"
    values = ["FortiGate-VM64-AWS%{if data.local_file.license.content == ""}ONDEMAND%{endif} *${var.release}*"]
  }

  filter {
    name   = "virtualization-type"
    values = ["hvm"]
  }

  owners = ["679593333241"] # AWS
}

And the very last thing is to create the user-data template (known as customdata in Azure), using this block:

data "template_cloudinit_config" "config" {
  gzip          = false
  base64_encode = false

  part {
    filename     = "config"
    content_type = "multipart/mixed"
    content      = templatefile(
      "${path.module}/user_data.txt.tmpl",
      {
        hostname = "firewall"
      }
    )
  }

  part {
    filename     = "license"
    content_type = "text/plain"
    content      = data.local_file.license.content
  }
}

And so that is how I can elect to provide a license, or use a pre-licensed image from AWS, and these lessons can also be applied in an Azure or GCP environment too.

Featured image is β€œTracking Methane Sources and Movement Around the Globe” by β€œNASA/Scientific Visualization Studio”

"Accept a New SSH Host Key" by "Linux Screenshots" on Flickr

Purposefully Reducing SSH Security when performing Builds of short-lived devices

I’ve recently been developing a few builds of things at home using throw-away sessions of virtual machines, and I found myself repeatedly having to accept and even having to remove SSH host keys for things I knew wouldn’t be around for long. It’s not a huge disaster, but it’s an annoyance.

This annoyance comes from the fact that SSH uses a thing called “Trust-On-First-Use” (Or TOFU) to protect yourself against a “Man-in-the-Middle” attack (or even where the host has been replaced with something malicious), which, for infrastructure that has a long lifetime (anything more than a couple of days) makes sense! You’re building something you want to trust hasn’t been compromised! That said, if you’re building new virtual machines, testing something and then rebuilding it to prove your script worked… well, that’s not so useful!

So, in this case, if you’ve got a designated build network, or if you trust, implicitly, your normal working network, this is a dead simple work-around.

In $HOME/.ssh/config or in $HOME/.ssh/config.d/local (if you’ve followed my previous advice to use separate ssh config files), add the following stanza:

# RFC1918
Host 10.* 172.16.* 172.17.* 172.18.* 172.19.* 172.20.* 172.21.* 172.22.* 172.23.* 172.24.* 172.25.* 172.26.* 172.27.* 172.28.* 172.29.* 172.30.* 172.31.* 192.168.*
        StrictHostKeyChecking no
        UserKnownHostsFile /dev/null

# RFC5373 and RFC2544
Host 192.0.2.* 198.51.100.* 203.0.113.* 198.18.* 198.19.*
        StrictHostKeyChecking no
        UserKnownHostsFile /dev/null

These stanzas let you disable host key checking for any IP address in the RFC1918 ranges (10.0.0.0/8, 172.16.0.0/12 and 192.168.0.0/16), and for the RFC5373 ranges (192.0.2.0/24, 198.51.100.0/24 and 203.0.113.0/24) – which should be used for documentation, and for the RFC2544 range (198.18.0.0/15) which should be used for inter-network testing.

Alternatively, if you always use a DDNS provider for short-lived assignments (for example, I use davd/docker-ddns) then instead, you can use this stanza:

Host *.ddns.example.com
        StrictHostKeyChecking no
        UserKnownHostsFile /dev/null

(Assuming, of course, you use ddns.example.com as your DDNS address!)

Featured image is β€œAccept a New SSH Host Key” by β€œLinux Screenshots” on Flickr and is released under a CC-BY license.

"Monitors" by "Jaysin Trevino" on Flickr

Using monit to monitor Docker Containers

As I only run a few machines with services that matter on them (notably, my home server and my web server), I don’t need a full-on monitoring service, so instead rely on a system called monit.

Monit is an open source piece of software, used to monitor (see, it’s easily named πŸ˜„) and, if possible remediate issues with things it sees wrong.

I use this for watching whether particular services are running (and if not, restart them), for whether the ink in my printer is empty, and to monitor the free space and SMART status on my disks.

Today I noticed that a Docker container had stopped, and I’d not noticed. It wasn’t a big thing, but it gnawed at me, so I had a bit of a look around to see what I can find about this.

I found this blog post, titled “Monitoring Docker Containers with Monit”, from 2014, which suggested monitoring the result from docker top… and would you believe it, that’s a valid trick πŸ™‚

So, here’s what I’m doing! Each container has it’s own file called/etc/monit/scripts/check_container_<container-name>.sh which has just this command in it:

#! /bin/bash
docker top "<container-name>"
exit $?

Note that you replace <container-name> in both the filename and the script itself with the name of the container – for example, the container hello-world would be monitored with the file check_container_hello-world.sh, and the line in that file would say docker top "hello-world".

I then have a file in /etc/monit/conf.d/ called check_container_<container-name> which has this content

CHECK PROGRAM <container-name> WITH PATH /etc/monit/scripts/check_container_<container-name>.sh
  START PROGRAM = "/usr/bin/docker start <container-name>"
  STOP PROGRAM = "/usr/bin/docker stop <container-name>"
  IF status != 0 FOR 3 CYCLES THEN RESTART
  IF 2 RESTARTS WITHIN 5 CYCLES THEN UNMONITOR

I then ensure that in /etc/monit/monitrc the line “include /etc/monit/conf.d/*” is included and not commented out, and then restart monit with systemctl restart monit.

Featured image is β€œMonitors” by β€œJaysin Trevino” on Flickr and is released under a CC-BY license.

β€œNew shoes” by β€œMorgaine” from Flickr

Making Windows Cloud-Init Scripts run after a reboot (Using Terraform)

I’m currently building a Proof Of Value (POV) environment for a product, and one of the things I needed in my environment was an Active Directory domain.

To do this in AWS, I had to do the following steps:

  1. Build my Domain Controller
    1. Install Windows
    2. Set the hostname (Reboot)
    3. Promote the machine to being a Domain Controller (Reboot)
    4. Create a domain user
  2. Build my Member Server
    1. Install Windows
    2. Set the hostname (Reboot)
    3. Set the DNS client to point to the Domain Controller
    4. Join the server to the domain (Reboot)

To make this work, I had to find a way to trigger build steps after each reboot. I was working with Windows 2012R2, Windows 2016 and Windows 2019, so the solution had to be cross-version. Fortunately I found this script online! That version was great for Windows 2012R2, but didn’t cover Windows 2016 or later… So let’s break down what I’ve done!

In your userdata field, you need to have two sets of XML strings, as follows:

<persist>true</persist>
<powershell>
$some = "powershell code"
</powershell>

The first block says to Windows 2016+ “keep trying to run this script on each boot” (note that you need to stop it from doing non-relevant stuff on each boot – we’ll get to that in a second!), and the second bit is the PowerShell commands you want it to run. The rest of this now will focus just on the PowerShell block.

  $path= 'HKLM:\Software\UserData'
  
  if(!(Get-Item $Path -ErrorAction SilentlyContinue)) {
    New-Item $Path
    New-ItemProperty -Path $Path -Name RunCount -Value 0 -PropertyType dword
  }
  
  $runCount = Get-ItemProperty -Path $path -Name Runcount -ErrorAction SilentlyContinue | Select-Object -ExpandProperty RunCount
  
  if($runCount -ge 0) {
    switch($runCount) {
      0 {
        $runCount = 1 + [int]$runCount
        Set-ItemProperty -Path $Path -Name RunCount -Value $runCount
        if ($ver -match 2012) {
          #Enable user data
          $EC2SettingsFile = "$env:ProgramFiles\Amazon\Ec2ConfigService\Settings\Config.xml"
          $xml = [xml](Get-Content $EC2SettingsFile)
          $xmlElement = $xml.get_DocumentElement()
          $xmlElementToModify = $xmlElement.Plugins
          
          foreach ($element in $xmlElementToModify.Plugin)
          {
            if ($element.name -eq "Ec2HandleUserData") {
              $element.State="Enabled"
            }
          }
          $xml.Save($EC2SettingsFile)
        }
        $some = "PowerShell Script"
      }
    }
  }

Whew, what a block! Well, again, we can split this up into a couple of bits.

In the first few lines, we build a pointer, a note which says “We got up to here on our previous boots”. We then read that into a variable and find that number and execute any steps in the block with that number. That’s this block:

  $path= 'HKLM:\Software\UserData'
  
  if(!(Get-Item $Path -ErrorAction SilentlyContinue)) {
    New-Item $Path
    New-ItemProperty -Path $Path -Name RunCount -Value 0 -PropertyType dword
  }
  
  $runCount = Get-ItemProperty -Path $path -Name Runcount -ErrorAction SilentlyContinue | Select-Object -ExpandProperty RunCount
  
  if($runCount -ge 0) {
    switch($runCount) {

    }
  }

The next part (and you’ll repeat it for each “number” of reboot steps you need to perform) says “increment the number” then “If this is Windows 2012, remind the userdata handler that the script needs to be run again next boot”. That’s this block:

      0 {
        $runCount = 1 + [int]$runCount
        Set-ItemProperty -Path $Path -Name RunCount -Value $runCount
        if ($ver -match 2012) {
          #Enable user data
          $EC2SettingsFile = "$env:ProgramFiles\Amazon\Ec2ConfigService\Settings\Config.xml"
          $xml = [xml](Get-Content $EC2SettingsFile)
          $xmlElement = $xml.get_DocumentElement()
          $xmlElementToModify = $xmlElement.Plugins
          
          foreach ($element in $xmlElementToModify.Plugin)
          {
            if ($element.name -eq "Ec2HandleUserData") {
              $element.State="Enabled"
            }
          }
          $xml.Save($EC2SettingsFile)
        }
        
      }

In fact, it’s fair to say that in my userdata script, this looks like this:

  $path= 'HKLM:\Software\UserData'
  
  if(!(Get-Item $Path -ErrorAction SilentlyContinue)) {
    New-Item $Path
    New-ItemProperty -Path $Path -Name RunCount -Value 0 -PropertyType dword
  }
  
  $runCount = Get-ItemProperty -Path $path -Name Runcount -ErrorAction SilentlyContinue | Select-Object -ExpandProperty RunCount
  
  if($runCount -ge 0) {
    switch($runCount) {
      0 {
        ${file("templates/step.tmpl")}

        ${templatefile(
          "templates/rename_windows.tmpl",
          {
            hostname = "SomeMachine"
          }
        )}
      }
      1 {
        ${file("templates/step.tmpl")}

        ${templatefile(
          "templates/join_ad.tmpl",
          {
            dns_ipv4 = "192.0.2.1",
            domain_suffix = "ad.mycorp",
            join_account = "ad\someuser",
            join_password = "SomePassw0rd!"
          }
        )}
      }
    }
  }

Then, after each reboot, you need a new block. I have a block to change the computer name, a block to join the machine to the domain, and a block to install an software that I need.

Featured image is β€œNew shoes” by β€œMorgaine” on Flickr and is released under a CC-BY-SA license.