"Blueprints" by "Cameron Degelia" on Flickr

Using Architectural Decision Records (ADR) with adr-tools

Introducing Architectural Decision Records

Over the last week, I discovered a new tool for my arsenal called Architectural Decision Records (ADR). They were first written about in 2011, in a post called “Documenting Architecture Decisions“, where the author, Michael Nygard, advocates for short documents explaining each decision that influences the architecture of an environment.

I found this via a Github repository, created by the team at gov.uk, which includes their ADR library, and references the tool they use to manage these documents – adr-tools.

Installing adr-tools on Linux

Currently adr-tools are easier to install under OSX rather than Linux or Windows Subsystem for Linux (WSL) (I’m working on this – bear with me! 😃 ).

The current installation notes suggest for Linux (which would also work on WSL) is to download the latest release tar.gz or zip file and unpack it into your path somewhere. This isn’t exactly the best way to deploy anything on Linux, but… I guess it works right now?

For me, I downloaded the file, and unpacked the whole tar.gz file (as root) into /usr/local/bin/, giving me a directory of /usr/local/bin/adr-tools-3.0.0/. There’s a subdirectory in here, called src which contains a large number of files – mostly starting _adr or adr- and two additional files, init.md and template.md.

Rather than putting all of these files into /usr/local/bin directly, instead I leave them in the adr-tools-3.0.0 directory, and create a symbolic link (symlink) to the /usr/local/bin directory with this command:

cd /usr/local/bin
ln -s adr-tools-3.0.0/src/* .

This gives me all those files in one place, so I can refer to them later.

An aside – why link everything in that src directory? (Feel free to skip this block!)

Now, why, you might ask, do all of these unrelated files need to be in the same place? Well…. the author of the script has put this in at the top of almost all the files:

#!/bin/bash
set -e
eval "$($(dirname $0)/adr-config)"

And then in that script, it says:

#!/bin/bash
basedir=$(cd -L $(dirname $0) > /dev/null 2>&1 && pwd -L)
echo "adr_bin_dir=$basedir"
echo "adr_template_dir=$basedir"

There are, technically, good reasons for this! This is designed to be run in, what in the Windows world, you might call as a “Portable Script”. So, you bung adr-tools into some directory somewhere, and then just call adr somecommand and it knows that all the files are where they need to be. The (somewhat) down side to this is that if you just want to call adr somecommand rather than path/to/my/adr somecommand then all those files need to be there

I’m currently looking to see if I can improve this somewhat, so that it’s not quite so complex to install, but for now, that’s what you need.

Anyway…

Using adr-tools to document your decisions

I’ll start documenting a fictional hosted web service project, and note down some of the decisions which have been made.

Initializing your ADR directory

Start by running adr init. You may want to specify a directory where you want to put these records, so instead use: adr init path/to/adr, like this:

Initializing the ADR in “documentation/architecture-decisions” with adr init documentation/architecture-decisions

You’ll notice that when I run this command, it creates a new entry, called 0001-record-architecture-decisions.md. Let’s open this up, and see what’s in here.

The VSCode record for the choice to use ADR. It is a markdown file, with the standard types of data recorded.

In here we have the record ID (1.), the title of the record Record architecture decisions, the date the choice was made Date: 2021-01-19, a status of Accepted, the context on why we made this choice, the decision, and the consequences of making this decision. Make changes, if needed, and save it. Let’s move on.

Creating our first own record

This all is quite straightforward thus far. Let’s create our next record.

Issuing the command adr new <sometitle> you create the next ADR record.

Let’s open up that record.

The template for the ADR record for “Use AWS”.

Like the first record, we have a title, a status, a context, decision and consequences. Let’s define these.

A “finished” brief ADR record.

This document shouldn’t be very long! It just describes why a choice was made and what that entails.

Changing decisions – completely replacing (superseding) a decision

Of course, over time, decisions will be replaced due to various decisions elsewhere.

You can ask adr to supersede a previous record, using the “-s” flag, and the record number.

Let’s look at how that works on the second ADR record.

After the command adr new -s 2 Use Azure, the ADR record number 2 has a new status, “Superceded by” and the superseded linked document. Yes, “Superceded” is a typo. There is an open PR for it

So, under the “Status”, where is previously said “Approved”, it now says “Superceded by [3. Use Azure](0003-use-azure.md)“. This is a markdown statement which indicates where the superseded document is located. As I mentioned in the comment below the above image, there is an open Pull Request to fix this on the adr-tools, so hopefully that typo won’t last long!

We’ve got our new ADR too – let’s take a look at that one?

Our new ADR shows that it “supercedes” the previous record. Which is good! Typo aside :)

Other references

Of course, you don’t always completely overrule a decision. Sometimes your decision is influenced by, or has a dependency on something else, like this one.

We know which provider we’re using at long last, now let’s pick a region. Use the -l flag to “link” between the referenced and new ADR. The context for the -l flag is “<number>:<text for link to number>:<text for link in targetted document>”.

The command here is:

adr new -l '3:Dependency:Influences' Use Region UK South and UK West

I’m just going to crop from the “Status” block on both the referenced ADR (3) and the ADR which references it (4):

Status block in ADR 0003 which is referenced by ADR 0004
Status block in the new ADR 0004 which references ADR 0003

And of course, you can also use the same switch to mark documents as partially obsoleted, like this:

adr new -l '4:Partially obsoletes:Partially obsoleted by' Use West Europe region instead of UK West region
Status block in ADR 0004 indicating it’s partially obsoleted. Probably worth updating the status properly to show it’s not just “Accepted”.

If you forget to add the referencing in, you can also use the adr link command, like this:

adr link 3 Influences 5 Dependency

To be clear, that command adds a (complete) line to ADR 0003 saying “Influences [5. ADR Title](link)” and a separate (complete) line to ADR 0005 saying “Dependency [3. ADR Title](link)“.

What else can we do?

There are four other “things” that it’s worth doing at this point.

  1. Note that you can change the template per-ADR directory.

Create a directory called “templates” in the ADR directory, and put a file in there called “template.md“. Tweak this as you need. Ensure you have AT LEAST the line ## Status and # NUMBER. TITLE as these are required by the script.

A much abbreviated template file, containing just “Number”, “Title”, “Date”, “Status”, and a new dummy heading called “Stuff”.
And the result of running adr new Some Text once you’ve created that template.

As you can see, it’s possible to add all sorts of content in this template as a result. Bear in mind, before your template turns into something like this, that it’s supposed to be a short document explaining why each decision was made, not a funding proposal, or a complex epic of your user stories!

Be careful not to let your template run away with you!
  1. Note that you can automatically open an editor, by setting the EDITOR (where the process is expected to finish before returning control, like using nano, emacs or vim, for example) or VISUAL (where the process is expected to “fork”, like for example, gedit or vscode) environment variable, and then running adr new A Title, like this:
  1. We can create “Table of Contents” files, using the adr generate toc command, like this:
Generating the table of contents, for injecting into other files.

This can be included into your various other markdown files. There are switches, so you can set the link path, but your best bet is to find that using adr help generate toc.

  1. We can also generate graphviz files of the link maps between elements of the various ADRs, like this: adr generate graph | dot -Tjpg > graph.jpg

If you omit the “| dot -Tjpg > graph.jpg” part, then you’ll see the graphviz output, which looks like this: (I’ve removed the documents 6 and 7).

digraph {
  node [shape=plaintext];
  subgraph {
    _1 [label="1. Record architecture decisions"; URL="0001-record-architecture-decisions.html"];
    _2 [label="2. Use AWS"; URL="0002-use-aws.html"];
    _1 -> _2 [style="dotted", weight=1];
    _3 [label="3. Use Azure"; URL="0003-use-azure.html"];
    _2 -> _3 [style="dotted", weight=1];
    _4 [label="4. Use Region UK South and UK West"; URL="0004-use-region-uk-south-and-uk-west.html"];
    _3 -> _4 [style="dotted", weight=1];
    _5 [label="5. Use West Europe region instead of UK West region"; URL="0005-use-west-europe-region-instead-of-uk-west-region.html"];
    _4 -> _5 [style="dotted", weight=1];
  }
  _3 -> _2 [label="Supercedes", weight=0]
  _3 -> _5 [label="Influences", weight=0]
  _4 -> _3 [label="Dependency", weight=0]
  _5 -> _4 [label="Partially obsoletes", weight=0]
  _5 -> _3 [label="Dependency", weight=0]
}

To make the graphviz part work, you’ll need to install graphviz, which is just an apt get away.

Any caveats?

adr-tools is not actively maintained. I’ve contacted the author, about seeing if I can help out with the maintenance, but… we’ll see, and given some fairly high profile malware takeovers of projects like this sort of thing on Github, Docker, NPM, and more… I can see why there might be some reluctance to consider it! Also, I’m an unknown entity, I’ve just dropped in on the project and offered to help, with no previous exposure to the lead dev or the project… so, we’ll see. Worst case, I’ll fork it!

Working with this also requires an understanding of markdown files, and why these might be a useful document format for records like this. There was a PR submitted to support multiple file formats (like asciidoc and rst) but these were not approved by the author.

There is no current intention to support languages other than English. The tool is hard-coded to look for strings like “status” and “superceded” which is hard. Part of the reason I raised the PRs I did was to let me fix some of these sorts of issues. Again, we’ll see what happens.

Lastly, it can be overwhelming to see a lot of documents in one place, particularly if they’re as granular as the documents I produced in this demo. If the project supported categories, or could be broken down into components (like doc/adr/networking and doc/adr/server_builds and doc/adr/applications) then this might help, but it’s not on the roadmap right now!

Featured image is “Blueprints” by “Cameron Degelia” on Flickr and is released under a CC-BY license.

"map" by "Jason Grote" on Flickr

Documenting my Career Path

For something internal at work, I decided to sketch out how I got to doing the job I do today. And, because there’s nothing hugely secretive in that document (or, at least, nothing you wouldn’t already find out on something like Linked In), I figured I’d also put this on my blog… and I think it might be interesting if you’ve written something similar, if you’d share your document too.

I intend to make that a “Living Document” (like I do with my “What am I doing now” and my “What do I use” pages) that I update every time I think about it, and think they need a tweak. So, as a result, I’ve put them over on my “Career Path” page, which is not a traditional “blog post” and is in my sidebar.

Featured image is “map” by “Jason Grote” on Flickr and is released under a CC-BY-SA license.

"DNA" by "MIKI Yoshihito" on Flickr

One to read: Reverse Engineering the source code of the BioNTech/Pfizer SARS-CoV-2 Vaccine – Articles

One to read: “Reverse Engineering the source code of the BioNTech/Pfizer SARS-CoV-2 Vaccine – Articles”

OMG this is amazing! It uses terminology that a coder or, for that matter, anyone who understands IT terminology will understand. It describes the headers for the RNA sequence as starting with [~]”the equivalent of the #! in shell scripts”

At the end of the post is a couple of links to other content I’m considering reviewing later… “DNA for programmers” and the “two hour [(!)] presentation on DNA […] aimed at computer people”. Maybe once I get some time to spare I’ll take a proper look at these.

This was automatically posted from my RSS Reader, and may be edited later to add commentary.

Featured image is “DNA” by “MIKI Yoshihito” on Flickr and is released under a CC-BY license.

"Raven" by "Jim Bahn" on Flickr

Sending SSH login notifications to Matrix via Huginn using Webhooks

On the Self Hosted Podcast’s Discord Server, someone posted a link to the following blog post, which I read and found really interesting…: https://blog.hay-kot.dev/ssh-notifications-in-home-assistant/

You see, the key part of that post wasn’t that they were posting to Home Assistant when they were logging in, but instead that they were triggering a webhook on login. And I can do stuff with Webhooks.

What’s a webhook?

A webhook is a callable URL, either with a “secret” embedded in the URL or some authentication header that lets you trigger an action of some sort. I first came across these with Github, but they’re pretty common now. Services will offer these as a way to get an action in one service to do something in another. A fairly common webhook for those getting started with these sorts of things is where creating a pull request (PR) on a Github repository will trigger a message on something like Slack to say the PR is there.

Well, that’s all well and good, but what does Matrix or Huginn have to do with things?

Matrix is a decentralized, end to end encrypted, eventually consistent database system, that just happens to be used extensively as a chat network. In particular, it’s used by Open Source projects, like KDE and Mozilla, and by Government bodies, like the whole French goverment (lead by DINSIC) the German Bundeswehr (Unified Armed Forces) department.

Matrix has a reference client, Element, that was previously called “Riot”, and in 2018 I produced a YouTube video showing how to bridge various alternative messaging systems into Matrix.

Huginn describes itself as:

Huginn is a system for building agents that perform automated tasks for you online. They can read the web, watch for events, and take actions on your behalf. Huginn’s Agents create and consume events, propagating them along a directed graph. Think of it as a hackable version of IFTTT or Zapier on your own server. You always know who has your data. You do.

Huginn Readme

With Huginn, I can create “agents”, including a “receive webhook agent” that will take the content I send, and tweak it to do something else. In the past I used IFTTT to do some fun things, like making this blog work, but now I use Huginn to post Tweets when I post to this blog.

So that I knew that Huginn was posting my twitter posts, I created a Matrix room called “Huginn Alerts” and used the Matrix account I created for the video I mentioned before to send me messages that it had made the posts I wanted. I followed the guidance from this page to do it: https://drwho.virtadpt.net/archive/2020-02-12/integrating-huginn-with-a-matrix-server/

Enough already. Just show me what you did.

In Element.io

  1. Get an access token for the Matrix account you want to post with.

Log into the web interface at https://app.element.io and go to your settings

Click where it says your handle, then click on where it says “All Settings”.

Then click on “Help & About” and scroll to the bottom of that page, where it says “Advanced”

Get to the “Advanced” part of the settings, under “Help & About” to get your access token.

Click where it says “Access Token: <click to reveal>” (strangely, I’m not posting that 😉)

  1. Click on the room, then click on it’s name at the top to open the settings, then click on “Advanced” to get the “Internal room ID”
Gettng the Room ID. Note, it starts with an exclamation mark (!) and ends :<servername>.

In Huginn

  1. Go to the “Credentials” tab, and click on “New Credential”. Give the credential a name (perhaps “Matrix Bot Access Token”), leave it as text and put your access token in here.
  1. Create a Credential for the Room ID. Like before, name it something sensible and put the ID you found earlier.
  1. Create a “Post Agent” by going to Agents and selecting “New agent”. This will show just the “Type” box. You can type in this box to put “Post Agent” and then find it. That will then provide you with the rest of these boxes. Provide a name, and tick the box marked “Propagate immediately”. I’ll cover the content of the “Options” box after this screenshot.

In the “Options” block is a button marked “Toggle View”. Select this which turns it from the above JSON pretty editor, into this text field (note your text is likely to be different):

My content of that box is as follows:

{
  "post_url": "https://matrix.org/_matrix/client/r0/rooms/{% credential Personal_Matrix_Notification_Channel %}/send/m.room.message?access_token={% credential Matrix_Bot_Access_Credential %}",
  "expected_receive_period_in_days": "365",
  "content_type": "json",
  "method": "post",
  "payload": {
    "msgtype": "m.text",
    "body": "{{ text }}"
  },
  "emit_events": "true",
  "no_merge": "false",
  "output_mode": "clean"
}

Note that the “post_url” value contains two “credential” values, like this:

{% credential Personal_Matrix_Notification_Channel %} (this is the Room ID we found earlier) and {% credential Matrix_Bot_Access_Credential %} (this is the Access Token we found earlier).

If you’ve used different names for these values (which are perfectly valid!) then just change these two. The part where it says “{{ text }}” leave there, because we’ll be using that in a later section. Click “Save” (the blue button at the bottom).

  1. Create a Webhook Agent. Go to Agents and then “New Agent”. Select “Webhook Agent” from the “Type” field. Give it a name, like “SSH Logged In Notification Agent”. Set “Keep Events” to a reasonable number of days, like 5. In “Receivers” find the Notification agent you created (“Send Matrix Message to Notification Room” was the name I used). Then, in the screenshot, I’ve pressed the “Toggle View” button on the “Options” section, as this is, to me a little clearer.

The content of the “options” box is:

{
  "secret": "supersecretstring",
  "expected_receive_period_in_days": 365,
  "payload_path": ".",
  "response": ""
}

Change the “secret” from “supersecretstring” to something a bit more useful and secure.

The “Expected Receive Period in Days” basically means, if you’ve not had an event cross this item in X number of days, does Huginn think this agent is broken? And the payload path of “.” basically means “pass everything to the next agent”.

Once you’ve completed this step, press “Save” which will take you back to your agents, and then go into the agent again. This will show you a page like this:

Copy that URL, because you’ll need it later…

On the server you are logging the SSH to

As root, create a file called /etc/ssh/sshrc. This file will be your script that will run every time someone logs in. It must have the file permissions 0644 (u+rw,g+r,o+r), which means that there is a slight risk that the Webhook secret is exposed.

The content of that file is as follows:

#!/bin/sh
ip="$(echo "$SSH_CONNECTION" | cut -d " " -f 1)"
curl --silent\
     --header "Content-Type: application/json"\
     --request POST\
     --data '{
       "At": "'"$(date -Is)"'",
       "Connection": "'"$SSH_CONNECTION"'",
       "User": "'"$USER"'",
       "Host": "'"$(hostname)"'",
       "Src": "'"$ip"'",
       "text": "'"$USER@$(hostname) logged in from $ip at $(date +%H:%M:%S)"'"
     }'\
     https://my.huginn.website/some/path/web_requests/taskid/secret

The heading line (#!/bin/sh) is more there for shellcheck, as, according to the SSH man page this is executed by /bin/sh either way.

The bulk of these values (At, Connection, User, Host or Src) are not actually used by Huginn, but might be useful for later… the key one is text, which if you recall from the “Send Matrix Message to Notification Room” Huginn agent, we put {{ text }} into the “options” block – that’s this block here!

So what happens when we log in over SSH?

SSH asks the shell in the user’s context to execute /etc/ssh/sshrc before it hands over to the user’s login session. This script calls curl and hands some POST data to the url.

Huginn receives this POST via the “SSH Logged In Notification Agent”, and files it.

Huginn then hands that off to the “Send Matrix Message to Notification Room”:

Huginn makes a POST to the Matrix.org server, and Matrix sends the finished message to all the attached clients.

Featured image is “Raven” by “Jim Bahn” on Flickr and is released under a CC-BY license.

"The Guitar Template" by "Neil Williamson" on Flickr

Testing (and failing inline) for data types in Ansible

I tend to write long and overly complicated set_fact statements in Ansible, ALL THE DAMN TIME. I write stuff like this:

rulebase: |
  {
    {% for var in vars | dict2items %}
      {% if var.key | regex_search(regex_rulebase_match) | type_debug != "NoneType"
        and (
          var.value | type_debug == "dict" 
          or var.value | type_debug == "AnsibleMapping"
        ) %}
        {% for item in var.value | dict2items %}
          {% if item.key | regex_search(regex_rulebase_match) | type_debug != "NoneType"
            and (
              item.value | type_debug == "dict" 
              or item.value | type_debug == "AnsibleMapping"
            ) %}
            "{{ var.key | regex_replace(regex_rulebase_match, '\2') }}{{ item.key | regex_replace(regex_rulebase_match, '\2') }}": {
              {# This block is used for rulegroup level options #}
              {% for key in ['log_from_start', 'log', 'status', 'nat', 'natpool', 'schedule', 'ips_enable', 'ssl_ssh_profile', 'ips_sensor'] %}
                {% if var.value[key] is defined and rule.value[key] is not defined %}
                  {% if var.value[key] | type_debug in ['string', 'AnsibleUnicode'] %}
                    "{{ key }}": "{{ var.value[key] }}",
                  {% else %}
                    "{{ key }}": {{ var.value[key] }},
                  {% endif %}
                {% endif %}
              {% endfor %}
              {% for rule in item.value | dict2items %}
                {% if rule.key in ['sources', 'destinations', 'services', 'src_internet_service', 'dst_internet_service'] and rule.value | type_debug not in ['list', 'AnsibleSequence'] %}
                  "{{ rule.key }}": ["{{ rule.value }}"],
                {% elif rule.value | type_debug in ['string', 'AnsibleUnicode'] %}
                  "{{ rule.key }}": "{{ rule.value }}",
                {% else %}
                  "{{ rule.key }}": {{ rule.value }},
                {% endif %}
              {% endfor %}
            },
          {% endif %}
        {% endfor %}
      {% endif %}
    {% endfor %}
  }

Now, if you’re writing set_fact or vars like this a lot, what you tend to end up with is the dreaded dict2items requires a dictionary, got instead. which basically means “Hah! You wrote a giant blob of what you thought was JSON, but didn’t render right, so we cast it to a string for you!”

The way I usually write my playbooks, I’ll do something with this set_fact at line, let’s say, 10, and then use it at line, let’s say, 500… So, I don’t know what the bloomin’ thing looks like then!

So, how to get around that? Well, you could do a type check. In fact, I wrote a bloomin’ big blog post explaining just how to do that!

However, that gets unwieldy really quickly, and what I actually wanted to do was to throw the breaks on as soon as I’d created an invalid data type. So, to do that, I created a collection of functions which helped me with my current project, and they look a bit like this one, called “is_a_string.yml“:

- name: Type Check - is_a_string
  assert:
    quiet: yes
    that:
    - vars[this_key] is not boolean
    - vars[this_key] is not number
    - vars[this_key] | int | string != vars[this_key] | string
    - vars[this_key] | float | string != vars[this_key] | string
    - vars[this_key] is string
    - vars[this_key] is not mapping
    - vars[this_key] is iterable
    success_msg: "{{ this_key }} is a string"
    fail_msg: |-
      {{ this_key }} should be a string, and is instead
      {%- if vars[this_key] is not defined %} undefined
      {%- else %} {{ vars[this_key] is boolean | ternary(
        'a boolean',
        (vars[this_key] | int | string == vars[this_key] | string) | ternary(
          'an integer',
          (vars[this_key] | float | string == vars[this_key] | string) | ternary(
            'a float',
            vars[this_key] is string | ternary(
              'a string',
              vars[this_key] is mapping | ternary(
                'a dict',
                vars[this_key] is iterable | ternary(
                  'a list',
                  'unknown (' ~ vars[this_key] | type_debug ~ ')'
                )
              )
            )
          )
        )
      )}}{% endif %} - {{ vars[this_key] | default('unset') }}

To trigger this, I do the following:

- hosts: localhost
  gather_facts: false
  vars:
    SomeString: abc123
    SomeDict: {'somekey': 'somevalue'}
    SomeList: ['somevalue']
    SomeInteger: 12
    SomeFloat: 12.0
    SomeBoolean: false
  tasks:
  - name: Type Check - SomeString
    vars:
      this_key: SomeString
    include_tasks: tasks/type_check/is_a_string.yml
  - name: Type Check - SomeDict
    vars:
      this_key: SomeDict
    include_tasks: tasks/type_check/is_a_dict.yml
  - name: Type Check - SomeList
    vars:
      this_key: SomeList
    include_tasks: tasks/type_check/is_a_list.yml
  - name: Type Check - SomeInteger
    vars:
      this_key: SomeInteger
    include_tasks: tasks/type_check/is_an_integer.yml
  - name: Type Check - SomeFloat
    vars:
      this_key: SomeFloat
    include_tasks: tasks/type_check/is_a_float.yml
  - name: Type Check - SomeBoolean
    vars:
      this_key: SomeBoolean
    include_tasks: tasks/type_check/is_a_boolean.yml

I hope this helps you, bold traveller with complex jinja2 templating requirements!

(Oh, and if you get “template error while templating string: no test named 'boolean'“, you’re probably running Ansible which you installed using apt from Ubuntu Universe, version 2.9.6+dfsg-1 [or, at least I was!] – to fix this, use pip to install a more recent version – preferably using virtualenv first!)

Featured image is “The Guitar Template” by “Neil Williamson” on Flickr and is released under a CC-BY-SA license.

'Geocache "Goodies"' by 'sk' on Flickr

Caching online data sources in Ansible for later development or testing

My current Ansible project relies on me collecting a lot of data from AWS and then checking it again later, to see if something has changed.

This is great for one-off tests (e.g. terraform destroy ; terraform apply ; ansible-playbook run.yml) but isn’t great for repetitive tests, especially if you have to collect data that may take many minutes to run all the actions, or if you have slow or unreliable internet in your development environment.

To get around this, I wrote a wrapper for caching this data.

At the top of my playbook, run.yml, I have these tasks:

- name: Set Online Status.
  # This stores the value of run_online, unless run_online
  # is not set, in which case, it defines it as "true".
  ansible.builtin.set_fact:
    run_online: |-
      {{- run_online | default(true) | bool -}}

- name: Create cache_data path.
  # This creates a "cached_data" directory in the same
  # path as the playbook.
  when: run_online | bool and cache_data | default(false) | bool
  delegate_to: localhost
  run_once: true
  file:
    path: "cached_data"
    state: directory
    mode: 0755

- name: Create cache_data for host.
  # This creates a directory under "cached_data" in the same
  # path as the playbook, with the name of each of the inventory
  # items.
  when: run_online | bool and cache_data | default(false) | bool
  delegate_to: localhost
  file:
    path: "cached_data/{{ inventory_hostname }}"
    state: directory
    mode: 0755

Running this sets up an expectation for the normal operation of the playbook, that it will be “online”, by default.

Then, every time I need to call something “online”, for example, collect EC2 Instance Data (using the community.aws.ec2_instance_info module), I call out to (something like) this set of tasks, instead of just calling the task by itself.

- name: List all EC2 instances in the regions of interest.
  when: run_online | bool
  community.aws.ec2_instance_info:
    region: "{{ item.region_name }}"
  loop: "{{ regions }}"
  loop_control:
    label: "{{ item.region_name }}"
  register: regional_ec2

- name: "NOTE: Set regional_ec2 data path"
  when: not run_online | bool or cache_data | default(false) | bool
  set_fact:
    regional_ec2_cached_data_file_loop: "{{ regional_ec2_cached_data_file_loop | default(0) | int + 1 }}"
    cached_data_filename: "cached_data/{{ inventory_hostname }}/{{ cached_data_file | default('regional_ec2') }}.{{ regional_ec2_cached_data_file_loop | default(0) | int + 1 }}.json"

- name: "NOTE: Cache/Get regional_ec2 data path"
  when: not run_online | bool or cache_data | default(false) | bool
  debug:
    msg: "File: {{ cached_data_filename }}"

- name: Cache all EC2 instances in the regions of interest.
  when: run_online | bool and cache_data | default(false) | bool
  delegate_to: localhost
  copy:
    dest: "{{ cached_data_filename }}"
    mode: "0644"
    content: "{{ regional_ec2 }}"

- name: "OFFLINE: Load all EC2 instances in the regions of interest."
  when: not run_online | bool
  set_fact:
    regional_ec2: "{% include( cached_data_filename ) %}"

The first task, if it’s still set to being “online” will execute the task, and registers the result for later. If cache_data is configured, we generate a filename for the caching, record the filename to the log (via the debug task) and then store it (using the copy task). So far, so online… but what happens when we don’t need the instance to be up and running?

In that case, we use the set_fact module, triggered by running the playbook like this: ansible-playbook run.yml -e run_online=false. This reads the cached data out of that locally stored pool of data for later use.

Featured image is ‘Geocache “Goodies”‘ by ‘sk‘ on Flickr and is released under a CC-BY-ND license.

My Fujitsu Stylistic V727

Review of my Fujitsu Stylistic V727 Laptop/Tablet.

TL;DR: Linux is usually awesome, but it doesn’t work for my niche case.

Why was I in the market for a new computer?

October 2019 my beloved (but 7 year old) Acer V5-171, “Minilith” (so named because it was smaller than it’s predecessor, a 17″ monster of a black slab that was named “Monolith”) started exhibiting signs of having a dead battery. I replaced the battery with an 3rd party replacement, and while it charged OK for a few runs, it stopped charging all together (I could get a maximum of 5% charge), so I put the old battery in, and it started working better. Huzzah. All was going well until around 6 months ago when the hard drive failed, so I replaced it with an SSD, and that gave it a new lease of life… and this month, well, it just wouldn’t boot reliably. I finally decided that it was time to let it go and play with Timmy the dog at the farm, and replace it with something newer.

The back of "Minilith", my 7 year old Laptop.
The back of “Minilith”, my 7 year old Laptop.
Minilith's Keyboard and Screen
Minilith’s Keyboard and Screen

Fortunately, this co-incided with a small win on the company social Prize Draw of a reasonable sized pay out, enough to consider looking at the Ex-Demo staff sales list made available to me by dint of my employer.

Making my choice.

There weren’t a lot of options, to be fair, but one item stood out to me. A Stylistic Tablet Computer. I’d previously had an Asus Transformer TF300T – a tablet-come-computer which had a detachable keyboard. I’d loved that, even though it didn’t really do what I wanted it for (and, I think I’d paid quite a bit over what it was worth, really)… but what I really wanted to do was have a tablet I could use for computing… Hence, the Stylistic.

Image of the TF300T, a tablet model I’d previously owned, from the Asus Transformer Marketing Pages
Der obere Teil des Stylistic V727 ist ein Tablet.
Image of the tablet view of the Stylistic V727 computer from a German blog.

Fujitsu are in a bit of an odd place, at least in the UK (I’ve not looked elsewhere) for personal computers – we sell quite well (apparently) to business, but we moved out of the “selling to the public” market probably around 2010, and so it was pretty hard to gauge how well this laptop performed. Oh, and of course, being a “Linux Enthusiast”, I wanted to be able to run Ubuntu, Fedora or others on it.

Because it was an internal sale, and I wanted to test Ubuntu on it before I bought it, I was able to get the sales team to let me evaluate it before I bought it.

It arrives!

It arrived as the tablet and keyboard, with a dock for setting it on your desk. I tested it with Windows, where the dock worked well, but the keyboard by itself didn’t so much. You see, the keyboard is an optional accessory, and had been sold with the laptop, all good thus far. Except what you also need to get, when you get the keyboard, is the case. The case gives you the sturdy back to give the “laptop” a frame. It’s basically the hinge that the top-heavy screen needs to keep itself upright.

A screen capture of the Fujitsu Stylistic V727 from the datasheet. Note this image shows the optional keyboard and the optional case.

The sales team were very understanding, and found a case to ship to me as well, but it wouldn’t come for a few days, so I was left to try out the rest of the hardware.

What do you get for your money?

The processor is an Intel i5-7Y57 dual core CPU with four threads, running at 1.2GHz.

It has 8GB of RAM and a 256GB m2 SATA drive.

The 12.3″ touch-or-pen (included) enabled screen has a maximum resolution of 1920×1280 pixels. The surface of the tablet is considered a WACOM tablet, and the pen can be sensed from a reasonable distance away. There are two buttons on the side of the pen, which turns a tap on the screen from a “normal” left click to a right click, or a center click.

On the rear is a fingerprint sensor.

From a network perspective, the WIFI supports 802.11ac, Bluetooth 4.2, and under the battery there is a LTE module onboard (although, I’ve not tested that).

On the side is a USB3.1 A connector, and a USB C connector (the specs sheet I linked to above suggests there is a single USB 3 and a USB 2 interface, but I doubt the USB C is USB 2).

There is also a MicroSD slot, which is detected by the booted OS as an MMC device, but it is not detected as a bootable device.

There’s a combined 3.5mm audio in and out jack, which I’ve not tried and a power socket.

There are two cameras, a 5 megapixel front-facing camera and an 8 megapixel rear facing camera with a flash.

The detachable keyboard has an integrated touchpad. It’s all good, and compared to my poor Acer V5, it’s a massive step up ❤

When you add in the desk dock (where, to be fair, it’s spent most of it’s time since I got it), the connections also then include Gigabit Ethernet, a Display Port interface, a VGA port and three USB 3 A interfaces, and a power socket.

The OS Comparison starts

Windows first

I booted it in Windows, and found it really rather responsive, especially once I’d reinstalled Windows without all the customizations the demo team had put on…

My previous install of Windows on Minilith had been the Home edition, and I’d found the semi-constant nagging to install games and the like rather annoying. I’ve had a couple of Windows 10 Professional installs at work, and, while those builds came with their own fair share of corporately mandated bloat (after all, their threat models are somewhat different to mine) they usually felt more slim than the Windows 10 home install I’d had, so when I saw this had Windows 10 Professional, I was looking forward to seeing something a bit leaner… and I wasn’t let down. All the hardware worked fine, I had the fingerprint reader working, no worries and the dock was great.

Docking and undocking is relatively seamless, although the first try was a bit tricky, I’ve got used to it. I had two screens attached via my work-supplied Fujitsu PR08 DisplayLink adaptor, plugged through the dock, and again, that all worked fine.

I could use the pen in the tablet mode really well. It makes selecting items on the screen easy, and if you don’t want to use the virtual keyboard, in some cases, it pops up a handwriting recognition box, although the time I showed this to my wife (where I’d been using it successfully for some time), it didn’t recognise half the words I wrote… but I’m sure that’s just my dreadful scrawl, and not the tablet’s fault!

Even using the tablet without the pen worked really well. Tapping the screen is a left click, and a long press on an area is a right click, similar to how Android handles left-and-right clicks in RDP and VNC sessions. The keyboard has several “modes” – a reduced character set, a thumb typing set or a full keyboard. The reduced set has a control key and an escape key, but no alt, windows or arrow keys. I didn’t try the thumb typing set (this thing is 12″ across!) but the full keyboard is an “ISO layout” 75% keyboard (I discovered by matching the image to this website!) which means I still get my Control, Alt, and Windows keys.

Next, Ubuntu

I booted from a USB stick that had the Ubuntu 20.04 installer on it. Ubuntu booted fine, allowed me to repartition the Windows partition into approximately half the drive, and install away. During the install, I was asked to provide a password to setup the SecureBoot keys, and instructed that it’d prompt me for it on the next reboot. Most of the hardware worked fine. Dock, keyboard, Wifi, Bluetooth… all good. The fingerprint sensor wasn’t detected, and still isn’t, but I’m OK with that, it was always just a nice-to-have. The install worked fine, and yes, on reboot, I got a blue screen asking me to set up my “MOK” (which, I guessed eventually, was the SecureBoot setup). I realised that the SecureBoot install stage of the Linux install copies a private key to the UEFI space, and on the next boot, it spots there’s a key there and asks you to unlock that private key, so it can install it into the boot keys. All good!

I was working away on it with the tablet in the dock. I tried using it with the detachable keyboard, but it was a bit tricky to use without the rigid back, so I kept it in the dock. The pen works a treat too.

The problem came when I tried to use it as a tablet.

You see, where Windows has a selection of keyboard layouts for their “On Screen Keyboard” (OSK), the Gnome one only lets you use this layout:

GNOME 3.28 OSK
Screenshot taken from an article at OMG Ubuntu.

While this is passable for tapping stuff into a URL bar in your browser, entering a password for logging in, or typing simple statements into dialogue boxes, there are some key things it’s missing. The first (for me) is a Control, Alt or Super (Windows) key. This means I can’t do any programming, of any sort, in Tablet mode. Note, this just works on Windows, and is possible on Android with an extended keyboard called “Hacker Keyboard”. There are also no cursor keys, which seems like it’s less important, but it makes editing things you’ve typed (or mistyped) MUCH harder.

“Well, OK then, let’s have a look around and see what our options are?”

I’d heard good things about “OnBoard”, a predecessor to Gnome OSK, but because OSK is registered as “The” on screen keyboard, and runs as a system process, and OnBoard is a user process, Gnome OSK pops up any time you want to do on screen keyboard things, even if you’ve got OnBoard loaded. Ahah! I found an extension which blocks Gnome OSK… except that stops it from being able to be used for logging in.

You see, that whole “system” versus “user” process thing I mentioned before. The Gnome lock screen is considered a system process, not a user one, which means that if you’ve disabled Gnome OSK, then you can’t put your password in, but equally, if you’re typing in a box with OnBoard, change focus and change back again, up pops Gnome OSK.

Breaking down and turning it around.

I should confess, I didn’t spend a lot of time wondering about this. I booted a Kubuntu environment instead, and found that this really didn’t work for me either (although I now don’t remember what stopped me from liking it – I might have to revisit this!) By this point, I’d spent several hours “messing” around with this, and I just wanted to give something a try. So I booted back into Windows.

I gave the on screen keyboard another try. It worked great. I tried doing some sketches in Paint 3D (the replacement for Windows Paint) with the pen, and it was very easy (so much so, I need to work out how to use it for my next design call with work!)

All the familiar tools I use in my work or personal environment are there.

  • VSCode. Check.
  • A usable shell (via Windows Subsystem for Linux). Check.
  • File synchronization (via Syncthing). Check.
  • Web browser (Firefox). Check.
  • Audio recording software (Audacity). Check.
  • Image editor (GIMP). Check.
  • Voice chat for the podcast software (Mumble). Check.
  • Screencasting software (OBS). Check.
  • Virtual Machine software (VirtualBox, Vagrant, Terraform). Check.

And the fingerprint reader works… so I stuck with Windows 10.

The only last catch, whether it was Windows or Linux? There’s no HDMI or VGA out without the dock… so I need to start looking into “cheap” display adaptors that I can use for presenting things, whenever we get back to “normal” and I can start attending and speaking at conferences again.

What about the case?

Oh yehr, so a few days after I get the computer, the case turns up. It attaches to the back of the computer with tape, and feels like leather (although, I’m sure it’s not leather). It definitely makes it feel like a “quality” product 😀. It’s a little bit more tricky to drop into the dock, but it makes it feel like a Laptop when you’re using it like one. The detachable keyboard is interesting. I’ve used it in the car, waiting for children to finish activities, and it’s fine, because it goes flat. I’ve detached the keyboard from the screen to just do tablet-y things with it, and that’s fine too.

So in summary

I think if I didn’t want it to be a tablet as much as a computer, I’d have been fine.

If you want a Windows Tablet that turns into a Laptop, it’s fine. If you want a low-profile desktop computer (in a dock) that can become a laptop, it’s fine.

But until Gnome or one of the other flavours gets a handle on how to do a reasonable on-screen keyboard… I don’t think I’ll be using Linux on here (because it’s also a tablet) for the next few months… and I think that’s going to be OK.

All of that said, if you use any Linux distributions with a tablet style mode, and you’ve got a working OSK, please contact me (via one of the links at the top of the site) to let me know what and how you did it, and I’ll give it a try too!

"Main console" by "Steve Parker" on Flickr

Running services (like SSH, nginx, etc) on Windows Subsystem for Linux (WSL1) on boot

I recently got a new laptop, and for various reasons, I’m going to be primarily running Windows on that laptop. However, I still like having a working SSH server, running in the context of my Windows Subsystem for Linux (WSL) environment.

Initially, trying to run service ssh start failed with an error, because you need to re-execute the ssh configuration steps which are missed in a WSL environment. To fix that, run sudo apt install --reinstall openssh-server.

Once you know your service runs OK, you start digging around to find out how to start it on boot, and you’ll see lots of people saying things like “Just run a shell script that starts your first service, and then another shell script for the next service.”

Well, the frustration for me is that Linux already has this capability – the current popular version is called SystemD, but a slightly older variant is still knocking around in modern linux distributions, and it’s called SystemV Init, often referred to as just “sysv” or “init.d”.

The way that those services work is that you have an “init” file in /etc/init.d and then those files have a symbolic link into a “runlevel” directory, for example /etc/rc3.d. Each symbolic link is named S##service or K##service, where the ## represents the order in which it’s to be launched. The SSH Daemon, for example, that I want to run is created in there as /etc/rc3.d/S01ssh.

So, how do I make this work in the grander scheme of WSL? I can’t use SystemD, where I could say systemctl enable --now ssh, instead I need to add a (yes, I know) shell script, which looks in my desired runlevel directory. Runlevel 3 is the level at which network services have started, hence using that one. If I was trying to set up a graphical desktop, I’d instead be looking to use Runlevel 5, but the X Windows system isn’t ported to Windows like that yet… Anyway.

Because the rc#.d directory already has this structure for ordering and naming services to load, I can just step over this directory looking for files which match or do not match the naming convention, and I do that with this script:

#! /bin/bash
function run_rc() {
  base="$(basename "$1")"
  if [[ ${base:0:1} == "S" ]]
  then
    "$1" start
  else
    "$1" stop
  fi
}

if [ "$1" != "" ] && [ -e "$1" ]
then
  run_rc "$1"
else
  rc=3
  if [ "$1" != "" ] && [ -e "/etc/rc${$1}.d/" ]
  then
    rc="$1"
  fi
  for digit1 in {0..9}
  do
    for digit2 in {0..9}
    do
      find "/etc/rc${rc}.d/" -name "[SK]${digit1}${digit2}*" -exec "$0" '{}' \; 2>/dev/null
    done
  done
fi

I’ve put this script in /opt/wsl_init.sh

This does a bit of trickery, but basically runs the bottom block first. It loops over the digits 0 to 9 twice (giving you 00, 01, 02 and so on up to 99) and looks in /etc/rc3.d for any file containing the filename starting S or K and then with the two digits you’ve looped to by that point. Finally, it runs itself again, passing the name of the file it just found, and this is where the top block comes in.

In the top block we look at the “basename” – the part of the path supplied, without any prefixed directories attached, and then extract just the first character (that’s the ${base:0:1} part) to see whether it’s an “S” or anything else. If it’s an S (which everything there is likely to be), it executes the task like this: /etc/rc3.d/S01ssh start and this works because it’s how that script is designed! You can run one of the following instances of this command: service ssh start, /etc/init.d/ssh start or /etc/rc3.d/S01ssh start. There are other options, notably “stop” or “status”, but these aren’t really useful here.

Now, how do we make Windows execute this on boot? I’m using NSSM, the “Non-sucking service manager” to add a line to the Windows System services. I placed the NSSM executable in C:\Program Files\nssm\nssm.exe, and then from a command line, ran C:\Program Files\nssm\nssm.exe install WSL_Init.

I configured it with the Application Path: C:\Windows\System32\wsl.exe and the Arguments: -d ubuntu -e sudo /opt/wsl_init.sh. Note that this only works because I’ve also got Sudo setup to execute this command without prompting for a password.

Here I invoke C:\Windows\System32\wsl.exe -d ubuntu -e sudo /opt/wsl_init.sh
I define the name of the service, as Services will see it, and also the description of the service.
I put in MY username and My Windows Password here, otherwise I’m not running WSL in my user context, but another one.

And then I rebooted. SSH was running as I needed it.

Featured image is “Main console” by “Steve Parker” on Flickr and is released under a CC-BY license.

"2009.01.17 - UNKNOWN, Unknown" by "Adrian Clark" on Flickr

Creating tagged AWS EC2 resources (like Elastic IPs) with Ansible

This is a quick note, having stumbled over this one today.

Mostly these days, I’m used to using Terraform to create Elastic IP (EIP) items in AWS, and I can assign tags to them during creation. For various reasons in $Project I’m having to create my EIPs in Ansible.

To make this work, you can’t just create an EIP with tags (like you would in Terraform), instead what you need to do is to create the EIP and then tag it, like this:

  - name: Allocate a new elastic IP
    community.aws.ec2_eip:
      state: present
      in_vpc: true
      region: eu-west-1
    register: eip

  - name: Tag that resource
    amazon.aws.ec2_tag:
      region: eu-west-1
      resource: "{{ eip.allocation_id }}"
      state: present
      tags:
        Name: MyTag
    register: tag

Notice that we create a VPC associated EIP, and assign the allocation_id from the result of that module to the resource we want to tag.

How about if you’re trying to be a bit more complex?

Here I have a list of EIPs I want to create, and then I pass this into the ec2_eip module, like this:

- name: Create list of EIPs
  set_fact:
    region: eu-west-1
    eip_list:
    - demo-eip-1
    - demo-eip-2
    - demo-eip-3

  - name: Allocate new elastic IPs
    community.aws.ec2_eip:
      state: present
      in_vpc: true
      region: "{{ region }}"
    register: eip
    loop: "{{ eip_list | dict2items }}"
    loop_control:
      label: "{{ item.key }}"

  - name: Tag the EIPs
    amazon.aws.ec2_tag:
      region: "{{ item.invocation.module_args.region }}"
      resource: "{{ item.allocation_id }}"
      state: present
      tags:
        Name: "{{ item.item.key }}"
    register: tag
    loop: "{{ eip.results }}"
    loop_control:
      label: "{{ item.item.key }}"

So, in this instance we pass the list of EIP names we want to create as a list with the loop instruction. Now, at the point we create them, we don’t actually know what they’ll be called, but we’re naming them there because when we tag them, we get the “item” (from the loop) that was used to create the EIP. When we then tag the EIP, we can use some of the data that was returned from the ec2_eip module (region, EIP allocation ID and the name we used as the loop key). I’ve trimmed out the debug statements I created while writing this, but here’s what you get back from ec2_eip:

"eip": {
        "changed": true,
        "msg": "All items completed",
        "results": [
            {
                "allocation_id": "eipalloc-decafbaddeadbeef1",
                "ansible_loop_var": "item",
                "changed": true,
                "failed": false,
                "invocation": {
                    "module_args": {
                        "allow_reassociation": false,
                        "aws_access_key": null,
                        "aws_ca_bundle": null,
                        "aws_config": null,
                        "aws_secret_key": null,
                        "debug_botocore_endpoint_logs": false,
                        "device_id": null,
                        "ec2_url": null,
                        "in_vpc": true,
                        "private_ip_address": null,
                        "profile": null,
                        "public_ip": null,
                        "public_ipv4_pool": null,
                        "region": "eu-west-1",
                        "release_on_disassociation": false,
                        "reuse_existing_ip_allowed": false,
                        "security_token": null,
                        "state": "present",
                        "tag_name": null,
                        "tag_value": null,
                        "validate_certs": true,
                        "wait_timeout": null
                    }
                },
                "item": {
                    "key": "demo-eip-1",
                    "value": {}
                },
                "public_ip": "192.0.2.1"
            }
     ]
}

So, that’s what I’m doing next!

Featured image is “2009.01.17 – UNKNOWN, Unknown” by “Adrian Clark” on Flickr and is released under a CC-BY-ND license.

"pharmacy" by "Tim Evanson" on Flickr

AWX – The Gateway Drug to Ansible Tower

A love letter to Ansible Tower

I love Ansible… I mean, I really love Ansible. You can ask anyone, and they’ll tell you my first love is my wife, then my children… and then it’s Ansible.

OK, maybe it’s Open Source and then Ansible, but either way, Ansible is REALLY high up there.

But, while I love Ansible, I love what Ansible Tower brings to an environment. See, while you get to easily and quickly manage a fleet of machines with Ansible, Ansible Tower gives you the fine grained control over what you need to expose to your developers, your ops team, or even, in a fit of “what-did-you-just-do”-ness, your manager. (I should probably mention that Ansible Tower is actually part of a much larger portfolio of products, called Ansible Automation Platform, and there’s some hosted SaaS stuff that goes with it… but the bit I really want to talk about is Tower, so I’ll be talking about Tower and not Ansible Automation Platform. Sorry!)

Ansible Tower has a scheduling engine, so you can have a “Go” button, for deploying the latest software to your fleet, or just for the 11PM patching cycle. It has a credential store, so your teams can’t just quickly go and perform an undocumented quick fix on that “flaky” box – they need to do their changes via Ansible. And lastly, it has an inventory, so you can see that the last 5 jobs failed to deploy on that host, so maybe you’ve got a problem with it.

One thing that people don’t so much love to do, is to get a license to deploy Tower, particularly if they just want to quickly spin up a demonstration for some colleagues to show how much THEY love Ansible. And for those people, I present AWX.

The first hit is free

One of the glorious and beautiful things that RedHat did, when they bought Ansible, was to make the same assertion about the Ansible products that they make to the rest of their product line, which is… while they may sell a commercial product, underneath it will be an Open Source version of that product, and you can be part of developing and improving that version, to help improve the commercial product. Thus was released AWX.

Now, I hear the nay-sayers commenting, “but what if you have an issue with AWX at 2AM, how do you get support on that”… and to those people, I reply: “If you need support at 2AM for your box, AWX is not the tool for you – what you need is Tower.”… Um, I mean Ansible Automation Platform. However, Tower takes a bit more setting up than what I’d want to do for a quick demo, and it has a few more pre-requisites. ANYWAY, enough about dealing with the nay-sayers.

AWX is an application inside Docker containers. It’s split into three parts, the AWX Web container, which has the REST API. There’s also a PostgreSQL database inside there too, and one “Engine”, which is the separate container which gets playbooks from your version control system, asks for any dynamic inventories, and then runs those playbooks on your inventories.

I like running demos of Tower, using AWX, because it’s reasonably easy to get stood up, and it’s reasonably close to what Tower looks and behaves like (except for the logos)… and, well, it’s a good gateway to getting people interested in what Tower can do for them, without them having to pay (or spend time signing up for evaluation licenses) for the environment in the first place.

And what’s more, it can all be automated

Yes, folks, because AWX is just a set of docker containers (and an install script), and Ansible knows how to start Docker containers (and run an install script), I can add an Ansible playbook to my cloud-init script, Vagrantfile or, let’s face it, when things go really wrong, put it in a bash script for some poor keyboard jockey to install for you.

If you’re running a demo, and you don’t want to get a POC (proof of concept) or evaluation license for Ansible Tower, then the chances are you’re probably not running this on RedHat Enterprise Linux (RHEL) either. That’s OK, once you’ve sold the room on using Tower (by using AWX), you can sell them on using RHEL too. So, I’ll be focusing on using CentOS 8 instead. Partially because there’s a Vagrant box for CentOS 8, but also because I can also use CentOS 8 on AWS, where I can prove that the Ansible Script I’m putting into my Vagrantfile will also deploy nicely via Cloud-Init too. With a very small number of changes, this is likely to work on anything that runs Docker, so everything from Arch to Ubuntu… probably 😁

“OK then. How can you work this magic, eh?” I hear from the back of the room. OK, pipe down, nay-sayers.

First, install Ansible on your host. You just need to run dnf install -y ansible.

Next, you need to install Docker. This is a marked difference between AWX and Ansible Tower, as AWX is based on Docker, but Ansible Tower uses other magic to make it work. When you’re selling the benefits of Tower, note that it’s not a 1-for-1 match at this point, but it’s not a big issue. Fortunately, CentOS can install Docker Community edition quite easily. At this point, I’m swapping to using Ansible playbooks. At the end, I’ll drop a link to where you can get all this in one big blob… In fact, we’re likely to use it with our Cloud-Init deployment.

Aw yehr, here’s the good stuff

tasks:
- name: Update all packages
  dnf:
    name: "*"
    state: latest

- name: Add dependency for "yum config-manager"
  dnf:
    name: yum-utils
    state: present

- name: Add the Docker Repo
  shell: yum config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo
  args:
    creates: /etc/yum.repos.d/docker-ce.repo
    warn: false

- name: Install Docker
  dnf:
    name:
    - docker-ce
    - docker-ce-cli
    - containerd.io
    state: present
  notify: Start Docker

That first stanza – update all packages? Well, that’s because containerd.io relies on a newer version of libseccomp, which hasn’t been built in the CentOS 8 Vagrantbox I’m using.

The next one? That ensures I can run yum config-manager to add a repo. I could use the copy module in Ansible to create the repo files so yum and/or dnf could use that instead, but… meh, this is a single line shell command.

And then we install the repo, and the docker-ce packages we require. We use the “notify” statement to trigger a handler call to start Docker, like this:

handlers:
- name: Start Docker
  systemd:
    name: docker
    state: started

Fab. We’ve got Docker. Now, let’s clone the AWX repo to our machine. Again, we’re doing this with Ansible, naturally :)

tasks:
- name: Clone AWX repo to local path
  git:
    repo: https://github.com/ansible/awx.git
    dest: /opt/awx

- name: Get latest AWX tag
  shell: |
    if [ $(git status -s | wc -l) -gt 0 ]
    then
      git stash >/dev/null 2>&1
    fi
    git fetch --tags && git describe --tags $(git rev-list --tags --max-count=1)
    if [ $(git stash list | wc -l) -gt 0 ]
    then
      git stash pop >/dev/null 2>&1
    fi
  args:
    chdir: /opt/awx
  register: latest_tag
  changed_when: false

- name: Use latest released version of AWX
  git:
    repo: https://github.com/ansible/awx.git
    dest: /opt/awx
    version: "{{ latest_tag.stdout }}"

OK, there’s a fair bit to get from this, but essentially, we clone the repo from Github, then ask (using a collection of git commands) for the latest released version (yes, I’ve been bitten by just using the head of “devel” before), and then we check out that released version.

Fab, now we can configure it.

tasks:
- name: Set or Read admin password
  set_fact:
    admin_password_was_generated: "{{ (admin_password is defined or lookup('env', 'admin_password') != '') | ternary(false, true) }}"
    admin_password: "{{ admin_password | default (lookup('env', 'admin_password') | default(lookup('password', 'pw.admin_password chars=ascii_letters,digits length=20'), true) ) }}"

- name: Configure AWX installer
  lineinfile:
    path: /opt/awx/installer/inventory
    regexp: "^#?{{ item.key }}="
    line: "{{ item.key }}={{ item.value }}"
  loop:
  - key: "awx_web_hostname"
    value: "{{ ansible_fqdn }}"
  - key: "pg_password"
    value: "{{ lookup('password', 'pw.pg_password chars=ascii_letters,digits length=20') }}"
  - key: "rabbitmq_password"
    value: "{{ lookup('password', 'pw.rabbitmq_password chars=ascii_letters,digits length=20') }}"
  - key: "rabbitmq_erlang_cookie"
    value: "{{ lookup('password', 'pw.rabbitmq_erlang_cookie chars=ascii_letters,digits length=20') }}"
  - key: "admin_password"
    value: "{{ admin_password }}"
  - key: "secret_key"
    value: "{{ lookup('password', 'pw.secret_key chars=ascii_letters,digits length=64') }}"
  - key: "create_preload_data"
    value: "False"
  loop_control:
    label: "{{ item.key }}"

If we don’t already have a password defined, then create one. We register the fact we’ve had to create one, as we’ll need to tell ourselves it once the build is finished.

After that, we set a collection of values into the installer – the hostname, passwords, secret keys and so on. It loops over a key/value pair, and passes these to a regular expression rewrite command, so at the end, we have the settings we want, without having to change this script between releases.

When this is all done, we execute the installer. I’ve seen this done two ways. In an ideal world, you’d throw this into an Ansible shell module, and get it to execute the install, but the problem with that is that the AWX install takes quite a while, so I’d much rather actually be able to see what’s going on… and so, instead, we exit our prepare script at this point, and drop back to the shell to run the installer. Let’s look at both options, and you can decide which one you want to do. In my script, I’m doing the first, but just because it’s a bit neater to have everything in one place.

- name: Run the AWX install.
  shell: ansible-playbook -i inventory install.yml
  args:
    chdir: /opt/awx/installer
cd /opt/awx/installer
ansible-playbook -i inventory install.yml

When this is done, you get a prepared environment, ready to access using the username admin and the password of … well, whatever you set admin_password to.

AWX takes a little while to stand up, so you might want to run this next Ansible stanza to see when it’s ready to go.

- name: Test access to AWX
  tower_user:
    tower_host: "http://{{ ansible_fqdn }}"
    tower_username: admin
    tower_password: "{{ admin_password }}"
    email: "admin@{{ ansible_fqdn }}"
    first_name: "admin"
    last_name: ""
    password: "{{ admin_password }}"
    username: admin
    superuser: yes
    auditor: no
  register: _result
  until: _result.failed == false
  retries: 240 # retry 240 times
  delay: 5 # pause for 5 sec between each try

The upshot to using that command there is that it sets the email address of the admin account to “admin@your.awx.example.org“, if the fully qualified domain name (FQDN) of your machine is your.awx.example.org.

Moving from the Theoretical to the Practical

Now we’ve got our playbook, let’s wrap this up in both a Vagrant Vagrantfile and a Terraform script, this means you can deploy it locally, to test something internally, and in “the cloud”.

To simplify things, and because the version of Ansible deployed on the Vagrant box isn’t the one I want to use, I am using a single “user-data.sh” script for both Vagrant and Terraform. Here that is:

#!/bin/bash
if [ -e "$(which yum)" ]
then
  yum install git python3-pip -y
  pip3 install ansible docker docker-compose
else
  echo "This script only supports CentOS right now."
  exit 1
fi

git clone https://gist.github.com/JonTheNiceGuy/024d72f970d6a1c6160a6e9c3e642e07 /tmp/Install_AWX
cd /tmp/Install_AWX
/usr/local/bin/ansible-playbook Install_AWX.yml

While they both have their differences, they both can execute a script once the machine has finished booting. Let’s start with Vagrant.

Vagrant.configure("2") do |config|
  config.vm.box = "centos/8"

  config.vm.provider :virtualbox do |v|
    v.memory = 4096
  end

  config.vm.provision "shell", path: "user-data.sh"

  config.vm.network "forwarded_port", guest: 80, host: 8080, auto_correct: true
end

To boot this up, once you’ve got Vagrant and Virtualbox installed, run vagrant up and it’ll tell you that it’s set up a port forward from the HTTP port (TCP/80) to a “high” port – TCP/8080. If there’s a collision (because you’re running something else on TCP/8080), it’ll tell you what port it’s forwarded the HTTP port to instead. Once you’ve finished, run vagrant destroy to shut it down. There are lots more tricks you can play with Vagrant, but this is a relatively quick and easy one. Be aware that you’re not using HTTPS, so traffic to the AWX instance can be inspected, but if you’re running this on your local machine, it’s probably not a big issue.

How about running this on a cloud provider, like AWS? We can use the exact same scripts – both the Ansible script, and the user-data.sh script, using Terraform, however, this is a little more complex, as we need to create a VPC, Internet Gateway, Subnet, Security Group and Elastic IP before we can create the virtual machine. What’s more, the Free Tier (that “first hit is free” thing that Amazon Web Services provide to you) does not have enough horsepower to run AWX, so, if you want to look at how to run up AWX in EC2 (or to tweak it to run on Azure, GCP, Digital Ocean or one of the fine offerings from IBM or RedHat), then click through to the gist I’ve put all my code from this post into. The critical lines in there are to select a “CentOS 8” image, open HTTP and SSH into the machine, and to specify the user-data.sh file to provision the machine. Everything else is cruft to make the virtual machine talk to, and be seen by, hosts on the Internet.

To run this one, you need to run terraform init to load the AWS plugin, then terraform apply. Note that this relies on having an AWS access token defined, so if you don’t have them set up, you’ll need to get that sorted out first. Once you’ve finished with your demo, you should run terraform destroy to remove all the assets created by this terraform script. Again, when you’re running that demo, note that you ONLY have HTTP access set up, not HTTPS, so don’t use important credentials on there!

Once you’ve got your AWX environment running, you’ve got just enough AWX there to demo what Ansible Tower looks like, what it can bring to your organisation… and maybe even convince them that it’s worth investing in a license, rather than running AWX in production. Just in case you have that 2AM call-out that we all dread.

Featured image is “pharmacy” by “Tim Evanson” on Flickr and is released under a CC-BY-SA license.