"Main console" by "Steve Parker" on Flickr

Running services (like SSH, nginx, etc) on Windows Subsystem for Linux (WSL1) on boot

I recently got a new laptop, and for various reasons, I’m going to be primarily running Windows on that laptop. However, I still like having a working SSH server, running in the context of my Windows Subsystem for Linux (WSL) environment.

Initially, trying to run service ssh start failed with an error, because you need to re-execute the ssh configuration steps which are missed in a WSL environment. To fix that, run sudo apt install --reinstall openssh-server.

Once you know your service runs OK, you start digging around to find out how to start it on boot, and you’ll see lots of people saying things like “Just run a shell script that starts your first service, and then another shell script for the next service.”

Well, the frustration for me is that Linux already has this capability – the current popular version is called SystemD, but a slightly older variant is still knocking around in modern linux distributions, and it’s called SystemV Init, often referred to as just “sysv” or “init.d”.

The way that those services work is that you have an “init” file in /etc/init.d and then those files have a symbolic link into a “runlevel” directory, for example /etc/rc3.d. Each symbolic link is named S##service or K##service, where the ## represents the order in which it’s to be launched. The SSH Daemon, for example, that I want to run is created in there as /etc/rc3.d/S01ssh.

So, how do I make this work in the grander scheme of WSL? I can’t use SystemD, where I could say systemctl enable --now ssh, instead I need to add a (yes, I know) shell script, which looks in my desired runlevel directory. Runlevel 3 is the level at which network services have started, hence using that one. If I was trying to set up a graphical desktop, I’d instead be looking to use Runlevel 5, but the X Windows system isn’t ported to Windows like that yet… Anyway.

Because the rc#.d directory already has this structure for ordering and naming services to load, I can just step over this directory looking for files which match or do not match the naming convention, and I do that with this script:

#! /bin/bash
function run_rc() {
  base="$(basename "$1")"
  if [[ ${base:0:1} == "S" ]]
  then
    "$1" start
  else
    "$1" stop
  fi
}

if [ "$1" != "" ] && [ -e "$1" ]
then
  run_rc "$1"
else
  rc=3
  if [ "$1" != "" ] && [ -e "/etc/rc${$1}.d/" ]
  then
    rc="$1"
  fi
  for digit1 in {0..9}
  do
    for digit2 in {0..9}
    do
      find "/etc/rc${rc}.d/" -name "[SK]${digit1}${digit2}*" -exec "$0" '{}' \; 2>/dev/null
    done
  done
fi

I’ve put this script in /opt/wsl_init.sh

This does a bit of trickery, but basically runs the bottom block first. It loops over the digits 0 to 9 twice (giving you 00, 01, 02 and so on up to 99) and looks in /etc/rc3.d for any file containing the filename starting S or K and then with the two digits you’ve looped to by that point. Finally, it runs itself again, passing the name of the file it just found, and this is where the top block comes in.

In the top block we look at the “basename” – the part of the path supplied, without any prefixed directories attached, and then extract just the first character (that’s the ${base:0:1} part) to see whether it’s an “S” or anything else. If it’s an S (which everything there is likely to be), it executes the task like this: /etc/rc3.d/S01ssh start and this works because it’s how that script is designed! You can run one of the following instances of this command: service ssh start, /etc/init.d/ssh start or /etc/rc3.d/S01ssh start. There are other options, notably “stop” or “status”, but these aren’t really useful here.

Now, how do we make Windows execute this on boot? I’m using NSSM, the “Non-sucking service manager” to add a line to the Windows System services. I placed the NSSM executable in C:\Program Files\nssm\nssm.exe, and then from a command line, ran C:\Program Files\nssm\nssm.exe install WSL_Init.

I configured it with the Application Path: C:\Windows\System32\wsl.exe and the Arguments: -d ubuntu -e sudo /opt/wsl_init.sh. Note that this only works because I’ve also got Sudo setup to execute this command without prompting for a password.

Here I invoke C:\Windows\System32\wsl.exe -d ubuntu -e sudo /opt/wsl_init.sh
I define the name of the service, as Services will see it, and also the description of the service.
I put in MY username and My Windows Password here, otherwise I’m not running WSL in my user context, but another one.

And then I rebooted. SSH was running as I needed it.

Featured image is “Main console” by “Steve Parker” on Flickr and is released under a CC-BY license.

"pharmacy" by "Tim Evanson" on Flickr

AWX – The Gateway Drug to Ansible Tower

A love letter to Ansible Tower

I love Ansible… I mean, I really love Ansible. You can ask anyone, and they’ll tell you my first love is my wife, then my children… and then it’s Ansible.

OK, maybe it’s Open Source and then Ansible, but either way, Ansible is REALLY high up there.

But, while I love Ansible, I love what Ansible Tower brings to an environment. See, while you get to easily and quickly manage a fleet of machines with Ansible, Ansible Tower gives you the fine grained control over what you need to expose to your developers, your ops team, or even, in a fit of “what-did-you-just-do”-ness, your manager. (I should probably mention that Ansible Tower is actually part of a much larger portfolio of products, called Ansible Automation Platform, and there’s some hosted SaaS stuff that goes with it… but the bit I really want to talk about is Tower, so I’ll be talking about Tower and not Ansible Automation Platform. Sorry!)

Ansible Tower has a scheduling engine, so you can have a “Go” button, for deploying the latest software to your fleet, or just for the 11PM patching cycle. It has a credential store, so your teams can’t just quickly go and perform an undocumented quick fix on that “flaky” box – they need to do their changes via Ansible. And lastly, it has an inventory, so you can see that the last 5 jobs failed to deploy on that host, so maybe you’ve got a problem with it.

One thing that people don’t so much love to do, is to get a license to deploy Tower, particularly if they just want to quickly spin up a demonstration for some colleagues to show how much THEY love Ansible. And for those people, I present AWX.

The first hit is free

One of the glorious and beautiful things that RedHat did, when they bought Ansible, was to make the same assertion about the Ansible products that they make to the rest of their product line, which is… while they may sell a commercial product, underneath it will be an Open Source version of that product, and you can be part of developing and improving that version, to help improve the commercial product. Thus was released AWX.

Now, I hear the nay-sayers commenting, “but what if you have an issue with AWX at 2AM, how do you get support on that”… and to those people, I reply: “If you need support at 2AM for your box, AWX is not the tool for you – what you need is Tower.”… Um, I mean Ansible Automation Platform. However, Tower takes a bit more setting up than what I’d want to do for a quick demo, and it has a few more pre-requisites. ANYWAY, enough about dealing with the nay-sayers.

AWX is an application inside Docker containers. It’s split into three parts, the AWX Web container, which has the REST API. There’s also a PostgreSQL database inside there too, and one “Engine”, which is the separate container which gets playbooks from your version control system, asks for any dynamic inventories, and then runs those playbooks on your inventories.

I like running demos of Tower, using AWX, because it’s reasonably easy to get stood up, and it’s reasonably close to what Tower looks and behaves like (except for the logos)… and, well, it’s a good gateway to getting people interested in what Tower can do for them, without them having to pay (or spend time signing up for evaluation licenses) for the environment in the first place.

And what’s more, it can all be automated

Yes, folks, because AWX is just a set of docker containers (and an install script), and Ansible knows how to start Docker containers (and run an install script), I can add an Ansible playbook to my cloud-init script, Vagrantfile or, let’s face it, when things go really wrong, put it in a bash script for some poor keyboard jockey to install for you.

If you’re running a demo, and you don’t want to get a POC (proof of concept) or evaluation license for Ansible Tower, then the chances are you’re probably not running this on RedHat Enterprise Linux (RHEL) either. That’s OK, once you’ve sold the room on using Tower (by using AWX), you can sell them on using RHEL too. So, I’ll be focusing on using CentOS 8 instead. Partially because there’s a Vagrant box for CentOS 8, but also because I can also use CentOS 8 on AWS, where I can prove that the Ansible Script I’m putting into my Vagrantfile will also deploy nicely via Cloud-Init too. With a very small number of changes, this is likely to work on anything that runs Docker, so everything from Arch to Ubuntu… probably 😁

“OK then. How can you work this magic, eh?” I hear from the back of the room. OK, pipe down, nay-sayers.

First, install Ansible on your host. You just need to run dnf install -y ansible.

Next, you need to install Docker. This is a marked difference between AWX and Ansible Tower, as AWX is based on Docker, but Ansible Tower uses other magic to make it work. When you’re selling the benefits of Tower, note that it’s not a 1-for-1 match at this point, but it’s not a big issue. Fortunately, CentOS can install Docker Community edition quite easily. At this point, I’m swapping to using Ansible playbooks. At the end, I’ll drop a link to where you can get all this in one big blob… In fact, we’re likely to use it with our Cloud-Init deployment.

Aw yehr, here’s the good stuff

tasks:
- name: Update all packages
  dnf:
    name: "*"
    state: latest

- name: Add dependency for "yum config-manager"
  dnf:
    name: yum-utils
    state: present

- name: Add the Docker Repo
  shell: yum config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo
  args:
    creates: /etc/yum.repos.d/docker-ce.repo
    warn: false

- name: Install Docker
  dnf:
    name:
    - docker-ce
    - docker-ce-cli
    - containerd.io
    state: present
  notify: Start Docker

That first stanza – update all packages? Well, that’s because containerd.io relies on a newer version of libseccomp, which hasn’t been built in the CentOS 8 Vagrantbox I’m using.

The next one? That ensures I can run yum config-manager to add a repo. I could use the copy module in Ansible to create the repo files so yum and/or dnf could use that instead, but… meh, this is a single line shell command.

And then we install the repo, and the docker-ce packages we require. We use the “notify” statement to trigger a handler call to start Docker, like this:

handlers:
- name: Start Docker
  systemd:
    name: docker
    state: started

Fab. We’ve got Docker. Now, let’s clone the AWX repo to our machine. Again, we’re doing this with Ansible, naturally :)

tasks:
- name: Clone AWX repo to local path
  git:
    repo: https://github.com/ansible/awx.git
    dest: /opt/awx

- name: Get latest AWX tag
  shell: |
    if [ $(git status -s | wc -l) -gt 0 ]
    then
      git stash >/dev/null 2>&1
    fi
    git fetch --tags && git describe --tags $(git rev-list --tags --max-count=1)
    if [ $(git stash list | wc -l) -gt 0 ]
    then
      git stash pop >/dev/null 2>&1
    fi
  args:
    chdir: /opt/awx
  register: latest_tag
  changed_when: false

- name: Use latest released version of AWX
  git:
    repo: https://github.com/ansible/awx.git
    dest: /opt/awx
    version: "{{ latest_tag.stdout }}"

OK, there’s a fair bit to get from this, but essentially, we clone the repo from Github, then ask (using a collection of git commands) for the latest released version (yes, I’ve been bitten by just using the head of “devel” before), and then we check out that released version.

Fab, now we can configure it.

tasks:
- name: Set or Read admin password
  set_fact:
    admin_password_was_generated: "{{ (admin_password is defined or lookup('env', 'admin_password') != '') | ternary(false, true) }}"
    admin_password: "{{ admin_password | default (lookup('env', 'admin_password') | default(lookup('password', 'pw.admin_password chars=ascii_letters,digits length=20'), true) ) }}"

- name: Configure AWX installer
  lineinfile:
    path: /opt/awx/installer/inventory
    regexp: "^#?{{ item.key }}="
    line: "{{ item.key }}={{ item.value }}"
  loop:
  - key: "awx_web_hostname"
    value: "{{ ansible_fqdn }}"
  - key: "pg_password"
    value: "{{ lookup('password', 'pw.pg_password chars=ascii_letters,digits length=20') }}"
  - key: "rabbitmq_password"
    value: "{{ lookup('password', 'pw.rabbitmq_password chars=ascii_letters,digits length=20') }}"
  - key: "rabbitmq_erlang_cookie"
    value: "{{ lookup('password', 'pw.rabbitmq_erlang_cookie chars=ascii_letters,digits length=20') }}"
  - key: "admin_password"
    value: "{{ admin_password }}"
  - key: "secret_key"
    value: "{{ lookup('password', 'pw.secret_key chars=ascii_letters,digits length=64') }}"
  - key: "create_preload_data"
    value: "False"
  loop_control:
    label: "{{ item.key }}"

If we don’t already have a password defined, then create one. We register the fact we’ve had to create one, as we’ll need to tell ourselves it once the build is finished.

After that, we set a collection of values into the installer – the hostname, passwords, secret keys and so on. It loops over a key/value pair, and passes these to a regular expression rewrite command, so at the end, we have the settings we want, without having to change this script between releases.

When this is all done, we execute the installer. I’ve seen this done two ways. In an ideal world, you’d throw this into an Ansible shell module, and get it to execute the install, but the problem with that is that the AWX install takes quite a while, so I’d much rather actually be able to see what’s going on… and so, instead, we exit our prepare script at this point, and drop back to the shell to run the installer. Let’s look at both options, and you can decide which one you want to do. In my script, I’m doing the first, but just because it’s a bit neater to have everything in one place.

- name: Run the AWX install.
  shell: ansible-playbook -i inventory install.yml
  args:
    chdir: /opt/awx/installer
cd /opt/awx/installer
ansible-playbook -i inventory install.yml

When this is done, you get a prepared environment, ready to access using the username admin and the password of … well, whatever you set admin_password to.

AWX takes a little while to stand up, so you might want to run this next Ansible stanza to see when it’s ready to go.

- name: Test access to AWX
  tower_user:
    tower_host: "http://{{ ansible_fqdn }}"
    tower_username: admin
    tower_password: "{{ admin_password }}"
    email: "admin@{{ ansible_fqdn }}"
    first_name: "admin"
    last_name: ""
    password: "{{ admin_password }}"
    username: admin
    superuser: yes
    auditor: no
  register: _result
  until: _result.failed == false
  retries: 240 # retry 240 times
  delay: 5 # pause for 5 sec between each try

The upshot to using that command there is that it sets the email address of the admin account to “admin@your.awx.example.org“, if the fully qualified domain name (FQDN) of your machine is your.awx.example.org.

Moving from the Theoretical to the Practical

Now we’ve got our playbook, let’s wrap this up in both a Vagrant Vagrantfile and a Terraform script, this means you can deploy it locally, to test something internally, and in “the cloud”.

To simplify things, and because the version of Ansible deployed on the Vagrant box isn’t the one I want to use, I am using a single “user-data.sh” script for both Vagrant and Terraform. Here that is:

#!/bin/bash
if [ -e "$(which yum)" ]
then
  yum install git python3-pip -y
  pip3 install ansible docker docker-compose
else
  echo "This script only supports CentOS right now."
  exit 1
fi

git clone https://gist.github.com/JonTheNiceGuy/024d72f970d6a1c6160a6e9c3e642e07 /tmp/Install_AWX
cd /tmp/Install_AWX
/usr/local/bin/ansible-playbook Install_AWX.yml

While they both have their differences, they both can execute a script once the machine has finished booting. Let’s start with Vagrant.

Vagrant.configure("2") do |config|
  config.vm.box = "centos/8"

  config.vm.provider :virtualbox do |v|
    v.memory = 4096
  end

  config.vm.provision "shell", path: "user-data.sh"

  config.vm.network "forwarded_port", guest: 80, host: 8080, auto_correct: true
end

To boot this up, once you’ve got Vagrant and Virtualbox installed, run vagrant up and it’ll tell you that it’s set up a port forward from the HTTP port (TCP/80) to a “high” port – TCP/8080. If there’s a collision (because you’re running something else on TCP/8080), it’ll tell you what port it’s forwarded the HTTP port to instead. Once you’ve finished, run vagrant destroy to shut it down. There are lots more tricks you can play with Vagrant, but this is a relatively quick and easy one. Be aware that you’re not using HTTPS, so traffic to the AWX instance can be inspected, but if you’re running this on your local machine, it’s probably not a big issue.

How about running this on a cloud provider, like AWS? We can use the exact same scripts – both the Ansible script, and the user-data.sh script, using Terraform, however, this is a little more complex, as we need to create a VPC, Internet Gateway, Subnet, Security Group and Elastic IP before we can create the virtual machine. What’s more, the Free Tier (that “first hit is free” thing that Amazon Web Services provide to you) does not have enough horsepower to run AWX, so, if you want to look at how to run up AWX in EC2 (or to tweak it to run on Azure, GCP, Digital Ocean or one of the fine offerings from IBM or RedHat), then click through to the gist I’ve put all my code from this post into. The critical lines in there are to select a “CentOS 8” image, open HTTP and SSH into the machine, and to specify the user-data.sh file to provision the machine. Everything else is cruft to make the virtual machine talk to, and be seen by, hosts on the Internet.

To run this one, you need to run terraform init to load the AWS plugin, then terraform apply. Note that this relies on having an AWS access token defined, so if you don’t have them set up, you’ll need to get that sorted out first. Once you’ve finished with your demo, you should run terraform destroy to remove all the assets created by this terraform script. Again, when you’re running that demo, note that you ONLY have HTTP access set up, not HTTPS, so don’t use important credentials on there!

Once you’ve got your AWX environment running, you’ve got just enough AWX there to demo what Ansible Tower looks like, what it can bring to your organisation… and maybe even convince them that it’s worth investing in a license, rather than running AWX in production. Just in case you have that 2AM call-out that we all dread.

Featured image is “pharmacy” by “Tim Evanson” on Flickr and is released under a CC-BY-SA license.

"Status" by "Doug Letterman" on Flickr

Adding your Git Status to your Bash prompt

I was watching Lorna Mitchell‘s Open Source Hour twitch stream this morning, and noticed that she had a line in her prompt showing what her git status was.

A snip from Lorna’s screen during the Open Source Hour stream.

Git, for those of you who aren’t aware, is the version control software which has dominated software development and documentation for over 10 years now. It’s used for almost everything now, supplanting it’s competitors like Subversion, Visual Source Safe, Mercurial and Bazaar. While many people are only aware of Git using GitHub, before there was GitHub, there was the Git command line. I’m using the git command in a Bash shell all the time because I find it easier to use that for the sorts of things I do, than it is to use the GUI tools.

However, the thing that often stumbles me is what state I’m in with the project, and this line showed me just how potentially powerful this command can be.

During the video, I started researching how I could get this prompt set up on my machine, and finally realised that actually, git prompt was installed as part of the git package on my Ubuntu 20.04 install. To use it, I just had to add this string $(__git_ps1) into my prompt. This showed me which branch I was on, but I wanted more detail than that!

So, then I started looking into how to configure this prompt. I found this article from 2014, called “Git Prompt Variables” which showed me how to configure which features I wanted to enable:

GIT_PS1_DESCRIBE_STYLE='contains'
GIT_PS1_SHOWCOLORHINTS='y'
GIT_PS1_SHOWDIRTYSTATE='y'
GIT_PS1_SHOWSTASHSTATE='y'
GIT_PS1_SHOWUNTRACKEDFILES='y'
GIT_PS1_SHOWUPSTREAM='auto'

To turn this on, I edited ~/.bashrc (again, this is Ubuntu 20.04, I’ve not tested this on CentOS, Fedora, Slackware or any other distro). Here’s the lines I’m looking for:

The lines in the middle, between the two red lines are the lines in question – the lines above and below are for context in the standard .bashrc file shipped with Ubuntu 20.04

I edited each line starting PS1=, to add this: $(__git_ps1), so this now looks like this:

The content of those two highlighted lines in .bashrc

I’m aware that line is pretty hard to read in many cases, so here’s just the text for each PS1 line:

PS1='${debian_chroot:+($debian_chroot)}\[\033[01;32m\]\u@\h\[\033[00m\]:\[\033[01;34m\]\w$(__git_ps1)\[\033[00m\]\$ '
PS1='${debian_chroot:+($debian_chroot)}\u@\h:\w$(__git_ps1)\$ '

The first of those is the version that is triggered when if [ "$color_prompt" = yes ] is true, the second is when it isn’t.

What does this look like?

Let’s run through a “standard” work-flow of “conditions”. Yes, this is a really trivial example, and quite (I would imagine) different from how most people approach things like this… but it gives you a series of conditions so you can see what it looks like.

Note, as I’m still running a slightly older version of git, and I’ve not adjusted my defaults, the “initial” branch created is still called “master”, not “main”. For the purposes of this demonstration, it’s fine, but I should really have fixed this from the outset. My apologies.

First, we create and git init a directory, called git_test in /tmp.

Following a git init, the prompt ends (master #). Following the git init of the master branch, we are in a state where there is “No HEAD to compare against”, so the git prompt fragment ends #.

Next, we create a file in here. It’s unstaged.

Following the creation of an empty file, using the touch command, the prompt ends (master #%). We’re on the master branch, with no HEAD to compare against (#), and we have an untracked file (%).

And then we add that to the staging area.

Following a git add, the prompt ends (master +). We’re on the master branch, with a staged change (+).

Next we commit the file to the repository.

Following a git commit, the prompt ends (master). We’re on the master branch with a clean staging and unstaged area.

We add some content to the README file.

Following the change of a tracked file, by echoing content into the file, the prompt ends (master *). We’re on the master branch with an unstaged change (*).

We realise that we can’t use this change right now, let’s stash it for later.

Following the git stash of a tracked file, the prompt ends (master $). We’re on the master branch with stashed files ($).

We check out a new branch, so we can use that stash in there.

Following the creation of a new branch with git checkout -b, the prompt ends (My-New-Feature $). We’re on the My-New-Feature branch with stashed files ($).

And then pop the stashed file out.

Following the restoration of a stashed file, using git stash pop, the prompt ends (My-New-Feature *). We’re on the My-New-Feature branch with stashed files (*).

We then add the file and commit it.

Following a git add and git commit of the previously stashed file, the prompt ends (My-New-Feature). We’re on the My-New-Feature branch with a clean staged and unstaged area.

How about working with remote sources? Let’s change to back to the /tmp directory and fork git_test to git_local_fork.

Following the clone of the repository in a new path using git clone, and then changing into that directory with the cd command, the prompt ends (My-New-Feature=). We’re on the My-New-Feature branch, which is in an identical state to it’s default remote tracked branch (=).

We’ve checked it out at the feature branch instead of master, let’s check out master instead.

Subsequent to checking out the master branch in the repository using git checkout master, the prompt ends (master=). We’re on the master branch, which is in an identical state to it’s default remote tracked branch (=).

In the meantime, upstream, someone merged “My-New-Feature” into “master” on our original git_test repo.

Following the merge of a feature branch, using git merge, the prompt ends (master). We’re on the master branch with a clean staged and unstaged area.

On our local branch again, let’s fetch the state of our “upstream” into git_local_fork.

After we fetch the state of our default upstream repository, the prompt ends (master<). We’re on the master branch with a clean staged and unstaged area, but we’re behind the default remote tracked branch (<).

And then pull, to make sure we’re in-line with upstream’s master branch.

Once we perform a git pull to bring this branch up-to-date with the upstream repository, the prompt ends (master=). We’re on the master branch, which is back in an identical state to it’s default remote tracked branch (=).

We should probably make some local changes to this repository.

The prompt changes from (master=) to (master *=) to (master +=) and then (master>) as we create an unstaged change (*), stage it (+) and then bring the branch ahead of the default remote tracked branch (>).

Meanwhile, someone made some changes to the upstream repository.

The prompt changes from (master) to (master *) to (master +) and then (master) as we create an unstaged change (*) with echo, stage it (+) with git add and then end up with a clear staged and unstaged area following a git commit.

So, before we try and push, let’s quickly fetch their tree to see what’s going on.

After a git fetch to pull the latest state from the remote repository, the prompt ends (master <>). We’re on the master branch, but our branch has diverged from the default remote and won’t merge cleanly (<>).

Oh no, we’ve got a divergence. We need to fix this! Let’s pull the upstream master branch.

We do git pull and end up with the prompt ending (master *+|MERGING<>). We have unstaged (*) and staged (+) changes, and we’re in a “merging” state (MERGING) to try to resolve our diverged branches (<>).

Let’s fix the failed merge.

We resolve the merge conflict with nano, and confirm it has worked with cat, and then stage the merge resolution change using git add. The prompt ends (master +|MERGING<>). We have staged (+) changes, and we’re in a “merging” state (MERGING) to try to resolve our diverged branches (<>).

I think we’re ready to go with the merge.

We perform a git commit and the prompt ends as (master>). We have resolved our diverged master branches, have exited the “merging” state and are simply ahead of the default remote branch (>).

If the remote were a system like github, at this point we’d just do a git push. But… it’s not, so we’d need to do a git pull /tmp/git_local_fork in /tmp/git_test and then a git fetch in /tmp/git_local_fork… but that’s an implementation detail 😉

Featured image is “Status” by “Doug Letterman” on Flickr and is released under a CC-BY license.

"inventory" by "Lee" on Flickr

Using a AWS Dynamic Inventory with Ansible 2.10

In Ansible 2.10, Ansible started bundling modules and plugins as “Collections”, basically meaning that Ansible didn’t need to make a release every time a vendor wanted to update the libraries it required, or API changes required new fields to be supplied to modules. As part of this split between “Collections” and “Core”, the AWS modules and plugins got moved into a collection.

Now, if you’re using Ansible 2.9 or earlier, this probably doesn’t impact you, but there are some nice features in Ansible 2.10 that I wanted to use, so… buckle up :)

Getting started with Ansible 2.10, using a virtual environment

If you currently are using Ansible 2.9, it’s probably worth creating a “python virtual environment”, or “virtualenv” to try out Ansible 2.10. I did this on my Ubuntu 20.04 machine by typing:

sudo apt install -y virtualenv
mkdir -p ~/bin
cd ~/bin
virtualenv -p python3 ansible_2.10

The above ensures that you have virtualenv installed, creates a directory called “bin” in your home directory, if it doesn’t already exist, and then places the virtual environment, using Python3, into a directory there called “ansible_2.10“.

Whenever we want to use this new environment you must activate it, using this command:

source ~/bin/ansible_2.10/bin/activate

Once you’ve executed this, any binary packages created in that virtual environment will be executed from there, in preference to the file system packages.

You can tell that you’ve “activated” this virtual environment, because your prompt changes from user@HOST:~$ to (ansible_2.10) user@HOST:~$ which helps 😀

Next, let’s create a requirements.txt file. This will let us install the environment in a repeatable manner (which is useful with Ansible). Here’s the content of this file.

ansible>=2.10
boto3
botocore

So, this isn’t just Ansible, it’s also the supporting libraries we’ll need to talk to AWS from Ansible.

We execute the following command:

pip install -r requirements.txt

Note, on Windows Subsystem for Linux version 1 (which I’m using) this will take a reasonable while, particularly if it’s crossing from the WSL environment into the Windows environment, depending on where you have specified the virtual environment to be placed.

If you get an error message about something to do with being unable to install ffi, then you’ll need to install the package libffi-dev with sudo apt install -y libffi-dev and then re-run the pip install command above.

Once the installation has completed, you can run ansible --version to see something like the following:

ansible 2.10.2
  config file = None
  configured module search path = ['/home/user/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = /home/user/ansible_2.10/lib/python3.8/site-packages/ansible
  executable location = /home/user/ansible_2.10/bin/ansible
  python version = 3.8.2 (default, Jul 16 2020, 14:00:26) [GCC 9.3.0]

Configuring Ansible for local collections

Ansible relies on certain paths in the filesystem to store things like collections, roles and modules, but I like to circumvent these things – particularly if I’m developing something, or moving from one release to the next. Fortunately, Ansible makes this very easy, using a single file, ansible.cfg to tell the code that’s running in this path where to find things.

A quick note on File permissions with ansible.cfg

Note that the POSIX file permissions for the directory you’re in really matter! It must be set to 775 (-rwxrwxr-x) as a maximum – if it’s “world writable” (the last number) it won’t use this file! Other options include 770, 755. If you accidentally set this as world writable, or are using a directory from the “Windows” side of WSL, then you’ll get an error message like this:

[WARNING]: Ansible is being run in a world writable directory (/home/user/ansible_2.10_aws), ignoring it as an ansible.cfg source. For more information see
https://docs.ansible.com/ansible/devel/reference_appendices/config.html#cfg-in-world-writable-dir

That link is this one: https://docs.ansible.com/ansible/devel/reference_appendices/config.html#cfg-in-world-writable-dir and has some useful advice.

Back to configuring Ansible

In ansible.cfg, I have the following configured:

[defaults]
collections_paths = ./collections:~/.ansible/collections:/usr/share/ansible/collections

This file didn’t previously exist in this directory, so I created that file.

This block asks Ansible to check the following paths in order:

  • collections in this path (e.g. /home/user/ansible_2.10_aws/collections)
  • collections in the .ansible directory under the user’s home directory (e.g. /home/user/.ansible/collections)
  • and finally /usr/share/ansible/collections for system-wide collections.

If you don’t configure Ansible with the ansible.cfg file, the default is to store the collections in ~/.ansible/collections, but you can “only have one version of the collection”, so this means that if you’re relying on things not to change when testing, or if you’re running multiple versions of Ansible on your system, then it’s safest to store the collections in the same file tree as you’re working in!

Installing Collections

Now we have Ansible 2.10 installed, and our Ansible configuration file set up, let’s get our collection ready to install. We do this with a requirements.yml file, like this:

---
collections:
- name: amazon.aws
  version: ">=1.2.1"

What does this tell us? Firstly, that we want to install the Amazon AWS collection from Ansible Galaxy. Secondly that we want at least the most current version (which is currently version 1.2.1). If you leave the version line out, it’ll get “the latest” version. If you replace ">=1.2.1" with 1.2.1 it’ll install exactly that version from Galaxy.

If you want any other collections, you add them as subsequent lines (more details here), like this:

collections:
- name: amazon.aws
  version: ">=1.2.1"
- name: some.other
- name: git+https://example.com/someorg/somerepo.git
  version: 1.0.0
- name: git@example.com:someorg/someotherrepo.git

Once we’ve got this file, we run this command to install the content of the requirements.yml: ansible-galaxy collection install -r requirements.yml

In our case, this installs just the amazon.aws collection, which is what we want. Fab!

Getting our dynamic inventory

Right, so we’ve got all the pieces now that we need! Let’s tell Ansible that we want it to ask AWS for an inventory. There are three sections to this.

Configuring Ansible, again!

We need to open up our ansible.cfg file. Because we’re using the collection to get our Dynamic Inventory plugin, we need to tell Ansible to use that plugin. Edit ./ansible.cfg in your favourite editor, and add this block to the end:

[inventory]
enable_plugins = aws_ec2

If you previously created the ansible.cfg file when you were setting up to get the collection installed alongside, then your ansible.cfg file will look (something) like this:

[defaults]
collections_paths     = ./collections:~/.ansible/collections:/usr/share/ansible/collections

[inventory]
enable_plugins = amazon.aws.aws_ec2

Configure AWS

Your machine needs to have access tokens to interact with the AWS API. These are stored in ~/.aws/credentials (e.g. /home/user/.aws/credentials) and look a bit like this:

[default]
aws_access_key_id = A1B2C3D4E5F6G7H8I9J0
aws_secret_access_key = A1B2C3D4E5F6G7H8I9J0a1b2c3d4e5f6g7h8i9j0

Set up your inventory

In a bit of a change to how Ansible usually does the inventory, to have a plugin based dynamic inventory, you can’t specify a file any more, you have to specify a directory. So, create the file ./inventory/aws_ec2.yaml (having created the directory inventory first). The file contains the following:

---
plugin: amazon.aws.aws_ec2

Late edit 2020-12-01: Further to the comment by Giovanni, I’ve amended this file snippet from plugin: aws_ec2 to plugin: amazon.aws.aws_ec2.

By default, this just retrieves the hostnames of any running EC2 instance, as you can see by running ansible-inventory -i inventory --graph

@all:
  |--@aws_ec2:
  |  |--ec2-176-34-76-187.eu-west-1.compute.amazonaws.com
  |  |--ec2-54-170-131-24.eu-west-1.compute.amazonaws.com
  |  |--ec2-54-216-87-131.eu-west-1.compute.amazonaws.com
  |--@ungrouped:

I need a bit more detail than this – I like to use the tags I assign to AWS assets to decide what I’m going to target the machines with. I also know exactly which regions I’ve got my assets in, and what I want to use to get the names of the devices, so this is what I’ve put in my aws_ec2.yaml file:

---
plugin: amazon.aws.aws_ec2
keyed_groups:
- key: tags
  prefix: tag
- key: 'security_groups|json_query("[].group_name")'
  prefix: security_group
- key: placement.region
  prefix: aws_region
- key: tags.Role
  prefix: role
regions:
- eu-west-1
hostnames:
- tag:Name
- dns-name
- public-ip-address
- private-ip-address

Late edit 2020-12-01: Again, I’ve amended this file snippet from plugin: aws_ec2 to plugin: amazon.aws.aws_ec2.

Now, when I run ansible-inventory -i inventory --graph, I get this output:

@all:
  |--@aws_ec2:
  |  |--euwest1-firewall
  |  |--euwest1-demo
  |  |--euwest1-manager
  |--@aws_region_eu_west_1:
  |  |--euwest1-firewall
  |  |--euwest1-demo
  |  |--euwest1-manager
  |--@role_Firewall:
  |  |--euwest1-firewall
  |--@role_Firewall_Manager:
  |  |--euwest1-manager
  |--@role_VM:
  |  |--euwest1-demo
  |--@security_group_euwest1_allow_all:
  |  |--euwest1-firewall
  |  |--euwest1-demo
  |  |--euwest1-manager
  |--@tag_Name_euwest1_firewall:
  |  |--euwest1-firewall
  |--@tag_Name_euwest1_demo:
  |  |--euwest1-demo
  |--@tag_Name_euwest1_manager:
  |  |--euwest1-manager
  |--@tag_Role_Firewall:
  |  |--euwest1-firewall
  |--@tag_Role_Firewall_Manager:
  |  |--euwest1-manager
  |--@tag_Role_VM:
  |  |--euwest1-demo
  |--@ungrouped:

To finish

Now you have your dynamic inventory, you can target your playbook at any of the groups listed above (like role_Firewall, aws_ec2, aws_region_eu_west_1 or some other tag) like you would any other inventory assignment, like this:

---
- hosts: role_Firewall
  gather_facts: false
  tasks:
  - name: Show the name of this device
    debug:
      msg: "{{ inventory_hostname }}"

And there you have it. Hope this is useful!

Late edit: 2020-11-23: Following a conversation with Andy from Work, we’ve noticed that if you’re trying to do SSM connections, rather than username/password based ones, you might want to put this in your aws_ec2.yml file:

---
plugin: amazon.aws.aws_ec2
hostnames:
  - tag:Name
compose:
  ansible_host: instance_id
  ansible_connection: 'community.aws.aws_ssm'

Late edit 2020-12-01: One final instance, I’ve changed plugin: aws_ec2 to plugin: amazon.aws.aws_ec2.

This will keep your hostnames “pretty” (with whatever you’ve tagged it as), but will let you connect over SSM to the Instance ID. Good fun :)

Featured image is “inventory” by “Lee” on Flickr and is released under a CC-BY-SA license.

"Submarine" by "NH53" on Flickr

Recursive Git Submodules

One of my colleagues asked today about using recursive git submodules. First, let’s quickly drill into what a Submodule is.

Git Submodules

A submodule is a separate git repository, attached to the git repository you’re working on via two “touch points” – a file in the root directory called .gitmodules, and, when checked out, the HEAD file in the .git directory.

When you clone a repository with a submodule attached, it creates the directory the submodule will be cloned into, but leave it empty, unless you either do git submodule update --init --recursive or, when you clone the repository initially, you can ask it to pull any recursive submodules, like this git clone https://your.vcs.example.org/someorg/somerepo.git --recursive.

Git stores the commit reference of the submodule (via a file in .git/modules/$SUBMODULE_NAME/HEAD which contains the commit reference). If you change a file in that submodule, it marks the path of the submodule as “dirty” (because you have an uncommitted change), and if you either commit that change, or pull an updated commit from the source repository, then it will mark the path of the submodule as having changed.

In other words, you can track two separate but linked parts of your code in the same tree, working on each in turn, and without impacting each other code base.

I’ve used this, mostly with Ansible playbooks, where I’ve consumed someone else’s role, like this:

My_Project
|
+- Roles
|  |
|  +- <SUBMODULE> someorg.some_role
|  +- <SUBMODULE> anotherorg.another_role
+- inventory
+- playbook.yml
+- .git
|  |
|  +- HEAD
|  +- modules
|  +- etc
+- .gitmodules

In .gitmodules the file looks like this:

[submodule "module1"]
 path = module1
 url = https://your.vcs.example.org/someorg/module1.git

Once you’ve checked out this submodule, you can do any normal operations in this submodule, like pulls, pushes, commits, tags, etc.

So, what happens when you want to nest this stuff?

Nesting Submodule Recursion

So, my colleague wanted to have files in three layers of directories. In this instance, I’ve simulated this by creating three directories, root, module1 and module2. Typically these would be pulled from their respective Git Service paths, like GitHub or GitLab, but here I’m just using everything on my local file system. Where, in the following screen shot, you see /tmp/ you could easily replace that with https://your.vcs.example.org/someorg/.

The output of running mkdir {root,module1,module2} ; cd root ; git init ; cd ../module1 ; git init ; cd ../module2 ; git init ; touch README.md ; git add README.md ; git commit -m 'Added README.md' ; cd ../module1 ; git submodule add /tmp/module2 module2 ; git commit -m 'Added module2' ; cd ../root ; git submodule add /tmp/module1 module1 ; git submodule update --init --recursive ; tree showing the resulting tree of submodules under the root directory.
The output of running mkdir {root,module1,module2} ; cd root ; git init ; cd ../module1 ; git init ; cd ../module2 ; git init ; touch README.md ; git add README.md ; git commit -m ‘Added README.md’ ; cd ../module1 ; git submodule add /tmp/module2 module2 ; git commit -m ‘Added module2’ ; cd ../root ; git submodule add /tmp/module1 module1 ; git submodule update –init –recursive ; tree showing the resulting tree of submodules under the root directory.

So, here, we’ve created these three paths (basically to initiate the repositories), added a basic commit to the furthest submodule (module2), then done a submodule add into the next furthest submodule (module1) and finally added that into the root tree.

Note, however, when you perform the submodule add it doesn’t automatically clone any submodules, and if you were to, from another machine, perform git clone https://your.vcs.example.org/someorg/root.git you wouldn’t get any of the submodules (neither module1 nor module2) without adding either --recursive to the clone command (like this: git clone --recursive https://your.vcs.example.org/someorg/root.git), or by running the follow-up command git submodule update --init --recursive.

Oh, and if any of these submodules are updated? You need to go in and pull those updates, and then commit that change, like this!

The workflow of pulling updates for each of the submodules, with git add, git commit, and git pull, also noting that when a module has been changed, it shows as having “new commits”.
And here we have the finish of the workflow, updating the other submodules. Note that some of these steps (probably the ones in the earlier image) are likely to have been performed by some other developer on another system, so having all the updates on one machine is pretty rare!

The only thing which isn’t in these submodules is if you’ve done a git clone of the root repo (using the terms from the above screen images), the submodules won’t be using the “master” branch (or a particular “tag” or “branch hame”, for that matter), but will instead be using the commit reference. If you wanted to switch to a specific branch or tag, then you’d need to issue the command git checkout some_remote/some_branch or git checkout master instead of (in the above screen captures) git pull.

If you have any questions or issues with this post, please either add a comment, or contact me via one of the methods at the top or side of this page!

Featured image is “Submarine” by “NH53” on Flickr and is released under a CC-BY license.

"Accept a New SSH Host Key" by "Linux Screenshots" on Flickr

Purposefully Reducing SSH Security when performing Builds of short-lived devices

I’ve recently been developing a few builds of things at home using throw-away sessions of virtual machines, and I found myself repeatedly having to accept and even having to remove SSH host keys for things I knew wouldn’t be around for long. It’s not a huge disaster, but it’s an annoyance.

This annoyance comes from the fact that SSH uses a thing called “Trust-On-First-Use” (Or TOFU) to protect yourself against a “Man-in-the-Middle” attack (or even where the host has been replaced with something malicious), which, for infrastructure that has a long lifetime (anything more than a couple of days) makes sense! You’re building something you want to trust hasn’t been compromised! That said, if you’re building new virtual machines, testing something and then rebuilding it to prove your script worked… well, that’s not so useful!

So, in this case, if you’ve got a designated build network, or if you trust, implicitly, your normal working network, this is a dead simple work-around.

In $HOME/.ssh/config or in $HOME/.ssh/config.d/local (if you’ve followed my previous advice to use separate ssh config files), add the following stanza:

# RFC1918
Host 10.* 172.16.* 172.17.* 172.18.* 172.19.* 172.20.* 172.21.* 172.22.* 172.23.* 172.24.* 172.25.* 172.26.* 172.27.* 172.28.* 172.29.* 172.30.* 172.31.* 192.168.*
        StrictHostKeyChecking no
        UserKnownHostsFile /dev/null

# RFC5373 and RFC2544
Host 192.0.2.* 198.51.100.* 203.0.113.* 198.18.* 198.19.*
        StrictHostKeyChecking no
        UserKnownHostsFile /dev/null

These stanzas let you disable host key checking for any IP address in the RFC1918 ranges (10.0.0.0/8, 172.16.0.0/12 and 192.168.0.0/16), and for the RFC5373 ranges (192.0.2.0/24, 198.51.100.0/24 and 203.0.113.0/24) – which should be used for documentation, and for the RFC2544 range (198.18.0.0/15) which should be used for inter-network testing.

Alternatively, if you always use a DDNS provider for short-lived assignments (for example, I use davd/docker-ddns) then instead, you can use this stanza:

Host *.ddns.example.com
        StrictHostKeyChecking no
        UserKnownHostsFile /dev/null

(Assuming, of course, you use ddns.example.com as your DDNS address!)

Featured image is “Accept a New SSH Host Key” by “Linux Screenshots” on Flickr and is released under a CC-BY license.

"Monitors" by "Jaysin Trevino" on Flickr

Using monit to monitor Docker Containers

As I only run a few machines with services that matter on them (notably, my home server and my web server), I don’t need a full-on monitoring service, so instead rely on a system called monit.

Monit is an open source piece of software, used to monitor (see, it’s easily named 😄) and, if possible remediate issues with things it sees wrong.

I use this for watching whether particular services are running (and if not, restart them), for whether the ink in my printer is empty, and to monitor the free space and SMART status on my disks.

Today I noticed that a Docker container had stopped, and I’d not noticed. It wasn’t a big thing, but it gnawed at me, so I had a bit of a look around to see what I can find about this.

I found this blog post, titled “Monitoring Docker Containers with Monit”, from 2014, which suggested monitoring the result from docker top… and would you believe it, that’s a valid trick 🙂

So, here’s what I’m doing! Each container has it’s own file called/etc/monit/scripts/check_container_<container-name>.sh which has just this command in it:

#! /bin/bash
docker top "<container-name>"
exit $?

Note that you replace <container-name> in both the filename and the script itself with the name of the container – for example, the container hello-world would be monitored with the file check_container_hello-world.sh, and the line in that file would say docker top "hello-world".

I then have a file in /etc/monit/conf.d/ called check_container_<container-name> which has this content

CHECK PROGRAM <container-name> WITH PATH /etc/monit/scripts/check_container_<container-name>.sh
  START PROGRAM = "/usr/bin/docker start <container-name>"
  STOP PROGRAM = "/usr/bin/docker stop <container-name>"
  IF status != 0 FOR 3 CYCLES THEN RESTART
  IF 2 RESTARTS WITHIN 5 CYCLES THEN UNMONITOR

I then ensure that in /etc/monit/monitrc the line “include /etc/monit/conf.d/*” is included and not commented out, and then restart monit with systemctl restart monit.

Featured image is “Monitors” by “Jaysin Trevino” on Flickr and is released under a CC-BY license.

"Field Notes - Sweet Tooth" by "The Marmot" on Flickr

Multi-OS builds in AWS with Terraform – some notes from the field!

Late edit: 2020-05-22 – Updated with better search criteria from colleague conversations

I’m building a proof of concept for … well, a product that needs testing on several different Linux and Windows variants on AWS and Azure. I’m building this environment with Terraform, and it’s thrown me a few curve balls, so I thought I’d document the issues I’ve had!

The versions of distributions I have tested are the latest releases of each of these images at-or-near the time of writing. The major version listed is the earliest I have tested, so no assumption is made about previous versions, and later versions, after the time of this post should not assume any of this data is also accurate!

(Fujitsu Staff – please contact me on my work email address for details on how to get the internal AMIs of our builds of these images 😄)

Linux Distributions

On the whole, I tend to be much more confident and knowledgable about Linux distributions. I’ve also done far more installs of each of these!

Almost all of these installs are Free of Charge, with the exception of Red Hat Enterprise Linux, which requires a subscription fee, and this can be “Pay As You Go” or “Bring Your Own License”. These sorts of things are arranged for me, so I don’t know how easy or hard it is to organise these licenses!

These builds all use cloud-init, via either a cloud-init yaml script, or some shell scripting language (usually accepted to be bash). If this script fails to execute, you will find your user-data file in /var/lib/cloud/instance/scripts/part-001. If this is a shell script then you will be able to execute it by running that script as your root user.

Amazon Linux 2 or Amzn2

Amazon Linux2 is the “preferred” distribution for Amazon Web Services (AWS) (surprisingly enough). It is based on Red Hat Enterprise Linux (RHEL), and many of the instructions you’ll want to run to install software will use RHEL based instructions. This platform is not available outside the AWS ecosystem, as far as I can tell, although you might be able to run it on-prem.

Software packages are limited in this distribution, so any “extra” features require the installation of the “EPEL” repository, by executing the command sudo amazon-linux-extras install epel and then using the yum command to install further packages. I needed nginx for part of my build, and this was only in EPEL.

Amzn2 AMI Lookup

data "aws_ami" "amzn2" {
  most_recent = true

  filter {
    name   = "name"
    values = ["amzn2-ami-hvm-2.0.*-gp2"]
  }

  filter {
    name   = "architecture"
    values = ["x86_64"]
  }

  filter {
    name   = "state"
    values = ["available"]
  }

  owners = ["amazon"] # Canonical
}

Amzn2 User Account

Amazon Linux 2 images under AWS have a default “ec2-user” user account. sudo will allow escalation to Root with no password prompt.

Amzn2 AWS Interface Configuration

The primary interface is called eth0. Network Manager is not installed. To manage the interface, you need to edit /etc/sysconfig/network-scripts/ifcfg-eth0 and apply changes with ifdown eth0 ; ifup eth0.

Amzn2 user-data / Cloud-Init Troubleshooting

I’ve found the output from user-data scripts appearing in /var/log/cloud-init-output.log.

CentOS 7

For starters, AWS doesn’t have an official CentOS8 image, so I’m a bit stymied there! In fact, as far as I can make out, CentOS is only releasing ISOs for builds now, and not any cloud images. There’s an open issue on their bug tracker which seems to suggest that it’s not going to get any priority any time soon! Blimey.

This image may require you to “subscribe” to the image (particularly if you have a “private marketplace”), but this will be requested of you (via a URL provided on screen) when you provision your first machine with this AMI.

Like with Amzn2, CentOS7 does not have nginx installed, and like Amzn2, installation of the EPEL library is not a difficult task. CentOS7 bundles a file to install the EPEL, installed by running yum install epel-release. After this is installed, you have the “full” range of software in EPEL available to you.

CentOS AMI Lookup

data "aws_ami" "centos7" {
  most_recent = true

  filter {
    name   = "name"
    values = ["CentOS Linux 7*"]
  }

  filter {
    name   = "architecture"
    values = ["x86_64"]
  }

  filter {
    name   = "state"
    values = ["available"]
  }

  owners = ["aws-marketplace"]
}

CentOS User Account

CentOS7 images under AWS have a default “centos” user account. sudo will allow escalation to Root with no password prompt.

CentOS AWS Interface Configuration

The primary interface is called eth0. Network Manager is not installed. To manage the interface, you need to edit /etc/sysconfig/network-scripts/ifcfg-eth0 and apply changes with ifdown eth0 ; ifup eth0.

CentOS Cloud-Init Troubleshooting

I’ve run several different user-data located bash scripts against this system, and the logs from these scripts are appearing in the default syslog file (/var/log/syslog) or by running journalctl -xefu cloud-init. They do not appear in /var/log/cloud-init-output.log.

Red Hat Enterprise Linux (RHEL) 7 and 8

Red Hat has both RHEL7 and RHEL8 images in the AWS market place. The Proof Of Value (POV) I was building was only looking at RHEL7, so I didn’t extensively test RHEL8.

Like Amzn2 and CentOS7, RHEL7 needs EPEL installing to have additional packages installed. Unlike Amzn2 and CentOS7, you need to obtain the EPEL package from the Fedora Project. Do this by executing these two commands:

wget https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm
yum install epel-release-latest-7.noarch.rpm

After this is installed, you’ll have access to the broader range of software that you’re likely to require. Again, I needed nginx, and this was not available to me with the stock install.

RHEL7 AMI Lookup

data "aws_ami" "rhel7" {
  most_recent = true

  filter {
    name   = "name"
    values = ["RHEL-7*GA*Hourly*"]
  }

  filter {
    name   = "architecture"
    values = ["x86_64"]
  }

  filter {
    name   = "state"
    values = ["available"]
  }

  owners = ["309956199498"] # Red Hat
}

RHEL8 AMI Lookup

data "aws_ami" "rhel8" {
  most_recent = true

  filter {
    name   = "name"
    values = ["RHEL-8*HVM-*Hourly*"]
  }

  filter {
    name   = "architecture"
    values = ["x86_64"]
  }

  filter {
    name   = "state"
    values = ["available"]
  }

  owners = ["309956199498"] # Red Hat
}

RHEL User Accounts

RHEL7 and RHEL8 images under AWS have a default “ec2-user” user account. sudo will allow escalation to Root with no password prompt.

RHEL AWS Interface Configuration

The primary interface is called eth0. Network Manager is installed, and the eth0 interface has a profile called “System eth0” associated to it.

RHEL Cloud-Init Troubleshooting

In RHEL7, as per CentOS7, logs from user-data scripts are appear in the general syslog file (in this case, /var/log/messages) or by running journalctl -xefu cloud-init. They do not appear in /var/log/cloud-init-output.log.

In RHEL8, logs from user-data scrips now appear in /var/log/cloud-init-output.log.

Ubuntu 18.04

At the time of writing this, the vendor, who’s product I was testing, categorically stated that the newest Ubuntu LTS, Ubuntu 20.04 (Focal Fossa) would not be supported until some time after our testing was complete. As such, I spent no time at all researching or planning to use this image.

Ubuntu is the only non-RPM based distribution in this test, instead being based on the Debian project’s DEB packages. As such, it’s range of packages is much wider. That said, for the project I was working on, I required a later version of nginx than was available in the Ubuntu Repositories, so I had to use the nginx Personal Package Archive (PPA). To do this, I found the official PPA for the nginx project, and followed the instructions there. Generally speaking, this would potentially risk any support from the distribution vendor, as it’s not certified or supported by the project… but I needed that version, so I had to do it!

Ubuntu 18.04 AMI Lookup

data "aws_ami" "ubuntu1804" {
  most_recent = true

  filter {
    name   = "name"
    values = ["*ubuntu*18.04*"]
  }

  filter {
    name   = "architecture"
    values = ["x86_64"]
  }

  filter {
    name   = "state"
    values = ["available"]
  }

  owners = ["099720109477"] # Canonical
}

Ubuntu 18.04 User Accounts

Ubuntu 18.04 images under AWS have a default “ubuntu” user account. sudo will allow escalation to Root with no password prompt.

Ubuntu 18.04 AWS Interface Configuration

The primary interface is called eth0. Network Manager is not installed, and instead Ubuntu uses Netplan to manage interfaces. The file to manage the interface defaults is /etc/netplan/50-cloud-init.yaml. If you struggle with this method, you may wish to install ifupdown and define your configuration in /etc/network/interfaces.

Ubuntu 18.04 Cloud-Init Troubleshooting

In Ubuntu 18.04, logs from user-data scrips appear in /var/log/cloud-init-output.log.

Windows

This section is far more likely to have it’s data consolidated here!

Windows has a common “standard” username – Administrator, and a common way of creating a password (this is generated on-boot, and the password is transferred to the AWS Metadata Service, which it is retrieved and decrypted with the SSH key you’ve used to build the “authentication” to the box) which Terraform handles quite nicely.

The network device is referred to as “AWS PV Network Device #0”. It can be managed with powershell, netsh (although apparently Microsoft are rumbling about demising this script), or from the GUI.

Windows 2012R2

This version is very old now, and should be compared to Windows 7 in terms of age. It is only supported by Microsoft with an extended maintenance package!

Windows 2012R2 AMI Lookup

data "aws_ami" "w2012r2" {
  most_recent = true

  filter {
    name = "name"
    values = ["Windows_Server-2012-R2_RTM-English-64Bit-Base*"]
  }

  filter {
    name   = "architecture"
    values = ["x86_64"]
  }

  filter {
    name   = "state"
    values = ["available"]
  }

  owners = ["801119661308"] # AWS
}

Windows 2012R2 Cloud-Init Troubleshooting

Logs from the Metadata Service can be found in C:\Program Files\Amazon\Ec2ConfigService\Logs\Ec2ConfigLog.txt. You can also find the userdata script in C:\Program Files\Amazon\Ec2ConfigService\Scripts\UserScript.ps1. This can be launched and debugged using PowerShell ISE, which is in the “Start” menu.

Windows 2016

This version is reasonably old now, and should be compared to Windows 8 in terms of age. It is supported until 2022 in “mainline” support.

Windows 2016 AMI Lookup

data "aws_ami" "w2016" {
  most_recent = true

  filter {
    name = "name"
    values = ["Windows_Server-2016-English-Full-Base*"]
  }

  filter {
    name   = "architecture"
    values = ["x86_64"]
  }

  filter {
    name   = "state"
    values = ["available"]
  }

  owners = ["801119661308"] # AWS
}

Windows 2016 Cloud-Init Troubleshooting

The metadata service has moved from Windows 2016 and onwards. Logs are stored in a partially hidden directory tree, so you may need to click in the “Address” bar of the Explorer window and type in part of this path. The path to these files is: C:\ProgramData\Amazon\EC2-Windows\Launch\Log. I say “files” as there are two parts to this file – an “Ec2Launch.log” file which reports on the boot process, and “UserdataExecution.log” which shows the output from the userdata script.

Unlike with the Windows 2012R2 version, you can’t get hold of the actual userdata script on the filesystem, you need to browse to a special path in the metadata service (actually, technically, you can do this with any of the metadata services – OpenStack, Azure, and so on) which is: http://169.254.169.254/latest/user-data/

This will contain userdata between a <powershell> and </powershell> pair of tags. This would need to be copied out of this URL and pasted into a new file on your local machine to determine why issues are occurring. Again, I would recommend using PowerShell ISE from the Start Menu to debug your code.

Windows 2019

This version is the most recent released version of Windows Server, and should be compared to Windows 10 in terms of age.

Windows 2019 AMI Lookup

data "aws_ami" "w2019" {
  most_recent = true

  filter {
    name = "name"
    values = ["Windows_Server-2019-English-Full-Base*"]
  }

  filter {
    name   = "architecture"
    values = ["x86_64"]
  }

  filter {
    name   = "state"
    values = ["available"]
  }

  owners = ["801119661308"] # AWS
}

Windows 2019 Cloud-Init Troubleshooting

Functionally, the same as Windows 2016, but to recap, the metadata service has moved from Windows 2016 and onwards. Logs are stored in a partially hidden directory tree, so you may need to click in the “Address” bar of the Explorer window and type in part of this path. The path to these files is: C:\ProgramData\Amazon\EC2-Windows\Launch\Log. I say “files” as there are two parts to this file – an “Ec2Launch.log” file which reports on the boot process, and “UserdataExecution.log” which shows the output from the userdata script.

Unlike with the Windows 2012R2 version, you can’t get hold of the actual userdata script on the filesystem, you need to browse to a special path in the metadata service (actually, technically, you can do this with any of the metadata services – OpenStack, Azure, and so on) which is: http://169.254.169.254/latest/user-data/

This will contain userdata between a <powershell> and </powershell> pair of tags. This would need to be copied out of this URL and pasted into a new file on your local machine to determine why issues are occurring. Again, I would recommend using PowerShell ISE from the Start Menu to debug your code.

Featured image is “Field Notes – Sweet Tooth” by “The Marmot” on Flickr and is released under a CC-BY license.

“New shoes” by “Morgaine” from Flickr

Making Windows Cloud-Init Scripts run after a reboot (Using Terraform)

I’m currently building a Proof Of Value (POV) environment for a product, and one of the things I needed in my environment was an Active Directory domain.

To do this in AWS, I had to do the following steps:

  1. Build my Domain Controller
    1. Install Windows
    2. Set the hostname (Reboot)
    3. Promote the machine to being a Domain Controller (Reboot)
    4. Create a domain user
  2. Build my Member Server
    1. Install Windows
    2. Set the hostname (Reboot)
    3. Set the DNS client to point to the Domain Controller
    4. Join the server to the domain (Reboot)

To make this work, I had to find a way to trigger build steps after each reboot. I was working with Windows 2012R2, Windows 2016 and Windows 2019, so the solution had to be cross-version. Fortunately I found this script online! That version was great for Windows 2012R2, but didn’t cover Windows 2016 or later… So let’s break down what I’ve done!

In your userdata field, you need to have two sets of XML strings, as follows:

<persist>true</persist>
<powershell>
$some = "powershell code"
</powershell>

The first block says to Windows 2016+ “keep trying to run this script on each boot” (note that you need to stop it from doing non-relevant stuff on each boot – we’ll get to that in a second!), and the second bit is the PowerShell commands you want it to run. The rest of this now will focus just on the PowerShell block.

  $path= 'HKLM:\Software\UserData'
  
  if(!(Get-Item $Path -ErrorAction SilentlyContinue)) {
    New-Item $Path
    New-ItemProperty -Path $Path -Name RunCount -Value 0 -PropertyType dword
  }
  
  $runCount = Get-ItemProperty -Path $path -Name Runcount -ErrorAction SilentlyContinue | Select-Object -ExpandProperty RunCount
  
  if($runCount -ge 0) {
    switch($runCount) {
      0 {
        $runCount = 1 + [int]$runCount
        Set-ItemProperty -Path $Path -Name RunCount -Value $runCount
        if ($ver -match 2012) {
          #Enable user data
          $EC2SettingsFile = "$env:ProgramFiles\Amazon\Ec2ConfigService\Settings\Config.xml"
          $xml = [xml](Get-Content $EC2SettingsFile)
          $xmlElement = $xml.get_DocumentElement()
          $xmlElementToModify = $xmlElement.Plugins
          
          foreach ($element in $xmlElementToModify.Plugin)
          {
            if ($element.name -eq "Ec2HandleUserData") {
              $element.State="Enabled"
            }
          }
          $xml.Save($EC2SettingsFile)
        }
        $some = "PowerShell Script"
      }
    }
  }

Whew, what a block! Well, again, we can split this up into a couple of bits.

In the first few lines, we build a pointer, a note which says “We got up to here on our previous boots”. We then read that into a variable and find that number and execute any steps in the block with that number. That’s this block:

  $path= 'HKLM:\Software\UserData'
  
  if(!(Get-Item $Path -ErrorAction SilentlyContinue)) {
    New-Item $Path
    New-ItemProperty -Path $Path -Name RunCount -Value 0 -PropertyType dword
  }
  
  $runCount = Get-ItemProperty -Path $path -Name Runcount -ErrorAction SilentlyContinue | Select-Object -ExpandProperty RunCount
  
  if($runCount -ge 0) {
    switch($runCount) {

    }
  }

The next part (and you’ll repeat it for each “number” of reboot steps you need to perform) says “increment the number” then “If this is Windows 2012, remind the userdata handler that the script needs to be run again next boot”. That’s this block:

      0 {
        $runCount = 1 + [int]$runCount
        Set-ItemProperty -Path $Path -Name RunCount -Value $runCount
        if ($ver -match 2012) {
          #Enable user data
          $EC2SettingsFile = "$env:ProgramFiles\Amazon\Ec2ConfigService\Settings\Config.xml"
          $xml = [xml](Get-Content $EC2SettingsFile)
          $xmlElement = $xml.get_DocumentElement()
          $xmlElementToModify = $xmlElement.Plugins
          
          foreach ($element in $xmlElementToModify.Plugin)
          {
            if ($element.name -eq "Ec2HandleUserData") {
              $element.State="Enabled"
            }
          }
          $xml.Save($EC2SettingsFile)
        }
        
      }

In fact, it’s fair to say that in my userdata script, this looks like this:

  $path= 'HKLM:\Software\UserData'
  
  if(!(Get-Item $Path -ErrorAction SilentlyContinue)) {
    New-Item $Path
    New-ItemProperty -Path $Path -Name RunCount -Value 0 -PropertyType dword
  }
  
  $runCount = Get-ItemProperty -Path $path -Name Runcount -ErrorAction SilentlyContinue | Select-Object -ExpandProperty RunCount
  
  if($runCount -ge 0) {
    switch($runCount) {
      0 {
        ${file("templates/step.tmpl")}

        ${templatefile(
          "templates/rename_windows.tmpl",
          {
            hostname = "SomeMachine"
          }
        )}
      }
      1 {
        ${file("templates/step.tmpl")}

        ${templatefile(
          "templates/join_ad.tmpl",
          {
            dns_ipv4 = "192.0.2.1",
            domain_suffix = "ad.mycorp",
            join_account = "ad\someuser",
            join_password = "SomePassw0rd!"
          }
        )}
      }
    }
  }

Then, after each reboot, you need a new block. I have a block to change the computer name, a block to join the machine to the domain, and a block to install an software that I need.

Featured image is “New shoes” by “Morgaine” on Flickr and is released under a CC-BY-SA license.

"Root" by "llee_wu" on Flickr

A quick note on using Firefox in Windows in a Corporate or Enterprise environment

I’ve been using Firefox as my “browser of choice” for around 15 years. I tend to prefer to use it for all sorts of reasons, but the main thing I expect is support for extensions. Not many of them, but … well, there’s a few!

There are two stumbling blocks for using Firefox in a corporate or enterprise setting. These are:

  1. NTLM or Kerberos Authentication for resources like Sharepoint and ADFS.
  2. Enterprise TLS certificates (usually deployed via GPO as part of the domain)

These are both trivially fixed in the about:config screen, but first you need to get past a scary looking warning page!

In the address bar, where it probably currently says jon.sprig.gs, click in there and type about:config.

Getting to about:config

This brings you to a scary page!

Proceed with caution! (of course!!)

Click the “Accept the Risk and Continue” (note, this is with Firefox 76. Wording with later or earlier versions may differ).

As if it wasn’t obvious enough from the previous screen, this “may impact performance and security”…

And then you get a search box.

In the “Search preference name” type in ntlm and find the line that says network.automatic-ntlm-auth.trusted-uris.

The “NTLM Options page”

Type in there the suffixes of any TRUSTED domains. For example, if your company uses the domain names of bigcompany.com, bigco.local and big.company then you’d type in:

bigcompany.com,bigco.local,big.company

Any pages that you browse to, where they request NTLM authentication, will receive an NTLM set of credentials if prompted (same as IE, Edge, and Chrome already do!) NTLM is effectively a way to pass a trusted Kerberos ticket (a bit like your domain credentials) into a web page.

Next up, let’s get those pesky certificate errors removed!

This assumes that you have a centrally managed TLS Root Certificate, and the admins in your network haven’t just been dumping self signed certificates everywhere (nothing gets around that… just sayin’).

Still in about:config, clear the search box and type enterprise, like this!

Enterprise Roots are here!

Find the line security.enterprise_roots.enabled and make sure it says true. If it doesn’t double click it, so it does.

Now you can close your preferences page, and you should be fine to visit your internal source code repository, time sheeting system or sharepoint site, with almost no interruptions!

If you’ve been tasked for turning this stuff on in your estate of managed desktop environment machines, then you might find this article (on Autoconfiguration of Firefox) of use (but I’ve not tried it!)

Featured image is “Root” by “llee_wu” on Flickr and is released under a CC-BY-ND license.