"pharmacy" by "Tim Evanson" on Flickr

AWX – The Gateway Drug to Ansible Tower

A love letter to Ansible Tower

I love Ansible… I mean, I really love Ansible. You can ask anyone, and they’ll tell you my first love is my wife, then my children… and then it’s Ansible.

OK, maybe it’s Open Source and then Ansible, but either way, Ansible is REALLY high up there.

But, while I love Ansible, I love what Ansible Tower brings to an environment. See, while you get to easily and quickly manage a fleet of machines with Ansible, Ansible Tower gives you the fine grained control over what you need to expose to your developers, your ops team, or even, in a fit of “what-did-you-just-do”-ness, your manager. (I should probably mention that Ansible Tower is actually part of a much larger portfolio of products, called Ansible Automation Platform, and there’s some hosted SaaS stuff that goes with it… but the bit I really want to talk about is Tower, so I’ll be talking about Tower and not Ansible Automation Platform. Sorry!)

Ansible Tower has a scheduling engine, so you can have a “Go” button, for deploying the latest software to your fleet, or just for the 11PM patching cycle. It has a credential store, so your teams can’t just quickly go and perform an undocumented quick fix on that “flaky” box – they need to do their changes via Ansible. And lastly, it has an inventory, so you can see that the last 5 jobs failed to deploy on that host, so maybe you’ve got a problem with it.

One thing that people don’t so much love to do, is to get a license to deploy Tower, particularly if they just want to quickly spin up a demonstration for some colleagues to show how much THEY love Ansible. And for those people, I present AWX.

The first hit is free

One of the glorious and beautiful things that RedHat did, when they bought Ansible, was to make the same assertion about the Ansible products that they make to the rest of their product line, which is… while they may sell a commercial product, underneath it will be an Open Source version of that product, and you can be part of developing and improving that version, to help improve the commercial product. Thus was released AWX.

Now, I hear the nay-sayers commenting, “but what if you have an issue with AWX at 2AM, how do you get support on that”… and to those people, I reply: “If you need support at 2AM for your box, AWX is not the tool for you – what you need is Tower.”… Um, I mean Ansible Automation Platform. However, Tower takes a bit more setting up than what I’d want to do for a quick demo, and it has a few more pre-requisites. ANYWAY, enough about dealing with the nay-sayers.

AWX is an application inside Docker containers. It’s split into three parts, the AWX Web container, which has the REST API. There’s also a PostgreSQL database inside there too, and one “Engine”, which is the separate container which gets playbooks from your version control system, asks for any dynamic inventories, and then runs those playbooks on your inventories.

I like running demos of Tower, using AWX, because it’s reasonably easy to get stood up, and it’s reasonably close to what Tower looks and behaves like (except for the logos)… and, well, it’s a good gateway to getting people interested in what Tower can do for them, without them having to pay (or spend time signing up for evaluation licenses) for the environment in the first place.

And what’s more, it can all be automated

Yes, folks, because AWX is just a set of docker containers (and an install script), and Ansible knows how to start Docker containers (and run an install script), I can add an Ansible playbook to my cloud-init script, Vagrantfile or, let’s face it, when things go really wrong, put it in a bash script for some poor keyboard jockey to install for you.

If you’re running a demo, and you don’t want to get a POC (proof of concept) or evaluation license for Ansible Tower, then the chances are you’re probably not running this on RedHat Enterprise Linux (RHEL) either. That’s OK, once you’ve sold the room on using Tower (by using AWX), you can sell them on using RHEL too. So, I’ll be focusing on using CentOS 8 instead. Partially because there’s a Vagrant box for CentOS 8, but also because I can also use CentOS 8 on AWS, where I can prove that the Ansible Script I’m putting into my Vagrantfile will also deploy nicely via Cloud-Init too. With a very small number of changes, this is likely to work on anything that runs Docker, so everything from Arch to Ubuntu… probably 😁

“OK then. How can you work this magic, eh?” I hear from the back of the room. OK, pipe down, nay-sayers.

First, install Ansible on your host. You just need to run dnf install -y ansible.

Next, you need to install Docker. This is a marked difference between AWX and Ansible Tower, as AWX is based on Docker, but Ansible Tower uses other magic to make it work. When you’re selling the benefits of Tower, note that it’s not a 1-for-1 match at this point, but it’s not a big issue. Fortunately, CentOS can install Docker Community edition quite easily. At this point, I’m swapping to using Ansible playbooks. At the end, I’ll drop a link to where you can get all this in one big blob… In fact, we’re likely to use it with our Cloud-Init deployment.

Aw yehr, here’s the good stuff

tasks:
- name: Update all packages
  dnf:
    name: "*"
    state: latest

- name: Add dependency for "yum config-manager"
  dnf:
    name: yum-utils
    state: present

- name: Add the Docker Repo
  shell: yum config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo
  args:
    creates: /etc/yum.repos.d/docker-ce.repo
    warn: false

- name: Install Docker
  dnf:
    name:
    - docker-ce
    - docker-ce-cli
    - containerd.io
    state: present
  notify: Start Docker

That first stanza – update all packages? Well, that’s because containerd.io relies on a newer version of libseccomp, which hasn’t been built in the CentOS 8 Vagrantbox I’m using.

The next one? That ensures I can run yum config-manager to add a repo. I could use the copy module in Ansible to create the repo files so yum and/or dnf could use that instead, but… meh, this is a single line shell command.

And then we install the repo, and the docker-ce packages we require. We use the “notify” statement to trigger a handler call to start Docker, like this:

handlers:
- name: Start Docker
  systemd:
    name: docker
    state: started

Fab. We’ve got Docker. Now, let’s clone the AWX repo to our machine. Again, we’re doing this with Ansible, naturally :)

tasks:
- name: Clone AWX repo to local path
  git:
    repo: https://github.com/ansible/awx.git
    dest: /opt/awx

- name: Get latest AWX tag
  shell: |
    if [ $(git status -s | wc -l) -gt 0 ]
    then
      git stash >/dev/null 2>&1
    fi
    git fetch --tags && git describe --tags $(git rev-list --tags --max-count=1)
    if [ $(git stash list | wc -l) -gt 0 ]
    then
      git stash pop >/dev/null 2>&1
    fi
  args:
    chdir: /opt/awx
  register: latest_tag
  changed_when: false

- name: Use latest released version of AWX
  git:
    repo: https://github.com/ansible/awx.git
    dest: /opt/awx
    version: "{{ latest_tag.stdout }}"

OK, there’s a fair bit to get from this, but essentially, we clone the repo from Github, then ask (using a collection of git commands) for the latest released version (yes, I’ve been bitten by just using the head of “devel” before), and then we check out that released version.

Fab, now we can configure it.

tasks:
- name: Set or Read admin password
  set_fact:
    admin_password_was_generated: "{{ (admin_password is defined or lookup('env', 'admin_password') != '') | ternary(false, true) }}"
    admin_password: "{{ admin_password | default (lookup('env', 'admin_password') | default(lookup('password', 'pw.admin_password chars=ascii_letters,digits length=20'), true) ) }}"

- name: Configure AWX installer
  lineinfile:
    path: /opt/awx/installer/inventory
    regexp: "^#?{{ item.key }}="
    line: "{{ item.key }}={{ item.value }}"
  loop:
  - key: "awx_web_hostname"
    value: "{{ ansible_fqdn }}"
  - key: "pg_password"
    value: "{{ lookup('password', 'pw.pg_password chars=ascii_letters,digits length=20') }}"
  - key: "rabbitmq_password"
    value: "{{ lookup('password', 'pw.rabbitmq_password chars=ascii_letters,digits length=20') }}"
  - key: "rabbitmq_erlang_cookie"
    value: "{{ lookup('password', 'pw.rabbitmq_erlang_cookie chars=ascii_letters,digits length=20') }}"
  - key: "admin_password"
    value: "{{ admin_password }}"
  - key: "secret_key"
    value: "{{ lookup('password', 'pw.secret_key chars=ascii_letters,digits length=64') }}"
  - key: "create_preload_data"
    value: "False"
  loop_control:
    label: "{{ item.key }}"

If we don’t already have a password defined, then create one. We register the fact we’ve had to create one, as we’ll need to tell ourselves it once the build is finished.

After that, we set a collection of values into the installer – the hostname, passwords, secret keys and so on. It loops over a key/value pair, and passes these to a regular expression rewrite command, so at the end, we have the settings we want, without having to change this script between releases.

When this is all done, we execute the installer. I’ve seen this done two ways. In an ideal world, you’d throw this into an Ansible shell module, and get it to execute the install, but the problem with that is that the AWX install takes quite a while, so I’d much rather actually be able to see what’s going on… and so, instead, we exit our prepare script at this point, and drop back to the shell to run the installer. Let’s look at both options, and you can decide which one you want to do. In my script, I’m doing the first, but just because it’s a bit neater to have everything in one place.

- name: Run the AWX install.
  shell: ansible-playbook -i inventory install.yml
  args:
    chdir: /opt/awx/installer
cd /opt/awx/installer
ansible-playbook -i inventory install.yml

When this is done, you get a prepared environment, ready to access using the username admin and the password of … well, whatever you set admin_password to.

AWX takes a little while to stand up, so you might want to run this next Ansible stanza to see when it’s ready to go.

- name: Test access to AWX
  tower_user:
    tower_host: "http://{{ ansible_fqdn }}"
    tower_username: admin
    tower_password: "{{ admin_password }}"
    email: "admin@{{ ansible_fqdn }}"
    first_name: "admin"
    last_name: ""
    password: "{{ admin_password }}"
    username: admin
    superuser: yes
    auditor: no
  register: _result
  until: _result.failed == false
  retries: 240 # retry 240 times
  delay: 5 # pause for 5 sec between each try

The upshot to using that command there is that it sets the email address of the admin account to “admin@your.awx.example.org“, if the fully qualified domain name (FQDN) of your machine is your.awx.example.org.

Moving from the Theoretical to the Practical

Now we’ve got our playbook, let’s wrap this up in both a Vagrant Vagrantfile and a Terraform script, this means you can deploy it locally, to test something internally, and in “the cloud”.

To simplify things, and because the version of Ansible deployed on the Vagrant box isn’t the one I want to use, I am using a single “user-data.sh” script for both Vagrant and Terraform. Here that is:

#!/bin/bash
if [ -e "$(which yum)" ]
then
  yum install git python3-pip -y
  pip3 install ansible docker docker-compose
else
  echo "This script only supports CentOS right now."
  exit 1
fi

git clone https://gist.github.com/JonTheNiceGuy/024d72f970d6a1c6160a6e9c3e642e07 /tmp/Install_AWX
cd /tmp/Install_AWX
/usr/local/bin/ansible-playbook Install_AWX.yml

While they both have their differences, they both can execute a script once the machine has finished booting. Let’s start with Vagrant.

Vagrant.configure("2") do |config|
  config.vm.box = "centos/8"

  config.vm.provider :virtualbox do |v|
    v.memory = 4096
  end

  config.vm.provision "shell", path: "user-data.sh"

  config.vm.network "forwarded_port", guest: 80, host: 8080, auto_correct: true
end

To boot this up, once you’ve got Vagrant and Virtualbox installed, run vagrant up and it’ll tell you that it’s set up a port forward from the HTTP port (TCP/80) to a “high” port – TCP/8080. If there’s a collision (because you’re running something else on TCP/8080), it’ll tell you what port it’s forwarded the HTTP port to instead. Once you’ve finished, run vagrant destroy to shut it down. There are lots more tricks you can play with Vagrant, but this is a relatively quick and easy one. Be aware that you’re not using HTTPS, so traffic to the AWX instance can be inspected, but if you’re running this on your local machine, it’s probably not a big issue.

How about running this on a cloud provider, like AWS? We can use the exact same scripts – both the Ansible script, and the user-data.sh script, using Terraform, however, this is a little more complex, as we need to create a VPC, Internet Gateway, Subnet, Security Group and Elastic IP before we can create the virtual machine. What’s more, the Free Tier (that “first hit is free” thing that Amazon Web Services provide to you) does not have enough horsepower to run AWX, so, if you want to look at how to run up AWX in EC2 (or to tweak it to run on Azure, GCP, Digital Ocean or one of the fine offerings from IBM or RedHat), then click through to the gist I’ve put all my code from this post into. The critical lines in there are to select a “CentOS 8” image, open HTTP and SSH into the machine, and to specify the user-data.sh file to provision the machine. Everything else is cruft to make the virtual machine talk to, and be seen by, hosts on the Internet.

To run this one, you need to run terraform init to load the AWS plugin, then terraform apply. Note that this relies on having an AWS access token defined, so if you don’t have them set up, you’ll need to get that sorted out first. Once you’ve finished with your demo, you should run terraform destroy to remove all the assets created by this terraform script. Again, when you’re running that demo, note that you ONLY have HTTP access set up, not HTTPS, so don’t use important credentials on there!

Once you’ve got your AWX environment running, you’ve got just enough AWX there to demo what Ansible Tower looks like, what it can bring to your organisation… and maybe even convince them that it’s worth investing in a license, rather than running AWX in production. Just in case you have that 2AM call-out that we all dread.

Featured image is “pharmacy” by “Tim Evanson” on Flickr and is released under a CC-BY-SA license.

"centos login" by "fsse8info" on Flickr

Getting the default username and AMI for an OS with #Terraform

I have a collection of AWS AMIs I use for various builds at work. These come from two places – the AWS Marketplace and our internal Build process.

Essentially, our internal builds (for those who work for my employer – these are the OptiMISe builds) are taken from specific AWS Marketplace builds and hardened.

Because I don’t want to share the AMI details when I put stuff on GitHub, I have an override.tf file that handles the different AMI search strings. So, here’s the ami.tf file I have with the AWS Marketplace version:

data "aws_ami" "centos7" {
  most_recent = true

  filter {
    name   = "name"
    values = ["CentOS Linux 7 x86_64 HVM EBS ENA*"]
  }

  filter {
    name   = "architecture"
    values = ["x86_64"]
  }

  owners = ["679593333241"] # CentOS Project
}

And here’s an example of the override.tf file I have:

data "aws_ami" "centos7" {
  most_recent = true

  filter {
    name   = "name"
    values = ["SomeUniqueString Containing CentOS*"]
  }

  owners = ["123456789012"]
}

Next, I put these AMI images into a “null” data source, which is evaluated at runtime:

data "null_data_source" "os" {
  inputs = {
    centos7 = data.aws_ami.centos7.id
  }
}

I always forget which username goes with each image, so in the ami.tf file, I also have this:

variable "username" {
  type = map(string)
  default = {
    centos7 = "centos"
  }
}

And in the override.tf file, I have this:

variable "username" {
  type = map(string)
  default = {
    centos7 = "someuser"
  }
}

To get the right combination of username and AMI, I have this in the file where I create my “instance” (virtual machine):

variable "os" {
  default = "centos7"
}

resource "aws_instance" "vm01" {
  ami = data.null_data_source.os.outputs[var.os]
  # additional lines omitted for brevity
}

output "username" {
  value = var.username[var.os]
}

output "vm01" {
  value = aws_instance.vm01.public_ip
}

And that way, I get the VM’s default username and IP address on build. Nice.

Late edit – 2020-09-20: It’s worth noting that this is fine for short-lived builds, proof of concept, etc. But, for longer lived environments, you should be calling out exactly which AMI you’re using, right from the outset. That way, your builds will (or should) all start out from the same point, no ambiguity about exactly which point release they’re getting, etc.

Featured image is “centos login” by “fsse8info” on Flickr and is released under a CC-BY-SA license.

"Stockholms Stadsbibliotek" by "dilettantiquity" on Flickr

Terraform templates with Maps

For a project I’m working on, I needed to define a list of ports, and set some properties on some of them. In the Ansible world, you’d use statements like:

{% if data.somekey is defined %}something {{ data.somekey }}{% endif %}

or

{{ data.somekey | default('') }}

In a pinch, you can also do this:

{{ (data | default({}) ).somekey | default('') }}

With Terraform, I was finding it much harder to work out how to find whether a value as part of a map (the Terraform term for a Dictionary in Ansible terms, or an Associative Array in PHP terms), until I stumbled over the Lookup function. Here’s how that looks for just a simple Terraform file:

output "test" {
    value = templatefile(
        "template.tmpl",
        {
            ports = {
                "eth0": {"ip": "192.168.1.1/24", "name": "public"}, 
                "eth1": {"ip": "172.16.1.1/24", "name": "protected"}, 
                "eth2": {"ip": "10.1.1.1/24", "name": "management", "management": true}, 
                "eth3": {}
            },
            management = "1"
        }
    )
}

And the template that goes with that?

%{ for port, data in ports ~}
Interface: ${port}%{ if lookup(data, "name", "") != ""}
Alias: ${ lookup(data, "name", "") }%{ endif }
Services: ping%{ if lookup(data, "management", false) == true } ssh https%{ endif }
IP: ${ lookup(data, "ip", "Not Defined") }

%{ endfor }

This results in the following output:

C:\tf>terraform.exe apply -auto-approve

Apply complete! Resources: 0 added, 0 changed, 0 destroyed.

Outputs:

test = Interface: eth0
Alias: public
Services: ping
IP: 192.168.1.1/24

Interface: eth1
Alias: protected
Services: ping
IP: 172.16.1.1/24

Interface: eth2
Alias: management
Services: ping ssh https
IP: 10.1.1.1/24

Interface: eth3
Services: ping
IP: Not Defined

Naturally, using this in your own user-data or Custom Data field will probably make more sense than just writing it to “output” 😁

Featured image is “Stockholms Stadsbibliotek” by “dilettantiquity” on Flickr and is released under a CC-BY-SA license.

"Sydney Observatory I" by "Newtown grafitti" on Flickr

Using Feature Flags in Terraform with Count Statements

In a project I’m working on in Terraform, I’ve got several feature flags in a module. These flags relate to whether this module should turn on a system in a cloud provider, or not, and looks like this:

variable "turn_on_feature_x" {
  description = "Setting this to 'yes' will enable Feature X. Any other value will disable it. (Default 'yes')"
  value = "yes"
}

variable "turn_on_feature_y" {
  description = "Setting this to 'yes' will enable Feature Y. Any other value will disable it. (Default 'no')"
  value = "no"
}

When I call the module, I then can either leave the feature with the default values, or selectively enable or disable them, like this:

module "region1" {
  source = "./my_module"
}

module "region2" {
  source = "./my_module"
  turn_on_feature_x = "no"
  turn_on_feature_y = "yes"
}

module "region3" {
  source = "./my_module"
  turn_on_feature_y = "yes"
}

module "region4" {
  source = "./my_module"
  turn_on_feature_x = "no"
}

# Result:
# region1 has X=yes, Y=no
# region2 has X=no, Y=yes
# region3 has X=yes, Y=yes
# region4 has X=no, Y=no

When I then want to use the feature, I have to remember a couple of key parts.

  1. Normally this feature check is done with a “count” statement, and the easiest way to use this is to use the ternary operator to check values and return a “1” or a “0” for if you want the value used.

    Ternary operators look like this: var.turn_on_feature_x == "yes" ? 1 : 0 which basically means, if the value of the variable turn_on_feature_x is set to “yes”, then return 1 otherwise return 0.

    This can get a bit complex, particularly if you want to check several flags a few times, like this: var.turn_on_feature_x == "yes" ? var.turn_on_feature_y == "yes" ? 1 : 0 : 0. I’ve found that wrapping them in brackets helps to understand what you’re getting, like this:

    (
      var.turn_on_feature_x == "yes" ?
      (
        var.turn_on_feature_y == "yes" ?
        1 :
        0
      ) :
      0
    )
  2. If you end up using a count statement, the resulting value must be treated as an 0-indexed array, like this: some_provider_service.my_name[0].result

    This is because, using the count value says “I want X number of resources”, so Terraform has to treat it as an array, in case you actually wanted 10 instead of 1 or 0.

Here’s an example of that in use:

resource "aws_guardduty_detector" "Region" {
  count = var.enable_guardduty == "yes" ? 1 : 0
  enable = true
}

resource "aws_cloudwatch_event_rule" "guardduty_finding" {
  count = (var.enable_guardduty == "yes" ? (var.send_guardduty_findings_to_sns == "yes" ? 1 : (var.send_guardduty_findings_to_sqs == "yes" ? 1 : 0)) : 0)
  name = "${data.aws_caller_identity.current.account_id}-${data.aws_region.current.name}-${var.sns_guardduty_finding_suffix}"
  event_pattern = <<PATTERN
{
  "source": [
    "aws.guardduty"
  ],
  "detail-type": [
    "GuardDuty Finding"
  ]
}
PATTERN
}

resource "aws_cloudwatch_event_target" "sns_guardduty_finding" {
  count = (var.enable_guardduty == "yes" ? (var.send_guardduty_findings_to_sns == "yes" ? 1 : 0) : 0)
  rule = aws_cloudwatch_event_rule.guardduty_finding[0].name
  target_id = aws_sns_topic.guardduty_finding[0].name
  arn = aws_sns_topic.guardduty_finding[0].arn
}

resource "aws_cloudwatch_event_target" "sqs_guardduty_finding" {
  count = (var.enable_guardduty == "yes" ? (var.send_guardduty_findings_to_sqs == "yes" ? 1 : 0) : 0)
  rule = aws_cloudwatch_event_rule.guardduty_finding[0].name
  target_id = "SQS"
  arn = aws_sqs_queue.guardduty_finding[0].arn
}

One thing that bit me rather painfully around this was that if you change from an uncounted resource, like this:

resource "some_tool" "this" {
  some_setting = 1
}

To a counted resource, like this:

resource "some_tool" "this" {
  count = var.some_tool == "yes" ? 1 : 0
  some_setting = 1
}

Then, Terraform will promptly destroy some_tool.this to replace it with some_tool.this[0], because they’re not the same referenced thing!

Fun, huh? 😊

Featured image is “Sydney Observatory I” by “Newtown grafitti” on Flickr and is released under a CC-BY license.

"Field Notes - Sweet Tooth" by "The Marmot" on Flickr

Multi-OS builds in AWS with Terraform – some notes from the field!

Late edit: 2020-05-22 – Updated with better search criteria from colleague conversations

I’m building a proof of concept for … well, a product that needs testing on several different Linux and Windows variants on AWS and Azure. I’m building this environment with Terraform, and it’s thrown me a few curve balls, so I thought I’d document the issues I’ve had!

The versions of distributions I have tested are the latest releases of each of these images at-or-near the time of writing. The major version listed is the earliest I have tested, so no assumption is made about previous versions, and later versions, after the time of this post should not assume any of this data is also accurate!

(Fujitsu Staff – please contact me on my work email address for details on how to get the internal AMIs of our builds of these images 😄)

Linux Distributions

On the whole, I tend to be much more confident and knowledgable about Linux distributions. I’ve also done far more installs of each of these!

Almost all of these installs are Free of Charge, with the exception of Red Hat Enterprise Linux, which requires a subscription fee, and this can be “Pay As You Go” or “Bring Your Own License”. These sorts of things are arranged for me, so I don’t know how easy or hard it is to organise these licenses!

These builds all use cloud-init, via either a cloud-init yaml script, or some shell scripting language (usually accepted to be bash). If this script fails to execute, you will find your user-data file in /var/lib/cloud/instance/scripts/part-001. If this is a shell script then you will be able to execute it by running that script as your root user.

Amazon Linux 2 or Amzn2

Amazon Linux2 is the “preferred” distribution for Amazon Web Services (AWS) (surprisingly enough). It is based on Red Hat Enterprise Linux (RHEL), and many of the instructions you’ll want to run to install software will use RHEL based instructions. This platform is not available outside the AWS ecosystem, as far as I can tell, although you might be able to run it on-prem.

Software packages are limited in this distribution, so any “extra” features require the installation of the “EPEL” repository, by executing the command sudo amazon-linux-extras install epel and then using the yum command to install further packages. I needed nginx for part of my build, and this was only in EPEL.

Amzn2 AMI Lookup

data "aws_ami" "amzn2" {
  most_recent = true

  filter {
    name   = "name"
    values = ["amzn2-ami-hvm-2.0.*-gp2"]
  }

  filter {
    name   = "architecture"
    values = ["x86_64"]
  }

  filter {
    name   = "state"
    values = ["available"]
  }

  owners = ["amazon"] # Canonical
}

Amzn2 User Account

Amazon Linux 2 images under AWS have a default “ec2-user” user account. sudo will allow escalation to Root with no password prompt.

Amzn2 AWS Interface Configuration

The primary interface is called eth0. Network Manager is not installed. To manage the interface, you need to edit /etc/sysconfig/network-scripts/ifcfg-eth0 and apply changes with ifdown eth0 ; ifup eth0.

Amzn2 user-data / Cloud-Init Troubleshooting

I’ve found the output from user-data scripts appearing in /var/log/cloud-init-output.log.

CentOS 7

For starters, AWS doesn’t have an official CentOS8 image, so I’m a bit stymied there! In fact, as far as I can make out, CentOS is only releasing ISOs for builds now, and not any cloud images. There’s an open issue on their bug tracker which seems to suggest that it’s not going to get any priority any time soon! Blimey.

This image may require you to “subscribe” to the image (particularly if you have a “private marketplace”), but this will be requested of you (via a URL provided on screen) when you provision your first machine with this AMI.

Like with Amzn2, CentOS7 does not have nginx installed, and like Amzn2, installation of the EPEL library is not a difficult task. CentOS7 bundles a file to install the EPEL, installed by running yum install epel-release. After this is installed, you have the “full” range of software in EPEL available to you.

CentOS AMI Lookup

data "aws_ami" "centos7" {
  most_recent = true

  filter {
    name   = "name"
    values = ["CentOS Linux 7*"]
  }

  filter {
    name   = "architecture"
    values = ["x86_64"]
  }

  filter {
    name   = "state"
    values = ["available"]
  }

  owners = ["aws-marketplace"]
}

CentOS User Account

CentOS7 images under AWS have a default “centos” user account. sudo will allow escalation to Root with no password prompt.

CentOS AWS Interface Configuration

The primary interface is called eth0. Network Manager is not installed. To manage the interface, you need to edit /etc/sysconfig/network-scripts/ifcfg-eth0 and apply changes with ifdown eth0 ; ifup eth0.

CentOS Cloud-Init Troubleshooting

I’ve run several different user-data located bash scripts against this system, and the logs from these scripts are appearing in the default syslog file (/var/log/syslog) or by running journalctl -xefu cloud-init. They do not appear in /var/log/cloud-init-output.log.

Red Hat Enterprise Linux (RHEL) 7 and 8

Red Hat has both RHEL7 and RHEL8 images in the AWS market place. The Proof Of Value (POV) I was building was only looking at RHEL7, so I didn’t extensively test RHEL8.

Like Amzn2 and CentOS7, RHEL7 needs EPEL installing to have additional packages installed. Unlike Amzn2 and CentOS7, you need to obtain the EPEL package from the Fedora Project. Do this by executing these two commands:

wget https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm
yum install epel-release-latest-7.noarch.rpm

After this is installed, you’ll have access to the broader range of software that you’re likely to require. Again, I needed nginx, and this was not available to me with the stock install.

RHEL7 AMI Lookup

data "aws_ami" "rhel7" {
  most_recent = true

  filter {
    name   = "name"
    values = ["RHEL-7*GA*Hourly*"]
  }

  filter {
    name   = "architecture"
    values = ["x86_64"]
  }

  filter {
    name   = "state"
    values = ["available"]
  }

  owners = ["309956199498"] # Red Hat
}

RHEL8 AMI Lookup

data "aws_ami" "rhel8" {
  most_recent = true

  filter {
    name   = "name"
    values = ["RHEL-8*HVM-*Hourly*"]
  }

  filter {
    name   = "architecture"
    values = ["x86_64"]
  }

  filter {
    name   = "state"
    values = ["available"]
  }

  owners = ["309956199498"] # Red Hat
}

RHEL User Accounts

RHEL7 and RHEL8 images under AWS have a default “ec2-user” user account. sudo will allow escalation to Root with no password prompt.

RHEL AWS Interface Configuration

The primary interface is called eth0. Network Manager is installed, and the eth0 interface has a profile called “System eth0” associated to it.

RHEL Cloud-Init Troubleshooting

In RHEL7, as per CentOS7, logs from user-data scripts are appear in the general syslog file (in this case, /var/log/messages) or by running journalctl -xefu cloud-init. They do not appear in /var/log/cloud-init-output.log.

In RHEL8, logs from user-data scrips now appear in /var/log/cloud-init-output.log.

Ubuntu 18.04

At the time of writing this, the vendor, who’s product I was testing, categorically stated that the newest Ubuntu LTS, Ubuntu 20.04 (Focal Fossa) would not be supported until some time after our testing was complete. As such, I spent no time at all researching or planning to use this image.

Ubuntu is the only non-RPM based distribution in this test, instead being based on the Debian project’s DEB packages. As such, it’s range of packages is much wider. That said, for the project I was working on, I required a later version of nginx than was available in the Ubuntu Repositories, so I had to use the nginx Personal Package Archive (PPA). To do this, I found the official PPA for the nginx project, and followed the instructions there. Generally speaking, this would potentially risk any support from the distribution vendor, as it’s not certified or supported by the project… but I needed that version, so I had to do it!

Ubuntu 18.04 AMI Lookup

data "aws_ami" "ubuntu1804" {
  most_recent = true

  filter {
    name   = "name"
    values = ["*ubuntu*18.04*"]
  }

  filter {
    name   = "architecture"
    values = ["x86_64"]
  }

  filter {
    name   = "state"
    values = ["available"]
  }

  owners = ["099720109477"] # Canonical
}

Ubuntu 18.04 User Accounts

Ubuntu 18.04 images under AWS have a default “ubuntu” user account. sudo will allow escalation to Root with no password prompt.

Ubuntu 18.04 AWS Interface Configuration

The primary interface is called eth0. Network Manager is not installed, and instead Ubuntu uses Netplan to manage interfaces. The file to manage the interface defaults is /etc/netplan/50-cloud-init.yaml. If you struggle with this method, you may wish to install ifupdown and define your configuration in /etc/network/interfaces.

Ubuntu 18.04 Cloud-Init Troubleshooting

In Ubuntu 18.04, logs from user-data scrips appear in /var/log/cloud-init-output.log.

Windows

This section is far more likely to have it’s data consolidated here!

Windows has a common “standard” username – Administrator, and a common way of creating a password (this is generated on-boot, and the password is transferred to the AWS Metadata Service, which it is retrieved and decrypted with the SSH key you’ve used to build the “authentication” to the box) which Terraform handles quite nicely.

The network device is referred to as “AWS PV Network Device #0”. It can be managed with powershell, netsh (although apparently Microsoft are rumbling about demising this script), or from the GUI.

Windows 2012R2

This version is very old now, and should be compared to Windows 7 in terms of age. It is only supported by Microsoft with an extended maintenance package!

Windows 2012R2 AMI Lookup

data "aws_ami" "w2012r2" {
  most_recent = true

  filter {
    name = "name"
    values = ["Windows_Server-2012-R2_RTM-English-64Bit-Base*"]
  }

  filter {
    name   = "architecture"
    values = ["x86_64"]
  }

  filter {
    name   = "state"
    values = ["available"]
  }

  owners = ["801119661308"] # AWS
}

Windows 2012R2 Cloud-Init Troubleshooting

Logs from the Metadata Service can be found in C:\Program Files\Amazon\Ec2ConfigService\Logs\Ec2ConfigLog.txt. You can also find the userdata script in C:\Program Files\Amazon\Ec2ConfigService\Scripts\UserScript.ps1. This can be launched and debugged using PowerShell ISE, which is in the “Start” menu.

Windows 2016

This version is reasonably old now, and should be compared to Windows 8 in terms of age. It is supported until 2022 in “mainline” support.

Windows 2016 AMI Lookup

data "aws_ami" "w2016" {
  most_recent = true

  filter {
    name = "name"
    values = ["Windows_Server-2016-English-Full-Base*"]
  }

  filter {
    name   = "architecture"
    values = ["x86_64"]
  }

  filter {
    name   = "state"
    values = ["available"]
  }

  owners = ["801119661308"] # AWS
}

Windows 2016 Cloud-Init Troubleshooting

The metadata service has moved from Windows 2016 and onwards. Logs are stored in a partially hidden directory tree, so you may need to click in the “Address” bar of the Explorer window and type in part of this path. The path to these files is: C:\ProgramData\Amazon\EC2-Windows\Launch\Log. I say “files” as there are two parts to this file – an “Ec2Launch.log” file which reports on the boot process, and “UserdataExecution.log” which shows the output from the userdata script.

Unlike with the Windows 2012R2 version, you can’t get hold of the actual userdata script on the filesystem, you need to browse to a special path in the metadata service (actually, technically, you can do this with any of the metadata services – OpenStack, Azure, and so on) which is: http://169.254.169.254/latest/user-data/

This will contain userdata between a <powershell> and </powershell> pair of tags. This would need to be copied out of this URL and pasted into a new file on your local machine to determine why issues are occurring. Again, I would recommend using PowerShell ISE from the Start Menu to debug your code.

Windows 2019

This version is the most recent released version of Windows Server, and should be compared to Windows 10 in terms of age.

Windows 2019 AMI Lookup

data "aws_ami" "w2019" {
  most_recent = true

  filter {
    name = "name"
    values = ["Windows_Server-2019-English-Full-Base*"]
  }

  filter {
    name   = "architecture"
    values = ["x86_64"]
  }

  filter {
    name   = "state"
    values = ["available"]
  }

  owners = ["801119661308"] # AWS
}

Windows 2019 Cloud-Init Troubleshooting

Functionally, the same as Windows 2016, but to recap, the metadata service has moved from Windows 2016 and onwards. Logs are stored in a partially hidden directory tree, so you may need to click in the “Address” bar of the Explorer window and type in part of this path. The path to these files is: C:\ProgramData\Amazon\EC2-Windows\Launch\Log. I say “files” as there are two parts to this file – an “Ec2Launch.log” file which reports on the boot process, and “UserdataExecution.log” which shows the output from the userdata script.

Unlike with the Windows 2012R2 version, you can’t get hold of the actual userdata script on the filesystem, you need to browse to a special path in the metadata service (actually, technically, you can do this with any of the metadata services – OpenStack, Azure, and so on) which is: http://169.254.169.254/latest/user-data/

This will contain userdata between a <powershell> and </powershell> pair of tags. This would need to be copied out of this URL and pasted into a new file on your local machine to determine why issues are occurring. Again, I would recommend using PowerShell ISE from the Start Menu to debug your code.

Featured image is “Field Notes – Sweet Tooth” by “The Marmot” on Flickr and is released under a CC-BY license.

“New shoes” by “Morgaine” from Flickr

Making Windows Cloud-Init Scripts run after a reboot (Using Terraform)

I’m currently building a Proof Of Value (POV) environment for a product, and one of the things I needed in my environment was an Active Directory domain.

To do this in AWS, I had to do the following steps:

  1. Build my Domain Controller
    1. Install Windows
    2. Set the hostname (Reboot)
    3. Promote the machine to being a Domain Controller (Reboot)
    4. Create a domain user
  2. Build my Member Server
    1. Install Windows
    2. Set the hostname (Reboot)
    3. Set the DNS client to point to the Domain Controller
    4. Join the server to the domain (Reboot)

To make this work, I had to find a way to trigger build steps after each reboot. I was working with Windows 2012R2, Windows 2016 and Windows 2019, so the solution had to be cross-version. Fortunately I found this script online! That version was great for Windows 2012R2, but didn’t cover Windows 2016 or later… So let’s break down what I’ve done!

In your userdata field, you need to have two sets of XML strings, as follows:

<persist>true</persist>
<powershell>
$some = "powershell code"
</powershell>

The first block says to Windows 2016+ “keep trying to run this script on each boot” (note that you need to stop it from doing non-relevant stuff on each boot – we’ll get to that in a second!), and the second bit is the PowerShell commands you want it to run. The rest of this now will focus just on the PowerShell block.

  $path= 'HKLM:\Software\UserData'
  
  if(!(Get-Item $Path -ErrorAction SilentlyContinue)) {
    New-Item $Path
    New-ItemProperty -Path $Path -Name RunCount -Value 0 -PropertyType dword
  }
  
  $runCount = Get-ItemProperty -Path $path -Name Runcount -ErrorAction SilentlyContinue | Select-Object -ExpandProperty RunCount
  
  if($runCount -ge 0) {
    switch($runCount) {
      0 {
        $runCount = 1 + [int]$runCount
        Set-ItemProperty -Path $Path -Name RunCount -Value $runCount
        if ($ver -match 2012) {
          #Enable user data
          $EC2SettingsFile = "$env:ProgramFiles\Amazon\Ec2ConfigService\Settings\Config.xml"
          $xml = [xml](Get-Content $EC2SettingsFile)
          $xmlElement = $xml.get_DocumentElement()
          $xmlElementToModify = $xmlElement.Plugins
          
          foreach ($element in $xmlElementToModify.Plugin)
          {
            if ($element.name -eq "Ec2HandleUserData") {
              $element.State="Enabled"
            }
          }
          $xml.Save($EC2SettingsFile)
        }
        $some = "PowerShell Script"
      }
    }
  }

Whew, what a block! Well, again, we can split this up into a couple of bits.

In the first few lines, we build a pointer, a note which says “We got up to here on our previous boots”. We then read that into a variable and find that number and execute any steps in the block with that number. That’s this block:

  $path= 'HKLM:\Software\UserData'
  
  if(!(Get-Item $Path -ErrorAction SilentlyContinue)) {
    New-Item $Path
    New-ItemProperty -Path $Path -Name RunCount -Value 0 -PropertyType dword
  }
  
  $runCount = Get-ItemProperty -Path $path -Name Runcount -ErrorAction SilentlyContinue | Select-Object -ExpandProperty RunCount
  
  if($runCount -ge 0) {
    switch($runCount) {

    }
  }

The next part (and you’ll repeat it for each “number” of reboot steps you need to perform) says “increment the number” then “If this is Windows 2012, remind the userdata handler that the script needs to be run again next boot”. That’s this block:

      0 {
        $runCount = 1 + [int]$runCount
        Set-ItemProperty -Path $Path -Name RunCount -Value $runCount
        if ($ver -match 2012) {
          #Enable user data
          $EC2SettingsFile = "$env:ProgramFiles\Amazon\Ec2ConfigService\Settings\Config.xml"
          $xml = [xml](Get-Content $EC2SettingsFile)
          $xmlElement = $xml.get_DocumentElement()
          $xmlElementToModify = $xmlElement.Plugins
          
          foreach ($element in $xmlElementToModify.Plugin)
          {
            if ($element.name -eq "Ec2HandleUserData") {
              $element.State="Enabled"
            }
          }
          $xml.Save($EC2SettingsFile)
        }
        
      }

In fact, it’s fair to say that in my userdata script, this looks like this:

  $path= 'HKLM:\Software\UserData'
  
  if(!(Get-Item $Path -ErrorAction SilentlyContinue)) {
    New-Item $Path
    New-ItemProperty -Path $Path -Name RunCount -Value 0 -PropertyType dword
  }
  
  $runCount = Get-ItemProperty -Path $path -Name Runcount -ErrorAction SilentlyContinue | Select-Object -ExpandProperty RunCount
  
  if($runCount -ge 0) {
    switch($runCount) {
      0 {
        ${file("templates/step.tmpl")}

        ${templatefile(
          "templates/rename_windows.tmpl",
          {
            hostname = "SomeMachine"
          }
        )}
      }
      1 {
        ${file("templates/step.tmpl")}

        ${templatefile(
          "templates/join_ad.tmpl",
          {
            dns_ipv4 = "192.0.2.1",
            domain_suffix = "ad.mycorp",
            join_account = "ad\someuser",
            join_password = "SomePassw0rd!"
          }
        )}
      }
    }
  }

Then, after each reboot, you need a new block. I have a block to change the computer name, a block to join the machine to the domain, and a block to install an software that I need.

Featured image is “New shoes” by “Morgaine” on Flickr and is released under a CC-BY-SA license.

"the home automation system designed by loren amelang himself" by "Nicolás Boullosa" on Flickr

One to read: Ansible for Networking – Part 3: Cisco IOS

One to read: “Ansible for Networking – Part 3: Cisco IOS”

One of the guest hosts and stalwart member of the Admin Admin Telegram group has been documenting how he has built his Ansible Networking lab.

Stuart has done three posts so far, but this is the first one actually dealing with the technology. It’s a mammoth read, so I’d recommend doing it on a computer, and not on a tablet or phone!

Posts one and two were about what the series would cover and how the lab has been constructed.

Featured image is “the home automation system designed by loren amelang himself” by “Nicolás Boullosa” on Flickr and is released under a CC-BY license.

Opening to my video: Screencast 003 - Gitlab

Screencast 003: Gitlab

I’ve done a new mentoring style video, talking about how to use a self-hosted version of Gitlab for basic group projects and individual projects.

Screencast 003: Gitlab

Also available on Archive.org and LBRY.

Late edit 2020-03-25: To build the Gitlab environment I created, take a look at this git repository, which uses Terraform, some cloud init scripts and an ansible playbook. In particular, look at the following files:

If you just want to build the Gitlab environment, then it’s worth removing or renaming (to anything that isn’t .tf – I use .tf_unload) the files load_aws_module.tf, load_awx_module.tf, load_azure_module.tf

Opening to my video: Screencast 002 - A quick walk through Git

Screencast 002: A quick walk through Git (a mentoring style video)

I have done a follow-up Mentoring style video to support my last one. This video shows how to fix some of the issues in Git I came across in my last mentoring video!

Screencast 002: A quick walk through Git

I took some advice from a colleague who noticed that I skipped past a couple of issues with my Git setup, so I re-did them :) I hope this makes sense, and at 35 minutes, is a bit more understandable than the last 1h15 video!

Also on LBRY and Archive.org