"Stockholms Stadsbibliotek" by "dilettantiquity" on Flickr

Terraform templates with Maps

For a project I’m working on, I needed to define a list of ports, and set some properties on some of them. In the Ansible world, you’d use statements like:

{% if data.somekey is defined %}something {{ data.somekey }}{% endif %}

or

{{ data.somekey | default('') }}

In a pinch, you can also do this:

{{ (data | default({}) ).somekey | default('') }}

With Terraform, I was finding it much harder to work out how to find whether a value as part of a map (the Terraform term for a Dictionary in Ansible terms, or an Associative Array in PHP terms), until I stumbled over the Lookup function. Here’s how that looks for just a simple Terraform file:

output "test" {
    value = templatefile(
        "template.tmpl",
        {
            ports = {
                "eth0": {"ip": "192.168.1.1/24", "name": "public"}, 
                "eth1": {"ip": "172.16.1.1/24", "name": "protected"}, 
                "eth2": {"ip": "10.1.1.1/24", "name": "management", "management": true}, 
                "eth3": {}
            },
            management = "1"
        }
    )
}

And the template that goes with that?

%{ for port, data in ports ~}
Interface: ${port}%{ if lookup(data, "name", "") != ""}
Alias: ${ lookup(data, "name", "") }%{ endif }
Services: ping%{ if lookup(data, "management", false) == true } ssh https%{ endif }
IP: ${ lookup(data, "ip", "Not Defined") }

%{ endfor }

This results in the following output:

C:\tf>terraform.exe apply -auto-approve

Apply complete! Resources: 0 added, 0 changed, 0 destroyed.

Outputs:

test = Interface: eth0
Alias: public
Services: ping
IP: 192.168.1.1/24

Interface: eth1
Alias: protected
Services: ping
IP: 172.16.1.1/24

Interface: eth2
Alias: management
Services: ping ssh https
IP: 10.1.1.1/24

Interface: eth3
Services: ping
IP: Not Defined

Naturally, using this in your own user-data or Custom Data field will probably make more sense than just writing it to “output” 😁

Featured image is “Stockholms Stadsbibliotek” by “dilettantiquity” on Flickr and is released under a CC-BY-SA license.

"Sydney Observatory I" by "Newtown grafitti" on Flickr

Using Feature Flags in Terraform with Count Statements

In a project I’m working on in Terraform, I’ve got several feature flags in a module. These flags relate to whether this module should turn on a system in a cloud provider, or not, and looks like this:

variable "turn_on_feature_x" {
  description = "Setting this to 'yes' will enable Feature X. Any other value will disable it. (Default 'yes')"
  value = "yes"
}

variable "turn_on_feature_y" {
  description = "Setting this to 'yes' will enable Feature Y. Any other value will disable it. (Default 'no')"
  value = "no"
}

When I call the module, I then can either leave the feature with the default values, or selectively enable or disable them, like this:

module "region1" {
  source = "./my_module"
}

module "region2" {
  source = "./my_module"
  turn_on_feature_x = "no"
  turn_on_feature_y = "yes"
}

module "region3" {
  source = "./my_module"
  turn_on_feature_y = "yes"
}

module "region4" {
  source = "./my_module"
  turn_on_feature_x = "no"
}

# Result:
# region1 has X=yes, Y=no
# region2 has X=no, Y=yes
# region3 has X=yes, Y=yes
# region4 has X=no, Y=no

When I then want to use the feature, I have to remember a couple of key parts.

  1. Normally this feature check is done with a “count” statement, and the easiest way to use this is to use the ternary operator to check values and return a “1” or a “0” for if you want the value used.

    Ternary operators look like this: var.turn_on_feature_x == "yes" ? 1 : 0 which basically means, if the value of the variable turn_on_feature_x is set to “yes”, then return 1 otherwise return 0.

    This can get a bit complex, particularly if you want to check several flags a few times, like this: var.turn_on_feature_x == "yes" ? var.turn_on_feature_y == "yes" ? 1 : 0 : 0. I’ve found that wrapping them in brackets helps to understand what you’re getting, like this:

    (
      var.turn_on_feature_x == "yes" ?
      (
        var.turn_on_feature_y == "yes" ?
        1 :
        0
      ) :
      0
    )
  2. If you end up using a count statement, the resulting value must be treated as an 0-indexed array, like this: some_provider_service.my_name[0].result

    This is because, using the count value says “I want X number of resources”, so Terraform has to treat it as an array, in case you actually wanted 10 instead of 1 or 0.

Here’s an example of that in use:

resource "aws_guardduty_detector" "Region" {
  count = var.enable_guardduty == "yes" ? 1 : 0
  enable = true
}

resource "aws_cloudwatch_event_rule" "guardduty_finding" {
  count = (var.enable_guardduty == "yes" ? (var.send_guardduty_findings_to_sns == "yes" ? 1 : (var.send_guardduty_findings_to_sqs == "yes" ? 1 : 0)) : 0)
  name = "${data.aws_caller_identity.current.account_id}-${data.aws_region.current.name}-${var.sns_guardduty_finding_suffix}"
  event_pattern = <<PATTERN
{
  "source": [
    "aws.guardduty"
  ],
  "detail-type": [
    "GuardDuty Finding"
  ]
}
PATTERN
}

resource "aws_cloudwatch_event_target" "sns_guardduty_finding" {
  count = (var.enable_guardduty == "yes" ? (var.send_guardduty_findings_to_sns == "yes" ? 1 : 0) : 0)
  rule = aws_cloudwatch_event_rule.guardduty_finding[0].name
  target_id = aws_sns_topic.guardduty_finding[0].name
  arn = aws_sns_topic.guardduty_finding[0].arn
}

resource "aws_cloudwatch_event_target" "sqs_guardduty_finding" {
  count = (var.enable_guardduty == "yes" ? (var.send_guardduty_findings_to_sqs == "yes" ? 1 : 0) : 0)
  rule = aws_cloudwatch_event_rule.guardduty_finding[0].name
  target_id = "SQS"
  arn = aws_sqs_queue.guardduty_finding[0].arn
}

One thing that bit me rather painfully around this was that if you change from an uncounted resource, like this:

resource "some_tool" "this" {
  some_setting = 1
}

To a counted resource, like this:

resource "some_tool" "this" {
  count = var.some_tool == "yes" ? 1 : 0
  some_setting = 1
}

Then, Terraform will promptly destroy some_tool.this to replace it with some_tool.this[0], because they’re not the same referenced thing!

Fun, huh? 😊

Featured image is “Sydney Observatory I” by “Newtown grafitti” on Flickr and is released under a CC-BY license.

"Kelvin Test" by "Eelke" on Flickr

In Ansible, determine the type of a value, and casting those values to other types

TL;DR? It’s possible to work out what type of variable you’re working with in Ansible. The in-built filters don’t always do quite what you’re expecting. Jump to the “In Summary” heading for my suggestions.

One of the things I end up doing quite a bit with Ansible is value manipulation. I know it’s not really normal, but… well, I like rewriting values from one type of a thing to the next type of a thing.

For example, I like taking a value that I don’t know if it’s a list or a string, and passing that to an argument that expects a list.

Doing it wrong, getting it better

Until recently, I’d do that like this:

- debug:
    msg: |-
      {
        {%- if value | type_debug == "string" or value | type_debug == "AnsibleUnicode" -%}
           "string": "{{ value }}"
        {%- elif value | type_debug == "dict" or value | type_debug == "ansible_mapping" -%}
          "dict": {{ value }}
        {%- elif value | type_debug == "list" -%}
          "list": {{ value }}
        {%- else -%}
          "other": "{{ value }}"
        {%- endif -%}
      }

But, following finding this gist, I now know I can do this:

- debug:
    msg: |-
      {
        {%- if value is string -%}
           "string": "{{ value }}"
        {%- elif value is mapping -%}
          "dict": {{ value }}
        {%- elif value is iterable -%}
          "list": {{ value }}
        {%- else -%}
          "other": "{{ value }}"
        {%- endif -%}
      }

So, how would I use this, given the context of what I was saying before?

- assert:
    that:
    - value is string
    - value is not mapping
    - value is iterable
- some_module:
    some_arg: |-
      {%- if value is string -%}
        ["{{ value }}"]
      {%- else -%}
        {{ value }}
      {%- endif -%}

More details on finding a type

Why in this order? Well, because of how values are stored in Ansible, the following states are true:

⬇️Type \ ➡️Checkis iterableis mappingis sequenceis string
a_dict (e.g. {})✔️✔️✔️❌
a_list (e.g. [])✔️❌✔️❌
a_string (e.g. “”)✔️✔️✔️✔️
A comparison between value types

So, if you were to check for is iterable first, you might match on a_list or a_dict instead of a_string, but string can only match on a_string. Once you know it can’t be a string, you can check whether something is mapping – again, because a mapping can match either a_string or a_dict, but it can’t match a_list. Once you know it’s not that, you can check for either is iterable or is sequence because both of these match a_string, a_dict and a_list.

Likewise, if you wanted to check whether a_float and an_integer is number and not is string, you can check these:

⬇️Type \ ➡️Checkis floatis integeris iterableis mappingis numberis sequenceis string
a_float✔️❌❌❌✔️❌❌
an_integer❌✔️❌❌✔️❌❌
A comparison between types of numbers

So again, a_float and an_integer don’t match is string, is mapping or is iterable, but they both match is number and they each match their respective is float and is integer checks.

How about each of those (a_float and an_integer) wrapped in quotes, making them a string? What happens then?

⬇️Type \ ➡️Checkis floatis integeris iterableis mappingis numberis sequenceis string
a_float_as_string❌❌✔️❌❌✔️✔️
an_integer_as_string❌❌✔️❌❌✔️✔️
A comparison between types of numbers when held as a string

This is somewhat interesting, because they look like a number, but they’re actually “just” a string. So, now you need to do some comparisons to make them look like numbers again to check if they’re numbers.

Changing the type of a string

What happens if you cast the values? Casting means to convert from one type of value (e.g. string) into another (e.g. float) and to do that, Ansible has three filters we can use, float, int and string. You can’t cast to a dict or a list, but you can use dict2items and items2dict (more on those later). So let’s start with casting our group of a_ and an_ items from above. Here’s a list of values I want to use:

---
- hosts: localhost
  gather_facts: no
  vars:
    an_int: 1
    a_float: 1.1
    a_string: "string"
    an_int_as_string: "1"
    a_float_as_string: "1.1"
    a_list:
      - item1
    a_dict:
      key1: value1

With each of these values, I returned the value as Ansible knows it, what happens when you do {{ value | float }} to cast it as a float, as an integer by doing {{ value | int }} and as a string {{ value | string }}. Some of these results are interesting. Note that where you see u'some value' means that Python converted that string to a Unicode string.

⬇️Value \ ➡️Castvaluevalue when cast as floatvalue when cast as integervalue when cast as string
a_dict{“key1”: “value1”}0.00“{u’key1′: u’value1′}”
a_float1.11.11“1.1”
a_float_as_string“1.1”1.11“1.1”
a_list[“item1”]0.00“[u’item1′]”
a_string“string”0.00“string”
an_int111“1”
an_int_as_string“1”11“1”
Casting between value types

So, what does this mean for us? Well, not a great deal, aside from to note that you can “force” a number to be a string, or a string which is “just” a number wrapped in quotes can be forced into being a number again.

Oh, and casting dicts to lists and back again? This one is actually pretty clearly documented in the current set of documentation (as at 2.9 at least!)

Checking for miscast values

How about if I want to know whether a value I think might be a float stored as a string, how can I check that?

{{ vars[var] | float | string == vars[var] | string }}

What is this? If I cast a value that I think might be a float, to a float, and then turn both the cast value and the original into a string, do they match? If I’ve got a string or an integer, then I’ll get a false, but if I have actually got a float, then I’ll get true. Likewise for casting an integer. Let’s see what that table looks like:

⬇️Type \ ➡️Checkvalue when cast as floatvalue when cast as integervalue when cast as string
a_float✔️❌✔️
a_float_as_string✔️❌✔️
an_integer❌✔️✔️
an_integer_as_string❌✔️✔️
A comparison between types of numbers when cast to a string

So this shows us the values we were after – even if you’ve got a float (or an integer) stored as a string, by doing some careful casting, you can confirm they’re of the type you wanted… and then you can pass them through the right filter to use them in your playbooks!

Booleans

Last thing to check – boolean values – “True” or “False“. There’s a bit of confusion here, as a “boolean” can be: true or false, yes or no, 1 or 0, however, is true and True and TRUE the same? How about false, False and FALSE? Let’s take a look!

⬇️Value \ ➡️Checktype_debug is booleanis numberis iterableis mappingis stringvalue when cast as boolvalue when cast as stringvalue when cast as integer
yesbool✔️✔️❌❌❌TrueTrue1
YesAnsibleUnicode❌❌✔️❌✔️FalseYes0
YESAnsibleUnicode❌❌✔️❌✔️FalseYES0
“yes”AnsibleUnicode❌❌✔️❌✔️Trueyes0
“Yes”AnsibleUnicode❌❌✔️❌✔️TrueYes0
“YES”AnsibleUnicode❌❌✔️❌✔️TrueYES0
truebool✔️✔️❌❌❌TrueTrue1
Truebool✔️✔️❌❌❌TrueTrue1
TRUEbool✔️✔️❌❌❌TrueTrue1
“true”AnsibleUnicode❌❌✔️❌✔️Truetrue0
“True”AnsibleUnicode❌❌✔️❌✔️TrueTrue0
“TRUE”AnsibleUnicode❌❌✔️❌✔️TrueTRUE0
1int❌✔️❌❌❌True11
“1”AnsibleUnicode❌❌✔️❌✔️True11
nobool✔️✔️❌❌❌FalseFalse0
Nobool✔️✔️❌❌❌FalseFalse0
NObool✔️✔️❌❌❌FalseFalse0
“no”AnsibleUnicode❌❌✔️❌✔️Falseno0
“No”AnsibleUnicode❌❌✔️❌✔️FalseNo0
“NO”AnsibleUnicode❌❌✔️❌✔️FalseNO0
falsebool✔️✔️❌❌❌FalseFalse0
Falsebool✔️✔️❌❌❌FalseFalse0
FALSEbool✔️✔️❌❌❌FalseFalse0
“false”AnsibleUnicode❌❌✔️❌✔️Falsefalse0
“False”AnsibleUnicode❌❌✔️❌✔️FalseFalse0
“FALSE”AnsibleUnicode❌❌✔️❌✔️FalseFALSE0
0int❌✔️❌❌❌False00
“0”AnsibleUnicode❌❌✔️❌✔️False00
Comparisons between various stylings of boolean representations

So, the stand out thing for me here is that while all the permutations of string values of the boolean representations (those wrapped in quotes, like this: "yes") are treated as strings, and shouldn’t be considered as “boolean” (unless you cast for it explicitly!), and all non-string versions of true, false, and no are considered to be boolean, yes, Yes and YES are treated differently, depending on case. So, what would I do?

In summary

  • Consistently use no or yes, true or false in lower case to indicate a boolean value. Don’t use 1 or 0 unless you have to.
  • If you’re checking that you’re working with a string, a list or a dict, check in the order string (using is string), dict (using is mapping) and then list (using is sequence or is iterable)
  • Checking for numbers that are stored as strings? Cast your string through the type check for that number, like this: {% if value | float | string == value | string %}{{ value | float }}{% elif value | int | string == value | string %}{{ value | int }}{% else %}{{ value }}{% endif %}
  • Try not to use type_debug unless you really can’t find any other way. These values will change between versions, and this caused me a lot of issues with a large codebase I was working on a while ago!

Run these tests yourself!

Want to run these tests yourself? Here’s the code I ran (also available in a Gist on GitHub), using Ansible 2.9.10.

---
- hosts: localhost
  gather_facts: no
  vars:
    an_int: 1
    a_float: 1.1
    a_string: "string"
    an_int_as_string: "1"
    a_float_as_string: "1.1"
    a_list:
      - item1
    a_dict:
      key1: value1
  tasks:
    - debug:
        msg: |
          {
          {% for var in ["an_int", "an_int_as_string","a_float", "a_float_as_string","a_string","a_list","a_dict"] %}
            "{{ var }}": {
              "type_debug": "{{ vars[var] | type_debug }}",
              "value": "{{ vars[var] }}",
              "is float": "{{ vars[var] is float }}",
              "is integer": "{{ vars[var] is integer }}",
              "is iterable": "{{ vars[var] is iterable }}",
              "is mapping": "{{ vars[var] is mapping }}",
              "is number": "{{ vars[var] is number }}",
              "is sequence": "{{ vars[var] is sequence }}",
              "is string": "{{ vars[var] is string }}",
              "value cast as float": "{{ vars[var] | float }}",
              "value cast as integer": "{{ vars[var] | int }}",
              "value cast as string": "{{ vars[var] | string }}",
              "is same when cast to float": "{{ vars[var] | float | string == vars[var] | string }}",
              "is same when cast to integer": "{{ vars[var] | int | string == vars[var] | string }}",
              "is same when cast to string": "{{ vars[var] | string == vars[var] | string }}",
            },
          {% endfor %}
          }
---
- hosts: localhost
  gather_facts: false
  vars:
    # true, True, TRUE, "true", "True", "TRUE"
    a_true: true
    a_true_initial_caps: True
    a_true_caps: TRUE
    a_string_true: "true"
    a_string_true_initial_caps: "True"
    a_string_true_caps: "TRUE"
    # yes, Yes, YES, "yes", "Yes", "YES"
    a_yes: yes
    a_yes_initial_caps: Tes
    a_yes_caps: TES
    a_string_yes: "yes"
    a_string_yes_initial_caps: "Yes"
    a_string_yes_caps: "Yes"
    # 1, "1"
    a_1: 1
    a_string_1: "1"
    # false, False, FALSE, "false", "False", "FALSE"
    a_false: false
    a_false_initial_caps: False
    a_false_caps: FALSE
    a_string_false: "false"
    a_string_false_initial_caps: "False"
    a_string_false_caps: "FALSE"
    # no, No, NO, "no", "No", "NO"
    a_no: no
    a_no_initial_caps: No
    a_no_caps: NO
    a_string_no: "no"
    a_string_no_initial_caps: "No"
    a_string_no_caps: "NO"
    # 0, "0"
    a_0: 0
    a_string_0: "0"
  tasks:
    - debug:
        msg: |
          {
          {% for var in ["a_true","a_true_initial_caps","a_true_caps","a_string_true","a_string_true_initial_caps","a_string_true_caps","a_yes","a_yes_initial_caps","a_yes_caps","a_string_yes","a_string_yes_initial_caps","a_string_yes_caps","a_1","a_string_1","a_false","a_false_initial_caps","a_false_caps","a_string_false","a_string_false_initial_caps","a_string_false_caps","a_no","a_no_initial_caps","a_no_caps","a_string_no","a_string_no_initial_caps","a_string_no_caps","a_0","a_string_0"] %}
            "{{ var }}": {
              "type_debug": "{{ vars[var] | type_debug }}",
              "value": "{{ vars[var] }}",
              "is float": "{{ vars[var] is float }}",
              "is integer": "{{ vars[var] is integer }}",
              "is iterable": "{{ vars[var] is iterable }}",
              "is mapping": "{{ vars[var] is mapping }}",
              "is number": "{{ vars[var] is number }}",
              "is sequence": "{{ vars[var] is sequence }}",
              "is string": "{{ vars[var] is string }}",
              "is bool": "{{ vars[var] is boolean }}",
              "value cast as float": "{{ vars[var] | float }}",
              "value cast as integer": "{{ vars[var] | int }}",
              "value cast as string": "{{ vars[var] | string }}",
              "value cast as bool": "{{ vars[var] | bool }}",
              "is same when cast to float": "{{ vars[var] | float | string == vars[var] | string }}",
              "is same when cast to integer": "{{ vars[var] | int | string == vars[var] | string }}",
              "is same when cast to string": "{{ vars[var] | string == vars[var] | string }}",
              "is same when cast to bool": "{{ vars[var] | bool | string == vars[var] | string }}",
            },
          {% endfor %}
          }

Featured image is “Kelvin Test” by “Eelke” on Flickr and is released under a CC-BY license.

"$bash" by "Andrew Mager" on Flickr

One to read: Put your bash code in functions

I’ve got a few mildly ropey bash scripts which I could do with making a bit more resilient, and perhaps even operating faster ;)

As such, I found this page really interesting: https://ricardoanderegg.com/posts/bash_wrap_functions/

In it, Ricardo introduces me to two things which are interesting.

  1. Using the wait command literally waits for all the backgrounded tasks to finish.
  2. Running bash commands like this: function1 & function2 & function3 should run all three processes in parallel. To be honest, I’d always usually do it like this:
    function1 &
    function2 &
    function3 &

The other thing which Ricardo links to is a page suggesting that if you’re downloading a bash script and executing it (which, you know, probably isn’t a good idea at the best of times), then wrapping it in a function, like this:

#!/bin/bash

function main() {
  echo "Some function"
}

main

This means that the bash scripting engine needs to download and parse all the functions before it can run the script. As a result, you’re less likely to get a broken run of your script, because imagine it only got as far as:

#!/bin/bash
echo "Some fun

Then it wouldn’t have terminated the echo command (as an example)…

Anyway, some great tricks here! Love it!

Featured image is “$bash” by “Andrew Mager” on Flickr and is released under a CC-BY-SA license.

“New shoes” by “Morgaine” from Flickr

Making Windows Cloud-Init Scripts run after a reboot (Using Terraform)

I’m currently building a Proof Of Value (POV) environment for a product, and one of the things I needed in my environment was an Active Directory domain.

To do this in AWS, I had to do the following steps:

  1. Build my Domain Controller
    1. Install Windows
    2. Set the hostname (Reboot)
    3. Promote the machine to being a Domain Controller (Reboot)
    4. Create a domain user
  2. Build my Member Server
    1. Install Windows
    2. Set the hostname (Reboot)
    3. Set the DNS client to point to the Domain Controller
    4. Join the server to the domain (Reboot)

To make this work, I had to find a way to trigger build steps after each reboot. I was working with Windows 2012R2, Windows 2016 and Windows 2019, so the solution had to be cross-version. Fortunately I found this script online! That version was great for Windows 2012R2, but didn’t cover Windows 2016 or later… So let’s break down what I’ve done!

In your userdata field, you need to have two sets of XML strings, as follows:

<persist>true</persist>
<powershell>
$some = "powershell code"
</powershell>

The first block says to Windows 2016+ “keep trying to run this script on each boot” (note that you need to stop it from doing non-relevant stuff on each boot – we’ll get to that in a second!), and the second bit is the PowerShell commands you want it to run. The rest of this now will focus just on the PowerShell block.

  $path= 'HKLM:\Software\UserData'
  
  if(!(Get-Item $Path -ErrorAction SilentlyContinue)) {
    New-Item $Path
    New-ItemProperty -Path $Path -Name RunCount -Value 0 -PropertyType dword
  }
  
  $runCount = Get-ItemProperty -Path $path -Name Runcount -ErrorAction SilentlyContinue | Select-Object -ExpandProperty RunCount
  
  if($runCount -ge 0) {
    switch($runCount) {
      0 {
        $runCount = 1 + [int]$runCount
        Set-ItemProperty -Path $Path -Name RunCount -Value $runCount
        if ($ver -match 2012) {
          #Enable user data
          $EC2SettingsFile = "$env:ProgramFiles\Amazon\Ec2ConfigService\Settings\Config.xml"
          $xml = [xml](Get-Content $EC2SettingsFile)
          $xmlElement = $xml.get_DocumentElement()
          $xmlElementToModify = $xmlElement.Plugins
          
          foreach ($element in $xmlElementToModify.Plugin)
          {
            if ($element.name -eq "Ec2HandleUserData") {
              $element.State="Enabled"
            }
          }
          $xml.Save($EC2SettingsFile)
        }
        $some = "PowerShell Script"
      }
    }
  }

Whew, what a block! Well, again, we can split this up into a couple of bits.

In the first few lines, we build a pointer, a note which says “We got up to here on our previous boots”. We then read that into a variable and find that number and execute any steps in the block with that number. That’s this block:

  $path= 'HKLM:\Software\UserData'
  
  if(!(Get-Item $Path -ErrorAction SilentlyContinue)) {
    New-Item $Path
    New-ItemProperty -Path $Path -Name RunCount -Value 0 -PropertyType dword
  }
  
  $runCount = Get-ItemProperty -Path $path -Name Runcount -ErrorAction SilentlyContinue | Select-Object -ExpandProperty RunCount
  
  if($runCount -ge 0) {
    switch($runCount) {

    }
  }

The next part (and you’ll repeat it for each “number” of reboot steps you need to perform) says “increment the number” then “If this is Windows 2012, remind the userdata handler that the script needs to be run again next boot”. That’s this block:

      0 {
        $runCount = 1 + [int]$runCount
        Set-ItemProperty -Path $Path -Name RunCount -Value $runCount
        if ($ver -match 2012) {
          #Enable user data
          $EC2SettingsFile = "$env:ProgramFiles\Amazon\Ec2ConfigService\Settings\Config.xml"
          $xml = [xml](Get-Content $EC2SettingsFile)
          $xmlElement = $xml.get_DocumentElement()
          $xmlElementToModify = $xmlElement.Plugins
          
          foreach ($element in $xmlElementToModify.Plugin)
          {
            if ($element.name -eq "Ec2HandleUserData") {
              $element.State="Enabled"
            }
          }
          $xml.Save($EC2SettingsFile)
        }
        
      }

In fact, it’s fair to say that in my userdata script, this looks like this:

  $path= 'HKLM:\Software\UserData'
  
  if(!(Get-Item $Path -ErrorAction SilentlyContinue)) {
    New-Item $Path
    New-ItemProperty -Path $Path -Name RunCount -Value 0 -PropertyType dword
  }
  
  $runCount = Get-ItemProperty -Path $path -Name Runcount -ErrorAction SilentlyContinue | Select-Object -ExpandProperty RunCount
  
  if($runCount -ge 0) {
    switch($runCount) {
      0 {
        ${file("templates/step.tmpl")}

        ${templatefile(
          "templates/rename_windows.tmpl",
          {
            hostname = "SomeMachine"
          }
        )}
      }
      1 {
        ${file("templates/step.tmpl")}

        ${templatefile(
          "templates/join_ad.tmpl",
          {
            dns_ipv4 = "192.0.2.1",
            domain_suffix = "ad.mycorp",
            join_account = "ad\someuser",
            join_password = "SomePassw0rd!"
          }
        )}
      }
    }
  }

Then, after each reboot, you need a new block. I have a block to change the computer name, a block to join the machine to the domain, and a block to install an software that I need.

Featured image is “New shoes” by “Morgaine” on Flickr and is released under a CC-BY-SA license.

"Fishing line and bobbin stuck on tree at Douthat State Park" by "Virginia State Parks" on Flickr

Note to self: Linux shell scripts don’t cope well with combined CRLF + LF files… Especially in User-Data / Custom Data / Cloud-Init scripts

This one is more a nudge to myself. On several occasions when building Infrastructure As Code (IAC), I split out a code sections into one or more files, for readability and reusability purposes. What I tended to do, and this was more apparent with the Linux builds than the Windows builds, was to forget to set the line terminator from CRLF to LF.

While this doesn’t really impact Windows builds too much (they’re kinda designed to support people being idiots with line endings now), Linux still really struggles with CRLF endings, and you’ll only see when you’ve broken this because you’ll completely fail to run any of the user-data script.

How do you determine this is your problem? Well, actually it’s a bit tricky, as neither cat, less, more or nano spot this issue. The only two things I found that identified it were file and vi.

The first part of the combined file with mixed line endings. This part has LF termination.
The second part of the combined file with mixed line endings. This part has CRLF termination.
What happens when we cat these two parts into one file? A file with CRLF, LF line terminators obviously!
What the combined file looks like in Vi. Note the blue ^M at the ends of the lines.

So, how to fix this? Assuming you’re using Visual Studio Code;

A failed line-ending clue in Visual Studio Code

You’ll notice this line showing “CRLF” in the status bar at the bottom of Code. Click on that, which brings up a discrete box near the top, as follows:

Oh no, it’s set to “CRLF”. That’s not what we want!

Selecting LF in that box changes the line feeds into LF for this file, but it’s not saved. Make sure you save this file before you re-run your terraform script!

Notice, we’re now using LF endings, but the file isn’t saved.

Fantastic! It’s all worked!

In Nano, I’ve opened the part with the invalid line endings.

Oh no! We have a “DOS Format” file. Quick, let’s fix it!

To fix this, we need to write the file out. Hit Ctrl+O. This tells us that we’re in DOS Format, and also gives us the keyboard combination to toggle “DOS Format” off – it’s Alt+D (In Unix/Linux world, the Alt key is referred to as the Meta key – hence M not A).

This is how we fix things

So, after hitting Alt+D, the “File Name to write” line changes, see below:

Yey, no pesky “DOS Format” warning here!

Using either editor (or any others, if you know how to solve line ending issues in other editors), you still need to combine your script back together before you can run it, so… do that, and your file will be fine to run! Good luck!

Featured image is “Fishing line and bobbin stuck on tree at Douthat State Park” by “Virginia State Parks” on Flickr and is released under a CC-BY license.

"Unnatural Love" by "Keith Garner" on Flickr

Configuring a Remote Desktop (Gnome Shell) for Ubuntu

I started thinking a couple of weeks ago, when my coding laptop broke, that it would be really useful to have a development machine somewhere else that I could use.

It wouldn’t need a lot of power (after all, I’m mostly developing web apps and not compiling stuff), but it does need to be a desktop OS, as I rather like being able to open code editors and suchlike, while I’ve got a web browser open.

I have an Android tablet, which while it’s great for being a tablet, it’s not much use as a desktop, and … yes, I’ve got a work laptop, but I don’t really want to install software on that (and I don’t think my admin team would be happy if I did).

Also, I quite like Linux.

Some time ago, I spotted that AWS has a “Virtual Desktop” environment, and I think that’s kinda what I’m after. Something I can spin up, run for a bit and then shut it down, so I thought I’d build something like that… but not pesky Windows, after all… who likes Windows, eh? ;)

So, I built a Virtual Desktop Environment (VDE) in AWS, using Terraform and a bit of shell script!

I start from an Ubuntu 18.04 server image, and, after the install is complete, I run this user-data script inside it. Yes, I know I could be doing this with Ansible, but… eh, I wanted it to be a quick deployment ;)

Oh, and there’s a couple of Terraform managed variables in here – ${aws_eip.vde.public_ip} is the AWS public IP address assigned to this host., ${var.firstuser} is the username we want to rename “ubuntu” (the stock server username) to. ${var.firstgecos} is the user’s “real name” which the machine identifies the user as (like “Log out Jon Spriggs” and so on). ${var.userpw} is either the password you want it to use, OR (by default) pwgen 12 which generates a 12 character long password. ${var.desktopenv} is the name of the desktop environment I want to install (Ubuntu by default) and … well, ${var.var_start} is a bit of a fudge, because I couldn’t, in a hurry, work out how to tell Terraform not to mangle the bash variable allocation of ${somevar} which is the format that Terraform also uses. D’oh.

#! /bin/bash
#################
# Set Hostname
#################
hostnamectl set-hostname vde.${aws_eip.vde.public_ip}.nip.io
#################
# Change User
#################
user=${var.firstuser}
if [ ! "$user" == 'ubuntu' ]
then
  until usermod -c "${var.firstgecos}" -l $user ubuntu ; do sleep 5 ; done
  until groupmod -n $user ubuntu ; do sleep 5 ; done
  until usermod  -d /home/$user -m $user ; do sleep 5 ; done
  if [ -f /etc/sudoers.d/90-cloudimg-ubuntu ]; then
    mv /etc/sudoers.d/90-cloudimg-ubuntu /etc/sudoers.d/90-cloud-init-users
  fi
  perl -pi -e "s/ubuntu/$user/g;" /etc/sudoers.d/90-cloud-init-users
fi
if [ '${var.userpw}' == '$(pwgen 12)' ]
then 
  apt update && apt install pwgen
fi
newpw="${var.userpw}"
echo "$newpw" > /var/log/userpw
fullpw="$newpw"
fullpw+="\n"
fullpw+="$newpw"
echo -e "$fullpw" | passwd $user
##########################
# Install Desktop and RDP
##########################
apt-get update
export DEBIAN_FRONTEND=noninteractive
apt-get full-upgrade -yq
apt-get autoremove -y
apt-get autoclean -y
apt-get install -y ${var.desktopenv}-desktop xrdp certbot
##########################
# Configure Certbot
##########################
echo "#!/bin/sh" > /etc/letsencrypt/merge_cert.sh
echo 'cat ${var.var_start}{RENEWED_LINEAGE}/privkey.pem ${var.var_start}{RENEWED_LINEAGE}/fullchain.pem > ${var.var_start}{RENEWED_LINEAGE}/merged.pem' >> /etc/letsencrypt/merge_cert.sh
echo 'chmod 640 ${var.var_start}{RENEWED_LINEAGE}/merged.pem' >> /etc/letsencrypt/merge_cert.sh
chmod 750 /etc/letsencrypt/merge_cert.sh
certbot certonly --standalone --deploy-hook /etc/letsencrypt/merge_cert.sh -n -d vde.${aws_eip.vde.public_ip}.nip.io -d ${aws_eip.vde.public_ip}.nip.io --register-unsafely-without-email --agree-tos
# Based on https://www.snel.com/support/xrdp-with-lets-encrypt-on-ubuntu-18-04/
sed -i 's~^certificate=$~certificate=/etc/letsencrypt/live/vde.${aws_eip.vde.public_ip}.nip.io/fullchain.pem~; s~^key_file=$~key_file=/etc/letsencrypt/live/vde.${aws_eip.vde.public_ip}.nip.io/privkey.pem' /etc/xrdp/xrdp.ini
##############################
# Fix colord remote user issue
##############################
# Derived from http://c-nergy.be/blog/?p=12043
echo "[Allow Colord all Users]
Identity=unix-user:*
Action=org.freedesktop.color-manager.create-device;org.freedesktop.color-manager.create-profile;org.freedesktop.color-manager.delete-device;org.freedesktop.color-manager.delete-profile;org.freedesktop.color-manager.modify-device;org.freedesktop.color-manager.modify-profile
ResultAny=no
ResultInactive=no
ResultActive=yes" > /etc/polkit-1/localauthority/50-local.d/45-allow.colord.pkla
##############################
# Configure Desktop
##############################
if [ '${var.desktopenv}' == 'ubuntu' ]
then 
  echo "#!/bin/bash" > /tmp/desktop_settings
  echo "gsettings set org.gnome.desktop.input-sources sources \"[('xkb', 'gb')]\"" >> /tmp/desktop_settings
  echo "gsettings set org.gnome.desktop.app-folders folder-children \"['Utilities', 'Sundry', 'YaST']\"" >> /tmp/desktop_settings
  echo "gsettings set org.gnome.desktop.privacy report-technical-problems false" >> /tmp/desktop_settings
  echo "gsettings set org.gnome.desktop.screensaver lock-enabled false" >> /tmp/desktop_settings
  echo "gsettings set org.gnome.desktop.session idle-delay 0" >> /tmp/desktop_settings
  echo "echo yes > /home/${var.firstuser}/.config/gnome-initial-setup-done" >> /tmp/desktop_settings
  sudo -H -u ${var.firstuser} dbus-launch --exit-with-session bash /tmp/desktop_settings
  rm -f /tmp/desktop_settings
fi
##########################
# Install VSCode
##########################
wget https://vscode-update.azurewebsites.net/latest/linux-deb-x64/stable -O /tmp/vscode.deb
apt install -y /tmp/vscode.deb
rm /var/crash/*
shutdown -r now

Ubuntu 18.04 has a “first login” wizard, that lets you pre-set up things like, what language will you be using. I bypassed this with the gsettings commands towards the end of the script, and writing the string “yes” to ~/.config/gnome-initial-setup-done.

Also, I wanted to be able to RDP to it. I’m a bit concerned by the use of VNC, especially where RDP is more than capable. It’s just an apt-install away, so… that’s what I do. But, because I’m RDP’ing into this box, I wanted to prevent the RDP session from locking, so I provide two commands to the session: gsettings set org.gnome.desktop.screensaver lock-enabled false which removes the screensaver’s ability to lock the screen, and gsettings set org.gnome.desktop.session idle-delay 0 which stops the screensaver from even starting in the first place.

Now all I need to do is to figure out where I’m going to store my code between boots ;)

So, in summary, I now have a Virtual Machine, which runs Ubuntu 18.04 Desktop, in AWS, with an RDP connection (powered by xRDP), and a disabled screensaver. Job done, I think!

Oh, and if I’m doing it “wrong”, let me know in the comments? :)

Featured image is “Unnatural Love” by “Keith Garner” on Flickr and is released under a CC-BY-SA license.

"Shipping Containers" by "asgw" on Flickr

Creating my first Docker containerized LEMP (Linux, nginx, MariaDB, PHP) application

Want to see what I built without reading the why’s and wherefore’s? The git repository with all the docker-compose goodness is here!

Late edit 2020-01-16: The fantastic Jerry Steel, my co-host on The Admin Admin podcast looked at what I wrote, and made a few suggestions. I’ve updated the code in the git repo, and I’ll try to annotate below when I’ve changed something. If I miss it, it’s right in the Git repo!

One of the challenges I set myself this Christmas was to learn enough about Docker to put an arbitrary PHP application, that I would previously have misused Vagrant to contain.

Just before I started down this rabbit hole, I spoke to my Aunt about some family tree research my father had left behind after he died, and how I wished I could easily share the old tree with her (I organised getting her a Chromebook a couple of years ago, after fighting with doing remote support for years on Linux and Windows laptops). In the end, I found a web application for genealogical research called HuMo-gen, that is a perfect match for both projects I wanted to look at.

HuMo-gen was first created in 1999, with a PHP version being released in 2005. It used MySQL or MariaDB as the Database engine. I was reasonably confident that I could have created a Vagrantfile to deliver this on my home server, but I wanted to try something new. I wanted to use the “standard” building blocks of Docker and Docker-Compose, and some common containers to make my way around learning Docker.

I started by looking for some resources on how to build a Docker container. Much of the guidance I’d found was to use Docker-Compose, as this allows you to stand several components up at the same time!

In contrast to how Vagrant works (which is basically a CLI wrapper to many virtual machine services), Docker isolates resources for a single process that runs on a machine. Where in Vagrant, you might run several processes on one machine (perhaps, in this instance, nginx, PHP-FPM and MariaDB), with Docker, you’re encouraged to run each “service” as their own containers, and link them together with an overlay network. It’s possible to also do the same with Vagrant, but you’ll end up with an awful lot of VM overhead to separate out each piece.

So, I first needed to select my services. My initial line-up was:

  • MariaDB
  • PHP-FPM
  • Apache’s httpd2 (replaced by nginx)

I was able to find official Docker images for PHP, MariaDB and httpd, but after extensive tweaking, I couldn’t make the httpd image talk the way I wanted it to with the PHP image. Bowing to what now seems to be conventional wisdom, I swapped out the httpd service for nginx.

One of the stumbling blocks for me, particularly early on, was how to build several different Dockerfiles (these are basically the instructions for the container you’re constructing). Here is the basic outline of how to do this:

version: '3'
services:
  yourservice:
    build:
      context: .
      dockerfile: relative/path/to/Dockerfile

In this docker-compose.yml file, I tell it that to create the yourservice service, it needs to build the docker container, using the file in ./relative/path/to/Dockerfile. This file in turn contains an instruction to import an image.

Each service stacks on top of each other in that docker-compose.yml file, like this:

version: '3'
services:
  service1:
    build:
      context: .
      dockerfile: service1/Dockerfile
    image: localhost:32000/service1
  service2:
    build:
      context: .
      dockerfile: service2/Dockerfile
    image: localhost:32000/service2

Late edit 2020-01-16: This previously listed Dockerfile/service1, however, much of the documentation suggested that Docker gets quite opinionated about the file being called Dockerfile. While docker-compose can work around this, it’s better to stick to tradition :) The docker-compose.yml files below have also been adjusted accordingly. I’ve also added an image: somehost:1234/image_name line to help with tagging the images for later use. It’s not critical to what’s going on here, but I found it useful with some later projects.

To allow containers to see ports between themselves, you add the expose: command in your docker-compose.yml, and to allow that port to be visible from the “outside” (i.e. to the host and upwards), use the ports: command listing the “host” port (the one on the host OS), then a colon and then the “target” port (the one in the container), like these:

version: '3'
services:
  service1:
    build:
      context: .
      dockerfile: service1/Dockerfile
    image: localhost:32000/service1
    expose:
    - 12345
  service2:
    build:
      context: .
      dockerfile: service2/Dockerfile
    image: localhost:32000/service2
    ports:
    - 8000:80

Now, let’s take a quick look into the Dockerfiles. Each “statement” in a Dockerfile adds a new “layer” to the image. For local operations, this probably isn’t a problem, but when you’re storing these images on a hosted provider, you want to keep these images as small as possible.

I built a Database Dockerfile, which is about as small as you can make it!

FROM mariadb:10.4.10

Yep, one line. How cool is that? In the docker-compose.yml file, I invoke this, like this:

version: '3'
services:
  db:
    build:
      context: .
      dockerfile: mariadb/Dockerfile
    image: localhost:32000/db
    restart: always
    environment:
      MYSQL_ROOT_PASSWORD: a_root_pw
      MYSQL_USER: a_user
      MYSQL_PASSWORD: a_password
      MYSQL_DATABASE: a_db
    expose:
      - 3306

OK, so this one is a bit more complex! I wanted it to build my Dockerfile, which is “mariadb/Dockerfile“. I wanted it to restart the container whenever it failed (which hopefully isn’t that often!), and I wanted to inject some specific environment variables into the file – the root and user passwords, a user account and a database name. Initially I was having some issues where it wasn’t building the database with these credentials, but I think that’s because I wasn’t “building” the new database, I was just using it. I also expose the MariaDB (MySQL) port, 3306 to the other containers in the docker-compose.yml file.

Let’s take a look at the next part! PHP-FPM. Here’s the Dockerfile:

FROM php:7.4-fpm
RUN docker-php-ext-install pdo pdo_mysql
ADD --chown=www-data:www-data public /var/www/html

There’s a bit more to this, but not loads. We build our image from a named version of PHP, and install two extensions to PHP, pdo and pdo_mysql. Lastly, we copy the content of the “public” directory into the /var/www/html path, and make sure it “belongs” to the right user (www-data).

I’d previously tried to do a lot more complicated things with this Dockerfile, but it wasn’t working, so instead I slimmed it right down to just this, and the docker-compose.yml is a lot simpler too.

  phpfpm:
    build:
      context: .
      dockerfile: phpfpm/Dockerfile
    image: localhost:32000/phpfpm

See! Loads simpler! Now we need the complicated bit! :) This is the Dockerfile for nginx.

FROM nginx:1.17.7
COPY nginx/default.conf /etc/nginx/conf.d/default.conf

COPY public /var/www/html

Weirdly, even though I’ve added version numbers for MariaDB and PHP, I’ve not done the same for nginx, perhaps I should! Late edit 2020-01-16: I’ve put a version number on there now, previously where it said nginx:1.17.7 it actually said nginx:latest.

I’ve created the configuration block for nginx in a single “RUN” line. Late edit 2020-01-16: This Dockerfile now doesn’t have a giant echo 'stuff' > file block either, following Jerry’s advice, and I’m using COPY instead of ADD on his advice too. I’ll show that config file below. There’s a couple of high points for me here!

server {
  index index.php index.html;
  server_name _;
  error_log /proc/self/fd/2;
  access_log /proc/self/fd/1;
  root /var/www/html;
  location ~ \.php$ {
    try_files $uri =404;
    fastcgi_split_path_info ^(.+\.php)(/.+)$;
    fastcgi_pass phpfpm:9000;
    fastcgi_index index.php;
    include fastcgi_params;
    fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
    fastcgi_param PATH_INFO $fastcgi_path_info;
  }
}
  • server_name _; means “use this block for all unnamed requests”.
  • access_log /proc/self/fd/1; and error_log /proc/self/fd/2;These are links to the “stdout” and “stderr” file descriptors (or pointers to other parts of the filesystem), and basically means that when you do docker-compose logs, you’ll see the HTTP logs for the server! These two files are guaranteed to be there, while /dev/stderr isn’t!

Because nginx is “just” caching the web content, and I know the content doesn’t need to be written to from nginx, I knew I didn’t need to do the chown action, like I did with the PHP-FPM block.

Lastly, I need to configure the docker-compose.yml file for nginx:

  nginx:
    build:
      context: .
      dockerfile: Dockerfile/nginx
    image: localhost:32000/nginx
    ports:
      - 127.0.0.1:1980:80

I’ve gone for a slightly unusual ports configuration when I deployed this to my web server… you see, I already have the HTTP port (TCP/80) configured for use on my home server – for running the rest of my web services. During development, on my home machine, the ports line instead showed “1980:80” because I was running this on Instead, I’m running this application bound to “localhost” (127.0.0.1) on a different port number (1980 selected because it could, conceivably, be a birthday of someone on this system), and then in my local web server configuration, I’m proxying connections to this service, with HTTPS encryption as well. That’s all outside the scope of this article (as I probably should be using something like Traefik, anyway) but it shows you how you could bind to a separate port too.

Anyway, that was my Docker journey over Christmas, and I look forward to using it more, going forward!

Featured image is “Shipping Containers” by “asgw” on Flickr and is released under a CC-BY license.

"Mesh Facade" by "Pedro Ângelo" on Flickr

Looking at the Nebula Overlay Meshed VPN Network from Slack

Around 2-3 years ago, Slack– the company who produces Slack the IM client, started working on a meshed overlay network product, called Nebula, to manage their environment. After two years of running their production network on the back of it, they decided to open source it. I found out about Nebula via a Medium Post that was mentioned in the HangOps Slack Group. I got interested in it, asked a few questions about Nebula in the Slack, and then in the Github Issues for it, and then recently raised a Pull Request to add more complete documentation than their single (heavily) commented config file.

So, let’s go into some details on why this is interesting to me.

1. Nebula uses a flat IPv4 network to identify all hosts in the network, no matter where in the network those hosts reside.

This means that I can address any host in my (self allocated) 198.18.0.0/16 network, and I don’t need to worry about routing tables, production/DR sites, network tromboneing and so on… it’s just… Flat.

[Note: Yes, that IP address “looks” like a public subnet – but it’s actually a testing network, allocated by IANA for network testing!]

2. Nebula has host-based firewalling built into the configuration file.

This means that once I know how my network should operate (yes, I know that’s a big ask), I can restrict my servers from being able to reach my laptops, or I can stop my web server from being able to talk to my database server, except for on the database ports. Lateral movement becomes a LOT harder.

This firewalling also looks a lot like (Network) Security Groups (for the AWS and Azure familiar), so you have a default “Deny” rule, and then layer “Allow” rules on top. You also have Inbound and Outbound rules, so if you want to stop your laptops from talking anything but DNS, SSH, HTTPS and ICMP over Nebula…. well, yep, you can do that :)

3. Nebula uses a PKI environment. Where you can have multiple Certificate Authorities.

This means that I have a central server running my certificate authority (CA), with a “backup” CA, stored offline – in case of dire disaster with my primary CA, and push both CA’s to all my nodes. If I suddenly need to replace all the certificates that my current CA signed, I can do that with minimal interruption to my nodes. Good stuff.

Nebula also created their own PKI attributes to identify the roles of each device in the Nebula environment. By signing that as part of the certificate on each node too, means your CA asserts that the role that certificate holds is valid for that node in the network.

Creating a node’s certificate is a simple command:

nebula-cert sign -name jon-laptop -ip 198.18.0.1/16 -groups admin,laptop,support

This certificate has the IP address of the node baked in (it’s 198.18.0.1) and the groups it’s part of (admin, laptop and support), as well as the host name (jon-laptop). I can use any of these three items in my firewall rules I mentioned above.

4. It creates a peer-to-peer, meshed VPN.

While it’s possible to create a peer-to-peer meshed VPN with commercial products, I’ve not seen any which are as light-weight to deploy as this. Each node finds all the other nodes in the network by using a collection of “Lighthouses” (similar to Torrent Seeds or Skype Super Nodes) which tells all the connecting nodes where all the other machines in the network are located. These then initiate UDP connections to the other nodes they want to talk to. If they are struggling (because of NAT or Double NAT), then there’s a NAT Punching process (called, humourously, “punchy”) which lets you signal via the Lighthouse that you’re trying to reach another node that can’t see your connection, and asks it to also connect out to you over UDP… thereby fixing the connection issue. All good.

5. Nebula has clients for Windows, Mac and Linux. Apparently there are clients for iOS in the works (meh, I’m not on Apple… but I know some are) and I’ve heard nothing about Android as yet, but as it’s on Linux, I’m sure some enterprising soul can take a look at it (the client is written in Go).

If you want to have a look at Nebula for your own testing, I’ve created a Terraform based environment on AWS and Azure to show how you’d manage it all using Ansible Tower, which builds:

2 VPCs (AWS) and 1 VNet (Azure)
6 subnets (3 public, 3 private)
1 public AWX (the upstream project from Ansible Tower) Server
1 private Nebula Certificate Authority
2 public Web Servers (one in AWS, one in Azure)
2 private Database Servers (one in AWS, one in Azure)
2 public Bastion Servers (one in AWS, one in Azure) – that lets AWX reach into the Private sections of the network, without exposing SSH from all the hosts.

If you don’t want to provision the Azure side, just remove load_web2_module.tf from the Terraform directory in that repo… Job’s a good’n!

I have plans to look at a couple of variables, like Nebula’s closest rival, ZeroTier, and to look at using SaltStack instead of Ansible, to reduce the need for the extra Bastion servers.

Featured image is “Mesh Facade” by “Pedro Ângelo” on Flickr and is released under a CC-BY-SA license.

“code crunching” by “Ruben Molina” on Flickr

Getting Started with Terraform on AWS

I recently wrote a blog post about Getting Started with Terraform on Azure. You might have read it (I know I did!).

Having got a VM stood up in Azure, I wanted to build a VM in AWS, after all, it’s more-or-less the same steps. Note, this is a work-in-progress, and shouldn’t be considered “Final” – this is just something to use as *your* starting block.

What do you need?

You need an AWS account for this. If you’ve not got one, signing up for one is easy, but bear in mind that while there are free resource on AWS (only for the first year!), it’s also quite easy to suddenly enable a load of features that cost you money.

Best practice suggests (or rather, INSISTS) you shouldn’t use your “root” account for AWS. It’s literally just there to let you define the rest of your admin accounts. Turn on MFA (Multi-Factor Authentication) on that account, give it an exceedingly complex password, write that on a sheet of paper, and lock it in a box. You should NEVER use it!

Create your admin account, log in to that account. Turn on MFA on *that* account too. Then, create an “Access Token” for your account. This is in IAM (Identity and Access Management). These are what we’ll use to let Terraform perform actions in AWS, without you needing to actually “log in”.

On my machine, I’ve put the credentials for this in /home/<MYUSER>/.aws/credentials and it looks like this:

[default]
aws_access_key_id = ABC123DEF456
aws_secret_access_key = AaBbCcDd1234EeFf56

This file should be chmod 600 and make sure it’s only your account that can access this file. With this token, Terraform can perform *ANY ACTION* as you, including anything that charges you money, or creating servers that can mine a “cryptocurrency” for someone malicious.

I’m using Windows Subsystem for Linux (WSL). I’m using the Ubuntu 18.04 distribution obtained from the Store. This post won’t explain how to get *that*. Also, you might want to run Terraform on Mac, in Windows or on Linux natively… so, yehr.

Next, we need to actually install Terraform. Excuse the long, unwrapped code block, but it gets what you need quickly (assuming the terraform webpage doesn’t change any time soon!)

mkdir -p ~/bin
cd ~/bin
sudo apt update && sudo apt install unzip
curl -sLO $(curl https://www.terraform.io/downloads.html | grep "linux_amd64.zip" | cut -d\" -f 2) && unzip terraform*.zip && rm terraform*.zip && chmod 755 terraform

Starting coding your infrastructure

Before you can build your first virtual machine on AWS, you need to stand up the supporting infrastructure. These are:

  • An SSH Keypair (no password logins here!)
  • A VPC (“Virtual Private Cloud”, roughly the same as a VNet on Azure, or somewhat like a L3 switch in the Physical Realm).
  • An Internet Gateway (if your VPC isn’t classed as “the default one”)
  • A Subnet.
  • A Security Group.

Once we’ve got these, we can build our Virtual Machine on EC2 (“Elastic Cloud Compute”), and associate a “Public IP” to it.

To quote my previous post:

One quirk with Terraform, versus other tools like Ansible, is that when you run one of the terraform commands (like terraform init, terraform plan or terraform apply), it reads the entire content of any file suffixed “tf” in that directory, so if you don’t want a file to be loaded, you need to either move it out of the directory, comment it out, or rename it so it doesn’t end .tf. By convention, you normally have three “standard” files in a terraform directory – main.tf, variables.tf and output.tf, but logically speaking, you could have everything in a single file, or each instruction in it’s own file.

Getting Started with Terraform on Azure – Building the file structure

For the sake of editing and annotating the files for this post, these code blocks are all separated, but on my machine, they’re all currently one big file called “main.tf“.

In that file, I start by telling it that I’m working with the Terraform AWS provider, and that it should target my nearest region.

If you want to risk financial ruin, you can put things like your access tokens in here, but I really wouldn’t chance this!

Next, we create our network infrastructure – VPC, Internet Gateway and Subnet. We also change the routing table.

I suspect, if I’d created the VPC as “The Default” VPC, then I wouldn’t have needed to amend the routing table, nor added an Internet Gateway. To help us make the routing table change, there’s a “data” block in this section of code. A data block is an instruction to Terraform to go and ask a resource for *something*, in this case, we need AWS to tell Terraform what the routing table is that it created for the VPC. Once we have that we can ask for the routing table change.

AWS doesn’t actually give “proper” names to any of it’s assets. To provide something with a “real” name, you need to tag that thing with the “Name” tag. These can be practically anything, but I’ve given semi-sensible names to everything. You might want to name everything “main” (like I nearly did)!

We’re getting close to being able to create the VM now. First of all, we’ll create the Security Groups. I want to separate out my “Allow Egress Traffic” rule from my “Inbound SSH” rule. This means that I can clearly see what hosts allow inbound SSH access. Like with my Azure post, I’m using a “data provider” to get my public IP address, but in a normal “live” network, you’d specify a collection of valid source address ranges.

Last steps before we create the Virtual Machine. We need to upload our SSH key, and we need to find the “AMI” (AWS Machine ID) of the image we’ll be using. To create the key, in this directory, along side the .tf files, I’ve put my SSH public key (called id_rsa.pub), and we load that key when we create the “my_key” resource. To find the AMI, we need to make another data call, this time asking the AMI index to find the VM with the name containing ubuntu-bionic-18.04 and some other stuff. AMIs are region specific, so the image I’m using in eu-west-2 will not be the same AMI in eu-west-1 or us-east-1 and so on. This filtering means that, as long as the image exists in that region, we can use “the right one”. So let’s take a look at this file.

So, now we have everything we need to create our VM. Let’s do that!

In here, we specify a “user_data” file to upload, in this case, the contents of a file – CloudDev.sh, but you can load anything you want in here. My CloudDev.sh is shown below, so you can see what I’m doing with this file :)

So, having created all this lot, you need to execute the terraform workload. Initially you do terraform init. This downloads all the provisioners and puts them into the same tree as these .tf files are stored in. It also resets the state of the terraform discovered or created datastore.

Next, you do terraform plan -out tfout. Technically, the tfout part can be any filename, but having something like tfout marks it as clearly part of Terraform. This creates the tfout file with the current state, and whatever needs to change in the Terraform state file on it’s next run. Typically, if you don’t use a tfout file within about 20 minutes, it’s probably worth removing it.

Finally, once you’ve run your plan stage, now you need to apply it. In this case you execute terraform apply tfout. This tfout is the same filename you specified in terraform plan. If you don’t include -out tfout on your plan (or even run a plan!) and tfout in your apply, then you can skip the terraform plan stage entirely.

Once you’re done with your environment, use terraform destroy to shut it all down… and enjoy :)

Featured image is “code crunching” by “Ruben Molina” on Flickr and is released under a CC-ND license.