A stack of Jenga bricks falling over

A Quick Fix for “Backend initialization required” from Terragrunt

Today I ran terragrunt apply against a IaC directory, and got this response:

╷
│ Error: Backend initialization required: please run "terraform init"
│ 
│ Reason: Backend configuration block has changed
│ 
│ The "backend" is the interface that Terraform uses to store state,
│ perform operations, etc. If this message is showing up, it means that the
│ Terraform configuration you're using is using a custom configuration for
│ the Terraform backend.
│ 
│ Changes to backend configurations require reinitialization. This allows
│ Terraform to set up the new configuration, copy existing state, etc. Please
│ run
│ "terraform init" with either the "-reconfigure" or "-migrate-state" flags
│ to
│ use the current configuration.
│ 
│ If the change reason above is incorrect, please verify your configuration
│ hasn't changed and try again. At this point, no changes to your existing
│ configuration or state have been made.
╵
ERRO[0000] Hit multiple errors:
Hit multiple errors:
exit status 1 

But wait, I hear you say, Terragrunt runs terraform init for you… so what gives?

Well, in this case, the terragrunt.hcl has a dependency block, and one of those dependencies has not run properly, so… let’s fix it

Read the content of your terragrunt.hcl

terraform {
  source = "git@github.com:example/example-terraform-modules.git//module"
}

include {
  path = find_in_parent_folders()
}

dependency "dependency_1" {
  config_path = "${get_terragrunt_dir()}/../dependency"

  mock_outputs_allowed_terraform_commands = ["destroy", "force-unlock"]
  mock_outputs = {
    output_1 = []
    output_2 = ""
  }
}

inputs = {
  name      = "some_module"
  some_key  = dependency.dependency_1.outputs.output_1
  other_key = dependency.dependency_1.outputs.output_2
}

Right, so for some reason the dependency won’t run. Change into that directory, and run terragrunt apply --terragrunt-source-update. Hopefully, you’ll get something like this:

Initializing the backend...

Successfully configured the backend "example"! Terraform will automatically
use this backend unless the backend configuration changes.

Initializing provider plugins...
- Reusing previous version of example/example from the dependency lock file
- Installing example/example v1.0.0...
- Installed example/example v1.0.0 (signed by Example)

Terraform has been successfully initialized!

You may now begin working with Terraform. Try running "terraform plan" to see
any changes that are required for your infrastructure. All Terraform commands
should now work.

If you ever set or change modules or backend configuration for Terraform,
rerun this command to reinitialize your working directory. If you forget, other
commands will detect it and remind you to do so if necessary.
example_module.this: Refreshing state... [id=an-example]

No changes. Your infrastructure matches the configuration.

Terraform has compared your real infrastructure against your configuration
and found no differences, so no changes are needed.

Apply complete! Resources: 0 added, 0 changed, 0 destroyed.

Outputs:
output_1 = {"some_key": "some_value"}
output_2 = "some_string"

You may find yourself having to traverse several different dependencies until you get to the one which is missing… and then it should work :)

Featured image is “Jenga” by “Mara Tr.” on Flickr and is released under a CC-BY license.

A medieval helm in gold on a bench at a museum

Fixing 403 errors from ghcr.io with #helm pull

At work, I’m using skaffold to deploy a helm chart which references a ghcr.io repository. Here’s the stanza I’m looking at:

apiVersion: skaffold/v3
kind: Config
deploy:
  helm:
    releases:
      - name: {package}
        remoteChart: oci://ghcr.io/{owner}/{package}

This is the first time I’ve tried to deploy this chart, and I kept getting this message:

No tags generated
Starting test...
Starting deploy...
Helm release {package} not installed. Installing...
Error: INSTALLATION FAILED: failed to authorize: failed to fetch anonymous token: unexpected status from GET request to https://ghcr.io/token?scope=repository%3A{owner}%2F{package}%3Apull&scope=repository%3Auser%2Fimage%3Apull&service=ghcr.io: 403 Forbidden
deploying "{package}": install: exit status 1

I thought this might have been an issue with the skaffold file, so I tried running this directly with helm:

$ helm pull oci://ghcr.io/{owner}/{package}
Error: failed to authorize: failed to fetch anonymous token: unexpected status from GET request to https://ghcr.io/token?scope=repository%3A{owner}%2F{package}%3Apull&scope=repository%3Auser%2Fimage%3Apull&service=ghcr.io: 403 Forbidden

Huh, that looks a bit familiar. I spent a little while checking to see whether this was something at the Kubernetes cluster, or if it was just me, and ended up finding this nugget (thanks to a steer from this post)

$ gh auth token | helm registry login ghcr.io -u {my_github_user} --password-stdin
Login Succeeded

And now it works!

elm pull oci://ghcr.io/{owner}/{package}
Pulled: ghcr.io/{owner}/{package}:1.2.3
Digest: sha256:decafbad1234567890aabbccddeeffdeadbeefbadbadbadbad12345678901234

Featured image is “helm” by “23 dingen voor musea” on Flickr and is released under a CC-BY-SA license.

A text dialogue from a web page showing "Uh oh. Something really just went wrong. Good thing we know about it and have our crack team of squirrels getting their nuts out of the system!"

How to capture stdout and stderr from a command in a shellscript without preventing piped processes from seeing them

I love the tee command – it captures stdout [1] and puts it in a file, while then returning that output to stdout for the next process in a pipe to consume, for example:

$ ls -l | tee /tmp/output
total 1
xrwxrwxrw 1 jonspriggs jonspriggs 0 Jul 27 11:16 build.sh
$ cat /tmp/output
total 1
xrwxrwxrw 1 jonspriggs jonspriggs 0 Jul 27 11:16 build.sh

But wait, why is that useful? Well, in a script, you don’t always want to see the content scrolling past, but in the case of a problem, you might need to catch up with the logs afterwards. Alternatively, you might do something like this:

if some_process | tee /tmp/output | grep -q "some text"
then
  echo "Found 'some text' - full output:"
  cat /tmp/output
fi

This works great for stdout but what about stderr [2]? In this case you could just do:

some_process 2>&1 | tee /tmp/output

But that mashes all of stdout and stderr into the same blob.

In my case, I want to capture all the output (stdout and stderr) of a given process into a file. Only stdout is forwarded to the next process, but I still wanted to have the option to see stderr as well during processing. Enter process substitution.

TEMP_DATA_PATH="$(mktemp -d)"
capture_out() {
  base="${TEMP_DATA_PATH}/${1}"
  mkdir "${base}"
  shift
  "$@" 2> >(tee "${base}/stderr" >&2) 1> >(tee "${base}/stdout")
}

With this, I run capture_out step-1 do_a_thing and then in /tmp/tmp.sometext/step-1/stdout and /tmp/tmp.sometext/step-1/stderr are the full outputs I need… but wait, I can also do:

$ capture_out step-1 do_a_thing | \
  capture_out step-2 process --the --thing && \
  capture_out step-3 echo "..." | capture_out step-4 profit
$ find /tmp/tmp.sometext -type f
/tmp/tmp.sometext/step-1/stdout
/tmp/tmp.sometext/step-1/stderr
/tmp/tmp.sometext/step-2/stdout
/tmp/tmp.sometext/step-2/stderr
/tmp/tmp.sometext/step-4/stdout
/tmp/tmp.sometext/step-4/stderr
/tmp/tmp.sometext/step-3/stderr
/tmp/tmp.sometext/step-3/stdout

Or

if capture_out has_an_error something-wrong | capture_out handler check_output
then
  echo "It all went great"
else
  echo "Process failure"
  echo "--Initial process"
  # Use wc -c to check the number of characters in the file
  if [ -e "${TEMP_DATA_PATH}/has_an_error/stdout"] && [ 0 -ne "$(wc -c "${TEMP_DATA_PATH}/has_an_error/stdout")" ]
  then
    echo "----stdout:"
    cat "${TEMP_DATA_PATH}/has_an_error/stdout"
  fi
  if [ -e "${TEMP_DATA_PATH}/has_an_error/stderr"] && [ 0 -ne "$(wc -c "${TEMP_DATA_PATH}/has_an_error/stderr")" ]
  then
    echo "----stderr:"
    cat "${TEMP_DATA_PATH}/has_an_error/stderr"
  fi
  echo "--Second stage"
  if [ -e "${TEMP_DATA_PATH}/handler/stdout"] && [ 0 -ne "$(wc -c "${TEMP_DATA_PATH}/handler/stdout")" ]
  then
    echo "----stdout:"
    cat "${TEMP_DATA_PATH}/handler/stdout"
  fi
  if [ -e "${TEMP_DATA_PATH}/handler/stderr"] && [ 0 -ne "$(wc -c "${TEMP_DATA_PATH}/handler/stderr")" ]
  then
    echo "----stderr:"
    cat "${TEMP_DATA_PATH}/handler/stderr"
  fi
fi

This has become part of my normal toolkit now for logging processes. Thanks bash!

Also, thanks to ChatGPT for helping me find this structure that I’d seen before, but couldn’t remember how to do it! (it almost got it right too! Remember kids, don’t *trust* what ChatGPT gives you, use it as a research starting point, test *that* against your own knowledge, test *that* against your environment and test *that* against expected error cases too! Copy & Paste is not the best idea with AI generated code!)

Footnotes

[1] stdout is the name of the normal output text we see in a shell, it’s also sometimes referred to as “file descriptor 1” or “fd1”. You can also output to &1 with >&1 which means “send to fd1”

[2] stderr is the name of the output in a shell when an error occurs. It isn’t caught by things like some_process > /dev/null which makes it useful when you don’t want to see output, just errors. Like stdout, it’s also referred to as “file descriptor 2” or “fd2” and you can output to &2 with >&2 if you want to send stdout to stderr.

Featured image is “WordPress Error” by “tara hunt” on Flickr and is released under a CC-BY-SA license.

A series of gold blocks, each crossed, and one of the lower blocks has engraved "Eduardo Nery 1995-1998 Aleluia - Secla"

Deploying the latest build of a template (machine) image with #Xen #Orchestrator

In my current role we are using Packer to build images on a Xen Orchestrator environment, use a CI/CD system to install that image into both a Xen Template and an AWS AMI, and then we use Terraform to use that image across our estate. The images we build with Packer have this stanza in it:

locals {
  timestamp = regex_replace(timestamp(), "[- TZ:]", "")
}
variable "artifact_name" {
  default = "SomeLinux-version.iso"
}
source "xenserver-iso" "this" {
  vm_name = "${var.artifact_name}-${local.timestamp}"
  # more config below
}

As a result, the built images include a timestamp.

When we use the AMI in Terraform, we can locate it with this code:

variable "ami_name" {
  default = "SomeLinux-version.iso-"
}

data "aws_ami" "this" {
  most_recent = true

  filter {
    name   = "name"
    values = [var.ami_name]
  }

  filter {
    name   = "virtualization-type"
    values = ["hvm"]
  }

  owners = [var.owner]
}

But, because Xen doesn’t track when a template is created, instead I needed to do something different. Enter get_xoa_template.sh.

#!/bin/bash
trap cleanup SIGINT SIGTERM EXIT
exit_message=""
set_exit=0
fail() {
[ -n "$1" ] && echo "$1" >&2
[ "$2" -gt 0 ] && exit $2
}
cleanup() {
trap – SIGINT SIGTERM EXIT
[ "$UNREGISTER" -eq 1 ] && [ "$STATE" == "signed_in" ] && xo-cli –unregister 2>&1
[ -n "$exit_message" ] && fail "$exit_message" $set_exit
}
log_debug() {
[ -n "$DEBUG" ] && echo "$1" >> "$DEBUG"
}
parse_params() {
UNREGISTER=1
DEBUG=""
while :; do
case "${1-}" in
-h | –help)
echo "usage: get_xoa_template.sh –template SomeTemplatePrefix" >&2
echo "" >&2
echo "Options:" >&2
echo " -t | –template MyTemplatePrefix = The template to look for (required)" >&2
echo " -s | –server ws://192.0.2.1 = Sign into Xen Orchestrator on 192.0.2.1" >&2
echo " [Default to using XOA_URL environment variable]" >&2
echo " -u | –user username@example.org = Sign into Xen Orchestrator using this username" >&2
echo " [Default to using XOA_USER environment variable]" >&2
echo " -p | –password hunter2 = Sign into Xen Orchestrator using this password" >&2
echo " [Default to using XOA_PASSWORD environment variable]" >&2
echo " -l | –pool MyXenPool1 = Use this pool when looking for the template." >&2
echo " [Omit to ignore]" >&2
echo " -x | –no-unregister = Don't log out from the XOA server once connected." >&2
echo " -d | –debug = Log output to /tmp/xocli_output" >&2
echo " -d | –debug /path/to/debug = Log output to the specified path" >&2
echo " –debug=/path/to/debug = Log output to the specified path" >&2
exit 255
;;
-s | –server)
XOA_URL="${2-}"
shift
;;
-u | –user)
XOA_USER="${2-}"
shift
;;
-p | –password)
XOA_PASSWORD="${2-}"
shift
;;
-l | –pool)
XOA_POOL="${2-}"
shift
;;
-t | –template)
TEMPLATE="${2-}"
shift
;;
-x | –no-unregister)
UNREGISTER=0
;;
-d | –debug)
DEBUG=/tmp/xocli_output
[ -n "${2-}" ] && [ "$(echo "${2-}" | cut -c1)" != "-" ] && DEBUG="${2-}" && shift
;;
–debug=*)
DEBUG="$(echo $1 | sed -E -e 's/^[^=]+=//')"
;;
*)
break
;;
esac
shift
done
}
sign_in() {
[ -z "$XOA_URL" ] || [ -z "$XOA_USER" ] || [ -z "$XOA_PASSWORD" ] && fail "Missing sign-in details" 1
log_debug "Logging in"
if [ -n "$DEBUG" ]
then
xo-cli –register –au "$XOA_URL" "$XOA_USER" "$XOA_PASSWORD" 2>&1 | tee -a "$DEBUG" | grep -q 'Successfully' || fail "Login failed" 2
else
xo-cli –register –au "$XOA_URL" "$XOA_USER" "$XOA_PASSWORD" 2>&1 | grep -q 'Successfully' || fail "Login failed" 2
fi
STATE="signed_in"
}
get_pool() {
[ -z "$XOA_POOL" ] && log_debug "No Pool" && return 0
log_debug "Getting Pool ID"
if [ -n "$DEBUG" ]
then
POOL_ID="\$pool=$(xo-cli –list-objects type=pool | jq -c -r ".[] | select(.name_label | match(\"${XOA_POOL}\")) | .uuid" | sort | tail -n 1 | tee -a "$DEBUG")"
else
POOL_ID="\$pool=$(xo-cli –list-objects type=pool | jq -c -r ".[] | select(.name_label | match(\"${XOA_POOL}\")) | .uuid" | sort | tail -n 1)"
fi
[ "$POOL_ID" == "\$pool=" ] && fail "Pool provided but no ID received" 3
}
get_template() {
log_debug "Getting template"
if [ -n "$DEBUG" ]
then
TEMPLATE_IS="$(xo-cli –list-objects type=VM-template "${POOL_ID-}" | jq -c ".[] | select(.name_label | match(\"${TEMPLATE}\")) | .name_label" | sort | tail -n 1 | tee -a "$DEBUG")"
else
TEMPLATE_IS="$(xo-cli –list-objects type=VM-template "${POOL_ID-}" | jq -c ".[] | select(.name_label | match(\"${TEMPLATE}\")) | .name_label" | sort | tail -n 1)"
fi
[ -z "$TEMPLATE_IS" ] && fail "Could not match this template" 4
if [ -n "$DEBUG" ]
then
echo "{\"is\": ${TEMPLATE_IS}}" | tee -a "$DEBUG"
else
echo "{\"is\": ${TEMPLATE_IS}}"
fi
}
[ -n "$(command -v xo-cli)" ] || fail "xo-cli is missing, and is a required dependency for this script. Please install it; \`sudo npm -g install xo-cli\`" 5
parse_params "$@"
if [ -n "$DEBUG" ]
then
rm -f "$DEBUG"
log_debug "Invoked: $(date)"
log_debug "Template: $TEMPLATE"
log_debug "Pool: $XOA_POOL"
fi
sign_in
get_pool
get_template

This script is invoked from your terraform like this:

variable "template_name" {
  default     = "SomeLinux-version.iso-"
  description = "A regex, partial or full string to match in the template name"
}

variable "poolname" {
  default = "MyPool"
}

data "external" "get_xoa_template" {
  program = [
    "/bin/bash", "${path.module}/get_xoa_template.sh",
    "--template", var.template_name,
    "--pool", var.poolname
  ]
}

data "xenorchestra_pool" "pool" {
  name_label = var.poolname
}

data "xenorchestra_template" "template" {
  name_label = data.external.get_xoa_template.result.is
  pool_id    = data.xenorchestra_pool.pool.id
}

And that’s how you do it. Oh, and if you need to pin to a specific version? Change the template_name value from the partial or regex version to the full version, like this:

variable "template_name" {
  # This assumes your image was minted at midnight on 1970-01-01
  default     = "SomeLinux-version.iso-19700101000000"
}

Featured image is “Barcelos and Braga-18” by “Graeme Churchard” on Flickr and is released under a CC-BY license.

A photo of a door with the focus on the handle which has a lock in the centre of the knob. The lock has a key in it, with a bunch of keys dangling from the central ring.

Using direnv with terraform, terragrunt, saml2aws, SOPS and AWS KMS

In my current project I am often working with Infrastructure as Code (IoC) in the form of Terraform and Terragrunt files. Before I joined the team a decision was made to use SOPS from Mozilla, and this is encrypted with an AWS KMS key. You can only access specific roles using the SAML2AWS credentials, and I won’t be explaining how to set that part up, as that is highly dependant on your SAML provider.

While much of our environment uses AWS, we do have a small presence hosted on-prem, using a hypervisor service. I’ll demonstrate this with Proxmox, as this is something that I also use personally :)

Firstly, make sure you have all of the above tools installed! For one stage, you’ll also require yq to be installed. Ensure you’ve got your shell hook setup for direnv as we’ll need this later too.

Late edit 2023-07-03: There was a bug in v0.22.0 of the terraform which didn’t recognise the environment variables prefixed PROXMOX_VE_ – a workaround by using TF_VAR_PROXMOX_VE and a variable "PROXMOX_VE_" {} block in the Terraform code was put in place for the inital publication of this post. The bug was fixed in 0.23.0 which this post now uses instead, and so as a result the use of TF_VAR_ prefixed variables was removed too.

Set up AWS Vault

AWS KMS

AWS Key Management Service (KMS) is a service which generates and makes available encryption keys, backed by the AWS service. There are *lots* of ways to cut that particular cake, but let’s do this a quick and easy way… terraform

variable "name" {
  default = "SOPS"
  type    = string
}
resource "aws_kms_key" "this" {
  tags                     = {
    Name : var.name,
    Owner : "Admins"
  }
  key_usage                = "ENCRYPT_DECRYPT"
  customer_master_key_spec = "SYMMETRIC_DEFAULT"
  deletion_window_in_days  = 30
  is_enabled               = true
  enable_key_rotation      = false
  policy                   = <<EOF
{
  "Version": "2012-10-17",
  "Id": "key-default-1",
  "Statement": [
    {
      "Sid": "Root Access",
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::${get_aws_account_id()}:root"
      },
      "Action": "kms:*",
      "Resource": "*"
    },
    {
      "Sid": "Estate Admin Access",
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::${get_aws_account_id()}:role/estateadmins"
      },
      "Action": [
        "kms:Describe*",
        "kms:List*",
        "kms:Get*",
        "kms:Encrypt*"
      ],
      "Resource": "*"
    }
  ]
}
EOF
}

resource "aws_kms_alias" "this" {
  target_key_id = aws_kms_key.this.key_id
  name          = "alias/${var.name}"
}

output "key" {
  value = aws_kms_alias.this.arn
}

After running this, let’s assume that we get an output for the “key” value of:

arn:aws:kms:us-east-1:123456789012:alias/main

Setup Sops

In your terragrunt tree, create a file called .sops.yaml, which contains:

---
creation_rules:
  - kms: arn:aws:kms:us-east-1:123456789012:alias/main

And a file called secrets.enc.yaml which contains:

---
PROXMOX_VE_USERNAME: root@pam
PROXMOX_VE_PASSWORD: deadb33f@2023

Test that your KMS works by assuming your IAM role via SAML2AWS like this:

$ saml2aws login --skip-prompt --quiet
$ saml2aws exec -- sops --verbose --encrypt --in-place secrets.enc.yaml
[AWSKMS]	 INFO[0000] Encryption succeeded                          arn="arn:aws:kms:us-east-1:123456789012:alias/main"
[CMD]		 INFO[0000] File written successfully

Setup direnv

Outside your tree, in ~/.config/direnv/lib create a file called use_sops.sh (does not need to be chmod +x or chmod 755!) containing this:

# Based on https://github.com/direnv/direnv/wiki/Sops
use_sops() {
    local path=${1:-$PWD/secrets.enc.yaml}
    if [ -e "$path" ]
    then
        if grep -q -E '^sops:' "$path"
        then
            eval "$(sops --decrypt --output-type dotenv "$path" 2>/dev/null | direnv dotenv bash /dev/stdin || false)"
        else
            if [ -n "$(command -v yq)" ]
            then
                eval "$(yq eval --output-format props "$path" | direnv dotenv bash /dev/stdin)"
                export SOPS_WARNING="unencrypted $path"
            fi
        fi
    fi
    watch_file "$path"
}

There are two key lines here, the first of which is:

eval "$(sops -d --output-type dotenv "$path" 2>/dev/null | direnv dotenv bash /dev/stdin || false)"

This line asks sops to decrypt the secrets file, using the “dotenv” output type, however, the dotenv format looks like this:

some_key = "some value"

So, as a result, we then pass that value to direnv and ask it to rewrite it in the format it expects, which looks like this:

export some_key="some value"

The second key line is this:

eval "$(yq eval --output-format props "$path" | direnv dotenv bash /dev/stdin)"

This asks yq to parse the secrets file, using the “props” formatter, which results in lines just like the dotenv output we saw above.

However, because we used yq to parse the file, it means that we know this file isn’t encrypted, so we also add an extra export value:

export SOPS_WARNING="unencrypted $path"

This can be picked up as part of your shell prompt to put a warning in! Anyway… let’s move on.

Now that you have your reusable library file, we now configure the direnv file, .envrc for the root of your proxmox cluster:

use sops

Oh, ok, that was simple. You can add several files here if you wish, like this:

use sops file1.enc.yaml
use sops file2.enc.yml
use sops ~/.core_sops

But, we don’t need that right now!

Open your shell in that window, and you’ll get this warning:

direnv: error /path/to/demo/.envrc is blocked. Run `direnv allow` to approve its content

So, let’s do that!

$ direnv allow
direnv: loading /path/to/demo/.envrc
direnv: using sops
direnv: export +PROXMOX_VE_USERNAME +PROXMOX_VE_PASSWORD
$

So far, so good… but wait, you’ve authenticated to your SAML access to AWS. Let’s close that shell, and go back in again

$ cd /path/to/demo
direnv: loading /path/to/demo/.envrc
direnv: using sops
$

Ah, now we don’t have our values exported. That’s what we wanted!

What now?!

Configuring the details of the proxmox cluster

We have our .envrc file which provides our credentials (let’s pretend we’re using a shared set of credentials across all the boxes), but now we need to setup access to each of the boxes.

Let’s make our two cluster directories;

mkdir cluster_01
mkdir cluster_02

And in each of these clusters, we need to put an .envrc file with the right IP address in. This needs to check up the tree for any credentials we may have already loaded:

source_env "$(find_up ../.envrc)"
export PROXMOX_VE_ENDPOINT="https://192.0.2.1:8006" # Documentation IP address for the first cluster - change for the second cluster.

The first line works up the tree, looking for a parent .envrc file to inject, and then, with the second line, adds the Proxmox API endpoint to the end of that chain. When we run direnv allow (having logged back into our saml2aws session), we get this:

$ direnv allow
direnv: loading /path/to/demo/cluster_01/.envrc
direnv: loading /path/to/demo/.envrc
direnv: using sops
direnv: export +PROXMOX_VE_ENDPOINT +PROXMOX_VE_USERNAME +PROXMOX_VE_PASSWORD
$

Great, now we can setup the connection to the cluster in the terragrunt file!

Set up Terragrunt

In /path/to/demo/terragrunt.hcl put this:

remote_state {
  backend = "s3"
  config  = {
    encrypt                = true
    bucket                 = "example-inc-terraform-state"
    key                    = "${path_relative_to_include()}/terraform.tfstate"
    region                 = "us-east-1"
    dynamodb_table         = "example-inc-terraform-state-lock"
    skip_bucket_versioning = false
  }
}
generate "providers" {
  path      = "providers.tf"
  if_exists = "overwrite"
  contents  = <<EOF
terraform {
  required_providers {
    proxmox = {
      source = "bpg/proxmox"
      version = "0.23.0"
    }
  }
}

provider "proxmox" {
  insecure = true
}
EOF
}

Then in the cluster_01 directory, create a directory for the code you want to run (e.g. create a VLAN might be called “VLANs/30/“) and put in it this terragrunt.hcl

terraform {
  source = "${get_terragrunt_dir()}/../../../terraform-module-network//vlan"
  # source = "git@github.com:YourProject/terraform-module-network//vlan?ref=production"
}

include {
  path = find_in_parent_folders()
}

inputs = {
  vlan_tag    = 30
  description = "VLAN30"
}

This assumes you have a terraform directory called terraform-module-network/vlan in a particular place in your tree or even better, a module in your git repo, which uses the input values you’ve provided.

That double slash in the source line isn’t a typo either – this is the point in that tree that Terragrunt will copy into the directory to run terraform from too.

A quick note about includes and provider blocks

The other key thing is that the “include” block loads the values from the first matching terragrunt.hcl file in the parent directories, which in this case is the one which defined the providers block. You can’t include multiple different parent files, and you can’t have multiple generate blocks either.

Running it all together!

Now we have all our depending files, let’s run it!

user@host:~$ cd test
direnv: loading ~/test/.envrc
direnv: using sops
user@host:~/test$ saml2aws login --skip-prompt --quiet ; saml2aws exec -- bash
direnv: loading ~/test/.envrc
direnv: using sops
direnv: export +PROXMOX_VE_USERNAME +PROXMOX_VE_PASSWORD
user@host:~/test$ cd cluster_01/VLANs/30
direnv: loading ~/test/cluster_01/.envrc
direnv: loading ~/test/.envrc
direnv: using sops
direnv: export +PROXMOX_VE_ENDPOINT +PROXMOX_VE_USERNAME +PROXMOX_VE_PASSWORD
user@host:~/test/cluster_01/VLANs/30$ terragrunt apply
data.proxmox_virtual_environment_nodes.available_nodes: Reading...
data.proxmox_virtual_environment_nodes.available_nodes: Read complete after 0s [id=nodes]

Terraform used the selected providers to generate the following execution
plan. Resource actions are indicated with the following symbols:
  + create

Terraform will perform the following actions:

  # proxmox_virtual_environment_network_linux_bridge.this[0] will be created
  + resource "proxmox_virtual_environment_network_linux_bridge" "this" {
      + autostart  = true
      + comment    = "VLAN30"
      + id         = (known after apply)
      + mtu        = (known after apply)
      + name       = "vmbr30"
      + node_name  = "proxmox01"
      + ports      = [
          + "enp3s0.30",
        ]
      + vlan_aware = (known after apply)
    }

Plan: 1 to add, 0 to change, 0 to destroy.

Do you want to perform these actions?
  Terraform will perform the actions described above.
  Only 'yes' will be accepted to approve.

  Enter a value: yes
proxmox_virtual_environment_network_linux_bridge.this[0]: Creating...
proxmox_virtual_environment_network_linux_bridge.this[0]: Creation complete after 2s [id=proxmox01:vmbr30]
user@host:~/test/cluster_01/VLANs/30$

Winning!!

Featured image is “2018/365/1 Home is Where The Key Fits” by “Alan Levine” on Flickr and is released under a CC-0 license.

"Tickets" by "Becky Snyder" on Flickr

IP Address Management using PHPIPAM integrated with Keycloak for SAML2 Authentication

I’ve recently been working with a network estate that was a bit hard to get a handle on. It had grown organically, and was a bit tricky to allocate new network segments in. To fix this, I deployed PHPIPAM, which was super easy to setup and configure (I used the docker-compose file on the project’s docker hub page, and put it behind an NGINX server which was pre-configured with a LetsEncrypt TLS/HTTPS certificate).

PHPIPAM is a IP Address Management tool which is self-hostable. I started by setting up the “Sections” (which was the hosting environments the estate is using), and then setup the supernets and subnets in the “Subnets” section.

Already, it was much easier to understand the network topology, but now I needed to get others in to take a look at the outcome. The team I’m working with uses a slightly dated version of Keycloak to provide Single Sign-On. PHPIPAM will use SAML for authentication, which is one of the protocols that Keycloak offers. The documentation failed me a bit at this point, but fortunately a well placed ticket helped me move it along.

Setting up Keycloak

Here’s my walk through

  1. Go to “Realm Settings” in the sidebar and find the “SAML Identity Provider Metadata” (on my system it’s on the “General” tab but it might have changed position on your system). This will be an XML file, and (probably) the largest block of continuous text will be in a section marked “ds:X509Certificate” – copy this text, and you’ll need to use this as the “IDP X.509 public cert” in PHPIPAM.
  2. Go to “Clients” in the sidebar and click “Create”. If you want Keycloak to offer access to PHPIPAM as a link, the client ID needs to start “urn:” If you just want to use the PHPIPAM login option, give the client ID whatever you want it to be (I’ve seen some people putting in the URL of the server at this point). Either way, this needs to be unique. The client protocol is “saml” and the client SAML endpoint is the URL that you will be signing into on PHPIPAM – in my case https://phpipam.example.org/saml2/. It should look like this:

    Click Save to take you to the settings for this client.
  3. If you want Keycloak to offer a sign-in button, set the name of the button and description.

    Further down the page is “Root URL” set that to the SAML Endpoint (the one ending /saml2/ from before). Also set the “Valid Redirect URIs” to that too.

    Where it says “IDP Initiated SSO URL Name” put a string that will identify the client – I put phpipam, but it can be anything you want. This will populate a URL like this: https://keycloak.example.org/auth/realms/yourrealm/protocol/saml/clients/phpipam, which you’ll need as the “IDP Issuer”, “IDP Login URL” and “IDP Logout URL”. Put everything after the /auth/ in the box marked “Base URL”. It should look like this:

    Hit Save.
  4. Go to the “SAML Keys” tab. Copy the private key and certificate, these are needed as the “Authn X.509 signing” cert and cert key in PHPIPAM.
  5. Go to the “Mappers” tab. Create each of the following mappers;
    • A Role List mapper, with the name of “role list”, Role Attribute Name of “Role”, no friendly name, the SAML Attribute NameFormat set to “Basic” and Single Role Attribute set to on.
    • A User Attribute mapper, with the name, User Attribute, Friendly Name and SAML Attribute Name set to “email”, the SAML Attribute NameFormat set to “Basic” and Aggregate Attribute Values set to “off”.
    • A Javascript Mapper, with the name, Friendly Name and SAML Attribute Name set to “display_name” and the SAML Attribute NameFormat set to “Basic”. The Script should be set to this single line: user.getFirstName() + ' ' + user.getLastName().
    • A Javascript Mapper, with the name, Friendly Name and SAML Attribute Name set to “is_admin” and the SAML Attribute NameFormat set to “Basic”.

      The script should be as follows:
is_admin = false;
var GroupSet = user.getGroups();
for each (var group in GroupSet) {
    use_group = ""
    switch (group.getName()) {
        case "phpipamadmins":
            is_admin = true;
            break;
    }
}
is_admin
  • Create one more mapper item:
    • A Javascript Mapper, with the name, Friendly Name and SAML Attribute Name set to “groups” and the SAML Attribute NameFormat set to “Basic”.

      The script should be as follows:
everyone_who_can_access_gets_read_only_access = false;
send_groups = "";
var GroupSet = user.getGroups();
for each (var group in GroupSet) {
    use_group = ""
    switch (group.getName()) {
        case "LDAP_GROUP_1":
            use_group = "IPAM_GROUP_1";
            break;
        case "LDAP_GROUP_2":
            use_group = "IPAM_GROUP_2";
            break;
    }
    if (use_group !== "") {
        if (send_groups !== "") {
          send_groups = send_groups + ","
        }
        send_groups = send_groups + use_group;
    }    
}
if (send_groups === "" && everyone_who_can_access_gets_read_only_access) {
    "Guests"
} else {
    send_groups
}

For context, the groups listed there, LDAP_GROUP_1 might be “Customer 1 Support Staff” or “ITSupport” or “Networks”, and the IPAM_GROUP_1 might be “Customer 1” or “WAN Links” or “DC Patching” – depending on the roles and functions of the teams. In my case they relate to other roles assigned to the staff member and the name of the role those people will perform in PHP IPAM. Likewise in the is_admin mapper, I’ve mentioned a group called “phpipamadmins” but this could be any relevant role that might grant someone admin access to PHPIPAM.

Late Update (2023-06-07): I’ve figured out how to enable modules now too. Create a Javascript mapper as per above, but named “modules” and have this script in it:

// Current modules as at 2023-06-07
// Some default values are set here.
noaccess       =  0;
readonly       =  1;
readwrite      =  2;
readwriteadmin =  3;
unsetperm      = -1;

var modules = {
    "*":       readonly,  "vlan":      unsetperm, "l2dom":    unsetperm,
    "devices": unsetperm, "racks":     unsetperm, "circuits": unsetperm,
    "nat":     unsetperm, "locations":  noaccess, "routing":  unsetperm,
    "pdns":    unsetperm, "customers": unsetperm
}

function updateModules(modules, new_value, list_of_modules) {
    for (var module in list_of_modules) {
        modules[module] = new_value;
    }
    return modules;
}

var GroupSet = user.getGroups();
for (var group in GroupSet) {
    switch (group.getName()) {
        case "LDAP_ROLE_3":
            modules = updateModules(modules, readwriteadmin, [
                'racks', 'devices', 'nat', 'routing'
            ]);
            break;
    }
}

var moduleList = '';

for (var key in modules) {
    if (modules.hasOwnProperty(key) && modules[key] !==-1) {
        if (moduleList !== '') {
            moduleList += ',';
        }
        moduleList += key + ':' + modules[key];
    }
}

moduleList;

OK, that’s Keycloak sorted. Let’s move on to PHPIPAM.

Setting up PHPIPAM

In the administration menu, select “Authentication Methods” and then “Create New” and select “Create new SAML2 authentication”.

In the description field, give it a relevant name, I chose SSO, but you could call it any SSO system name. Set “Enable JIT” to “on”, leave “Use advanced settings” as “off”. In Client ID put the Client ID you defined in Keycloak, probably starting urn: or https://. Leave “Strict mode” off. Next is the IDP Issuer, IDP Login URL and IDP Logout URL, which should all be set to the same URL – the “IDP Initiated SSO URL Name” from step 4 of the Keycloak side (that was set to something like https://keycloak.example.org/auth/realms/yourrealm/protocol/saml/clients/phpipam).

After that is the certificate section – first the IDP X.509 public cert that we got in step 1, then the “Sign Authn requests” should be set to “On” and the Authn X.509 signing cert and cert key are the private key and certificate we retrieved in step 5 above. Leave “SAML username attribute” and “SAML mapped user” blank and “Debugging” set to “Off”. It should look like this:

Hit save.

Next, any groups you specified in the groups mapper need to be defined. This is in Administration -> Groups. Create the group name and set a description.

Lastly, you need to configure the sections to define whigh groups have access. Each defined group gets given four radio buttons; “na” (no access), “ro” (read only), “rw” (read write) and “rwa” (read, write and administrate).

Try logging in. It should just work!

Debugging

If it doesn’t, and checking all of the above doesn’t help, I’ve tried adding some code into the PHP file in app/saml2/index.php, currently on line 149, above where it says:

if (empty($auth->getAttribute("display_name")[0])) {                                                                                    
    $Result->show("danger", _("Mandatory SAML JIT attribute missing")." : display_name (string)", true);
}

This code is:

$outfile = fopen("/tmp/log.out", "w");                                                           
fwrite($outfile, var_export($auth, true));                                                                     
fclose($outfile);

**REMEMBER THIS IS JUST FOR TESTING PURPOSES AND SHOULD BE REMOVED ASAP**

In here is an array called _attributes which will show you what has been returned from the Keycloak server when someone tries to log in. In my case, I got this:

   '_attributes' =>
  array (
    'groups' => 
    array (
      0 => 'PHPIPAM_GROUP_1',
    ),
    'is_admin' => 
    array (
      0 => 'false',
    ),
    'display_name' => 
    array (
      0 => 'Jon Spriggs',
    ),
    'email' => 
    array (
      0 => 'spriggsj@example.org',
    ),
  )

If you get something back here that isn’t what you expected, now at least you have a fighting chance of finding where that issue was! Good luck!!

Featured image is “Tickets” by “Becky Snyder” on Flickr and is released under a CC-BY license.

Quick tip: How to stop package installations from auto-starting server services with Debian based distributions (like Ubuntu)

I’m working on another toy project to understand a piece of software a little better, and to make it work, I needed to install dnsmasq inside an Ubuntu-based virtual machine. The problem with this is that Ubuntu already runs systemd-resolved to perform DNS lookups, and Debian likes to start server services as soon as it’s installed them. So how do we work around this? Well, actually, it’s pretty simple.

Thanks to this blog post from 2013, I found out that if you create an executable script called /usr/sbin/policy-rc.d with the content:

exit 101

This will stop all services in the dpkg/apt process from running on install, so I was able to do this:

echo 'exit 101' >> /usr/sbin/policy-rc.d
chmod +x /usr/sbin/policy-rc.d
apt update
apt install dnsmasq -y
systemctl disable --now systemd-resolved
# Futz with dnsmasq config
systemctl enable --now dnsmasq
dig example.com

Brilliant

(Don’t do this!) Note to self; Is your shutdown command causing an error for your packer build? Try this.

Don’t do this! Turns out I was doing this wrong. The below code is only needed if you’ve got things wrong, and you should instead be using keep_vm = "on_success". The more you know, eh?

If you’ve got a command in your packer script that looks like this:

provisioner "shell" {
  inline = ["shutdown -h now"]
}

Try running this instead:

provisioner "shell" {
  inline = [
    "echo Provisioning complete. Shutting down.",
    "(sleep 5 ; shutdown -h now) &"
  ]
}

This will force packer to execute a command which is pushed into the background, returning a return code (RC) of 0, which the system will interpret as a successful result. 5 seconds later the machine will shut itself down by itself.

"Repository Road" by "David" on Flickr

Using Packer to create an AlmaLinux 9 machine image? Getting an error message about “No enabled repositories” when running dnf or yum? This might be why…

You’re probably in the install image which hasn’t been chrooted into.

You see, when AlmaLinux 9 does it’s install from ISO, it formats the disk and mounts it to /mnt/sysroot and then copies files to it. Once it’s done, the rest of the packer scripts can be run… but commands are run in the install environment, not the chroot container, so, to transfer files in, or to execute commands that will have actions in the target environment, format them like this:

provisioner "file" {
  source = "my_command"
  destination = "/mnt/sysroot/tmp/my_command"
}

provisioner "shell" {
  inline = [
    "chmod +x /mnt/sysroot/tmp/my_command",
    "chroot /mnt/sysroot /tmp/my_command",
    "chroot /mnt/sysroot systemctl enable some.service"
  ]
}

In this example, we copy a file called “my_command” to the /tmp directory, mark it as executable and run it from inside the chroot environment.

Next we run systemctl to enable a service that is already present on the system (perhaps it was something that my_command did?)

I hope this helps you!

Featured image is “Repository Road” by “David” on Flickr and is released under a CC-BY-SA license.

Tools I use when writing documentation

I’ve just finished what felt like the worlds longest work instruction, and it made me think… I’ve got these great tools at my disposal for when I’m writing documentation – perhaps not everyone knows about them!?

So here are the main ones!

Open Broadcaster Software (OBS) Studio

OBS Studio is a screen recording application, most commonly used by eSports players and gamers to stream content to streaming platforms like Twitch. While it’s really easy to do stuff like “lower third” bars (like this)

It’s also really *really* easy to just record a screen, which makes it easy to just “set and forget” and then run through the tasks. In the past, I’ve been on highly-technical in-person training courses and run OBS while I’ve been in a room, just to make sure I didn’t miss any details when I went back over my notes later!

Clicking “Start Recording” in the bottom third of that screen starts writing the content of that screen into (by default) a MKV file in your home directory (on Linux), or into your Videos directory on Windows. I don’t recall where it went on OS X!

This means after the task is done, I can play it back with…

VideoLan Client (VLC)

VLC is (or, at least, was) the defacto video player for anyone who consumes media on their computer. In my case, I only really use it now for playing back files I’ve recorded with OBS, but it’s a great player.

I only really make one key change to VLC, which is; adding the playback speed buttons. Different platforms do it differently, but on Linux and Windows at least, go to Tools -> Customise Interface and you’re presented with a screen like this:

I drag the “Slower” and “Faster” buttons to either side of the stop button, and will typically play videos back at 3x or even 8x speed to get to the areas I want, and then slow it down, sometimes, to 0.5x speed to get specific frames I’m looking for.

Once I get that frame, I’ll use a…

Screen Snipping Tool

On Windows this is the built-in snipping tool, found by pressing Windows+Shift+S. On Ubuntu, I’ve installed the package gnome-screenshot and the Gnome Extension Screenshot Tool.

With Snipping Tool you can draw circles around things and make annotations with a pen, but in extreme cases, I rely on the …

GNU Image Manipulation Program (GIMP).

GIMP, aside from being a badly named project (notably because most people over a certain age know about that scene from Pulp Fiction) is a FANTASTIC image editor. I use it for all sorts of editing, cropping and resizing, but I particularly love the layers menu on the side.

A couple of years ago my son wanted to give all his friends “membership cards” for a club he’d made. These were created in GIMP and just required him to clone a layer from the block on the right, hide the previous layer, and change the text in the clone.

This also means that if you want to easily highlight specific areas of the screen it’s only a few clicks to do it. Now, that said, it’s not as straightforward as Paint, and for anything beyond easy tweaks it will take a bit of experience, but it works.

This is all well and good, but sometimes you just need details from a terminal, where I’d use…

Asciinema

I use Asciinema (just a pip, apt or rpm away) when I’m working on something on a command line that I want to play back as a video, and the main reason is that I can adjust my output really quickly and easily!

Consider this screen of data:

I “recorded” that using asciinema rec ./demo.rec and ended it with ctrl+d. That created a file called demo.rec, which looks like this:

Each line is prefixed with a timestamp, it then has a stream action (mostly o for “output”, but you can add m for “marker” or i for “input” in there if you wanted to be clever ;) ) and then the text which will be rendered when it’s played back (more about that in a tic). This means that if you make a drastic typo, you can fix it in this file, or if you want to redact a password that you never meant to type, you can do!

But, and here’s the next awesome thing, you can also ask it to play back ignoring any gaps longer than X amount of fractions of a second, so if you went off to make a coffee in the middle of doing a recording, playback won’t show it.

And then, playing it back… well, you can just run asciinema play demo.rec but that’s a bit boring… but you can link to it, like this demo I created a while ago or even embed it in your own site

Rendering it all

And the last thing I use? Whatever writing tool I have to hand! I typically write on my blog if it’s for public consumption (like this), but I’ve used private Github and gitlab repos, Visual Studio Code to create markdown files, Google Docs, Microsoft Word, LibreOffice Writer and even slide decks if it’s the right tool for the job.

But the most important thing is that you get your documentation OUT THERE! It’s so frustrating only having half a story… like this one:

This is an XKCD Comic titled "Wisdom of the Ancients" which has the alt-text:
"All long help threads should have a sticky globally-editable post at the top saying 'DEAR PEOPLE FROM THE FUTURE: Here's what we've figured out so far ...'"

It depicts a stick figure shaking his screen, and a series of lines of text which say:
"Never have I felt so close to another soul and yet so helplessly alone as when I Google an erorr and there's one result a thread by someone with the same problem and no answer last posted to in 2003"

Above the stick figure is the text "Who were you DenverCoder9? What did you see?!"

Featured image is “Tools” by “Paul Downey” on Flickr and is released under a CC-BY license.