How I deploy Vaultwarden to provide a Bitwarden compatible service in Kubernetes with Monitoring and Backups

2025-04-062025-04-06 JonTheNiceGuy Leave a comment

This initially was going to be a mammoth blog post going through all of the lines of code in how I’ve built a Vaultwarden service in Kubernetes rather than just writing what I’ve done. You can just look at the git repo and see what’s there! Ask for comments on that if you need more details! 😀

So, instead, let me link you to the helm chart and docker containers I created, and I’ll pull out some notes on some of the specific details in there.

https://github.com/JonTheNiceGuy/vaultwarden-helm-chart

This helm chart comprises of the 4 services I feel you need:

vaultwarden <- The actual password safe service
vwmetrics <- Prometheus Metrics for the service
vaultwarden-sync <- A packaged deployment of the directory synchronization tool from Bitwarden
vaultwarden-backup <- A tool to backup the data directory and the database from Bitwarden.

In addition, the chart allows you to provision dynamically allocated Persistent Volumes through a StorageClass, and flexibility to set all of the variables in the Vaultwarden settings file.

The biggest “weird-ish” thing I’ve done is to create a configuration file as a secret, and mounted that configuration file into the vaultwarden container. This prevents compromised hosts from being able to extract admin tokens and database credentials from process environment variables. That said, it would be better to somehow make this a Read-Once value, which I believe is possible with something like Hashicorp Vault, or SOPS. If you’ve got any advice on how to do this, I’d be very grateful for your advice!

I’m not exactly overjoyed with the vwmetrics, as it doesn’t expose any internal metrics, just a count of the number of each type of asset in the database, but the project are clear they don’t want to add any additional tracing to the application, so this is the best we can do.

vaultwarden-backup is a script I wrote which reads the vaultwarden environment file to get the database credentials and data path, and then backs up both database and non-database files (following the official guidance). In this invocation, the only fields required from the environment file are the path to the data directory and the database credentials are required, so the config secret stores those as a separate key. It also means that this can be just a Read-Only database credential too.

I wrote this script because no-one had released a containerised script that performed the database backup in something other than sqlite that I’d seen.

vaultwarden-sync is a wrapper I wrote to get the Bitwarden Directory Connector, and setup the configuration files to support performing LDAP sync. The other directories have not been tested, but are configured according to the changes to the configuration file when you configure them in the Bitwarden Directory Connector GUI.

I wrote this script because I couldn’t see any way to run the Directory Connector as part of an all-in-one set of containers for my cluster.

Both the backup and sync tools use the livenessProbe feature of Kubernetes to execute themselves, and use the termination log as their output method. This is a method one of my colleagues found when we were setting up some inter-cluster communication tests a while ago, and it works really well where you need to see the status of a long running loop.

I should stress, this is not a “fully-packaged” helm chart. It’s a learning aid, both for someone who hasn’t written many helm charts, and for me, to get feedback from people who *do* write lots of helm charts, and are prepared to tell me how I can do better!

Featured image is “Riggs Bank Vault in Washington D.C.” by “Steve Jurvetson” on Flickr and is released under a CC-BY license.

Building a Linux Firewall with AlmaLinux 9, NetworkManager, BGP, DHCP and NFTables with Puppet

2025-02-172025-02-17 JonTheNiceGuy Leave a comment

I’m in the process of building a Network Firewall for a work environment. This blog post is based on that work, but with all the identifying marks stripped off.

For this particular project, we standardised on Alma Linux 9 as the OS Base, and we’ve done some testing and proved that the RedHat default firewalling product, Firewalld, is not appropriate for this platform, but did determine that NFTables, or NetFilter Tables (the successor to IPTables) is.

I’ll warn you, I’m pretty prone to long and waffling posts, but there’s a LOT of technical content in this one. There is also a Git repository with the final code. I hope that you find something of use in here.

This document explains how it is using Vagrant with Virtualbox to build a test environment, how it installs a Puppet Server and works out how to calculate what settings it will push to it’s clients. With that puppet server, I show how to build and configure a firewall using Linux tools and services, setting up an NFTables policy and routing between firewalls using FRR to provide BGP, and then I will show how to deploy a DHCP server.

Let’s go!

The scenario

A network diagram, showing a WAN network attached to the top of firewall devices and out via the Host machine, a transit network linking the bottom of the firewall devices, and attached to the side, networks identified as "Prod", "Dev" and "DHCP" each with IP allocations indicated.

To prove the concept, I have built two Firewall machines (A and B), plus six hosts, one attached to each of the A and B side subnets called “Prod”, “Dev” and “Shared”.

Any host on any of the “Prod” networks should be able to speak to any host on any of the other “Prod” networks, or back to the “Shared” networks. Any host on any of the “Dev” networks should be able to speak to any host on the other “Dev” networks, or back to the “Shared” networks.

Any host in Prod, Dev or Shared should be able to reach the internet, and shared can reach any of the other networks.

"Apoptosis Network (alternate)" by "Simon Cockell" on Flickr

A few weird issues in the networking on our custom AWS EKS Workers, and how we worked around them

2024-12-032024-12-03 JonTheNiceGuy Leave a comment

For “reasons”, at work we run AWS Elastic Kubernetes Service (EKS) with our own custom-built workers. These workers are based on Alma Linux 9, instead of AWS’ preferred Amazon Linux 2023. We manage the deployment of these workers using AWS Auto-Scaling Groups.

Our unusal configuration of these nodes mean that we sometimes trip over configurations which are tricky to get support on from AWS (no criticism of their support team, if I was in their position, I wouldn’t want to try to provide support for a customer’s configuration that was so far outside the recommended configuration either!)

Over the past year, we’ve upgraded EKS1.23 to EKS1.27 and then on to EKS1.31, and we’ve stumbled over a few issues on the way. Here are a couple of notes on the subject, in case they help anyone else in their journey.

All three of the issues below were addressed by running an additional service on the worker nodes in a Systemd timed service which triggers every minute.

Incorrect routing for the 12th IP address onwards

Something the team found really early on (around EKS 1.18 or somewhere around there) was that the AWS VPC-CNI wasn’t managing the routing tables on the node properly. We raised an issue on the AWS VPC CNI (we were on CentOS 7 at the time) and although AWS said they’d fixed the issue, we currently need to patch the routing tables every minute on our nodes.

What happens?

When you get past the number of IP addresses that a single ENI can have (typically ~12), the AWS VPC-CNI will attach a second interface to the worker, and start adding new IP addresses to that. The VPC-CNI should setup routing for that second interface, but for some reason, in our case, it doesn’t. You can see this happens because the traffic will come in on the second ENI, eth1, but then try to exit the node on the first ENI, eth0, with a tcpdump, like this:

[root@test-i-01234567890abcdef ~]# tcpdump -i any host 192.0.2.123
tcpdump: data link type LINUX_SLL2
dropped privs to tcpdump
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes
09:38:07.331619 eth1  In  IP ip-192-168-1-100.eu-west-1.compute.internal.41856 > ip-192-0-2-123.eu-west-1.compute.internal.irdmi: Flags [S], seq 1128657991, win 64240, options [mss 1359,sackOK,TS val 2780916192 ecr 0,nop,wscale 7], length 0
09:38:07.331676 eni989c4ec4a56 Out IP ip-192-168-1-100.eu-west-1.compute.internal.41856 > ip-192-0-2-123.eu-west-1.compute.internal.irdmi: Flags [S], seq 1128657991, win 64240, options [mss 1359,sackOK,TS val 2780916192 ecr 0,nop,wscale 7], length 0
09:38:07.331696 eni989c4ec4a56 In  IP ip-192-0-2-123.eu-west-1.compute.internal.irdmi > ip-192-168-1-100.eu-west-1.compute.internal.41856: Flags [S.], seq 3367907264, ack 1128657992, win 26847, options [mss 8961,sackOK,TS val 1259768406 ecr 2780916192,nop,wscale 7], length 0
09:38:07.331702 eth0  Out IP ip-192-0-2-123.eu-west-1.compute.internal.irdmi > ip-192-168-1-100.eu-west-1.compute.internal.41856: Flags [S.], seq 3367907264, ack 1128657992, win 26847, options [mss 8961,sackOK,TS val 1259768406 ecr 2780916192,nop,wscale 7], length 0

The critical line here is the last one – it’s come in on eth1 and it’s going out of eth0. Another test here is to look at ip rule

[root@test-i-01234567890abcdef ~]# ip rule
0:	from all lookup local
512:	from all to 192.0.2.111 lookup main
512:	from all to 192.0.2.143 lookup main
512:	from all to 192.0.2.66 lookup main
512:	from all to 192.0.2.113 lookup main
512:	from all to 192.0.2.145 lookup main
512:	from all to 192.0.2.123 lookup main
512:	from all to 192.0.2.5 lookup main
512:	from all to 192.0.2.158 lookup main
512:	from all to 192.0.2.100 lookup main
512:	from all to 192.0.2.69 lookup main
512:	from all to 192.0.2.129 lookup main
1024:	from all fwmark 0x80/0x80 lookup main
1536:	from 192.0.2.123 lookup 2
32766:	from all lookup main
32767:	from all lookup default

Notice here that we have two entries from all to 192.0.2.123 lookup main and from 192.0.2.123 lookup 2. Let’s take a look at what lookup 2 gives us, in the routing table

[root@test-i-01234567890abcdef ~]# ip route show table 2
192.0.2.1 dev eth1 scope link

Fix the issue

This is pretty easy – we need to add a default route if one doesn’t already exist. Long before I got here, my boss created a script which first runs ip route show table main | grep default to get the gateway for that interface, then runs ip rule list, looks for each lookup <number> and finally runs ip route add to put the default route on that table, the same as on the main table.

ip route add default via "${GW}" dev "${INTERFACE}" table "${TABLE}"

Is this still needed?

I know when we upgraded our cluster from EKS1.23 to EKS1.27, this script was still needed. When I’ve just checked a worker running EKS1.31, after around 12 hours of running, and a second interface being up, it’s not been needed… so perhaps we can deprecate this script?

Dropping packets to the containers due to Martians

When we upgraded our cluster from EKS1.23 to EKS1.27 we also changed a lot of the infrastructure under the surface (AlmaLinux 9 from CentOS7, Containerd and Runc from Docker, CGroups v2 from CGroups v1, and so on). We also moved from using an AWS Elastic Load Balancer (ELB) or “Classic Load Balancer” to AWS Network Load Balancer (NLB).

Following the upgrade, we started seeing packets not arriving at our containers and the system logs on the node were showing a lot of martian source messages, particularly after we configured our NLB to forward original IP source addresses to the nodes.

What happens

One thing we noticed was that each time we added a new pod to the cluster, it added a new eni[0-9a-f]{11} interface, but the sysctl value for net.ipv4.conf.<interface>.rp_filter (return path filtering – basically, should we expect the traffic to be arriving at this interface for that source?) in sysctl was set to 1 or “Strict mode” where the source MUST be the coming from the best return path for the interface it arrived on. The AWS VPC-CNI is supposed to set this to 2 or “Loose mode” where the source must be reachable from any interface.

In this case you’d tell this because you’d see this in your system journal (assuming you’ve got net.ipv4.conf.all.log_martians=1 configured):

Dec 03 10:01:19 test-i-01234567890abcdef kernel: IPv4: martian source 192.168.1.100 from 192.0.2.123, on dev eth1

The net result is that packets would be dropped by the host at this point, and they’d never be received by the containers in the pods.

Fix the issue

This one is also pretty easy. We run sysctl -a and loop through any entries which match net.ipv4.conf.([^\.]+).rp_filter = (0|1) and then, if we find any, we run sysctl -w net.ipv4.conf.\1.rp_filter = 2 to set it to the correct value.

Is this still needed?

Yep, absolutely. As of our latest upgrade to EKS1.31, if this value isn’t set, then it will drop packets. VPC-CNI should be fixing this, but for some reason it doesn’t. And setting the conf.ipv4.all.rp_filter to 2 doesn’t seem to make a difference, which is contrary to the documentation in the relevant Kernel documentation.

After 12 IP addresses are assigned to a node, Kubernetes services stop working for some pods

This was pretty weird. When we upgraded to EKS1.31 on our smallest cluster we initially thought we had an issue with CoreDNS, in that it sometimes wouldn’t resolve IP addresses for services (DNS names for services inside the cluster are resolved by <servicename>.<namespace>.svc.cluster.local to an internal IP address for the cluster – in our case, in the range 172.20.0.0/16). We upgraded CoreDNS to the EKS1.31 recommended version, v1.11.3-eksbuild.2 and that seemed to fix things… until we upgraded our next largest cluster, and things REALLY went wrong, but only when we had increased to over 12 IP addresses assigned to the node.

You might see this as frequent restarts of a container, particularly if you’re reliant on another service to fulfil an init container or the liveness/readyness check.

What happens

EKS1.31 moves KubeProxy from iptables or ipvs mode to nftables – a shift we had to make internally as AlmaLinux 9 no longer supports iptables mode, and ipvs is often quite flaky, especially when you have a lot of pod movements.

With a single interface and up to 11 IP addresses assigned to that interface, everything runs fine, but the moment we move to that second interface, much like in the first case above, we start seeing those pods attached to the second+ interface being unable to resolve service addresses. On further investigation, doing a dig from a container inside that pod to the service address of the CoreDNS service 172.20.0.10 would timeout, but a dig against the actual pod address 192.0.2.53 would return a valid response.

Under the surface, on each worker, KubeProxy adds a rule to nftables to say “if you try and reach 172.20.0.10, please instead direct it to 192.0.2.53”. As the containers fluctuate inside the cluster, KubeProxy is constantly re-writing these rules. For whatever reason though, KubeProxy currently seems unable to determine that a second or subsequent interface has been added, and so these rules are not applied to the pods attached to that interface…. or at least, that’s what it looks like!

Fix the issue

In this case, we wrote a separate script which was also triggered every minute. This script looks to see if the interfaces have changed by running ip link and looking for any interfaces called eth[0-9]+ which have changed, and then if it has, it runs crictl pods (which lists all the running pods in Containerd), looks for the Pod ID of KubeProxy, and then runs crictl stopp <podID> [1] and crictl rmp <podID> [1] to stop and remove the pod, forcing kubelet to restart the KubeProxy on the node.

[1] Yes, they aren’t typos, stopp means “stop the pod” and rmp means “remove the pod”, and these are different to stop and rm which relate to the container.

Is this still needed?

As this was what I was working on all-day yesterday, yep, I’d say so 😊 – in all seriousness though, if this hadn’t been a high-priority issue on the cluster, I might have tried to upgrade the AWS VPC-CNI and KubeProxy add-ons to a later version, to see if the issue was resolved, but at this time, we haven’t done that, so maybe I’ll issue a retraction later 😂

Featured image is “Apoptosis Network (alternate)” by “Simon Cockell” on Flickr and is released under a CC-BY license.

I just want to note that Will Jessop noticed a significant typo in this post within an hour of my posting. The post was updated accordingly. Will is awesome and super lovely. Thanks Will!

Two pages from an old notebook with slightly yellowing paper, and black ink cursive writing and occasional doodles filling the pages

This little #bash script will make capturing #output from lots of #scripts a lot easier

2024-10-212024-10-21 JonTheNiceGuy Leave a comment

A while ago, I was asked to capture a LOT of data for a support case, where they wanted lots of commands to be run, like “kubectl get namespace” and then for each namespace, get all the pods with “kubectl get pods -n $namespace” and then describe each pod with “kubectl get pod -n namespace $podname”. Then do the same with all the services, deployments, ingresses and endpoints.

I wrote this function, and a supporting script to execute the actual checks, and just found it while clearing up!

#!/bin/bash

filename="$(echo $* | sed -E -e 's~[ -/\\]~_~g').log"
echo "\$ $@" | tee "${filename}"
$@ 2>&1 | tee -a "${filename}"

This script is quite simple, it does three things

Take the command you’re about to run, strip all the non-acceptable-filename characters out and replace them with underscores, and turn that into the output filename.
Write the command into the output file, replacing any prior versions of that file
Execute the command, and append the log to the output file.

So, how do you use this? Simple

log_result my-command --with --all --the options

This will produce a file called my-command_--with_--all_--the_options.log that contains this content:

$ my-command --with --all --the options
Congratulations, you ran my-command and turned on the options "--with --all --the options". Nice one!

… oh, and the command I ran to capture the data for the support case?

log_result kubectl get namespace
for TYPE in pod ingress service deployment endpoints
do
  for ns in $(kubectl get namespace | grep -v NAME | awk '{print $1}' )
  do
    echo $ns
    for item in $(kubectl get $TYPE -n $ns | grep -v NAME | awk '{print $1}')
    do
      log_result kubectl get $TYPE -n $ns $item -o yaml
      log_result kubectl describe $TYPE -n $ns $item
    done
  done
done

Featured image is “Travel log texture” by “Mary Vican” on Flickr and is released under a CC-BY license.

A photo of a conch shell in front of a blurry photo frame.

Why (and how) I’ve started writing my Shell Scripts in Python

2024-09-152024-09-15 JonTheNiceGuy 2 Comments

I’ve been using Desktop Linux for probably 15 years, and Server Linux for more like 25 in one form or another. One of the things you learn to write pretty early on in Linux System Administration is Bash Scripting. Here’s a great example

#!/bin/bash

i = 0
until [ $i -eq 10 ]
do
  print "Jon is the best!"
  (( i += 1 ))
done

Bash scripts are pretty easy to come up with, you just write the things you’d type into the interactive shell, and it does those same things for you! Yep, it’s pretty hard not to love Bash for a shell script. Oh, and it’s portable too! You can write the same Bash script for one flavour of Linux (like Ubuntu), and it’s probably going to work on another flavour of Linux (like RedHat Enterprise Linux, or Arch, or OpenWRT).

But. There comes a point where a Bash script needs to be more than just a few commands strung together.

At work, I started writing a “simple” installer for a Kubernetes cluster – it provisions the cloud components with Terraform, and then once they’re done, it then starts talking to the Kubernetes API (all using the same CLI tools I use day-to-day) to install other components and services.

When the basic stuff works, it’s great. When it doesn’t work, it’s a bit of a nightmare, so I wrote some functions to put logs in a common directory, and another function to gracefully stop the script running when something fails, and then write those log files out to the screen, so I know what went wrong. And then I gave it to a colleague, and he ran it, and things broke in a way that didn’t make sense for either of us, so I wrote some more functions to trap that type of error, and try to recover from them.

And each time, the way I tested where it was working (or not working) was to just… run the shell script, and see what it told me. There had to be a better way.

Enter Python

Python earns my vote for a couple of reasons (and they might not be right for you!)

I’ve been aware of the language for some time, and in fact, had patched a few code libraries in the past to use Ansible features I wanted.
My preferred IDE (Integrated Desktop Environment), Visual Studio Code, has a step-by-step debugger I can use to work out what’s going on during my programming
It’s still portable! In fact, if anything, it’s probably more portable than Bash, because the version of Bash on the Mac operating system – OS X is really old, so lots of “modern” features I’d expect to be in bash and associate tooling isn’t there! Python is Python everywhere.
There’s an argument parsing tool built into the core library, so if I want to handle things like ./myscript.py --some-long-feature "option-A" --some-long-feature "option-B" -a -s -h -o -r -t --argument I can do, without having to remember how to write that in Bash (which is a bit esoteric!)
And lastly, for now at least!, is that Python allows you to raise errors that can be surfaced up to other parts of your program

Given all this, my personal preference is to write my shell scripts now in Python.

If you’ve not written python before, variables are written without any prefix (like you might have seen $ in PHP) and any flow control (like if, while, for, until) as well as any functions and classes use white-space indentation to show where that block finishes, like this:

def do_something():
  pass

if some_variable == 1:
  do_something()
  and_something_else()
  while some_variable < 2:
    some_variable = some_variable * 2

Starting with Boilerplate

I start from a “standard” script I use. This has a lot of those functions I wrote previously for bash, but with cleaner code, and in a way that’s a bit more understandable. I’ll break down the pieces I use regularly.

Starting the script up

Here’s the first bit of code I always write, this goes at the top of everything

#!/usr/bin/env python3
import logging
logger = logging

This makes sure this code is portable, but is always using Python3 and not Python2. It also starts to logging engine.

At the bottom I create a block which the “main” code will go into, and then run it.

def main():
  logger.basicConfig(level=logging.DEBUG)
  logger.debug('Started main')

if __name__ == "__main__":
    main()

Adding argument parsing

There’s a standard library which takes command line arguments and uses them in your script, it’s called argparse and it looks like this:

#!/usr/bin/env python3
# It's convention to put all the imports at the top of your files
import argparse
import logging
logger = logging

def process_args():
  parser=argparse.ArgumentParser(
    description="A script to say hello world"
  )

  parser.add_argument(
    '--verbose', # The stored variable can be found by getting args.verbose
    '-v',
    action="store_true",
    help="Be more verbose in logging [default: off]"
  )

  parser.add_argument(
    'who', # This is a non-optional, positional argument called args.who
    help="The target of this script"
  )
  args = parser.parse_args()

  if args.verbose:
      logger.basicConfig(level=logging.DEBUG)
      logger.debug('Setting verbose mode on')
  else:
      logger.basicConfig(level=logging.INFO)

  return args

def main():
  args=process_args()

  print(f'Hello {args.who}')
  # Using f'' means you can include variables in the string
  # You could instead do printf('Hello %s', args.who)
  # but I always struggle to remember in what order I wrote things!

if __name__ == "__main__":
    main()

The order you put things in makes a lot of difference. You need to have the if __name__ == "__main__": line after you’ve defined everything else, but then you can put the def main(): wherever you want in that file (as long as it’s before the if __name__). But by having everything in one file, it feels more like those bash scripts I was talking about before. You can have imports (a bit like calling out to other shell scripts) and use those functions and classes in your code, but for the “simple” shell scripts, this makes most sense.

So what else do we do in Shell scripts?

Running commands

This is class in it’s own right. You can pass a class around in a variable, but it has functions and properties of it’s own. It’s a bit chunky, but it handles one of the biggest issues I have with bash scripts – capturing both the “normal” output (stdout) and the “error” output (stderr) without needing to put that into an external file you can read later to work out what you saw, as well as storing the return, exit or error code.

# Add these extra imports
import os
import subprocess

class RunCommand:
    command = ''
    cwd = ''
    running_env = {}
    stdout = []
    stderr = []
    exit_code = 999

    def __init__(
      self,
      command: list = [], 
      cwd: str = None,
      env: dict = None,
      raise_on_error: bool = True
    ):
        self.command = command
        self.cwd = cwd
        
        self.running_env = os.environ.copy()

        if env is not None and len(env) > 0:
            for env_item in env.keys():
                self.running_env[env_item] = env[env_item]

        logger.debug(f'exec: {" ".join(command)}')

        try:
            result = subprocess.run(
                command,
                cwd=cwd,
                capture_output=True,
                text=True,
                check=True,
                env=self.running_env
            )
            # Store the result because it worked just fine!
            self.exit_code = 0
            self.stdout = result.stdout.splitlines()
            self.stderr = result.stderr.splitlines()
        except subprocess.CalledProcessError as e:
            # Or store the result from the exception(!)
            self.exit_code = e.returncode
            self.stdout = e.stdout.splitlines()
            self.stderr = e.stderr.splitlines()

        # If verbose mode is on, output the results and errors from the command execution
        if len(self.stdout) > 0:
            logger.debug(f'stdout: {self.list_to_newline_string(self.stdout)}')
        if len(self.stderr) > 0:
            logger.debug(f'stderr: {self.list_to_newline_string(self.stderr)}')

        # If it failed and we want to raise an exception on failure, record the command and args
        # then Raise Away!
        if raise_on_error and self.exit_code > 0:
            command_string = None
            args = []
            for element in command:
                if not command_string:
                    command_string = element
                else:
                    args.append(element)

            raise Exception(
                f'Error ({self.exit_code}) running command {command_string} with arguments {args}\nstderr: {self.stderr}\nstdout: {self.stdout}')

    def __repr__(self) -> str: # Return a string representation of this class
        return "\n".join(
            [
               f"Command: {self.command}",
               f"Directory: {self.cwd if not None else '{current directory}'}",
               f"Env: {self.running_env}",
               f"Exit Code: {self.exit_code}",
               f"nstdout: {self.stdout}",
               f"stderr: {self.stderr}" 
            ]
        )

    def list_to_newline_string(self, list_of_messages: list):
        return "\n".join(list_of_messages)

So, how do we use this?

Well… you can do this: prog = RunCommand(['ls', '/tmp', '-l']) with which we’ll get back the prog object. If you literally then do print(prog) it will print the result of the __repr__() function:

Command: ['ls', '/tmp', '-l']
Directory: current directory
Env: <... a collection of things from your environment ...>
Exit Code: 0
stdout: total 1
drwx------ 1 root  root  0 Jan 1 01:01 somedir
stderr:

But you can also do things like:

for line in prog.stdout:
  print(line)

or:

try:
  prog = RunCommand(['false'], raise_on_error=True)
catch Exception as e:
  logger.error(e)
  exit(e.exit_code)

Putting it together

So, I wrote all this up into a git repo, that you’re more than welcome to take your own inspiration from! It’s licenced under an exceptional permissive license, so you can take it and use it without credit, but if you want to credit me in some way, feel free to point to this blog post, or the git repo, which would be lovely of you.

Github: JonTheNiceGuy/python_shell_script_template

Featured image is “The Conch” by “Kurtis Garbutt” on Flickr and is released under a CC-BY license.

Using #NetworkFirewall and #Route53 #DNS #Firewall to protect a private subnet’s egress traffic in #AWS

2024-01-142024-01-14 JonTheNiceGuy 8 Comments

I wrote this post in January 2023, and it’s been languishing in my Drafts folder since then. I’ve had a look through it, and I can’t see any glaring reasons why I didn’t publish it so… it’s published… Enjoy 😁

If you’ve ever built a private subnet in AWS, you know it can be a bit tricky to get updates from the Internet – you end up having a NAT gateway or a self-managed proxy, and you can never be 100% certain that the egress traffic isn’t going somewhere you don’t want it to.

In this case, I wanted to ensure that outbound HTTPS traffic was being blocked if the SNI didn’t explicitly show the DNS name I wanted to permit through, and also, I only wanted specific DNS names to resolve. To do this, I used AWS Network Firewall and Route 53 DNS Firewall.

I’ve written this blog post, and followed along with this, I’ve created a set of terraform files to represent the steps I’ve taken.

The Setup

Let’s start this story from a simple VPC with three private subnets for my compute resources, and three private subnets for the VPC Endpoints for Systems Manager (SSM).

Here’s our network diagram, with the three subnets containing the VPC Endpoints at the top, and the three instances at the bottom.

I’ve created a tag in my Github repo at this “pre-changes” state, called step 1.

At this point, none of those instances can reach anything outside the network, with the exception of the SSM environment. So, we can’t install any packages, we can’t get data from outside the network or anything similar.

Getting Protected Internet Access

In order to get internet access, we need to add 4 things;

An internet gateway
A NAT gateway in each AZ
Which needs three new subnets
And three Elastic IP addresses
Route tables in all the subnets

To clarify, a NAT gateway acts like a DSL router. It hides the source IP address of outbound traffic behind a single, public IP address (using an Elastic IP from AWS), and routes any return traffic back to wherever that traffic came from. To reduce inter-AZ data transfer rates, I’m putting one in each AZ, but if there’s not a lot of outbound traffic or the outbound traffic isn’t critical enough to require resiliency, this could all be centralised to a single NAT gateway. To put a NAT gateway in each AZ, you need a subnet in each AZ, and to get out to the internet (by whatever means you have), you need an internet gateway and route tables for how to reach the NAT and internet gateways.

We also should probably add, at this point, four additional things.

The Network Firewall
Subnets for the Firewall interfaces
Stateless Policy
Stateful Policy

The Network Firewall acts like a single appliance, and uses a Gateway Load Balancer to present an interface into each of the availability zones. It has a stateless policy (which is very fast, but needs to address both inbound and outbound traffic flows) to do IP and Port based filtering (referred to as “Layer 3” filtering) and then specific traffic can be passed into a stateful policy (which is slower) to do packet and flow inspection.

In this case, I only want outbound HTTPS traffic to be passed, so my stateless rule group is quite simple;

VPC range on any port → Internet on TCP/443; pass to Stateful rule groups
Internet on TCP/443 → VPC range on any port; pass to Stateful rule groups

I have two stateful rule groups, one is defined to just allow access out to example.com and any relevant subdomains, using the “Domain List” stateful policy item. The other allows access to example.org and any relevant subdomains, using a Suricata stateful policy item, to show the more flexible alternative route. (Suricata has lots more filters than just the SNI value, you can check for specific SSH versions, Kerberos CNAMEs, SNMP versions, etc. You can also add per-rule logging this way, which you can’t with the Domain List route).

These are added to the firewall policy, which also defines that if a rule doesn’t match a stateless rule group, or an established flow doesn’t match a stateful rule group, then it should be dropped.

New network diagram with more subnets and objects, but essentially, as described in the paragraphs above. Traffic flows from the instances either down towards the internet, or up towards the VPCe.

I’ve created a tag in my Github repo at this state, with the firewall, NAT Gateway and Internet Gateway, called step 2.

So far, so good… but why let our users even try to resolve the DNS name of a host they’re not permitted to reach. Let’s turn on DNS Firewalling too.

Turning on Route 53 DNS Firewall

You’ll notice that in the AWS Network Firewall, I didn’t let DNS out of the network. This is because, by default, AWS enables Route 53 as it’s local resolver. This lives on the “.2” address of the VPC, so in my example environment, this would be 198.18.0.2. Because it’s a local resolver, it won’t cross the Firewall exiting to the internet. You can also make Route 53 use your own DNS servers for specific DNS resolution (for example, if you’re running an Active Directory service inside your network).

Any Network Security Response team members you have working with you would appreciate it if you’d turn on DNS Logging at this point, so I’ll do it too!

In March 2021, AWS announced “Route 53 DNS Firewall”, which allow this DNS resolver to rewrite responses, or even to completely deny the existence of a DNS record. With this in mind, I’m going to add some custom DNS rules.

The first thing I want to do is to only permit traffic to my specific list of DNS names – example.org, example.com and their subdomains. DNS quite likes to terminate DNS names with a dot, signifying it shouldn’t try to resolve any higher up the chain, so I’m going to make a “permitted domains” DNS list;

example.com.
example.org.
*.example.com.
*.example.org.

Nice and simple! Except, this also stops me from being able to access the instances over SSM, so I’ll create a separate “VPCe” DNS list:

ssm.ex-ample-1.amazonaws.com.
*.ssm.ex-ample-1.amazonaws.com.
ssmmessages.ex-ample-1.amazonaws.com.
*.ssmmessages.ex-ample-1.amazonaws.com.
ec2messages.ex-ample-1.amazonaws.com.
*.ec2messages.ex-ample-1.amazonaws.com.

Next I create a “default deny” DNS list:

*.

And then build a DNS Firewall Policy which allows access to the “permitted domains”, “VPCe” lists, but blocks resolution of any “default deny” entries.

I’ve created a tag in my Github repo at this state, with the Route 53 DNS Firewall configured, called step 3.

In conclusion…

So there we have it. While the network is not “secure” (there’s still a few gaps here) it’s certainly MUCH more secure than it was, and it certainly would take a lot more work for anyone with malicious intent to get your content out.

Feel free to have a poke around, and leave comments below if this has helped or is of interest!

"Fishing fleet" by "Nomad Tales" on Flickr

Using Terraform to select multiple Instance Types for an Autoscaling Group in AWS

2024-01-042024-01-04 JonTheNiceGuy Leave a comment

Tale as old as time, the compute instance type you want to use in AWS is highly contested (or worse yet, not as available in every availability zone in your region)! You plead with your TAM or AM “Please let us have more of that instance type” only to be told “well, we can put in a request, but… haven’t you thought about using a range of instance types”?

And yes, I’ve been on both sides of that conversation, sadly.

The commented terraform

# This is your legacy instance_type variable. Ideally we'd have
# a warning we could raise at this point, telling you not to use
# this variable, but... it's not ready yet.
variable "instance_type" {
  description = "The legacy single-instance size, e.g. t3.nano. Please migrate to instance_types ASAP. If you specify instance_types, this value will be ignored."
  type        = string
  default     = null
}

# This is your new instance_types value. If you don't already have
# some sort of legacy use of the instance_type variable, then don't
# bother with that variable or the locals block below!
variable "instance_types" {
  description = "A list of instance sizes, e.g. [t2.nano, t3.nano] and so on."
  type        = list(string)
  default     = null
}

# Use only this locals block (and the value further down) if you
# have some legacy autoscaling groups which might use individual
# instance_type sizes.
locals {
  # This means if var.instance_types is not defined, then use it,
  # otherwise create a new list with the single instance_type
  # value in it!
  instance_types = var.instance_types != null ? var.instance_types : [ var.instance_type ]
}

resource "aws_launch_template" "this" {
  # The prefix for the launch template name
  # default "my_autoscaling_group"
  name_prefix = var.name

  # The AMI to use. Calculated outside this process.
  image_id = data.aws_ami.this.id

  # This block ensures that any new instances are created
  # before deleting old ones.
  lifecycle {
    create_before_destroy = true
  }

  # This block defines the disk size of the root disk in GB
  block_device_mappings {
    device_name = data.aws_ami.centos.root_device_name
    ebs {
      volume_size = var.disksize # default "10"
      volume_type = var.disktype # default "gp2"
    }
  }

  # Security Groups to assign to the instance. Alternatively
  # create a network_interfaces{} block with your
  # security_groups = [ var.security_group ] in it.
  vpc_security_group_ids = [ var.security_group ]

  # Any on-boot customizations to make.
  user_data = var.userdata
}

resource "aws_autoscaling_group" "this" {
  # The name of the Autoscaling Group in the Web UI
  # default "my_autoscaling_group"
  name = var.name

  # The list of subnets into which the ASG should be deployed.
  vpc_zone_identifier = var.private_subnets
  # The smallest and largest number of instances the ASG should scale between
  min_size            = var.min_rep
  max_size            = var.max_rep

  mixed_instances_policy {
    launch_template {
      # Use this template to launch all the instances
      launch_template_specification {
        launch_template_id = aws_launch_template.this.id
        version            = "$Latest"
      }

      # This loop can either use the calculated value "local.instance_types"
      # or, if you have no legacy use of this module, remove the locals{}
      # and the variable "instance_type" {} block above, and replace the
      # for_each and instance_type values (defined as "local.instance_types")
      # with "var.instance_types".
      #
      # Loop through the whole list of instance types and create a
      # set of "override" values (the values are defined in the content{}
      # block).
      dynamic "override" {
        for_each = local.instance_types
        content {
          instance_type = local.instance_types[override.key]
        }
      }
    }

    instances_distribution {
      # If we "enable spot", then make it 100% spot.
      on_demand_percentage_above_base_capacity = var.enable_spot ? 0 : 100
      spot_allocation_strategy                 = var.spot_allocation_strategy
      spot_max_price                           = "" # Empty string is "on-demand price"
    }
  }
}

So what is all this then?

This is two Terraform resources; an aws_launch_template and an aws_autoscaling_group. These two resources define what should be launched by the autoscaling group, and then the settings for the autoscaling group.

You will need to work out what instance types you want to use (e.g. “must have 16 cores and 32 GB RAM, have an x86_64 architecture and allow up to 15 Gigabit/second throughput”)

When might you use this pattern?

If you have been seeing messages like “There is no Spot capacity available that matches your request.” or “We currently do not have sufficient <size> capacity in the Availability Zone you requested.” then you need to consider diversifying the fleet that you’re requesting for your autoscaling group. To do that, you need to specify more instance types. To achieve this, I’d use the above code to replace (something like) one of the code samples below.

If you previously have had something like this:

resource "aws_launch_configuration" "this" {
  iam_instance_profile        = var.instance_profile_name
  image_id                    = data.aws_ami.this.id
  instance_type               = var.instance_type
  name_prefix                 = var.name
  security_groups             = [ var.security_group ]
  user_data_base64            = var.userdata
  spot_price                  = var.spot_price

  root_block_device {
    volume_size = var.disksize
  }

  lifecycle {
    create_before_destroy = true
  }
}

resource "aws_autoscaling_group" "this" {
  capacity_rebalance   = false
  launch_configuration = aws_launch_configuration.this.id
  max_size             = var.max_rep
  min_size             = var.min_rep
  name                 = var.name
  vpc_zone_identifier  = var.private_subnets
}

Or this:

resource "aws_launch_template" "this" {
  lifecycle {
    create_before_destroy = true
  }

  block_device_mappings {
    device_name = data.aws_ami.this.root_device_name
    ebs {
      volume_size = var.disksize
    }
  }

  iam_instance_profile {
    name = var.instance_profile_name
  }

  network_interfaces {
    associate_public_ip_address = true
    security_groups             = local.node_security_groups
  }

  image_id      = data.aws_ami.this.id
  name_prefix   = var.name
  instance_type = var.instance_type
  user_data     = var.userdata

  instance_market_options {
    market_type = "spot"
    spot_options {
      spot_instance_type = "one-time"
    }
  }

  metadata_options {
    http_tokens                 = var.imds == 1 ? "optional" : "required"
    http_endpoint               = "enabled"
    http_put_response_hop_limit = 1
  }
}

resource "aws_autoscaling_group" "this" {
  name                = var.name
  vpc_zone_identifier = var.private_subnets
  min_size            = var.min_rep
  max_size            = var.max_rep

  launch_template {
    id      = aws_launch_template.this.id
    version = "$Latest"
  }
}

Then this new method is a much better idea :) Even more so if you had two launch templates to support spot and non-spot instance types!

Hat-tip to former colleague Paul Moran who opened my eyes to defining your fleet of variable instance types, as well as to my former customer (deliberately unnamed) and my current employer who both stumbled into the same documentation issue. Without Paul’s advice with my prior customer’s issue I’d never have known what I was looking for this time around!

Featured image is “Fishing fleet” by “Nomad Tales” on Flickr and is released under a CC-BY-SA license.

An open padlock with a key inserted into it, on a printed circuit board

Pulling container images from private registries (including Docker Hub) with a Kubernetes Kubelet Credential Provider

2023-11-062023-11-06 JonTheNiceGuy Leave a comment

At work last week, I finally solved an issue by writing some code, and I wanted to explain why I wrote it.

At it’s core, Kubernetes is an orchestrator which runs “Container Images”, which are structured filesystem snapshots, taken after running individual commands against a base system. These container images are stored in a container registry, and the most well known of these is the Docker registry, known as Docker Hub.

A registry can be public, meaning you don’t need credentials to get any images from it, or private. Some also offer a mixed-mode where you can make a certain number of requests without requiring authentication, but if you need more than that amount of requests, you need to provide credentials.

During the build-out of a new cluster, I discovered that the ECR (Elastic Container Registry) from AWS requires a new type of authentication – the Kubelet Credential Provider, which required the following changes:

In /etc/sysconfig/kubelet you provide these two switches;
--image-credential-provider-bin-dir /usr/local/bin/image-credential-provider and --image-credential-provider-config /etc/kubernetes/image-credential-provider-config.json.
In /etc/kubernetes/image-credential-provider-config.json you provide a list of registries and the credential provider to use, which looks like this:

{
  "apiVersion": "kubelet.config.k8s.io/v1",
  "kind": "CredentialProviderConfig",
  "providers": [
    {
      "name": "binary-credential-provider-name",
      "matchImages": [
        "example.org",
        "registry.*.example.org",
        "*.registry.*.example.org"
      ],
      "defaultCacheDuration": "12h",
      "apiVersion": "credentialprovider.kubelet.k8s.io/v1"
    }
  ]
}

Downloading and placing the credential provider binary into the /usr/local/bin/image-credential-provider path.

The ECR Credential Provider has it’s own Github repostitory, and it made me think – we’ve been using the “old” method of storing credentials using the containerd configuration file, which is now marked as deprecated – but this means that any changes to these credentials would require a restart of the containerd service (which apparently used to have a big impact on the platform), but this new ECR provider doesn’t.

I decided to write my own Credential Provider, following the documentation for the Kubelet Credential Provider API and I wrote it in Python – a language I’m trying to get better in! (Pull requests, feature requests, etc. are all welcome!)

I will confess I made heavy use of ChatGPT to get a steer on certain aspects of how to write the code, but all the code is generic and there’s nothing proprietary in this code.

Using the Generic Credential Provider

Follow the steps above – change your Kubernetes environment to ensure you have the kubelet configuration changes and the JSON credential provider configuration put in the relevant parts of your tree. Set the “matchImages” values to include the registry in question – for dockerhub, I’d probably use ["docker.io", "*.docker.io"]
Download the generic-credential-provider script from Github, put it in the right path in your worker node’s filesystem (if you followed my notes above it’ll be in /usr/local/bin/image-credential-provider/generic-credential-provider but this is *your* system we’re talking about, not mine! You know your build better than I do!)
Create the /etc/kubernetes/registries directory – this can be changed by editing the script to use a new path, and for testing purposes there is a flag --credroot /some/new/path but that doesn’t work for the kubelet configuration file.
Create a credential file, for example, /etc/kubernetes/registries/example.org.json which contains this string: {"username":"token_username","password":"token_password"}. [Yes, it’s a plaintext credential. Make sure it’s scoped for only image downloads. No, this still isn’t very good. But how else would you do this?! (Pull requests are welcomed!)] You can add a duration value into that JSON dictionary, to change the default timeout from 5 minutes. Technically, the default is actually set in /etc/kubernetes/image-credential-provider-config.json but I wanted to have my own per-credential, and as these values are coming from the filesystem, and therefore has very little performance liability, I didn’t want to have a large delay in the cache.
Test your credential! This code is what I used:

echo '{
  "apiVersion": "credentialprovider.kubelet.k8s.io/v1",
  "kind": "CredentialProviderRequest",
  "image": "your.registry.example.org/org/image:version"
}' | /usr/local/bin/image-credential-provider/generic-credential-provider

which should return:

'{"kind": "CredentialProviderResponse", "apiVersion": "credentialprovider.kubelet.k8s.io/v1", "cacheKeyType": "Registry", "cacheDuration": "0h5m0s", "auth": {"your.registry.example.com": {"username": "token_username", "password": "token_password"}}}'

You should also see an entry in your syslog service showing a line that says “Credential request fulfilled for your.registry.example.com” and if you pass it a check that it fails, it should say “Failed to fulfill credential request for failure.example.org“.

If this helped you, please consider buying me a drink to say thanks!

Featured image is “Padlock on computer parts” by “Marco Verch Professional Photographer” on Flickr and is released under a CC-BY license.

A green notice board in a country setting. It has leaflets and cards on it, although they are not readable in this image.

Create yourself a “Work Profile” to let others know how (and when) to contact you!

2023-10-172023-10-17 JonTheNiceGuy 2 Comments

I recently got talking to a colleague about how people prefer to work and how they prefer to be contacted. It’s obvious in an office – if Bob isn’t there, then he’s not around, but when some of the team is remote, some are hybrid working, then it’s a lot harder.

There are three things I’ve found are really useful to know when trying to reach someone, and I’ve written this up in a simple page stored on our internal wiki;

What’s your baseline – where do you live and when are you usually in the office.
What are your usual working hours – how accurate is your calendar for non-meetings? do you have fixed meetings that happen every week, or a school run that you typically do? Do you need to be away from your desk at certain times for religious reasons?
What’s the best way to contact you – if you’ve got a choice of tools (like Slack Hangouts or Google Meet) which would you rather use, and why. Is it best to drop in a 15 minute appointment, or just call you?

Once you’ve got these three items, in something everyone can access, add it to your directory profile, bio on slack, your email signature (for internal emails) and so on.

From here to the end of the post is a mildly sanitised version of my internally posted profile. I hope it’s useful to you!

Baseline

I am based in the UK, using the Europe/London time zone. I am remote based with very infrequent visits to the London office.

Typical Working Hours Patterns

I work from Monday to Friday, normally starting at X and finishing at X. During school term times, I will be out of the office between 3:00PM and 3:45PM to do school drop-off and pick ups. On Monday to Thursday, I am in a stand-up from X until Y. I will typically take my lunch break between X and Y. On Friday I have a weekly one-to-one which starts at X and finishes at Y. I will then take lunch until 1:00PM.

During school holidays, the start and end times will need to be a bit more flexible, and drop-off and pick-up slots will vary based on day-to-day activities.

I will keep my calendar up-to-date accordingly.

Contact Preference

I prefer being contacted by Slack mention or DM, however, I will often follow-up with a request for a DM chat or call, especially if I have been typing a lot during the day, or am trying to resolve an issue which I expect will require a lot of interaction.

I am happy to use Google Meetings, Slack Huddles, Microsoft Teams or Amazon Chime, all of which I have tested and work on my computer. I personally prefer to use Microsoft Teams because the presenter can allow participants to interact with the presenter’s screen or Slack Huddles because that allows participants to draw on the presenters screen, and because I can see more of your screen by default.

Featured image is “Notice board / Bulletin Board” by “Matthew Paul Argall” on Flickr and is released under a CC-BY license.

A Quick Fix for “Backend initialization required” from Terragrunt

2023-10-172023-10-17 JonTheNiceGuy Leave a comment

Today I ran terragrunt apply against a IaC directory, and got this response:

╷
│ Error: Backend initialization required: please run "terraform init"
│ 
│ Reason: Backend configuration block has changed
│ 
│ The "backend" is the interface that Terraform uses to store state,
│ perform operations, etc. If this message is showing up, it means that the
│ Terraform configuration you're using is using a custom configuration for
│ the Terraform backend.
│ 
│ Changes to backend configurations require reinitialization. This allows
│ Terraform to set up the new configuration, copy existing state, etc. Please
│ run
│ "terraform init" with either the "-reconfigure" or "-migrate-state" flags
│ to
│ use the current configuration.
│ 
│ If the change reason above is incorrect, please verify your configuration
│ hasn't changed and try again. At this point, no changes to your existing
│ configuration or state have been made.
╵
ERRO[0000] Hit multiple errors:
Hit multiple errors:
exit status 1

But wait, I hear you say, Terragrunt runs terraform init for you… so what gives?

Well, in this case, the terragrunt.hcl has a dependency block, and one of those dependencies has not run properly, so… let’s fix it

Read the content of your terragrunt.hcl

terraform {
  source = "git@github.com:example/example-terraform-modules.git//module"
}

include {
  path = find_in_parent_folders()
}

dependency "dependency_1" {
  config_path = "${get_terragrunt_dir()}/../dependency"

  mock_outputs_allowed_terraform_commands = ["destroy", "force-unlock"]
  mock_outputs = {
    output_1 = []
    output_2 = ""
  }
}

inputs = {
  name      = "some_module"
  some_key  = dependency.dependency_1.outputs.output_1
  other_key = dependency.dependency_1.outputs.output_2
}

Right, so for some reason the dependency won’t run. Change into that directory, and run terragrunt apply --terragrunt-source-update. Hopefully, you’ll get something like this:

Initializing the backend...

Successfully configured the backend "example"! Terraform will automatically
use this backend unless the backend configuration changes.

Initializing provider plugins...
- Reusing previous version of example/example from the dependency lock file
- Installing example/example v1.0.0...
- Installed example/example v1.0.0 (signed by Example)

Terraform has been successfully initialized!

You may now begin working with Terraform. Try running "terraform plan" to see
any changes that are required for your infrastructure. All Terraform commands
should now work.

If you ever set or change modules or backend configuration for Terraform,
rerun this command to reinitialize your working directory. If you forget, other
commands will detect it and remind you to do so if necessary.
example_module.this: Refreshing state... [id=an-example]

No changes. Your infrastructure matches the configuration.

Terraform has compared your real infrastructure against your configuration
and found no differences, so no changes are needed.

Apply complete! Resources: 0 added, 0 changed, 0 destroyed.

Outputs:
output_1 = {"some_key": "some_value"}
output_2 = "some_string"

You may find yourself having to traverse several different dependencies until you get to the one which is missing… and then it should work :)

Featured image is “Jenga” by “Mara Tr.” on Flickr and is released under a CC-BY license.

Share this:

The scenario

Share this:

Incorrect routing for the 12th IP address onwards

What happens?

Fix the issue

Is this still needed?

Dropping packets to the containers due to Martians

What happens

Fix the issue

Is this still needed?

After 12 IP addresses are assigned to a node, Kubernetes services stop working for some pods

What happens

Fix the issue

Is this still needed?

Share this:

Share this:

Enter Python

Starting with Boilerplate

Starting the script up

Adding argument parsing

Running commands

Putting it together

Share this:

The Setup

Getting Protected Internet Access

Turning on Route 53 DNS Firewall

In conclusion…

Share this:

The commented terraform

So what is all this then?

When might you use this pattern?

Share this:

Using the Generic Credential Provider

Share this:

Baseline

Typical Working Hours Patterns

Contact Preference

Share this:

Share this: