How I deploy Vaultwarden to provide a Bitwarden compatible service in Kubernetes with Monitoring and Backups

2025-04-062025-04-06 JonTheNiceGuy Leave a comment

This initially was going to be a mammoth blog post going through all of the lines of code in how I’ve built a Vaultwarden service in Kubernetes rather than just writing what I’ve done. You can just look at the git repo and see what’s there! Ask for comments on that if you need more details! 😀

So, instead, let me link you to the helm chart and docker containers I created, and I’ll pull out some notes on some of the specific details in there.

https://github.com/JonTheNiceGuy/vaultwarden-helm-chart

This helm chart comprises of the 4 services I feel you need:

vaultwarden <- The actual password safe service
vwmetrics <- Prometheus Metrics for the service
vaultwarden-sync <- A packaged deployment of the directory synchronization tool from Bitwarden
vaultwarden-backup <- A tool to backup the data directory and the database from Bitwarden.

In addition, the chart allows you to provision dynamically allocated Persistent Volumes through a StorageClass, and flexibility to set all of the variables in the Vaultwarden settings file.

The biggest “weird-ish” thing I’ve done is to create a configuration file as a secret, and mounted that configuration file into the vaultwarden container. This prevents compromised hosts from being able to extract admin tokens and database credentials from process environment variables. That said, it would be better to somehow make this a Read-Once value, which I believe is possible with something like Hashicorp Vault, or SOPS. If you’ve got any advice on how to do this, I’d be very grateful for your advice!

I’m not exactly overjoyed with the vwmetrics, as it doesn’t expose any internal metrics, just a count of the number of each type of asset in the database, but the project are clear they don’t want to add any additional tracing to the application, so this is the best we can do.

vaultwarden-backup is a script I wrote which reads the vaultwarden environment file to get the database credentials and data path, and then backs up both database and non-database files (following the official guidance). In this invocation, the only fields required from the environment file are the path to the data directory and the database credentials are required, so the config secret stores those as a separate key. It also means that this can be just a Read-Only database credential too.

I wrote this script because no-one had released a containerised script that performed the database backup in something other than sqlite that I’d seen.

vaultwarden-sync is a wrapper I wrote to get the Bitwarden Directory Connector, and setup the configuration files to support performing LDAP sync. The other directories have not been tested, but are configured according to the changes to the configuration file when you configure them in the Bitwarden Directory Connector GUI.

I wrote this script because I couldn’t see any way to run the Directory Connector as part of an all-in-one set of containers for my cluster.

Both the backup and sync tools use the livenessProbe feature of Kubernetes to execute themselves, and use the termination log as their output method. This is a method one of my colleagues found when we were setting up some inter-cluster communication tests a while ago, and it works really well where you need to see the status of a long running loop.

I should stress, this is not a “fully-packaged” helm chart. It’s a learning aid, both for someone who hasn’t written many helm charts, and for me, to get feedback from people who *do* write lots of helm charts, and are prepared to tell me how I can do better!

Featured image is “Riggs Bank Vault in Washington D.C.” by “Steve Jurvetson” on Flickr and is released under a CC-BY license.

Building a Linux Firewall with AlmaLinux 9, NetworkManager, BGP, DHCP and NFTables with Puppet

2025-02-172025-02-17 JonTheNiceGuy Leave a comment

I’m in the process of building a Network Firewall for a work environment. This blog post is based on that work, but with all the identifying marks stripped off.

For this particular project, we standardised on Alma Linux 9 as the OS Base, and we’ve done some testing and proved that the RedHat default firewalling product, Firewalld, is not appropriate for this platform, but did determine that NFTables, or NetFilter Tables (the successor to IPTables) is.

I’ll warn you, I’m pretty prone to long and waffling posts, but there’s a LOT of technical content in this one. There is also a Git repository with the final code. I hope that you find something of use in here.

This document explains how it is using Vagrant with Virtualbox to build a test environment, how it installs a Puppet Server and works out how to calculate what settings it will push to it’s clients. With that puppet server, I show how to build and configure a firewall using Linux tools and services, setting up an NFTables policy and routing between firewalls using FRR to provide BGP, and then I will show how to deploy a DHCP server.

Let’s go!

The scenario

A network diagram, showing a WAN network attached to the top of firewall devices and out via the Host machine, a transit network linking the bottom of the firewall devices, and attached to the side, networks identified as "Prod", "Dev" and "DHCP" each with IP allocations indicated.

To prove the concept, I have built two Firewall machines (A and B), plus six hosts, one attached to each of the A and B side subnets called “Prod”, “Dev” and “Shared”.

Any host on any of the “Prod” networks should be able to speak to any host on any of the other “Prod” networks, or back to the “Shared” networks. Any host on any of the “Dev” networks should be able to speak to any host on the other “Dev” networks, or back to the “Shared” networks.

Any host in Prod, Dev or Shared should be able to reach the internet, and shared can reach any of the other networks.

"Apoptosis Network (alternate)" by "Simon Cockell" on Flickr

A few weird issues in the networking on our custom AWS EKS Workers, and how we worked around them

2024-12-032024-12-03 JonTheNiceGuy Leave a comment

For “reasons”, at work we run AWS Elastic Kubernetes Service (EKS) with our own custom-built workers. These workers are based on Alma Linux 9, instead of AWS’ preferred Amazon Linux 2023. We manage the deployment of these workers using AWS Auto-Scaling Groups.

Our unusal configuration of these nodes mean that we sometimes trip over configurations which are tricky to get support on from AWS (no criticism of their support team, if I was in their position, I wouldn’t want to try to provide support for a customer’s configuration that was so far outside the recommended configuration either!)

Over the past year, we’ve upgraded EKS1.23 to EKS1.27 and then on to EKS1.31, and we’ve stumbled over a few issues on the way. Here are a couple of notes on the subject, in case they help anyone else in their journey.

All three of the issues below were addressed by running an additional service on the worker nodes in a Systemd timed service which triggers every minute.

Incorrect routing for the 12th IP address onwards

Something the team found really early on (around EKS 1.18 or somewhere around there) was that the AWS VPC-CNI wasn’t managing the routing tables on the node properly. We raised an issue on the AWS VPC CNI (we were on CentOS 7 at the time) and although AWS said they’d fixed the issue, we currently need to patch the routing tables every minute on our nodes.

What happens?

When you get past the number of IP addresses that a single ENI can have (typically ~12), the AWS VPC-CNI will attach a second interface to the worker, and start adding new IP addresses to that. The VPC-CNI should setup routing for that second interface, but for some reason, in our case, it doesn’t. You can see this happens because the traffic will come in on the second ENI, eth1, but then try to exit the node on the first ENI, eth0, with a tcpdump, like this:

[root@test-i-01234567890abcdef ~]# tcpdump -i any host 192.0.2.123
tcpdump: data link type LINUX_SLL2
dropped privs to tcpdump
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes
09:38:07.331619 eth1  In  IP ip-192-168-1-100.eu-west-1.compute.internal.41856 > ip-192-0-2-123.eu-west-1.compute.internal.irdmi: Flags [S], seq 1128657991, win 64240, options [mss 1359,sackOK,TS val 2780916192 ecr 0,nop,wscale 7], length 0
09:38:07.331676 eni989c4ec4a56 Out IP ip-192-168-1-100.eu-west-1.compute.internal.41856 > ip-192-0-2-123.eu-west-1.compute.internal.irdmi: Flags [S], seq 1128657991, win 64240, options [mss 1359,sackOK,TS val 2780916192 ecr 0,nop,wscale 7], length 0
09:38:07.331696 eni989c4ec4a56 In  IP ip-192-0-2-123.eu-west-1.compute.internal.irdmi > ip-192-168-1-100.eu-west-1.compute.internal.41856: Flags [S.], seq 3367907264, ack 1128657992, win 26847, options [mss 8961,sackOK,TS val 1259768406 ecr 2780916192,nop,wscale 7], length 0
09:38:07.331702 eth0  Out IP ip-192-0-2-123.eu-west-1.compute.internal.irdmi > ip-192-168-1-100.eu-west-1.compute.internal.41856: Flags [S.], seq 3367907264, ack 1128657992, win 26847, options [mss 8961,sackOK,TS val 1259768406 ecr 2780916192,nop,wscale 7], length 0

The critical line here is the last one – it’s come in on eth1 and it’s going out of eth0. Another test here is to look at ip rule

[root@test-i-01234567890abcdef ~]# ip rule
0:	from all lookup local
512:	from all to 192.0.2.111 lookup main
512:	from all to 192.0.2.143 lookup main
512:	from all to 192.0.2.66 lookup main
512:	from all to 192.0.2.113 lookup main
512:	from all to 192.0.2.145 lookup main
512:	from all to 192.0.2.123 lookup main
512:	from all to 192.0.2.5 lookup main
512:	from all to 192.0.2.158 lookup main
512:	from all to 192.0.2.100 lookup main
512:	from all to 192.0.2.69 lookup main
512:	from all to 192.0.2.129 lookup main
1024:	from all fwmark 0x80/0x80 lookup main
1536:	from 192.0.2.123 lookup 2
32766:	from all lookup main
32767:	from all lookup default

Notice here that we have two entries from all to 192.0.2.123 lookup main and from 192.0.2.123 lookup 2. Let’s take a look at what lookup 2 gives us, in the routing table

[root@test-i-01234567890abcdef ~]# ip route show table 2
192.0.2.1 dev eth1 scope link

Fix the issue

This is pretty easy – we need to add a default route if one doesn’t already exist. Long before I got here, my boss created a script which first runs ip route show table main | grep default to get the gateway for that interface, then runs ip rule list, looks for each lookup <number> and finally runs ip route add to put the default route on that table, the same as on the main table.

ip route add default via "${GW}" dev "${INTERFACE}" table "${TABLE}"

Is this still needed?

I know when we upgraded our cluster from EKS1.23 to EKS1.27, this script was still needed. When I’ve just checked a worker running EKS1.31, after around 12 hours of running, and a second interface being up, it’s not been needed… so perhaps we can deprecate this script?

Dropping packets to the containers due to Martians

When we upgraded our cluster from EKS1.23 to EKS1.27 we also changed a lot of the infrastructure under the surface (AlmaLinux 9 from CentOS7, Containerd and Runc from Docker, CGroups v2 from CGroups v1, and so on). We also moved from using an AWS Elastic Load Balancer (ELB) or “Classic Load Balancer” to AWS Network Load Balancer (NLB).

Following the upgrade, we started seeing packets not arriving at our containers and the system logs on the node were showing a lot of martian source messages, particularly after we configured our NLB to forward original IP source addresses to the nodes.

What happens

One thing we noticed was that each time we added a new pod to the cluster, it added a new eni[0-9a-f]{11} interface, but the sysctl value for net.ipv4.conf.<interface>.rp_filter (return path filtering – basically, should we expect the traffic to be arriving at this interface for that source?) in sysctl was set to 1 or “Strict mode” where the source MUST be the coming from the best return path for the interface it arrived on. The AWS VPC-CNI is supposed to set this to 2 or “Loose mode” where the source must be reachable from any interface.

In this case you’d tell this because you’d see this in your system journal (assuming you’ve got net.ipv4.conf.all.log_martians=1 configured):

Dec 03 10:01:19 test-i-01234567890abcdef kernel: IPv4: martian source 192.168.1.100 from 192.0.2.123, on dev eth1

The net result is that packets would be dropped by the host at this point, and they’d never be received by the containers in the pods.

Fix the issue

This one is also pretty easy. We run sysctl -a and loop through any entries which match net.ipv4.conf.([^\.]+).rp_filter = (0|1) and then, if we find any, we run sysctl -w net.ipv4.conf.\1.rp_filter = 2 to set it to the correct value.

Is this still needed?

Yep, absolutely. As of our latest upgrade to EKS1.31, if this value isn’t set, then it will drop packets. VPC-CNI should be fixing this, but for some reason it doesn’t. And setting the conf.ipv4.all.rp_filter to 2 doesn’t seem to make a difference, which is contrary to the documentation in the relevant Kernel documentation.

After 12 IP addresses are assigned to a node, Kubernetes services stop working for some pods

This was pretty weird. When we upgraded to EKS1.31 on our smallest cluster we initially thought we had an issue with CoreDNS, in that it sometimes wouldn’t resolve IP addresses for services (DNS names for services inside the cluster are resolved by <servicename>.<namespace>.svc.cluster.local to an internal IP address for the cluster – in our case, in the range 172.20.0.0/16). We upgraded CoreDNS to the EKS1.31 recommended version, v1.11.3-eksbuild.2 and that seemed to fix things… until we upgraded our next largest cluster, and things REALLY went wrong, but only when we had increased to over 12 IP addresses assigned to the node.

You might see this as frequent restarts of a container, particularly if you’re reliant on another service to fulfil an init container or the liveness/readyness check.

What happens

EKS1.31 moves KubeProxy from iptables or ipvs mode to nftables – a shift we had to make internally as AlmaLinux 9 no longer supports iptables mode, and ipvs is often quite flaky, especially when you have a lot of pod movements.

With a single interface and up to 11 IP addresses assigned to that interface, everything runs fine, but the moment we move to that second interface, much like in the first case above, we start seeing those pods attached to the second+ interface being unable to resolve service addresses. On further investigation, doing a dig from a container inside that pod to the service address of the CoreDNS service 172.20.0.10 would timeout, but a dig against the actual pod address 192.0.2.53 would return a valid response.

Under the surface, on each worker, KubeProxy adds a rule to nftables to say “if you try and reach 172.20.0.10, please instead direct it to 192.0.2.53”. As the containers fluctuate inside the cluster, KubeProxy is constantly re-writing these rules. For whatever reason though, KubeProxy currently seems unable to determine that a second or subsequent interface has been added, and so these rules are not applied to the pods attached to that interface…. or at least, that’s what it looks like!

Fix the issue

In this case, we wrote a separate script which was also triggered every minute. This script looks to see if the interfaces have changed by running ip link and looking for any interfaces called eth[0-9]+ which have changed, and then if it has, it runs crictl pods (which lists all the running pods in Containerd), looks for the Pod ID of KubeProxy, and then runs crictl stopp <podID> [1] and crictl rmp <podID> [1] to stop and remove the pod, forcing kubelet to restart the KubeProxy on the node.

[1] Yes, they aren’t typos, stopp means “stop the pod” and rmp means “remove the pod”, and these are different to stop and rm which relate to the container.

Is this still needed?

As this was what I was working on all-day yesterday, yep, I’d say so 😊 – in all seriousness though, if this hadn’t been a high-priority issue on the cluster, I might have tried to upgrade the AWS VPC-CNI and KubeProxy add-ons to a later version, to see if the issue was resolved, but at this time, we haven’t done that, so maybe I’ll issue a retraction later 😂

Featured image is “Apoptosis Network (alternate)” by “Simon Cockell” on Flickr and is released under a CC-BY license.

I just want to note that Will Jessop noticed a significant typo in this post within an hour of my posting. The post was updated accordingly. Will is awesome and super lovely. Thanks Will!

Two pages from an old notebook with slightly yellowing paper, and black ink cursive writing and occasional doodles filling the pages

This little #bash script will make capturing #output from lots of #scripts a lot easier

2024-10-212024-10-21 JonTheNiceGuy Leave a comment

A while ago, I was asked to capture a LOT of data for a support case, where they wanted lots of commands to be run, like “kubectl get namespace” and then for each namespace, get all the pods with “kubectl get pods -n $namespace” and then describe each pod with “kubectl get pod -n namespace $podname”. Then do the same with all the services, deployments, ingresses and endpoints.

I wrote this function, and a supporting script to execute the actual checks, and just found it while clearing up!

#!/bin/bash

filename="$(echo $* | sed -E -e 's~[ -/\\]~_~g').log"
echo "\$ $@" | tee "${filename}"
$@ 2>&1 | tee -a "${filename}"

This script is quite simple, it does three things

Take the command you’re about to run, strip all the non-acceptable-filename characters out and replace them with underscores, and turn that into the output filename.
Write the command into the output file, replacing any prior versions of that file
Execute the command, and append the log to the output file.

So, how do you use this? Simple

log_result my-command --with --all --the options

This will produce a file called my-command_--with_--all_--the_options.log that contains this content:

$ my-command --with --all --the options
Congratulations, you ran my-command and turned on the options "--with --all --the options". Nice one!

… oh, and the command I ran to capture the data for the support case?

log_result kubectl get namespace
for TYPE in pod ingress service deployment endpoints
do
  for ns in $(kubectl get namespace | grep -v NAME | awk '{print $1}' )
  do
    echo $ns
    for item in $(kubectl get $TYPE -n $ns | grep -v NAME | awk '{print $1}')
    do
      log_result kubectl get $TYPE -n $ns $item -o yaml
      log_result kubectl describe $TYPE -n $ns $item
    done
  done
done

Featured image is “Travel log texture” by “Mary Vican” on Flickr and is released under a CC-BY license.

A scuffed painting on what appears to be a bin. The painting is of an orangutan holding up a sign saying "Don't Panic".

Mounting a damaged #ZFS Pool disk to recover data

2024-06-292024-06-29 JonTheNiceGuy Leave a comment

TL;DR? zpool import -d /dev/sdb1 -o readonly=on -R /recovery/poolname poolname

I have a pair of Proxmox servers, each with a single ZFS drive attached, with GlusterFS over the top to provide storage to the VMs.

Last week I had a power outage which took both nodes offline. When the power came back on, one node’s system drive had failed entirely and during recovery the second machine refused to restart some of the VMs.

Rather than try to fix things properly, I decided to “Nuke-and-Pave”, a decision I’m now regretting a little!

I re-installed one of the nodes OK, set up the new ZFS drive, set up Gluster and then started transferring the content from the old machine to the new one.

During the file transfer, I saw a couple of messages about failed blocks, and finally got a message from the cluster about how the pool was considered degraded, but as this was largely performed while I was asleep, I didn’t notice until I woke up… when the new node was offline.

I connected a Keyboard and Monitor to the box and saw a kernel panic. I rebooted the node, and during the boot sequence, just after the Systemd service that scanned the ZFS pool, it panicked again.

Unplugging the data drive from the machine and rebooting it, the node came up just fine.

I plugged the drive into my laptop and ran zpool import -d /dev/sdb1 -R /recovery/poolname poolname and my laptop crashed (although, I was running this in GUI mode, so I don’t know if it was a kernel panic or “just” a crash.)

Finally, I ran zpool import -d /dev/sdb1 -o read-only=on -R /recovery/poolname poolname and the drive came up in /recovery/poolname, so I could transfer files off to another drive until I figure out what’s going on!

Once I was done, I ran zfs unmount poolname and was able to detach the disk from the device.

Featured image is “don’t panic orangutan” by “Esperluette” on Flickr and is released under a CC-BY license.

A colour photograph of a series of cogs and gears interlinked to create a machine

Making .bashrc more manageable

2024-06-062024-06-06 JonTheNiceGuy Leave a comment

How many times have you seen an instruction in a setup script which says “Now add source <(somescript completion bash) to your ~/.bashrc file” or “Add export SOMEVAR=abc123 to your .bashrc file”?

This is great when it’s one or two lines, but for a big chunk of them? Whew!

Instead, I created this block in mine:

if [ -d ~/.bash_extensions.d ]; then
    for extension in ~/.bash_extensions.d/[a-zA-Z0-9]*
    do
        . "$extension"
    done
fi

This dynamically loads all the files in ~/.bash_extensions.d/ which start with a letter or a digit, so it means I can manage when things get loaded in, or removed from my bash shell.

For example, I recently installed the pre-release of Atuin, so my ~/.bash_extensions.d/atuin file looks like this:

source $HOME/.atuin/bin/env
eval "$(atuin init bash --disable-up-arrow)"

And when I installed direnv, I created ~/.bash_extensions.d/direnv which has this in it:

eval "$(direnv hook bash)"

This is dead simple, and now I know that if I stop using direnv, I just need to remove that file, rather than hunting for a line in .bashrc.

Featured image is “Gears gears cogs bits n pieces” by “Les Chatfield” on Flickr and is released under a CC-BY license.

A note to myself; resetting error status on proxmox HA workloads after a crash

2024-03-222024-03-22 JonTheNiceGuy Leave a comment

I’ve had a couple of issues with brown-outs recently which have interrupted my Proxmox server, and stopped my connected disks from coming back up cleanly (yes, I’m working on that separately!) but it’s left me in a state where several of my containers and virtual machines on the cluster are down.

It’s possible to point-and-click your way around this, but far easier to script it!

A failed state may look like this:

root@proxmox1:~# ha-manager status
quorum OK
master proxmox2 (active, Fri Mar 22 10:40:49 2024)
lrm proxmox1 (active, Fri Mar 22 10:40:52 2024)
lrm proxmox2 (active, Fri Mar 22 10:40:54 2024)
service ct:101 (proxmox1, error)
service ct:102 (proxmox2, error)
service ct:103 (proxmox2, error)
service ct:104 (proxmox1, error)
service ct:105 (proxmox1, error)
service ct:106 (proxmox2, error)
service ct:107 (proxmox2, error)
service ct:108 (proxmox1, error)
service ct:109 (proxmox2, error)
service vm:100 (proxmox2, error)

Once you’ve fixed your issue, you can do this on each node:

for worker in $(ha-manager status | grep "($(hostnamectl hostname), error)" | cut -d\  -f2)
do
  echo "Disabling $worker"
  ha-manager set $worker --state disabled
  until ha-manager status | grep "$worker" | grep -q disabled ; do sleep 1 ; done
  echo "Restarting $worker"
  ha-manager set $worker --state started
  until ha-manager status | grep "$worker" | grep -q started ; do sleep 1 ; done
done

Note that this hasn’t been tested, but a scan over it with those nodes working suggests it should. I guess I’ll be updating this the next time I get a brown-out!

A Quick Guide to setting up Traefik on a single Docker node inside your home network

2023-12-302023-12-30 JonTheNiceGuy Leave a comment

I have a small server running Docker for services at home. There are several services which will want to use HTTP, but I can’t have them all sharing the same port without a reverse proxy to manage how to route the traffic to the containers!

This is my guide to how I got Traefik set up to serve HTTP and HTTPS traffic.

The existing setup for one service

Currently, I have phpIPAM which has the following docker-compose.yml file:

version: '3'

services:
  web:
    image: phpipam/phpipam-www:latest
    ports:
      - "80:80"
    cap_add:
      - NET_ADMIN
      - NET_RAW
    environment:
      - TZ=Europe/London
      - IPAM_DATABASE_HOST=db
      - IPAM_DATABASE_USER=someuser
      - IPAM_DATABASE_PASS=somepassword
      - IPAM_DATABASE_WEBHOST=%
    restart: unless-stopped
    volumes:
      - phpipam-logo:/phpipam/css/images/logo
      - phpipam-ca:/usr/local/share/ca-certificates:ro
    depends_on:
      - db

  cron:
    image: phpipam/phpipam-cron:latest
    cap_add:
      - NET_ADMIN
      - NET_RAW
    environment:
      - TZ=Europe/London
      - IPAM_DATABASE_HOST=db
      - IPAM_DATABASE_USER=someuser
      - IPAM_DATABASE_PASS=somepassword
      - SCAN_INTERVAL=1h
    restart: unless-stopped
    volumes:
      - phpipam-ca:/usr/local/share/ca-certificates:ro
    depends_on:
      - db

  db:
    image: mariadb:latest
    environment:
      - MYSQL_USER=someuser
      - MYSQL_PASSWORD=somepassword
      - MYSQL_RANDOM_ROOT_PASSWORD=yes
      - MYSQL_DATABASE=phpipam
    restart: unless-stopped
    volumes:
      - phpipam-db-data:/var/lib/mysql

volumes:
  phpipam-db-data:
  phpipam-logo:
  phpipam-ca:

The moment I want to bind another service to TCP/80, I get an error because we’ve already used TCP/80 for phpIPAM. Enter Traefik. Let’s stop the docker container with docker compose down and build our Traefik setup.

Traefik Setup

I always store my docker compose files in /opt/docker/<servicename>, so let’s create a directory for traefik; sudo mkdir -p /opt/docker/traefik

The (“dynamic”) configuration file

Next we need to create a configuration file called traefik.yaml

# Ensure all logs are sent to stdout for `docker compose logs`
accessLog: {}
log: {}

# Enable docker provider but don't switch it on by default
providers:
  docker:
    exposedByDefault: false
    # Select this as the docker network to connect from traefik to containers
    # This is defined in the docker-compose.yaml file
    network: web

# Enable the API and Dashboard on TCP/8080
api:
  dashboard: true
  insecure: true
  debug: true

# Listen on both HTTP and HTTPS
entryPoints:
  http:
    address: ":80"
    http: {}
  https:
    address: ":443"
    http:
      tls: {}

With the configuration file like this, we’ll serve HTTPS traffic with a self-signed TLS certificate on TCP/443 and plain HTTP on TCP/80. We have a dashboard on TCP/8080 served over HTTP, so make sure you don’t expose *that* to the public internet!

The Docker-Compose File

Next we need the docker-compose file for Traefik, so let’s create docker-compose.yaml

version: '3'

networks:
  web:
    name: web
    attachable: true

services:
  traefik:
    image: traefik:latest
    ports:
      - "8080:8080"
      - "443:443"
      - "80:80"
    networks:
      - web
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
      - ./traefik.yaml:/etc/traefik/traefik.yaml
    restart: always

There are a few parts here which aren’t spelled out on the Traefik quickstart! Firstly, if you don’t define a network, it’ll create one using the docker-compose file path, so probably traefik_traefik or traefik_default, which is not what we want! So, we’ll create one called “web” (but you can call it whatever you want. On other deployments, I’ve used the name “traefik” but I found it tedious to remember how to spell that each time). This network needs to be “attachable” so that other containers can use it later.

You then attach that network to the traefik service, and expose the ports we need (80, 443 and 8080).

And then start the container with docker compose up -d

alpine-docker:/opt/docker/traefik# docker compose up -d
[+] Running 2/2
 ✔ Network web                  Created   0.2s 
 ✔ Container traefik-traefik-1  Started   1.7s 
alpine-docker:/opt/docker/traefik#

Adding Traefik to phpIPAM

Going back to phpIPAM, So that Traefik can reach the containers, and so that the container can reach it’s database, we need two network statements now; the first is the “external” network for the traefik connection which we called “web“. The second is the inter-container network so that the “web” service can reach the “db” service, and so that the “cron” service can reach the “db” service. So we need to add that to the start of /opt/docker/phpipam/docker-compose.yaml, like this;

networks:
  web:
    name: web
    external: true
    attachable: true
  ipam:
    name: ipam

We then need to add both networks that to the “web” container, like this:

services:
  web:
    image: phpipam/phpipam-www:latest
    networks:
      - ipam
      - web
# ...... and the rest of the config

Remove the “ports” block and replace it with an expose block like this:

services:
  web:
# ...... The rest of the config for this service
    ## Don't bind to port 80 - we use traefik now
    # ports:
    #   - "80:80"
    ## Do expose port 80 for Traefik to use 
    expose:
      - 80
# ...... and the rest of the config

And just the inter-container network to the “cron” and “db” containers, like this:

  cron:
    image: phpipam/phpipam-cron:latest
    networks:
      - ipam
# ...... and the rest of the config

  db:
    image: mariadb:latest
    networks:
      - ipam
# ...... and the rest of the config

There’s one other set of changes we need to make in the “web” service, which are to enable Traefik to know that this is a container to look at, and to work out what traffic to send to it, and that’s to add labels, like this:

services:
  web:
# ...... The rest of the config for this service
    labels:
      - traefik.enable=true
      - traefik.http.routers.phpipam.rule=Host(`phpipam.homenet`)
# ...... and the rest of the config

Right, now we run docker compose up -d

alpine-docker:/opt/docker/phpipam# docker compose up -d
[+] Running 4/4
 ✔ Network ipam              Created   0.4s 
 ✔ Container phpipam-db-1    Started   1.4s 
 ✔ Container phpipam-cron-1  Started   2.1s 
 ✔ Container phpipam-web-1   Started   2.6s 
alpine-docker:/opt/docker/phpipam#

If you notice, this doesn’t show to the web network being created (because it was already created by Traefik) but does bring up the container.

Checking to make sure it’s working

A screenshot of the traefik dashboard showing the phpipam service added.

If we head to the Traefik dashboard (http://your-docker-server:8080) you’ll see the phpipam service identified there… yey!

Better TLS with Lets Encrypt

So, at home I actually have a DNS suffix that is a real DNS name. For the sake of the rest of this documentation, assume it’s homenet.sprig.gs (but it isn’t 😁).

This DNS space is hosted by Digital Ocean, so I can use a DNS Challenge with Lets Encrypt to provide hostnames which are not publically accessible. If you’re hosting with someone else, then that’s probably also available – check the Traefik documentation for your specific variables. The table on that page (as of 2023-12-30) shows the environment variables you need to pass to Traefik to get LetsEncrypt working.

A screen capture of the table on the Traefik website, showing the environment variables needed to use the Lets Encrypt DNS challenge with Digital Ocean

As you can see here, I just need to add the value DO_AUTH_TOKEN, which is an API key. I went to the Digital Ocean console, and navigated to the API panel, and added a new “Personal Access Token”, like this:

Screen capture of part of the Digital Ocean console showing the personal access token, showing I needed "read" and "write" capabilities.

Notice that the API key needed to provide both “Read” and “Write” capabilities, and has been given a name so I can clearly see it’s purpose.

Changing the traefik `docker-compose.yaml` file

In /opt/docker/traefik/docker-compose.yaml we need to add that new environment variable; DO_AUTH_TOKEN, like this:

services:
  traefik:
# ...... The rest of the config for this service
    environment:
      DO_AUTH_TOKEN: dop_v1_decafbad1234567890abcdef....1234567890
# ...... and the rest of the config

Changing the `traefik.yaml` file

In /opt/docker/traefik/traefik.yaml we need to tell it to use Let’s Encrypt. Add this block to the end of the file:

certificatesResolvers:
  letsencrypt:
    acme:
      email: yourname@example.org
      storage: acme.json
      dnsChallenge:
        provider: digitalocean
        delayBeforeCheck: 1 # Minutes
        resolvers:
          - "1.1.1.1:53"
          - "8.8.8.8:53"

Obviously change the email address to a valid one for you! I hit a few issues with the value specified in the documentation for delayBeforeCheck, as their value of “0” wasn’t long enough for the DNS value to be propogated around the network – 1 minute is enough though!

I also had to add the resolvers, as my local network has a caching DNS server, so I’d never have seen the updates! You may be able to remove both those values from your files.

Now you’ve made all the changes to the Traefik service, restart it with docker compose down ; docker compose up -d

Changing the services to use Lets Encrypt

We need to add one final label to the /opt/docker/phpipam/docker-compose.yaml file, which is this one:

services:
  web:
# ...... The rest of the config for this service
    labels:
      - traefik.http.routers.phpipam.tls.certresolver=letsencrypt
# ...... and the rest of the config

Also, update your .rule=Host(`hostname`) to use the actual DNS name you want to be able to use, then restart the docker container.

phpIPAM doesn’t like trusting proxies, unless explicitly told to, so I also had add an environment variable IPAM_TRUST_X_FORWARDED=true to the /opt/docker/phpipam/docker-compose.yaml file too, because phpIPAM tried to write the HTTP scheme for any links which came up, based on what protocol it thought it was running – not what the proxy was telling it it was being accessed as!

Debugging any issues

If you have it all setup as per the above, and it isn’t working, go into /opt/docker/traefik/traefik.yaml and change the stanza which says log: {} to:

log:
  level: DEBUG

Be aware though, this adds a LOT to your logs! (But you won’t see why your ACME requests have failed without it). Change it back to log: {} once you have it working again.

Adding your next service

I now want to add that second service to my home network – WordPress. Here’s /opt/docker/wordpress/docker-compose.yaml for that service;

version: '3.7'

networks:
  web:
    name: web 
    external: true
    attachable: true
  wordpress:
    name: wordpress

services:
  php:
    image: wordpress:latest
    expose:
      - 80
    environment:
      - WORDPRESS_DB_HOST=mariadb
      - WORDPRESS_DB_USER=db_user
      - WORDPRESS_DB_PASSWORD=db_pass
      - WORDPRESS_DB_NAME=wordpress
    volumes:
      - wordpress:/var/www/html
    labels:
      - traefik.enable=true
      - traefik.http.routers.wordpress.rule=Host(`wp.homenet.sprig.gs`)
      - traefik.http.routers.wordpress.tls.certresolver=letsencrypt
    depends_on:
      - mariadb
    networks:
      - wordpress
      - web 

  mariadb:
    image: mariadb:10.3
    environment:
      MYSQL_ROOT_PASSWORD: True
      MYSQL_USER: db_user
      MYSQL_PASSWORD: db_pass
      MYSQL_DATABASE: wordpress
    volumes:
      - db:/var/lib/mysql
    networks:
      - wordpress

volumes:
  wordpress:
  db:

And then we start it up;

alpine-docker:/opt/docker/wordpress# docker compose up -d
[+] Running 3/3
 ✔ Network wordpress              Created   0.2s 
 ✔ Container wordpress-mariadb-1  Started   3.0s 
 ✔ Container wordpress-php-1      Started   3.8s 
alpine-docker:/opt/docker/wordpress#

Tada!

One final comment – I never did work out how to make connections forceably upgrade from HTTP to HTTPS, so instead, I shut down port 80 in Traefik, and instead run this container.

Featured image is “Traffic” by “Make Lemons” on Flickr and is released under a CC-BY-SA license.

A text dialogue from a web page showing "Uh oh. Something really just went wrong. Good thing we know about it and have our crack team of squirrels getting their nuts out of the system!"

How to capture stdout and stderr from a command in a shellscript without preventing piped processes from seeing them

2023-07-272023-07-27 JonTheNiceGuy Leave a comment

I love the tee command – it captures stdout [1] and puts it in a file, while then returning that output to stdout for the next process in a pipe to consume, for example:

$ ls -l | tee /tmp/output
total 1
xrwxrwxrw 1 jonspriggs jonspriggs 0 Jul 27 11:16 build.sh
$ cat /tmp/output
total 1
xrwxrwxrw 1 jonspriggs jonspriggs 0 Jul 27 11:16 build.sh

But wait, why is that useful? Well, in a script, you don’t always want to see the content scrolling past, but in the case of a problem, you might need to catch up with the logs afterwards. Alternatively, you might do something like this:

if some_process | tee /tmp/output | grep -q "some text"
then
  echo "Found 'some text' - full output:"
  cat /tmp/output
fi

This works great for stdout but what about stderr [2]? In this case you could just do:

some_process 2>&1 | tee /tmp/output

But that mashes all of stdout and stderr into the same blob.

In my case, I want to capture all the output (stdout and stderr) of a given process into a file. Only stdout is forwarded to the next process, but I still wanted to have the option to see stderr as well during processing. Enter process substitution.

TEMP_DATA_PATH="$(mktemp -d)"
capture_out() {
  base="${TEMP_DATA_PATH}/${1}"
  mkdir "${base}"
  shift
  "$@" 2> >(tee "${base}/stderr" >&2) 1> >(tee "${base}/stdout")
}

With this, I run capture_out step-1 do_a_thing and then in /tmp/tmp.sometext/step-1/stdout and /tmp/tmp.sometext/step-1/stderr are the full outputs I need… but wait, I can also do:

$ capture_out step-1 do_a_thing | \
  capture_out step-2 process --the --thing && \
  capture_out step-3 echo "..." | capture_out step-4 profit
$ find /tmp/tmp.sometext -type f
/tmp/tmp.sometext/step-1/stdout
/tmp/tmp.sometext/step-1/stderr
/tmp/tmp.sometext/step-2/stdout
/tmp/tmp.sometext/step-2/stderr
/tmp/tmp.sometext/step-4/stdout
/tmp/tmp.sometext/step-4/stderr
/tmp/tmp.sometext/step-3/stderr
/tmp/tmp.sometext/step-3/stdout

if capture_out has_an_error something-wrong | capture_out handler check_output
then
  echo "It all went great"
else
  echo "Process failure"
  echo "--Initial process"
  # Use wc -c to check the number of characters in the file
  if [ -e "${TEMP_DATA_PATH}/has_an_error/stdout"] && [ 0 -ne "$(wc -c "${TEMP_DATA_PATH}/has_an_error/stdout")" ]
  then
    echo "----stdout:"
    cat "${TEMP_DATA_PATH}/has_an_error/stdout"
  fi
  if [ -e "${TEMP_DATA_PATH}/has_an_error/stderr"] && [ 0 -ne "$(wc -c "${TEMP_DATA_PATH}/has_an_error/stderr")" ]
  then
    echo "----stderr:"
    cat "${TEMP_DATA_PATH}/has_an_error/stderr"
  fi
  echo "--Second stage"
  if [ -e "${TEMP_DATA_PATH}/handler/stdout"] && [ 0 -ne "$(wc -c "${TEMP_DATA_PATH}/handler/stdout")" ]
  then
    echo "----stdout:"
    cat "${TEMP_DATA_PATH}/handler/stdout"
  fi
  if [ -e "${TEMP_DATA_PATH}/handler/stderr"] && [ 0 -ne "$(wc -c "${TEMP_DATA_PATH}/handler/stderr")" ]
  then
    echo "----stderr:"
    cat "${TEMP_DATA_PATH}/handler/stderr"
  fi
fi

This has become part of my normal toolkit now for logging processes. Thanks bash!

Also, thanks to ChatGPT for helping me find this structure that I’d seen before, but couldn’t remember how to do it! (it almost got it right too! Remember kids, don’t *trust* what ChatGPT gives you, use it as a research starting point, test *that* against your own knowledge, test *that* against your environment and test *that* against expected error cases too! Copy & Paste is not the best idea with AI generated code!)

Footnotes

[1] stdout is the name of the normal output text we see in a shell, it’s also sometimes referred to as “file descriptor 1” or “fd1”. You can also output to &1 with >&1 which means “send to fd1”

[2] stderr is the name of the output in a shell when an error occurs. It isn’t caught by things like some_process > /dev/null which makes it useful when you don’t want to see output, just errors. Like stdout, it’s also referred to as “file descriptor 2” or “fd2” and you can output to &2 with >&2 if you want to send stdout to stderr.

Featured image is “WordPress Error” by “tara hunt” on Flickr and is released under a CC-BY-SA license.

A photo of a door with the focus on the handle which has a lock in the centre of the knob. The lock has a key in it, with a bunch of keys dangling from the central ring.

Using direnv with terraform, terragrunt, saml2aws, SOPS and AWS KMS

2023-06-262023-07-03 JonTheNiceGuy Leave a comment

In my current project I am often working with Infrastructure as Code (IoC) in the form of Terraform and Terragrunt files. Before I joined the team a decision was made to use SOPS from Mozilla, and this is encrypted with an AWS KMS key. You can only access specific roles using the SAML2AWS credentials, and I won’t be explaining how to set that part up, as that is highly dependant on your SAML provider.

While much of our environment uses AWS, we do have a small presence hosted on-prem, using a hypervisor service. I’ll demonstrate this with Proxmox, as this is something that I also use personally :)

Firstly, make sure you have all of the above tools installed! For one stage, you’ll also require yq to be installed. Ensure you’ve got your shell hook setup for direnv as we’ll need this later too.

Late edit 2023-07-03: There was a bug in v0.22.0 of the terraform which didn’t recognise the environment variables prefixed PROXMOX_VE_ – a workaround by using TF_VAR_PROXMOX_VE and a variable "PROXMOX_VE_" {} block in the Terraform code was put in place for the inital publication of this post. The bug was fixed in 0.23.0 which this post now uses instead, and so as a result the use of TF_VAR_ prefixed variables was removed too.

Set up AWS Vault

AWS KMS

AWS Key Management Service (KMS) is a service which generates and makes available encryption keys, backed by the AWS service. There are *lots* of ways to cut that particular cake, but let’s do this a quick and easy way… terraform

variable "name" {
  default = "SOPS"
  type    = string
}
resource "aws_kms_key" "this" {
  tags                     = {
    Name : var.name,
    Owner : "Admins"
  }
  key_usage                = "ENCRYPT_DECRYPT"
  customer_master_key_spec = "SYMMETRIC_DEFAULT"
  deletion_window_in_days  = 30
  is_enabled               = true
  enable_key_rotation      = false
  policy                   = <<EOF
{
  "Version": "2012-10-17",
  "Id": "key-default-1",
  "Statement": [
    {
      "Sid": "Root Access",
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::${get_aws_account_id()}:root"
      },
      "Action": "kms:*",
      "Resource": "*"
    },
    {
      "Sid": "Estate Admin Access",
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::${get_aws_account_id()}:role/estateadmins"
      },
      "Action": [
        "kms:Describe*",
        "kms:List*",
        "kms:Get*",
        "kms:Encrypt*"
      ],
      "Resource": "*"
    }
  ]
}
EOF
}

resource "aws_kms_alias" "this" {
  target_key_id = aws_kms_key.this.key_id
  name          = "alias/${var.name}"
}

output "key" {
  value = aws_kms_alias.this.arn
}

After running this, let’s assume that we get an output for the “key” value of:

arn:aws:kms:us-east-1:123456789012:alias/main

Setup Sops

In your terragrunt tree, create a file called .sops.yaml, which contains:

---
creation_rules:
  - kms: arn:aws:kms:us-east-1:123456789012:alias/main

And a file called secrets.enc.yaml which contains:

---
PROXMOX_VE_USERNAME: root@pam
PROXMOX_VE_PASSWORD: deadb33f@2023

Test that your KMS works by assuming your IAM role via SAML2AWS like this:

$ saml2aws login --skip-prompt --quiet
$ saml2aws exec -- sops --verbose --encrypt --in-place secrets.enc.yaml
[AWSKMS]	 INFO[0000] Encryption succeeded                          arn="arn:aws:kms:us-east-1:123456789012:alias/main"
[CMD]		 INFO[0000] File written successfully

Setup direnv

Outside your tree, in ~/.config/direnv/lib create a file called use_sops.sh (does not need to be chmod +x or chmod 755!) containing this:

# Based on https://github.com/direnv/direnv/wiki/Sops
use_sops() {
    local path=${1:-$PWD/secrets.enc.yaml}
    if [ -e "$path" ]
    then
        if grep -q -E '^sops:' "$path"
        then
            eval "$(sops --decrypt --output-type dotenv "$path" 2>/dev/null | direnv dotenv bash /dev/stdin || false)"
        else
            if [ -n "$(command -v yq)" ]
            then
                eval "$(yq eval --output-format props "$path" | direnv dotenv bash /dev/stdin)"
                export SOPS_WARNING="unencrypted $path"
            fi
        fi
    fi
    watch_file "$path"
}

There are two key lines here, the first of which is:

eval "$(sops -d --output-type dotenv "$path" 2>/dev/null | direnv dotenv bash /dev/stdin || false)"

This line asks sops to decrypt the secrets file, using the “dotenv” output type, however, the dotenv format looks like this:

some_key = "some value"

So, as a result, we then pass that value to direnv and ask it to rewrite it in the format it expects, which looks like this:

export some_key="some value"

The second key line is this:

eval "$(yq eval --output-format props "$path" | direnv dotenv bash /dev/stdin)"

This asks yq to parse the secrets file, using the “props” formatter, which results in lines just like the dotenv output we saw above.

However, because we used yq to parse the file, it means that we know this file isn’t encrypted, so we also add an extra export value:

export SOPS_WARNING="unencrypted $path"

This can be picked up as part of your shell prompt to put a warning in! Anyway… let’s move on.

Now that you have your reusable library file, we now configure the direnv file, .envrc for the root of your proxmox cluster:

use sops

Oh, ok, that was simple. You can add several files here if you wish, like this:

use sops file1.enc.yaml
use sops file2.enc.yml
use sops ~/.core_sops

But, we don’t need that right now!

Open your shell in that window, and you’ll get this warning:

direnv: error /path/to/demo/.envrc is blocked. Run `direnv allow` to approve its content

So, let’s do that!

$ direnv allow
direnv: loading /path/to/demo/.envrc
direnv: using sops
direnv: export +PROXMOX_VE_USERNAME +PROXMOX_VE_PASSWORD
$

So far, so good… but wait, you’ve authenticated to your SAML access to AWS. Let’s close that shell, and go back in again

$ cd /path/to/demo
direnv: loading /path/to/demo/.envrc
direnv: using sops
$

Ah, now we don’t have our values exported. That’s what we wanted!

What now?!

Configuring the details of the proxmox cluster

We have our .envrc file which provides our credentials (let’s pretend we’re using a shared set of credentials across all the boxes), but now we need to setup access to each of the boxes.

Let’s make our two cluster directories;

mkdir cluster_01
mkdir cluster_02

And in each of these clusters, we need to put an .envrc file with the right IP address in. This needs to check up the tree for any credentials we may have already loaded:

source_env "$(find_up ../.envrc)"
export PROXMOX_VE_ENDPOINT="https://192.0.2.1:8006" # Documentation IP address for the first cluster - change for the second cluster.

The first line works up the tree, looking for a parent .envrc file to inject, and then, with the second line, adds the Proxmox API endpoint to the end of that chain. When we run direnv allow (having logged back into our saml2aws session), we get this:

$ direnv allow
direnv: loading /path/to/demo/cluster_01/.envrc
direnv: loading /path/to/demo/.envrc
direnv: using sops
direnv: export +PROXMOX_VE_ENDPOINT +PROXMOX_VE_USERNAME +PROXMOX_VE_PASSWORD
$

Great, now we can setup the connection to the cluster in the terragrunt file!

Set up Terragrunt

In /path/to/demo/terragrunt.hcl put this:

remote_state {
  backend = "s3"
  config  = {
    encrypt                = true
    bucket                 = "example-inc-terraform-state"
    key                    = "${path_relative_to_include()}/terraform.tfstate"
    region                 = "us-east-1"
    dynamodb_table         = "example-inc-terraform-state-lock"
    skip_bucket_versioning = false
  }
}
generate "providers" {
  path      = "providers.tf"
  if_exists = "overwrite"
  contents  = <<EOF
terraform {
  required_providers {
    proxmox = {
      source = "bpg/proxmox"
      version = "0.23.0"
    }
  }
}

provider "proxmox" {
  insecure = true
}
EOF
}

Then in the cluster_01 directory, create a directory for the code you want to run (e.g. create a VLAN might be called “VLANs/30/“) and put in it this terragrunt.hcl

terraform {
  source = "${get_terragrunt_dir()}/../../../terraform-module-network//vlan"
  # source = "git@github.com:YourProject/terraform-module-network//vlan?ref=production"
}

include {
  path = find_in_parent_folders()
}

inputs = {
  vlan_tag    = 30
  description = "VLAN30"
}

This assumes you have a terraform directory called terraform-module-network/vlan in a particular place in your tree or even better, a module in your git repo, which uses the input values you’ve provided.

That double slash in the source line isn’t a typo either – this is the point in that tree that Terragrunt will copy into the directory to run terraform from too.

A quick note about includes and provider blocks

The other key thing is that the “include” block loads the values from the first matching terragrunt.hcl file in the parent directories, which in this case is the one which defined the providers block. You can’t include multiple different parent files, and you can’t have multiple generate blocks either.

Running it all together!

Now we have all our depending files, let’s run it!

user@host:~$ cd test
direnv: loading ~/test/.envrc
direnv: using sops
user@host:~/test$ saml2aws login --skip-prompt --quiet ; saml2aws exec -- bash
direnv: loading ~/test/.envrc
direnv: using sops
direnv: export +PROXMOX_VE_USERNAME +PROXMOX_VE_PASSWORD
user@host:~/test$ cd cluster_01/VLANs/30
direnv: loading ~/test/cluster_01/.envrc
direnv: loading ~/test/.envrc
direnv: using sops
direnv: export +PROXMOX_VE_ENDPOINT +PROXMOX_VE_USERNAME +PROXMOX_VE_PASSWORD
user@host:~/test/cluster_01/VLANs/30$ terragrunt apply
data.proxmox_virtual_environment_nodes.available_nodes: Reading...
data.proxmox_virtual_environment_nodes.available_nodes: Read complete after 0s [id=nodes]

Terraform used the selected providers to generate the following execution
plan. Resource actions are indicated with the following symbols:
  + create

Terraform will perform the following actions:

  # proxmox_virtual_environment_network_linux_bridge.this[0] will be created
  + resource "proxmox_virtual_environment_network_linux_bridge" "this" {
      + autostart  = true
      + comment    = "VLAN30"
      + id         = (known after apply)
      + mtu        = (known after apply)
      + name       = "vmbr30"
      + node_name  = "proxmox01"
      + ports      = [
          + "enp3s0.30",
        ]
      + vlan_aware = (known after apply)
    }

Plan: 1 to add, 0 to change, 0 to destroy.

Do you want to perform these actions?
  Terraform will perform the actions described above.
  Only 'yes' will be accepted to approve.

  Enter a value: yes
proxmox_virtual_environment_network_linux_bridge.this[0]: Creating...
proxmox_virtual_environment_network_linux_bridge.this[0]: Creation complete after 2s [id=proxmox01:vmbr30]
user@host:~/test/cluster_01/VLANs/30$

Winning!!

Featured image is “2018/365/1 Home is Where The Key Fits” by “Alan Levine” on Flickr and is released under a CC-0 license.

Share this:

The scenario

Share this:

Incorrect routing for the 12th IP address onwards

What happens?

Fix the issue

Is this still needed?

Dropping packets to the containers due to Martians

What happens

Fix the issue

Is this still needed?

After 12 IP addresses are assigned to a node, Kubernetes services stop working for some pods

What happens

Fix the issue

Is this still needed?

Share this:

Share this:

Share this:

Share this:

Share this:

The existing setup for one service

Traefik Setup

The (“dynamic”) configuration file

The Docker-Compose File

Adding Traefik to phpIPAM

Checking to make sure it’s working

Better TLS with Lets Encrypt

Changing the traefik docker-compose.yaml file

Changing the traefik.yaml file

Changing the services to use Lets Encrypt

Debugging any issues

Adding your next service

Share this:

Footnotes

Share this:

Set up AWS Vault

AWS KMS

Setup Sops

Setup direnv

Configuring the details of the proxmox cluster

Set up Terragrunt

A quick note about includes and provider blocks

Running it all together!

Share this:

Changing the traefik `docker-compose.yaml` file

Changing the `traefik.yaml` file