"Apoptosis Network (alternate)" by "Simon Cockell" on Flickr

A few weird issues in the networking on our custom AWS EKS Workers, and how we worked around them

For “reasons”, at work we run AWS Elastic Kubernetes Service (EKS) with our own custom-built workers. These workers are based on Alma Linux 9, instead of AWS’ preferred Amazon Linux 2023. We manage the deployment of these workers using AWS Auto-Scaling Groups.

Our unusal configuration of these nodes mean that we sometimes trip over configurations which are tricky to get support on from AWS (no criticism of their support team, if I was in their position, I wouldn’t want to try to provide support for a customer’s configuration that was so far outside the recommended configuration either!)

Over the past year, we’ve upgraded EKS1.23 to EKS1.27 and then on to EKS1.31, and we’ve stumbled over a few issues on the way. Here are a couple of notes on the subject, in case they help anyone else in their journey.

All three of the issues below were addressed by running an additional service on the worker nodes in a Systemd timed service which triggers every minute.

Incorrect routing for the 12th IP address onwards

Something the team found really early on (around EKS 1.18 or somewhere around there) was that the AWS VPC-CNI wasn’t managing the routing tables on the node properly. We raised an issue on the AWS VPC CNI (we were on CentOS 7 at the time) and although AWS said they’d fixed the issue, we currently need to patch the routing tables every minute on our nodes.

What happens?

When you get past the number of IP addresses that a single ENI can have (typically ~12), the AWS VPC-CNI will attach a second interface to the worker, and start adding new IP addresses to that. The VPC-CNI should setup routing for that second interface, but for some reason, in our case, it doesn’t. You can see this happens because the traffic will come in on the second ENI, eth1, but then try to exit the node on the first ENI, eth0, with a tcpdump, like this:

[root@test-i-01234567890abcdef ~]# tcpdump -i any host 192.0.2.123
tcpdump: data link type LINUX_SLL2
dropped privs to tcpdump
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes
09:38:07.331619 eth1  In  IP ip-192-168-1-100.eu-west-1.compute.internal.41856 > ip-192-0-2-123.eu-west-1.compute.internal.irdmi: Flags [S], seq 1128657991, win 64240, options [mss 1359,sackOK,TS val 2780916192 ecr 0,nop,wscale 7], length 0
09:38:07.331676 eni989c4ec4a56 Out IP ip-192-168-1-100.eu-west-1.compute.internal.41856 > ip-192-0-2-123.eu-west-1.compute.internal.irdmi: Flags [S], seq 1128657991, win 64240, options [mss 1359,sackOK,TS val 2780916192 ecr 0,nop,wscale 7], length 0
09:38:07.331696 eni989c4ec4a56 In  IP ip-192-0-2-123.eu-west-1.compute.internal.irdmi > ip-192-168-1-100.eu-west-1.compute.internal.41856: Flags [S.], seq 3367907264, ack 1128657992, win 26847, options [mss 8961,sackOK,TS val 1259768406 ecr 2780916192,nop,wscale 7], length 0
09:38:07.331702 eth0  Out IP ip-192-0-2-123.eu-west-1.compute.internal.irdmi > ip-192-168-1-100.eu-west-1.compute.internal.41856: Flags [S.], seq 3367907264, ack 1128657992, win 26847, options [mss 8961,sackOK,TS val 1259768406 ecr 2780916192,nop,wscale 7], length 0

The critical line here is the last one – it’s come in on eth1 and it’s going out of eth0. Another test here is to look at ip rule

[root@test-i-01234567890abcdef ~]# ip rule
0:	from all lookup local
512:	from all to 192.0.2.111 lookup main
512:	from all to 192.0.2.143 lookup main
512:	from all to 192.0.2.66 lookup main
512:	from all to 192.0.2.113 lookup main
512:	from all to 192.0.2.145 lookup main
512:	from all to 192.0.2.123 lookup main
512:	from all to 192.0.2.5 lookup main
512:	from all to 192.0.2.158 lookup main
512:	from all to 192.0.2.100 lookup main
512:	from all to 192.0.2.69 lookup main
512:	from all to 192.0.2.129 lookup main
1024:	from all fwmark 0x80/0x80 lookup main
1536:	from 192.0.2.123 lookup 2
32766:	from all lookup main
32767:	from all lookup default

Notice here that we have two entries from all to 192.0.2.123 lookup main and from 192.0.2.123 lookup 2. Let’s take a look at what lookup 2 gives us, in the routing table

[root@test-i-01234567890abcdef ~]# ip route show table 2
192.0.2.1 dev eth1 scope link

Fix the issue

This is pretty easy – we need to add a default route if one doesn’t already exist. Long before I got here, my boss created a script which first runs ip route show table main | grep default to get the gateway for that interface, then runs ip rule list, looks for each lookup <number> and finally runs ip route add to put the default route on that table, the same as on the main table.

ip route add default via "${GW}" dev "${INTERFACE}" table "${TABLE}"

Is this still needed?

I know when we upgraded our cluster from EKS1.23 to EKS1.27, this script was still needed. When I’ve just checked a worker running EKS1.31, after around 12 hours of running, and a second interface being up, it’s not been needed… so perhaps we can deprecate this script?

Dropping packets to the containers due to Martians

When we upgraded our cluster from EKS1.23 to EKS1.27 we also changed a lot of the infrastructure under the surface (AlmaLinux 9 from CentOS7, Containerd and Runc from Docker, CGroups v2 from CGroups v1, and so on). We also moved from using an AWS Elastic Load Balancer (ELB) or “Classic Load Balancer” to AWS Network Load Balancer (NLB).

Following the upgrade, we started seeing packets not arriving at our containers and the system logs on the node were showing a lot of martian source messages, particularly after we configured our NLB to forward original IP source addresses to the nodes.

What happens

One thing we noticed was that each time we added a new pod to the cluster, it added a new eni[0-9a-f]{11} interface, but the sysctl value for net.ipv4.conf.<interface>.rp_filter (return path filtering – basically, should we expect the traffic to be arriving at this interface for that source?) in sysctl was set to 1 or “Strict mode” where the source MUST be the coming from the best return path for the interface it arrived on. The AWS VPC-CNI is supposed to set this to 2 or “Loose mode” where the source must be reachable from any interface.

In this case you’d tell this because you’d see this in your system journal (assuming you’ve got net.ipv4.conf.all.log_martians=1 configured):

Dec 03 10:01:19 test-i-01234567890abcdef kernel: IPv4: martian source 192.168.1.100 from 192.0.2.123, on dev eth1

The net result is that packets would be dropped by the host at this point, and they’d never be received by the containers in the pods.

Fix the issue

This one is also pretty easy. We run sysctl -a and loop through any entries which match net.ipv4.conf.([^\.]+).rp_filter = (0|1) and then, if we find any, we run sysctl -w net.ipv4.conf.\1.rp_filter = 2 to set it to the correct value.

Is this still needed?

Yep, absolutely. As of our latest upgrade to EKS1.31, if this value isn’t set, then it will drop packets. VPC-CNI should be fixing this, but for some reason it doesn’t. And setting the conf.ipv4.all.rp_filter to 2 doesn’t seem to make a difference, which is contrary to the documentation in the relevant Kernel documentation.

After 12 IP addresses are assigned to a node, Kubernetes services stop working for some pods

This was pretty weird. When we upgraded to EKS1.31 on our smallest cluster we initially thought we had an issue with CoreDNS, in that it sometimes wouldn’t resolve IP addresses for services (DNS names for services inside the cluster are resolved by <servicename>.<namespace>.svc.cluster.local to an internal IP address for the cluster – in our case, in the range 172.20.0.0/16). We upgraded CoreDNS to the EKS1.31 recommended version, v1.11.3-eksbuild.2 and that seemed to fix things… until we upgraded our next largest cluster, and things REALLY went wrong, but only when we had increased to over 12 IP addresses assigned to the node.

You might see this as frequent restarts of a container, particularly if you’re reliant on another service to fulfil an init container or the liveness/readyness check.

What happens

EKS1.31 moves KubeProxy from iptables or ipvs mode to nftables – a shift we had to make internally as AlmaLinux 9 no longer supports iptables mode, and ipvs is often quite flaky, especially when you have a lot of pod movements.

With a single interface and up to 11 IP addresses assigned to that interface, everything runs fine, but the moment we move to that second interface, much like in the first case above, we start seeing those pods attached to the second+ interface being unable to resolve service addresses. On further investigation, doing a dig from a container inside that pod to the service address of the CoreDNS service 172.20.0.10 would timeout, but a dig against the actual pod address 192.0.2.53 would return a valid response.

Under the surface, on each worker, KubeProxy adds a rule to nftables to say “if you try and reach 172.20.0.10, please instead direct it to 192.0.2.53”. As the containers fluctuate inside the cluster, KubeProxy is constantly re-writing these rules. For whatever reason though, KubeProxy currently seems unable to determine that a second or subsequent interface has been added, and so these rules are not applied to the pods attached to that interface…. or at least, that’s what it looks like!

Fix the issue

In this case, we wrote a separate script which was also triggered every minute. This script looks to see if the interfaces have changed by running ip link and looking for any interfaces called eth[0-9]+ which have changed, and then if it has, it runs crictl pods (which lists all the running pods in Containerd), looks for the Pod ID of KubeProxy, and then runs crictl stopp <podID> [1] and crictl rmp <podID> [1] to stop and remove the pod, forcing kubelet to restart the KubeProxy on the node.

[1] Yes, they aren’t typos, stopp means “stop the pod” and rmp means “remove the pod”, and these are different to stop and rm which relate to the container.

Is this still needed?

As this was what I was working on all-day yesterday, yep, I’d say so 😊 – in all seriousness though, if this hadn’t been a high-priority issue on the cluster, I might have tried to upgrade the AWS VPC-CNI and KubeProxy add-ons to a later version, to see if the issue was resolved, but at this time, we haven’t done that, so maybe I’ll issue a retraction later 😂

Featured image is “Apoptosis Network (alternate)” by “Simon Cockell” on Flickr and is released under a CC-BY license.

"Milestone, Otley" by "Tim Green" on Flickr

Changing the default routing metric with Netplan, NetworkManager and ifupdown

In the past few months I’ve been working on a project, and I’ve been doing the bulk of that work using Vagrant.

By default and convention, all Vagrant machines, set up using Virtualbox have a “NAT” interface defined as the first network interface, but I like to configure a second interface as a “Bridged” interface which gives the host a “Real” IP address on the network as this means that any security appliances I have on my network can see what device is causing what traffic, and I can quickly identify which hosts are misbehaving.

By default, Virtualbox uses the network 10.0.2.0/24 for the NAT interface, and runs a DHCP server for that interface. In the past, I’ve removed the default route which uses 10.0.2.2 (the IP address of the NAT interface on the host device), but with Ubuntu 20.04, this route keeps being re-injected, so I had to come up with a solution.

Fixing Netplan

Ubuntu, in at least 20.04, but (according to Wikipedia) probably since 17.10, has used Netplan to define network interfaces, superseding the earlier ifupdown package (which uses /etc/network/interfaces and /etc/network/interface.d/* files to define the network). Netplan is a kind of meta-script which, instructs systemd or NetworkManager to reconfigure the network interfaces, and so making the configuration changes here seemed most sensible.

Vagrant configures the file /etc/netplan/50-cloud-init.yml with a network configuration to support this DHCP interface, and then applies it. To fix it, we need to rewrite this file completely.

#!/bin/bash

# Find details about the interface
ifname="$(grep -A1 ethernets "/etc/netplan/50-cloud-init.yaml" | tail -n1 | sed -Ee 's/[ ]*//' | cut -d: -f1)"
match="$(grep macaddress "/etc/netplan/50-cloud-init.yaml" | sed -Ee 's/[ ]*//' | cut -d\  -f2)"

# Configure the netplan file
{
  echo "network:"
  echo "  ethernets:"
  echo "    ${ifname}:"
  echo "      dhcp4: true"
  echo "      dhcp4-overrides:"
  echo "        route-metric: 250"
  echo "      match:"
  echo "        macaddress: ${match}"
  echo "      set-name: ${ifname}"
  echo "  version: 2"
} >/etc/netplan/50-cloud-init.yaml

# Apply the config
netplan apply

When I then came to a box running Fedora, I had a similar issue, except now I don’t have NetPlan to work with? How do I resolve this one?!

Actually, this is a four line script!

#!/bin/bash

# Get the name of the interface which has the IP address 10.0.2.2
netname="$(ip route | grep 10.0.2.2 | head -n 1 | sed -Ee 's/^(.*dev )(.*)$/\2/;s/proto [A-Za-z0-9]+//;s/metric [0-9]+//;s/[ \t]+$//')"

# Ask NetworkManager for a list of all the active connections, look for the string "eth0" and then just get the connection name.
nm="$(nmcli connection show --active | grep "${netname}" | sed -Ee 's/^(.*)([ \t][-0-9a-f]{36})(.*)$/\1/;s/[\t ]+$//g')"
# Set the network to have a metric of 250
nmcli connection modify "$nm" ipv4.route-metric 250
# And then re-apply the network config
nmcli connection up "$nm"

The last major interface management tool I’ve experienced on standard server Linux is “ifupdown” – /etc/network/interfaces. This is mostly used on Debian. How do we fix that one? Well, that’s a bit more tricky!

#!/bin/bash

# Get the name of the interface with the IP address 10.0.2.2
netname="$(ip route | grep 10.0.2.2 | head -n 1 | sed -Ee 's/^(.*dev )(.*)$/\2/;s/proto [A-Za-z0-9]+//;s/metric [0-9]+//;s/[ \t]+$//')"

# Create a new /etc/network/interfaces file which just looks in "interfaces.d"
echo "source /etc/network/interfaces.d/*" > /etc/network/interfaces

# Create the loopback interface file
{
  echo "auto lo"
  echo "iface lo inet loopback"
} > "/etc/network/interfaces.d/lo"
# Bounce the interface
ifdown lo ; ifup lo

# Create the first "real" interface file
{
  echo "allow-hotplug ${netname}"
  echo "iface ${netname} inet dhcp"
  echo "  metric 1000"
} > "/etc/network/interfaces.d/${netname}"
# Bounce the interface
ifdown "${netname}" ; ifup "${netname}"

# Loop through the rest of the interfaces
ip link | grep UP | grep -v lo | grep -v "${netname}" | cut -d: -f2 | sed -Ee 's/[ \t]+([A-Za-z0-9.]+)[ \t]*/\1/' | while IFS= read -r int
do
  # Create the interface file for this interface, assuming DHCP
  {
    echo "allow-hotplug ${int}"
    echo "iface ${int} inet dhcp"
  } > "/etc/network/interfaces.d/${int}"
  # Bounce the interface
  ifdown "${int}" ; ifup "${int}"
done

Looking for one consistent script which does this all?

#!/bin/bash
# This script ensures that the metric of the first "NAT" interface is set to 1000,
# while resetting the rest of the interfaces to "whatever" the DHCP server offers.

function netname() {
  ip route | grep 10.0.2.2 | head -n 1 | sed -Ee 's/^(.*dev )(.*)$/\2/;s/proto [A-Za-z0-9]+//;s/metric [0-9]+//;s/[ \t]+$//'
}

if command -v netplan
then
  ################################################
  # NETPLAN
  ################################################

  # Find details about the interface
  ifname="$(grep -A1 ethernets "/etc/netplan/50-cloud-init.yaml" | tail -n1 | sed -Ee 's/[ ]*//' | cut -d: -f1)"
  match="$(grep macaddress "/etc/netplan/50-cloud-init.yaml" | sed -Ee 's/[ ]*//' | cut -d\  -f2)"

  # Configure the netplan file
  {
    echo "network:"
    echo "  ethernets:"
    echo "    ${ifname}:"
    echo "      dhcp4: true"
    echo "      dhcp4-overrides:"
    echo "        route-metric: 1000"
    echo "      match:"
    echo "        macaddress: ${match}"
    echo "      set-name: ${ifname}"
    echo "  version: 2"
  } >/etc/netplan/50-cloud-init.yaml

  # Apply the config
  netplan apply
elif command -v nmcli
then
  ################################################
  # NETWORKMANAGER
  ################################################

  # Ask NetworkManager for a list of all the active connections, look for the string "eth0" and then just get the connection name.
  nm="$(nmcli connection show --active | grep "$(netname)" | sed -Ee 's/^(.*)([ \t][-0-9a-f]{36})(.*)$/\1/;s/[\t ]+$//g')"
  # Set the network to have a metric of 250
  nmcli connection modify "$nm" ipv4.route-metric 1000
  nmcli connection modify "$nm" ipv6.route-metric 1000
  # And then re-apply the network config
  nmcli connection up "$nm"
elif command -v ifup
then
  ################################################
  # IFUPDOWN
  ################################################

  # Get the name of the interface with the IP address 10.0.2.2
  netname="$(netname)"
  # Create a new /etc/network/interfaces file which just looks in "interfaces.d"
  echo "source /etc/network/interfaces.d/*" > /etc/network/interfaces
  # Create the loopback interface file
  {
    echo "auto lo"
    echo "iface lo inet loopback"
  } > "/etc/network/interfaces.d/lo"
  # Bounce the interface
  ifdown lo ; ifup lo
  # Create the first "real" interface file
  {
    echo "allow-hotplug ${netname}"
    echo "iface ${netname} inet dhcp"
    echo "  metric 1000"
  } > "/etc/network/interfaces.d/${netname}"
  # Bounce the interface
  ifdown "${netname}" ; ifup "${netname}"
  # Loop through the rest of the interfaces
  ip link | grep UP | grep -v lo | grep -v "${netname}" | cut -d: -f2 | sed -Ee 's/[ \t]+([A-Za-z0-9.]+)[ \t]*/\1/' | while IFS= read -r int
  do
    # Create the interface file for this interface, assuming DHCP
    {
      echo "allow-hotplug ${int}"
      echo "iface ${int} inet dhcp"
    } > "/etc/network/interfaces.d/${int}"
    # Bounce the interface
    ifdown "${int}" ; ifup "${int}"
  done
fi

Featured image is “Milestone, Otley” by “Tim Green” on Flickr and is released under a CC-BY license.

"Router" by "Ryan Hodnett" on Flickr

Post-Config of a RaspberryPi Zero W as an OTG-USB Gadget that routes

In my last post in this series I mentioned that I’d got my Raspberry Pi Zero W to act as a USB Ethernet adaptor via libComposite, and that I was using DNSMasq to provide a DHCP service to the host computer (the one you plug the Pi into). In this part, I’m going to extend what local services I could provide on this device, and start to use this as a router.

Here’s what you missed last time… When you plug the RPi in (to receive power on the data line), it powers up the RPi Zero, and uses a kernel module called “libComposite” to turn the USB interface into an Ethernet adaptor. Because of how Windows and non-Windows devices handle network interfaces, we use two features of libComposite to create an ECM/CDC interface and a RNDIS interface, called usb0 and usb1, and whichever one of these two is natively supported in the OS, that’s which interface comes up. As a result, we can then use DNSMasq to “advertise” a DHCP address for each interface, and use that to advertise services on, like an SSH server.

By making this device into a router, we can use it to access the network, without using the in-built network adaptor (which might be useful if your in-built WiFi adaptors isn’t detected under Linux or Windows without a driver), or to protect your computer from malware (by adding a second firewall that doesn’t share the same network stack as it’s host), or perhaps to ensure that your traffic is sent over a VPN tunnel.

Read More
"raspberry pie" by "stu_spivack" on Flickr

Post-Config of a RaspberryPi Zero W as an OTG-USB Gadget for off-device computing

History

A few months ago, I was working on a personal project that needed a separate, offline linux environment. I tried various different schemes to run what I was doing in the confines of my laptop and I couldn’t make what I was working on actually achieve my goals. So… I bought a Raspberry Pi Zero W and a “Solderless Zero Dongle“, with the intention of running Docker containers on it… unfortunately, while Docker runs on a Pi Zero, it’s really hard to find base images for the ARMv6/armhf platform that the Pi Zero W… so I put it back in the drawer, and left it there.

Roll forwards a month or so, and I was doing some experiments with Nebula, and only had an old Chromebook to test it on… except, I couldn’t install the Nebula client for Linux on there, and the Android client wouldn’t give me some features I wanted… so I broke out that old Pi Zero W again…

Now, while the tests with Nebula I was working towards will be documented later, I found that a lot of the documentation about using a Raspberry Pi Zero as a USB gadget were rough and unexplained. So, this post breaks down much of the content of what I found, what I tried, and what did and didn’t work.

Late Edit 2021-06-04: I spotted some typos around providing specific DHCP options for interfaces, based on work I’m doing elsewhere with this script. I’ve updated these values accordingly. I’ve also created a specific branch for this revision.

Late Edit 2021-06-06: I’ve noticed this document doesn’t cover IPv6 at all right now. I started to perform some tweaks to cover IPv6, but as my ISP has decided not to bother with IPv6, and won’t support Hurricane Electric‘s Tunnelbroker system, I can’t test any of it, without building out an IPv6 test environment… maybe soon, eh?

Read More
"the home automation system designed by loren amelang himself" by "Nicolás Boullosa" on Flickr

One to read: Ansible for Networking – Part 3: Cisco IOS

One to read: “Ansible for Networking – Part 3: Cisco IOS”

One of the guest hosts and stalwart member of the Admin Admin Telegram group has been documenting how he has built his Ansible Networking lab.

Stuart has done three posts so far, but this is the first one actually dealing with the technology. It’s a mammoth read, so I’d recommend doing it on a computer, and not on a tablet or phone!

Posts one and two were about what the series would cover and how the lab has been constructed.

Featured image is “the home automation system designed by loren amelang himself” by “Nicolás Boullosa” on Flickr and is released under a CC-BY license.

"vieux port Marseille" by "Jeanne Menjoulet" on Flickr

Networking tricks with Multipass in Virtualbox on Windows (Bridged interfaces and Port Forwards)

TL;DR? Want to “just” bridge one or more interfaces to a Multipass instance when you’re using Virtualbox? See the Bridging Summary below. Want to do a port forward? See the Port Forward section below. You will need the psexec command and to execute this as an administrator. The use of these two may be considered a security incident on your computing environment, depending on how your security processes and infrastructure are defined and configured.

Ah Multipass. This is a tool created by Canonical to create a “A mini-cloud on your Mac or Windows workstation.” (from their website)…

I’ve often seen this endorsed as the tool of choice from Canonical employees to do “stuff” like run Kubernetes, develop tools for UBPorts (previously Ubuntu Touch) devices, and so on.

So far, it seems interesting. It’s a little bit like Vagrant with an in-built cloud-init Provisioner, and as I want to test out the cloud-init files I’m creating for AWS and Azure, that’d be so much easier than actually building the AWS or Azure machines, or finding a viable cloud-init plugin for Vagrant to test it out.

BUT… Multipass is really designed for Linux systems (running LibVirt), OS X (running HyperKit) and Windows (running Hyper-V). Even if I were using Windows 10 Pro on this machine, I use Virtualbox for “things” on my Windows Machine, and Hyper-V steals the VT-X bit, which means that VirtualBox can’t run x64 code…. Soooo I can’t use the Hyper-V mode.

Now, there is a “fix” for this. You can put Multipass into Virtualbox mode, which lets you run Multipass on Windows or OS X without using their designed-for hypervisor, but this has a downside, you see, VirtualBox doesn’t give MultiPass the same interface to route networking connections to the VM, and there’s currently no CLI or GUI options to say “bridge my network” or “forward a port” (in part because it needs to be portable to the native hypervisor options, apparently). So, I needed to fudge some things so I can get my beloved bridged connections.

I got to the point where I could do this, thanks to the responses to a few issues I raised on the Multipass Github issues, mostly #1333.

The first thing you need to install in Windows is PsExec, because Multipass runs it’s Virtual Machines as the SYSTEM account, and talking to SYSTEM account processes is nominally hard. Get PsExec from the SysInternals website. Some IT Security professionals will note the addition of PsExec as a potential security incident, but then again, they might also see the running of a virtual machine as a security incident too, as these aren’t controlled with a central image. Anyway… Just bear it in mind, and don’t shout at me if you get frogmarched in front of your CISO.

I’m guessing if you’re here, you’ve already installed Multipass, (but if not, and it seems interesting – it’s over at https://multipass.run. Get it and install it, then carry on…) and you’ve probably enabled the VirtualBox mode (if not – open a command prompt as administrator, and run “multipass set local.driver=virtualbox“). Now, you can start sorting out your bridges.

Sorting out bridges

First things first, you need to launch a virtual machine. I did, and it generated a name for my image.

C:\Users\JON>multipass launch
Launched: witty-kelpie

Fab! We have a running virtual machine, and you should be able to get a shell in there by running multipass shell "witty-kelpie" (the name of the machine it launched before). But, uh-oh. We have the “default” NAT interface of this device mapped, not a bridged interface.

C:\Users\JON>multipass shell "witty-kelpie"
Welcome to Ubuntu 18.04.3 LTS (GNU/Linux 4.15.0-76-generic x86_64)

 * Documentation:  https://help.ubuntu.com
 * Management:     https://landscape.canonical.com
 * Support:        https://ubuntu.com/advantage

  System information as of Thu Feb  6 10:56:38 GMT 2020

  System load:  0.3               Processes:             82
  Usage of /:   20.9% of 4.67GB   Users logged in:       0
  Memory usage: 11%               IP address for enp0s3: 10.0.2.15
  Swap usage:   0%


0 packages can be updated.
0 updates are security updates.


To run a command as administrator (user "root"), use "sudo <command>".
See "man sudo_root" for details.

ubuntu@witty-kelpie:~$

So, exit the machine, and issue a multipass stop "witty-kelpie" command to ask Virtualbox to shut it down.

So, this is where the fun[1] part begins.
[1] The “Fun” part here depends on how you view this specific set of circumstances 😉

We need to get the descriptions of all the interfaces we might want to bridge to this device. I have three interfaces on my machine – a WiFi interface, a Ethernet interface on my laptop, and an Ethernet interface on my USB3 dock. At some point in the past, I renamed these interfaces, so I’d recognise them in the list of interfaces, so they’re not just called “Connection #1”, “Connection #2” and so on… but you should recognise your interfaces.

To get this list of interfaces, open PowerShell (as a “user”), and run this command:

PS C:\Users\JON> Get-NetAdapter -Physical | format-list -property "Name","DriverDescription"

Name              : On-Board Network Connection
DriverDescription : Intel(R) Ethernet Connection I219-LM

Name              : Wi-Fi
DriverDescription : Intel(R) Dual Band Wireless-AC 8260

Name              : Dock Network Connection
DriverDescription : DisplayLink Network Adapter NCM

For reasons best known to the Oracle team, they use the “Driver Description” to identify the interfaces, not the name assigned to the device by the user, so, before we get started, find your interface, and note down the description for later. If you want to bridge “all” of them, make a note of all the interfaces in question, and in the order you want to attach them. Note that Virtualbox doesn’t really like exposing more than 8 NICs without changing the Chipset to ICH9 (but really… 9+ NICs? really??) and the first one is already consumed with the NAT interface you’re using to connect to it… so that gives you 7 bridgeable interfaces. Whee!

So, now you know what interfaces you want to bridge, let’s configure the Virtualbox side. Like I said before you need psexec. I’ve got psexec stored in my Downloads folder. You can only run psexec as administrator, so open up an Administrator command prompt or powershell session, and run your command.

Just for clarity, your commands are likely to have some different paths, so remember that wherever “your” PsExec64.exe command is located, mine is in C:\Users\JON\Downloads\sysinternals\PsExec64.exe, and wherever your vboxmanage.exe is located, mine is in C:\Program Files\Oracle\VirtualBox\vboxmanage.exe.

Here, I’m going to attach my dock port (“DisplayLink Network Adapter NCM”) to the second VirtualBox interface, the Wifi adaptor to the third interface and my locally connected interface to the fourth interface. Your interfaces WILL have different descriptions, and you’re likely not to need quite so many of them!

C:\WINDOWS\system32>C:\Users\JON\Downloads\sysinternals\PsExec64.exe -s "c:\program files\oracle\virtualbox\vboxmanage" modifyvm "witty-kelpie" --nic2 bridged --bridgeadapter2 "DisplayLink Network Adapter NCM" --nic3 bridged --bridgeadapter3 "Intel(R) Dual Band Wireless-AC 8260" --nic4 bridged --bridgeadapter4 "Intel(R) Ethernet Connection I219-LM"

PsExec v2.2 - Execute processes remotely
Copyright (C) 2001-2016 Mark Russinovich
Sysinternals - www.sysinternals.com

c:\program files\oracle\virtualbox\vboxmanage exited on MINILITH with error code 0.

An error code of 0 means that it completed successfuly and with no issues.

If you wanted to use a “Host Only” network (if you’re used to using Vagrant, you might know it as “Private” Networking), then change the NIC you’re interested in from --nicX bridged --bridgeadapterX "Some Description" to --nicX hostonly --hostonlyadapterX "VirtualBox Host-Only Ethernet Adapter" (where X is replaced with the NIC number you want to swap, ranged between 2 and 8, as 1 is the NAT interface you use to SSH into the virtual machine.)

Now we need to check to make sure the machine has it’s requisite number of interfaces. We use the showvminfo flag to the vboxmanage command. It produces a LOT of content, so I’ve manually filtered the lines I want, but you should spot it reasonably quickly.

C:\WINDOWS\system32>C:\Users\JON\Downloads\sysinternals\PsExec64.exe -s "c:\program files\oracle\virtualbox\vboxmanage" showvminfo "witty-kelpie"

PsExec v2.2 - Execute processes remotely
Copyright (C) 2001-2016 Mark Russinovich
Sysinternals - www.sysinternals.com


Name:                        witty-kelpie
Groups:                      /Multipass
Guest OS:                    Ubuntu (64-bit)
<SNIP SOME CONTENT>
NIC 1:                       MAC: 0800273CCED0, Attachment: NAT, Cable connected: on, Trace: off (file: none), Type: 82540EM, Reported speed: 0 Mbps, Boot priority: 0, Promisc Policy: deny, Bandwidth group: none
NIC 1 Settings:  MTU: 0, Socket (send: 64, receive: 64), TCP Window (send:64, receive: 64)
NIC 1 Rule(0):   name = ssh, protocol = tcp, host ip = , host port = 53507, guest ip = , guest port = 22
NIC 2:                       MAC: 080027303758, Attachment: Bridged Interface 'DisplayLink Network Adapter NCM', Cable connected: on, Trace: off (file: none), Type: 82540EM, Reported speed: 0 Mbps, Boot priority: 0, Promisc Policy: deny, Bandwidth group: none
NIC 3:                       MAC: 0800276EA174, Attachment: Bridged Interface 'Intel(R) Dual Band Wireless-AC 8260', Cable connected: on, Trace: off (file: none), Type: 82540EM, Reported speed: 0 Mbps, Boot priority: 0, Promisc Policy: deny, Bandwidth group: none
NIC 4:                       MAC: 080027042135, Attachment: Bridged Interface 'Intel(R) Ethernet Connection I219-LM', Cable connected: on, Trace: off (file: none), Type: 82540EM, Reported speed: 0 Mbps, Boot priority: 0, Promisc Policy: deny, Bandwidth group: none
NIC 5:                       disabled
NIC 6:                       disabled
NIC 7:                       disabled
NIC 8:                       disabled
<SNIP SOME CONTENT>

Configured memory balloon size: 0MB

c:\program files\oracle\virtualbox\vboxmanage exited on MINILITH with error code 0.

Fab! We now have working interfaces… But wait, let’s start that VM back up and see what happens.

C:\Users\JON>multipass shell "witty-kelpie"
Welcome to Ubuntu 18.04.3 LTS (GNU/Linux 4.15.0-76-generic x86_64)

 * Documentation:  https://help.ubuntu.com
 * Management:     https://landscape.canonical.com
 * Support:        https://ubuntu.com/advantage

  System information as of Thu Feb  6 11:31:08 GMT 2020

  System load:  0.1               Processes:             84
  Usage of /:   21.1% of 4.67GB   Users logged in:       0
  Memory usage: 11%               IP address for enp0s3: 10.0.2.15
  Swap usage:   0%


0 packages can be updated.
0 updates are security updates.


Last login: Thu Feb  6 10:56:45 2020 from 10.0.2.2
To run a command as administrator (user "root"), use "sudo <command>".
See "man sudo_root" for details.

ubuntu@witty-kelpie:~$

Wait, what….. We’ve still only got the one interface up with an IP address… OK, let’s fix this!

As of Ubuntu 18.04, interfaces are managed using Netplan, and, well, when the VM was built, it didn’t know about any interface past the first one, so we need to get Netplan to get them enabled. Let’s check they’re detected by the VM, and see what they’re all called:

ubuntu@witty-kelpie:~$ ip link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP mode DEFAULT group default qlen 1000
    link/ether 08:00:27:3c:ce:d0 brd ff:ff:ff:ff:ff:ff
3: enp0s8: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 08:00:27:30:37:58 brd ff:ff:ff:ff:ff:ff
4: enp0s9: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 08:00:27:6e:a1:74 brd ff:ff:ff:ff:ff:ff
5: enp0s10: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 08:00:27:04:21:35 brd ff:ff:ff:ff:ff:ff
ubuntu@witty-kelpie:~$ 

If you compare the link/ether lines to the output from showvminfo we executed before, you’ll see that the MAC address against enp0s3 matches the NAT interface, while enp0s8 matches the DisplayLink adapter, and so on… So we basically want to ask NetPlan to do a DHCP lookup for all the new interfaces we’ve added to it. If you’ve got 1 NAT and 7 physical interfaces (why oh why…) then you’d have enp0s8, 9, 10, 16, 17, 18 and 19 (I’ll come back to the random numbering in a tic)… so we now need to ask Netplan to do DHCP on all of those interfaces (assuming we’ll be asking for them all to come up!)

If we want to push that in, then we need to add a new file in /etc/netplan called something like 60-extra-interfaces.yaml, that should contain:

network:
  ethernets:
    enp0s8:
      optional: yes
      dhcp4: yes
      dhcp4-overrides:
        route-metric: 10
    enp0s9:
      optional: yes
      dhcp4: yes
      dhcp4-overrides:
        route-metric: 11
    enp0s10:
      optional: yes
      dhcp4: yes
      dhcp4-overrides:
        route-metric: 12
    enp0s16:
      optional: yes
      dhcp4: yes
      dhcp4-overrides:
        route-metric: 13
    enp0s17:
      optional: yes
      dhcp4: yes
      dhcp4-overrides:
        route-metric: 14
    enp0s18:
      optional: yes
      dhcp4: yes
      dhcp4-overrides:
        route-metric: 15
    enp0s19:
      optional: yes
      dhcp4: yes
      dhcp4-overrides:
        route-metric: 16

Going through this, we basically ask netplan not to assume the interfaces are attached. This stops the boot process for waiting for a timeout to configure each of the interfaces before proceeding, so it means your boot should be reasonably fast, particularly if you don’t always attach a network cable or join a Wifi network on all your interfaces!

We also say to assume we want IPv4 DHCP on each of those interfaces. I’ve done IPv4 only, as most people don’t use IPv6 at home, but if you are doing IPv6 as well, then you’d also need the same lines that start dhcp4 copied to show dhcp6 (like dhcp6: yes and dhcp6-overrides: route-metric: 10)

The eagle eyed of you might notice that the route metric increases for each extra interface. This is because realistically, if you have two interfaces connected (perhaps if you’ve got wifi enabled, and plug a network cable in), then you’re more likely to want to prioritize traffic going over the lower numbered interfaces than the higher number interfaces.

Once you’ve created this file, you need to run netplan apply or reboot your machine.

So, yehr, that gets you sorted on the interface front.

Bridging Summary

To review, you launch your machine with multipass launch, and immediately stop it with multipass stop "vm-name", then, as an admin, run psexec vboxmanage modifyvm "vm-name" --nic2 bridged --bridgedadapter2 "NIC description", and then start the machine with multipass start "vm-name". Lastly, ask the interface to do DHCP by manipulating your Netplan configuration.

Interface Names in VirtualBox

Just a quick note on the fact that the interface names aren’t called things like eth0 any more. A few years back, Ubuntu (amongst pretty much all of the Linux distribution vendors) changed from using eth0 style naming to what they call “Predictable Network Interface Names”. This derives the names from things like, what the BIOS provides for on-board interfaces, slot index numbers for PCI Express ports, and for this case, the “geographic location of the connector”. In Virtualbox, these interfaces are provided as the “Geographically” attached to “port 0” (so enp0 are all on port 0), but for some reason, they broadcast themselves as being attached to the port 0 at “slots” 3, 8, 9, 10, 16, 17, 18 and 19… hence enp0s3 and so on. shrug It just means that if you don’t have the interfaces coming up on the interfaces you’re expecting, you need to run ip link to confirm the MAC addresses match.

Port Forwarding

Unlike with the Bridging, we don’t need to power down the VM to add the extra interfaces, we just need to use psexec (as an admin again) to execute a vboxmanage command – in this case, it’s:

C:\WINDOWS\system32>C:\Users\JON\Downloads\sysinternals\PsExec64.exe -s "c:\program files\oracle\virtualbox\vboxmanage" controlvm "witty-kelpie" --natpf1 "myport,tcp,,1234,,2345"

OK, that’s a bit more obscure. Basically it says “Create a NAT rule on NIC 1 called ‘myport’ to forward TCP connections from port 1234 attached to any IP associated to the host OS to port 2345 attached to the DHCP supplied IP on the guest OS”.

If we wanted to run a DNS server in our VM, we could run multiple NAT rules in the same command, like this:

C:\WINDOWS\system32>C:\Users\JON\Downloads\sysinternals\PsExec64.exe -s "c:\program files\oracle\virtualbox\vboxmanage" controlvm "witty-kelpie" --natpf1 "TCP DNS,tcp,127.0.0.1,53,,53" --natpf1 "UDP DNS,udp,127.0.0.1,53,,53"

If we then decide we don’t need those NAT rules any more, we just (with psexec and appropriate paths) issue: vboxmanage controlvm "vm-name" --natpf1 delete "TCP DNS"

Using ifupdown instead of netplan

Late Edit 2020-04-01: On Github, someone asked me how they could use the same type of config with netplan, but instead on a 16.04 system. Ubuntu 16.04 doesn’t use netplan, but instead uses ifupdown instead. Here’s how to configure the file for ifupdown:

You can either add the following stanzas to /etc/network/interfaces, or create a separate file for each interface in /etc/network/interfaces.d/<number>-<interface>.cfg (e.g. /etc/network/interfaces.d/10-enp0s8.cfg)

allow-hotplug enp0s8
iface enp0s8 inet dhcp
  metric 10

To re-iterate, in the above netplan file, the interfaces we identified were: enp0s8, enp0s9, enp0s10, enp0s16, enp0s17, enp0s18 and enp0s19. Each interface was incrementally assigned a route metric, starting at 10 and ending at 16, so enp0s8 has a metric of 10, while enp0s16 has a metric of 13, and so on. To build these files, I’ve created this brief shell script you could use:

export metric=10
for int in 8 9 10 16 17 18 19
do
  echo -e "allow-hotplug enp0s${int}\niface enp0s${int} inet dhcp\n  metric $metric" > /etc/network/interfaces.d/enp0s${int}.cfg
  ((metric++))
done

As before, you could reboot to make the changes to the interfaces. Bear in mind, however, that unlike Netplan, these interfaces will try and DHCP on boot with this configuration, so boot time will take longer if every interface attached isn’t connected to a network.

Using NAT Network instead of NAT Interface

Late update 2020-05-26: Ruzsinsky contacted me by email to ask how I’d use a “NAT Network” instead of a “NAT interface”. Essentially, it’s the same as the Bridged interface above, with one other tweak first, we need to create the Net Network, with this command (as an Admin)

C:\WINDOWS\system32>C:\Users\JON\Downloads\sysinternals\PsExec64.exe -s "c:\program files\oracle\virtualbox\vboxmanage" natnetwork add --netname MyNet --network 192.0.2.0/24

Next, stop your multipass virtual machine with multipass stop "witty-kelpie", and configure your second interface, like this:

C:\WINDOWS\system32>C:\Users\JON\Downloads\sysinternals\PsExec64.exe -s "c:\program files\oracle\virtualbox\vboxmanage" modifyvm "witty-kelpie" --nic2 natnetwork --nat-network2 "MyNet"

PsExec v2.2 - Execute processes remotely
Copyright (C) 2001-2016 Mark Russinovich
Sysinternals - www.sysinternals.com

c:\program files\oracle\virtualbox\vboxmanage exited on MINILITH with error code 0.

Start the vm with multipass start "witty-kelpie", open a shell with it multipass shell "witty-kelpie", become root sudo -i and then configure the interface in /etc/netplan/60-extra-interfaces.yaml like we did before:

network:
  ethernets:
    enp0s8:
      optional: yes
      dhcp4: yes
      dhcp4-overrides:
        route-metric: 10

And then run netplan apply or reboot.

What I would say, however, is that the first interface seems to be expected to be a NAT interface, at which point, having a NAT network as well seems a bit pointless. You might be better off using a “Host Only” (or “Private”) network for any inter-host communications between nodes at a network level… But you know your environments and requirements better than I do :)

Featured image is “vieux port Marseille” by “Jeanne Menjoulet” on Flickr and is released under a CC-BY-ND license.

"Wifi Here on a Blackboard" by "Jem Stone" on Flickr

Free Wi-Fi does not need to be password-less!

Recently a friend of mine forwarded an email to me about a Wi-fi service he wanted to use from a firm, but he raised some technical questions with them which they seemed to completely misunderstand!

So, let’s talk about the misconceptions of Wi-fi passwords.

Many people assume that when you log into a system, it means that system is secure. For example, logging into a website makes sure that your data is secure and protected, right? Not necessarily – the password you entered could be on a web page that is not secured by TLS, or perhaps the web server doesn’t properly transfer it’s contents to a database. Maybe the website was badly written, and means it’s vulnerable to one of a handful of common attacks (with fun names like “Cross Site Scripting” or “SQL Injection Attacks”)…

People also assume the same thing about Wi-fi. You reached a log in page, so it must be secure, right? It depends. If you didn’t put in a password to access the Wi-fi in the first place (like in the image of the Windows 10 screen, or on my KDE Desktop) then you’re probably using Unsecured Wi-fi.

An example of a secured Wi-fi sign-in box on Windows 10
The same Wi-fi sign in box on KDE Neon

People like to compare network traffic to “sending things through the post”, notablycomparing E-Mail to “sending a postcard”, versus PGP encrypted E-Mail being compared to “sending a sealed letter”. Unencrypted Wi-fi is like using CB. Anyone who can hear your signal can understand what you are saying… but if you visit a website which uses HTTPS, then it’s like listening to someone saying random numbers over the radio.

And, if you’re using Unencrypted Wi-fi, it’s also possible for an attacker to see what website you visited, because the request for the address to reach on the Internet (e.g. “Google.com” = 172.217.23.14) is sent in the clear. Also because of the way that DNS works (that name to address matching thing) means that if someone knows you’re visiting a “site of interest” (like, perhaps a bank website), they can reply *before* the real DNS server, and tell you that the server on their machine is actually your bank’s website.

So, many of these things can be protected against by using a simple method, that many people who provide Wi-fi don’t do.

Turn on WPA2 (the authentication bit). Even if *everyone* uses the same password (which they’d have to for WPA2), the fact you’re logging into the Access Point means it creates a unique shared secret for your session.

“But hang on”, I hear the guy at the back cry, “you used the same password – how does that work?”

OK, so this is where the fun stuff starts. The password is just part of how you negotiate to get on to the network. There’s a complex beast of a method that explains how get a shared unique secret when you’re passing stuff around “in the clear”, and so as a result, when you first connect to that Wi-fi access point, and you hand over your password, it “Authorises” you on to the network, but then hands you over to the encryption part, where you generate a key and then use that to talk to each other. The encryption is the bit like “HTTPS”, where you make it so that people can’t see what you’re looking at.

“I got told that if everyone used the same password” said a hipster in the front row, “I wouldn’t be able to tell them apart.” Aha, not true. You can have a separate passphrase to access the Wi-fi from the Login page, after all, you’ve got to make sure that people aren’t breaking the rules (which they *TOTALLY* read, before clicking “I agree, just get me on the damn Wi-fi already”) by using your network.

“OK”, says the lady over on the right, “but when I connected to the Wi-fi, they asked me to log in using Facebook – that’s secure, right?”

Um, no. Well, maybe. See, if they gave you a WPA2 password to log into the Wi-fi, and then the first thing you got to was that login screen, then yep, it’s all good! {*} You can browse with (relative) impunity. But if they didn’t… well, not only are they asking you to shout your secrets on the radio, but if you’re really unlucky, the page asking you to log into Facebook might *also* not actually be Facebook, but another website that just looks like Facebook… after all, I’m sure that page you went to complained that it wasn’t Google or Facebook when you tried to open it…

{*} Except for the fact they’re asking you to tell them not only who you are, but who you’re also friends with, where you went to school, what your hobbies are, what groups you’re in, your date of birth and so on.

But anyway. I understand why those login screens are there. They’re asserting that not only do you understand that you mustn’t use their network for bad things, but that if the police come and ask them who used their network to do something naughty, they can say “He said his name was ‘Bob Smith’ and his email address was ‘bob@example.com’, Officer”…

It also means that the “free” service they provide to you, usually at some great expense (*eye roll*) can get them some return on investment (like, they just got your totally-real-and-not-at-all-made-up-email-address… honest, and they also know what websites you visited while you were there, which they can sell on).

So… What to do the next time you “need” Wi-fi, and there’s a free service there? Always use a VPN when you’re not using a network you trust. If the Wi-fi isn’t using WPA2 encryption (even something as simple as “Buy a drink first” is a great passphrase to use!) point them to this page, and tell them it’s virtually pain free (as long as the passphrase is easy to remember, easy to type and doesn’t have too many weird symbols in) and makes their service more safe and secure for their customers…

Featured image is “Wifi Here on a Blackboard” by “Jem Stone” on Flickr and is released under a CC-BY license.

"Juniper NetScreen 25 Firewall front" by "jackthegag" on Flickr

Standard Firewall Rules

One of the things I like to do is to explain how I set things up, but a firewall is one of those things that’s a bit complicated, because it depends on your situation, and what you’re trying to do in your environment. That said, there’s a template that you can probably get away with deploying, and see if it works for your content, and then you’ll see where to add the extra stuff from there. Firewall policies typically work from the top down.

This document will assume you have a simple boundary firewall. This simple firewall has two interfaces, the first being an “Outside” interface, connected to your ISP, with an IPv4 address of 192.0.2.2/24 and a default gateway of 192.0.2.1, it also has a IPv6 address of 2001:db8:123c:abd::2/64 and a default gateway address of 2001:db8:123c:abd::1. The second “Inside” interface, where your protected network is attached, has an IPv4 address of 198.51.100.1/24 and an IPv6 address of 2001:db8:123d:abc::1/64. On this inside interface, the firewall is the default gateway for the inside network.

I’ll be using simple text rules to describe firewall policies, following this format:

Source Interface: <outside | inside>
Source IP Address: <x.x.x.x/x | "any">
NAT Source IP Address: <x.x.x.x/x | no>
Destination Interface: <outside | inside>
Destination IP Address: <x.x.x.x/x | "any">
NAT Destination IP Address: <x.x.x.x/x | no>
Destination Port: <tcp | udp | icmp | ip>/<x>
Action: <allow | deny | reject>
Log: <yes | no>
Notes: <some commentary if required>

In this model, if you want to describe HTTP access to a web server, you might write the following policy:

Source Interface: outside
Source IP Address: 0.0.0.0/0 (Any IP)
NAT Source IP Address: no
Destination Interface: inside
Destination IP Address: 192.0.2.2 (External IP)
NAT Destination IP Address: 198.51.100.2 (Internal IP)
Destination Port: tcp/80
Action: allow
Log: yes

So, without further waffling, let’s build a policy. By default all traffic will be logged. In high-traffic environments, you may wish to prevent certain traffic from being logged, but on the whole, I think you shouldn’t really lose firewall logs unless you need to!

Allowing established, related and same-host traffic

This rule is only really needed on iptables based firewalls, as all the commercial vendors (as far as I can tell, at least) already cover this as “standard”. If you’re using UFW (a wrapper to iptables), this rule is covered off already, but essentially it goes a bit like this:

Source Interface: lo (short for "local", where the traffic never leaves the device)
Source IP Address: any
NAT Source IP Address: no
Destination Interface: lo
Destination IP Address: any
NAT Destination IP Address: no
Destination Port: any
Action: allow
Log: no
Notes: This above rule permits traffic between localhost addresses (127.0.0.0/8) or between public addresses on the same host, for example, between two processes without being blocked.
flags: Established OR Related
Action: allow
Log: no
Notes: This above rule is somewhat special, as it looks for specific flags on the packet, that says "If we've already got a session open, let it carry on talking".

Dropping Noisy Traffic

In a network, some proportion of the traffic is going to be “noisy”. Whether it’s broadcast traffic from your application that uses mDNS, or the Windows File Share trying to find like-minded hosts to exchange data… these can fill up your logs, so lets drop the broadcast and multicast IPv4 traffic, and not log them.

Source Interface: any
Source IP Address: 0.0.0.0/0
NAT Source IP Address: no
Destination Interface: any
Destination IP Address: 255.255.255.255 (global broadcast), 192.0.2.255 ("outside" broadcast), 198.51.100.255 ("inside" broadcast) and 224.0.0.0/4 (multicast)
NAT Destination IP Address: no
Destination Port: any
Action: deny
Log: no
Notes: The global and local broadcast addresses are used to "find" other hosts in a network, whether that's a DHCP server or something like mDNS. Dropping this prevents the traffic from appearing in your logs later.

Permitting Management Traffic

Typically you want to trust certain machines to access or be accessed by this host – whether it’s your SYSLOG collector, or the box that can manage the firewall policy, so here we’ll create a policy that lets these in.

Source Interface: inside
Source IP Address: 198.51.100.2 and 2001:db8:123d:abc::2 (Management IP)
NAT Source IP Address: no
Destination Interface: inside
Destination IP Address: 198.51.100.1 and 2001:db8:123d:abc::1 (Firewall IP)
NAT Destination IP Address: no
Destination Port: SSH (tcp/22)
Action: permit
Log: yes
Notes: Allow inbound SSH access. You're unlikely to need more inbound ports, but if you do - customise them here.
Source Interface: inside
Source IP Address: 198.51.100.1 and 2001:db8:123d:abc::1 (Firewall IP)
NAT Source IP Address: no
Destination Interface: inside
Destination IP Address: 198.51.100.2 and 2001:db8:123d:abc::2 (Management IP)
NAT Destination IP Address: no
Destination Port: SYSLOG (udp/514)
Action: permit
Log: yes
Notes: Allow outbound SYSLOG access. Tailor this to outbound ports you need.

Allowing Control Traffic

ICMP is a protocol that is fundamental to IPv4 and IPv6. Commonly used for Traceroute and Ping, but also used to perform REJECT responses and that sort of thing. We’re only going to let it be initiated *out* not in. Some people won’t allow this rule, or tailor it to more specific destinations.

Source Interface: inside
Source IP Address: any
NAT Source IP Address: 192.0.2.2 (The firewall IP address which may be replaced with 0.0.0.0 indicating "whatever IP address is bound to the outbound interface")
Destination Interface: outside
Destination IP Address: any
NAT Destination IP Address: no
Destination Port: icmp
Action: allow
Log: yes
Notes: ICMPv4 and ICMPv6 are different things. This is just the ICMPv4 version. IPv4 does require NAT, hence the difference from the IPv6 version below.
Source Interface: inside
Source IP Address: any
NAT Source IP Address: no
Destination Interface: outside
Destination IP Address: any
NAT Destination IP Address: no
Destination Port: icmpv6
Action: allow
Log: yes
Notes: ICMPv4 and ICMPv6 may be treated as different things. This is just the ICMPv6 version. IPv6 does not require NAT.

Protect the Firewall

There should be no other traffic going to the Firewall, so let’s drop everything. There are two types of “Deny” message – a “Reject” and a “Drop”. A Reject sends a message back from the host which is refusing the connection – usually the end server to say that the service didn’t want to reply to you, but if there’s a box in the middle – like a firewall – this reject (actually an ICMP packet) comes from the firewall instead. In this case it’s identifying that the firewall was refusing the connection for the node, so it advertises the fact the end server is protected by a security box. Instead, firewall administrators tend to use Drop, which just silently discards the initial request, leaving the initiating end to “Time Out”. You’re free to either “Reject” or “Drop” whenever we show “Deny” in the below policies, but bear it in mind that it’s less secure to use Reject than it is to Drop.

Source Interface: any
Source IP Address: any
NAT Source IP Address: no
Destination Interface: any
Destination IP Address: 192.0.2.2, 2001:db8:123c:abd::2, 198.51.100.1 and 2001:db8:123d:abc::1 (may also be represented as :: or 0.0.0.0 depending on the platform)
NAT Destination IP Address: no
Destination Port: any
Action: deny
Log: no
Notes: Drop everything targetted at the firewall IPs. If you have more NICs or additional IP addresses on the firewall, these will also need blocking.

“Normal” Inbound Traffic

After you’ve got your firewall protected, now you can sort out your “normal” traffic flows. I’m going to add a single inbound policy to represent the sort of traffic you might want to configure (in this case a simple web server), but bear in mind some environments don’t have any “inbound” rules (for example, most homes would be in this case), and some might need lots and lots of inbound rules. This is just to give you a flavour on what you might see here.

Source Interface: outside
Source IP Address: any
NAT Source IP Address: no
Destination Interface: inside
Destination IP Address: 192.0.2.2 (External IP)
NAT Destination IP Address: 198.51.100.2 (Internal IP)
Destination Port: tcp/80 (HTTP), tcp/443 (HTTPS)
Action: allow
Log: yes
Notes: This is the IPv4-only rule. Note a NAT MUST be applied here.
Source Interface: outside
Source IP Address: any
NAT Source IP Address: no
Destination Interface: inside
Destination IP Address: 2001:db8:123d:abc::2
NAT Destination IP Address: no
Destination Port: tcp/80 (HTTP), tcp/443 (HTTPS)
Action: allow
Log: yes
Notes: This is the IPv6-only rule. Note that NO NAT is required (but, you may wish to perform NAT, depending on your environment).

“Normal” Outbound Traffic

If you’re used to a DSL router, that basically just allows all outbound traffic. We’re going to implement that here. If you want to be more specific about things, you’d define your outbound rules like the inbound rules in the block above… but if you’re not that worried, then this rule below is generally going to be all OK :)

Source Interface: inside
Source IP Address: any
NAT Source IP Address: 192.0.2.2 (The firewall IP address which may be replaced with 0.0.0.0 indicating "whatever IP address is bound to the outbound interface")
Destination Interface: outside
Destination IP Address: any
NAT Destination IP Address: no
Destination Port: any
Action: allow
Log: yes
Notes: This is just the IPv4 version. IPv4 does require NAT, hence the difference from the IPv6 version below.
Source Interface: inside
Source IP Address: any
NAT Source IP Address: no
Destination Interface: outside
Destination IP Address: any
NAT Destination IP Address: no
Destination Port: any
Action: allow
Log: yes
Notes: This is just the IPv6 version. IPv6 does not require NAT.

Drop Rule

Following your permit rules above, you now need to drop everything else. Fortunately, by now, you’ve “white-listed” all the permitted traffic, so now we can just drop “everything”. So, let’s do that!

Source Interface: any
Source IP Address: any
NAT Source IP Address: no
Destination Interface: any
Destination IP Address: any
NAT Destination IP Address: no
Destination Port: any
Action: deny
Log: yes

And so that is a basic firewall policy… or at least, it’s the template I tend to stick to! :)

"www.GetIPv6.info decal" from Phil Wolff on Flickr

Hurricane Electric IPv6 Gateway on Raspbian for Raspberry Pi

NOTE: This article was replaced on 2019-03-12 by a github repository where I now use Vagrant instead of a Raspberry Pi, because I was having some power issues with my Raspberry Pi. Also, using this method means I can easily use an Ansible Playbook. The following config will still work(!) however I prefer this Vagrant/Ansible workflow for this, so won’t update this blog post any further.

Following an off-hand remark from a colleague at work, I decided I wanted to set up a Raspberry Pi as a Hurricane Electric IPv6 6in4 tunnel router. Most of the advice around (in particular, this post about setting up IPv6 on the Raspberry Pi Forums) related to earlier version of Raspbian, so I thought I’d bring it up-to-date.

I installed the latest available version of Raspbian Stretch Lite (2018-11-13) and transferred it to a MicroSD card. I added the file ssh to the boot volume and unmounted it. I then fitted it into my Raspberry Pi, and booted it. While it was booting, I set a static IPv4 address on my router (192.168.1.252) for the Raspberry Pi, so I knew what IP address it would be on my network.

I logged into my Hurricane Electric (HE) account at tunnelbroker.net and created a new tunnel, specifying my public IP address, and selecting my closest HE endpoint. When the new tunnel was created, I went to the “Example Configurations” tab, and selected “Debian/Ubuntu” from the list of available OS options. I copied this configuration into my clipboard.

I SSH’d into the Pi, and gave it a basic config (changed the password, expanded the disk, turned off “predictable network names”, etc) and then rebooted it.

After this was done, I created a file in /etc/network/interfaces.d/he-ipv6 and pasted in the config from the HE website. I had to change the “local” line from the public IP I’d provided HE with, to the real IP address of this box. Note that any public IPs (that is, not 192.168.x.x addresses) in the config files and settings I’ve noted refer to documentation addressing (TEST-NET-2 and the IPv6 documentation address ranges)

auto he-ipv6
iface he-ipv6 inet6 v4tunnel
        address 2001:db8:123c:abd::2
        netmask 64
        endpoint 198.51.100.100
        local 192.168.1.252
        ttl 255
        gateway 2001:db8:123c:abd::1

Next, I created a file in /etc/network/interfaces.d/eth0 and put the following configuration in, using the first IPv6 address in the “routed /64” range listed on the HE site:

auto eth0
iface eth0 inet static
    address 192.168.1.252
    gateway 192.168.1.254
    netmask 24
    dns-nameserver 8.8.8.8
    dns-nameserver 8.8.4.4

iface eth0 inet6 static
    address 2001:db8:123d:abc::1
    netmask 64

Next, I disabled the DHCPd service by issuing systemctl stop dhcpcd.service Late edit (2019-01-22): Note, a colleague mentioned that this should have actually been systemctl stop dhcpcd.service && systemctl disable dhcpcd.service – good spot! Thanks!! This ensures that if, for some crazy reason, the router stops offering the right DHCP address to me, I can still access this box on this IP. Huzzah!

I accessed another host which had IPv6 access, and performed both a ping and an SSH attempt. Both worked. Fab. However, this now needs to be blocked, as we shouldn’t permit anything to be visible downstream from this gateway.

I’m using the Uncomplicated Firewall (ufw) which is a simple wrapper around IPTables. Let’s create our policy.

# First install the software
sudo apt update && sudo apt install ufw -y

# Permits inbound IPv4 SSH to this host - which should be internal only. 
# These rules allow tailored access in to our managed services
ufw allow in on eth0 app DNS
ufw allow in on eth0 app OpenSSH

# These rules accept all broadcast and multicast traffic
ufw allow in on eth0 to 224.0.0.0/4 # Multicast addresses
ufw allow in on eth0 to 255.255.255.255 # Global broadcast
ufw allow in on eth0 to 192.168.1.255 # Local broadcast

# Alternatively, accept everything coming in on eth0
# If you do this one, you don't need the lines above
ufw allow in on eth0

# Setup the default rules - deny inbound and routed, permit outbound
ufw default deny incoming 
ufw default deny routed
ufw default allow outgoing

# Prevent inbound IPv6 to the network
# Also, log any drops so we can spot them if we have an issue
ufw route deny log from ::/0 to 2001:db8:123d:abc::/64

# Permit outbound IPv6 from the network
ufw route allow from 2001:db8:123d:abc::/64

# Start the firewall!
ufw enable

# Check the policy
ufw status verbose
ufw status numbered

Most of the documentation I found suggested running radvd for IPv6 address allocation. This basically just allocates on a random basis, and, as far as I can make out, each renewal gives the host a new IPv6 address. To make that work, I performed apt-get update && apt-get install radvd -y and then created this file as /etc/radvd.conf. If all you want is a floating IP address with no static assignment – this will do it…

interface eth0
{
    AdvSendAdvert on;
    MinRtrAdvInterval 3;
    MaxRtrAdvInterval 10;
    prefix 2001:db8:123d:abc::/64
    {
        AdvOnLink on;
        AdvAutonomous on;
    };
   route ::/0 {
   };
};

However, this doesn’t give me the ability to statically assign IPv6 addresses to hosts. I found that a different IPv6 allocation method will do static addressing, based on your MAC address called SLAAC (note there are some privacy issues with this, but I’m OK with them for now…) In this mode assuming the prefix as before – 2001:db8:123d:abc:: and a MAC address of de:ad:be:ef:01:23, your IPv6 address will be something like: 2001:db8:123d:abc:dead:beff:feef:0123and this will be repeatably so – because you’re unlikely to change your MAC address (hopefully!!).

This SLAAC allocation mode is available in DNSMasq, which I’ve consumed before (in a Pi-Hole). To use this, I installed DNSMasq with apt-get update && apt-get install dnsmasq -y and then configured it as follows:

interface=eth0
listen-address=127.0.0.1
# DHCPv6 - Hurricane Electric Resolver and Google's
dhcp-option=option6:dns-server,[2001:470:20::2],[2001:4860:4860::8888]
# IPv6 DHCP scope
dhcp-range=2001:db8:123d:abc::, slaac

I decided to move from using my router as a DHCP server, to using this same host, so expanded that config as follows, based on several posts, but mostly centred around the MAN page (I’m happy to have this DNSMasq config improved if you’ve got any suggestions ;) )

# Stuff for DNS resolution
domain-needed
bogus-priv
no-resolv
filterwin2k
expand-hosts
domain=localnet
local=/localnet/
log-queries

# Global options
interface=eth0
listen-address=127.0.0.1

# Set these hosts as the DNS server for your network
# Hurricane Electric and Google
dhcp-option=option6:dns-server,[2001:470:20::2],2001:4860:4860::8888]

# My DNS servers are:
server=1.1.1.1                # Cloudflare's DNS server
server=8.8.8.8                # Google's DNS server

# IPv4 DHCP scope
dhcp-range=192.168.1.10,192.168.1.210,12h
# IPv6 DHCP scope
dhcp-range=2001:db8:123d:abc::, slaac

# Record the DHCP leases here
dhcp-leasefile=/run/dnsmasq/dhcp-lease

# DHCPv4 Router
dhcp-option=3,192.168.1.254

So, that’s what I’m doing now! Hope it helps you!

Late edit (2019-01-22): In issue 129 of the “Awesome Self Hosted Newsletter“, I found a post called “My New Years Resolution: Learn IPv6“… which uses a pfSense box and a Hurricane Electric tunnel too. Fab!

Header image is “www.GetIPv6.info decal” by “Phil Wolff” on Flickr and is released under a CC-BY-SA license. Used with thanks!

One to read/watch: IPsec and IKE Tutorial

Ever been told that IPsec is hard? Maybe you’ve seen it yourself? Well, Paul Wouters and Sowmini Varadhan recently co-delivered a talk at the NetDev conference, and it’s really good.

Sowmini’s and Paul’s slides are available here: https://www.files.netdevconf.org/d/a18e61e734714da59571/

A complete recording of the tutorial is here. Sowmini’s part of the tutorial (which starts first in the video) is quite technically complex, looking at specifically the way that Linux handles the packets through the kernel. I’ve focused more on Paul’s part of the tutorial (starting at 26m23s)… but my interest was piqued from 40m40s when he starts to actually show how “easy” configuration is. There are two quick run throughs of typical host-to-host IPsec and subnet-to-subnet IPsec tunnels.

A key message for me, which previously hadn’t been at all clear in IPsec using {free,libre,open}swan is that they refer to Left and Right as being one party and the other… but the node itself works out if it’s “left” or “right” so the *SAME CONFIG* can be used on both machines. GENIUS.

Also, when you’re looking at the config files, anything prefixed with an @ symbol is something that doesn’t need resolving to something else.

It’s well worth a check-out, and it’s inspired me to take another look at IPsec for my personal VPNs :)

I should note that towards the end, Paul tried to run a selection of demonstrations in Opportunistic Encryption (which basically is a way to enable encryption between two nodes, even if you don’t have a pre-established VPN with them). Because of issues with the conference wifi, plus the fact that what he’s demoing isn’t exactly production-grade yet, it doesn’t really work right, and much of the rest of the video (from around 1h10m) is him trying to show that working while attendees are running through the lab, and having conversations about those labs with the attendees.