Around 2-3 years ago, Slack– the company who produces Slack the IM client, started working on a meshed overlay network product, called Nebula, to manage their environment. After two years of running their production network on the back of it, they decided to open source it. I found out about Nebula via a Medium Post that was mentioned in the HangOps Slack Group. I got interested in it, asked a few questions about Nebula in the Slack, and then in the Github Issues for it, and then recently raised a Pull Request to add more complete documentation than their single (heavily) commented config file.
So, let’s go into some details on why this is interesting to me.
1. Nebula uses a flat IPv4 network to identify all hosts in the network, no matter where in the network those hosts reside.
This means that I can address any host in my (self allocated) 198.18.0.0/16 network, and I don’t need to worry about routing tables, production/DR sites, network tromboneing and so on… it’s just… Flat.
[Note: Yes, that IP address “looks” like a public subnet – but it’s actually a testing network, allocated by IANA for network testing!]
2. Nebula has host-based firewalling built into the configuration file.
This means that once I know how my network should operate (yes, I know that’s a big ask), I can restrict my servers from being able to reach my laptops, or I can stop my web server from being able to talk to my database server, except for on the database ports. Lateral movement becomes a LOT harder.
This firewalling also looks a lot like (Network) Security Groups (for the AWS and Azure familiar), so you have a default “Deny” rule, and then layer “Allow” rules on top. You also have Inbound and Outbound rules, so if you want to stop your laptops from talking anything but DNS, SSH, HTTPS and ICMP over Nebula…. well, yep, you can do that :)
3. Nebula uses a PKI environment. Where you can have multiple Certificate Authorities.
This means that I have a central server running my certificate authority (CA), with a “backup” CA, stored offline – in case of dire disaster with my primary CA, and push both CA’s to all my nodes. If I suddenly need to replace all the certificates that my current CA signed, I can do that with minimal interruption to my nodes. Good stuff.
Nebula also created their own PKI attributes to identify the roles of each device in the Nebula environment. By signing that as part of the certificate on each node too, means your CA asserts that the role that certificate holds is valid for that node in the network.
Creating a node’s certificate is a simple command:
nebula-cert sign -name jon-laptop -ip 198.18.0.1/16 -groups admin,laptop,support
This certificate has the IP address of the node baked in (it’s 198.18.0.1) and the groups it’s part of (admin, laptop and support), as well as the host name (jon-laptop). I can use any of these three items in my firewall rules I mentioned above.
4. It creates a peer-to-peer, meshed VPN.
While it’s possible to create a peer-to-peer meshed VPN with commercial products, I’ve not seen any which are as light-weight to deploy as this. Each node finds all the other nodes in the network by using a collection of “Lighthouses” (similar to Torrent Seeds or Skype Super Nodes) which tells all the connecting nodes where all the other machines in the network are located. These then initiate UDP connections to the other nodes they want to talk to. If they are struggling (because of NAT or Double NAT), then there’s a NAT Punching process (called, humourously, “punchy”) which lets you signal via the Lighthouse that you’re trying to reach another node that can’t see your connection, and asks it to also connect out to you over UDP… thereby fixing the connection issue. All good.
5. Nebula has clients for Windows, Mac and Linux. Apparently there are clients for iOS in the works (meh, I’m not on Apple… but I know some are) and I’ve heard nothing about Android as yet, but as it’s on Linux, I’m sure some enterprising soul can take a look at it (the client is written in Go).
If you want to have a look at Nebula for your own testing, I’ve created a Terraform based environment on AWS and Azure to show how you’d manage it all using Ansible Tower, which builds:
2 VPCs (AWS) and 1 VNet (Azure)
6 subnets (3 public, 3 private)
1 public AWX (the upstream project from Ansible Tower) Server
1 private Nebula Certificate Authority
2 public Web Servers (one in AWS, one in Azure)
2 private Database Servers (one in AWS, one in Azure)
2 public Bastion Servers (one in AWS, one in Azure) – that lets AWX reach into the Private sections of the network, without exposing SSH from all the hosts.
If you don’t want to provision the Azure side, just remove
load_web2_module.tf from the
Terraform directory in that repo… Job’s a good’n!
I have plans to look at a couple of variables, like Nebula’s closest rival, ZeroTier, and to look at using SaltStack instead of Ansible, to reduce the need for the extra Bastion servers.