"So many coats..." by "Scott Griggs" on Flickr

Migrating from docker-compose to Kubernetes Files

Just so you know. This is a long article to explain my wandering path through understanding Kubernetes (K8S). It’s not an article to explain to you how to use K8S with your project. I hit a lot of blockers, due to the stack I’m using and I document them all. This means there’s a lot of noise and not a whole lot of Signal.

In a previous blog post I created a docker-compose.yml file for a PHP based web application. Now that I have a working Kubernetes environment, I wanted to port the configuration files into Kubernetes.

Initially, I was pointed at Kompose, a tool for converting docker-compose files to Kubernetes YAML formatted files, and, in fact, this gives me 99% of what I need… except, the current version uses beta API version flags for some of it’s outputted files, and this isn’t valid for the version of Kubernetes I’m running. So, let’s wind things back a bit, and find out what you need to do to use kompose first and then we can tweak the output file next.

Note: I’m running all these commands as root. There’s a bit of weirdness going on because I’m using the snap of Docker and I had a few issues with running these commands as a user… While I could have tried to get to the bottom of this with sudo and watching logs, I just wanted to push on… Anyway.

Here’s our “simple” docker-compose file.

version: '3'
services:
  db:
    build:
      context: .
      dockerfile: mariadb/Dockerfile
    image: localhost:32000/db
    restart: always
    environment:
      MYSQL_ROOT_PASSWORD: a_root_pw
      MYSQL_USER: a_user
      MYSQL_PASSWORD: a_password
      MYSQL_DATABASE: a_db
    expose:
      - 3306
  nginx:
    build:
      context: .
      dockerfile: nginx/Dockerfile
    image: localhost:32000/nginx
    ports:
      - 1980:80
  phpfpm:
    build:
      context: .
      dockerfile: phpfpm/Dockerfile
    image: localhost:32000/phpfpm

This has three components – the MariaDB database, the nginx web server and the PHP-FPM CGI service that nginx consumes. The database service exposes a port (3306) to other containers, with a set of hard-coded credentials (yep, that’s not great… working on that one!), while the nginx service opens port 1980 to port 80 in the container. So far, so … well, kinda sensible :)

If we run kompose convert against this docker-compose file, we get five files created; db-deployment.yaml, nginx-deployment.yaml, phpfpm-deployment.yaml, db-service.yaml and nginx-service.yaml. If we were to run kompose up on these, we get an error message…

Well, actually, first, we get a whole load of “INFO” and “WARN” lines up while kompose builds and pushes the containers into the MicroK8S local registry (a registry is a like a package repository, for containers), which is served by localhost:32000 (hence all the image: localhost:3200/someimage lines in the docker-compose.yml file), but at the end, we get (today) this error:

INFO We are going to create Kubernetes Deployments, Services and PersistentVolumeClaims for your Dockerized application. If you need different kind of resources, use the 'kompose convert' and 'kubectl create -f' commands instead.

FATA Error while deploying application: Get http://localhost:8080/api: dial tcp 127.0.0.1:8080: connect: connection refused

Uh oh! Well, this is a known issue at least! Kubernetes used to use, by default, http on port 8080 for it’s service, but now it uses https on port 6443. Well, that’s what I thought! In this issue on the MicroK8S repo, it says that it uses a different port, and you should use microk8s.kubectl cluster-info to find the port… and yep… Kubernetes master is running at https://127.0.0.1:16443. Bah.

root@microk8s-a:~/glowing-adventure# microk8s.kubectl cluster-info
Kubernetes master is running at https://127.0.0.1:16443
Heapster is running at https://127.0.0.1:16443/api/v1/namespaces/kube-system/services/heapster/proxy
CoreDNS is running at https://127.0.0.1:16443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy
Grafana is running at https://127.0.0.1:16443/api/v1/namespaces/kube-system/services/monitoring-grafana/proxy
InfluxDB is running at https://127.0.0.1:16443/api/v1/namespaces/kube-system/services/monitoring-influxdb:http/proxy

So, we export the KUBERNETES_MASTER environment variable, which was explained in that known issue I mentioned before, and now we get a credential prompt:

Please enter Username:

Oh no, again! I don’t have credentials!! Fortunately the MicroK8S issue also tells us how to find those! You run microk8s.config and it tells you the username!

roo@microk8s-a:~/glowing-adventure# microk8s.config
apiVersion: v1
clusters:
- cluster:
    certificate-authority-data: <base64-data>
    server: https://10.0.2.15:16443
  name: microk8s-cluster
contexts:
- context:
    cluster: microk8s-cluster
    user: admin
  name: microk8s
current-context: microk8s
kind: Config
preferences: {}
users:
- name: admin
  user:
    username: admin
    password: QXdUVmN3c3AvWlJ3bnRmZVJmdFhpNkJ3cDdkR3dGaVdxREhuWWo0MmUvTT0K

So, our username is “admin” and our password is … well, in this case a string starting QX and ending 0K but yours will be different!

We run kompose up again, and put in the credentials… ARGH!

FATA Error while deploying application: Get https://127.0.0.1:16443/api: x509: certificate signed by unknown authority

Well, now, that’s no good! Fortunately, a quick Google later, and up pops this Stack Overflow suggestion (mildly amended for my circumstances):

openssl s_client -showcerts -connect 127.0.0.1:16443 < /dev/null | sed -ne '/-BEGIN CERTIFICATE-/,/-END CERTIFICATE-/p' | sudo tee /usr/local/share/ca-certificates/k8s.crt
update-ca-certificates
systemctl restart snap.docker.dockerd

Right then. Let’s run that kompose up statement again…

INFO We are going to create Kubernetes Deployments, Services and PersistentVolumeClaims for your Dockerized application. If you need different kind of resources, use the 'kompose convert' and 'kubectl create -f' commands instead.

Please enter Username: 
Please enter Password: 
INFO Deploying application in "default" namespace
INFO Successfully created Service: nginx
FATA Error while deploying application: the server could not find the requested resource

Bah! What resource do I need? Well, actually, there’s a bug in 1.20.0 of Kompose, and it should be fixed in 1.21.0. The “resource” it’s talking about is, I think, that one of the APIs refuses to process the converted YAML files. As a result, the “resource” is the service that won’t start. So, instead, let’s convert the file into the output YAML files, and then take a peak at what’s going wrong.

root@microk8s-a:~/glowing-adventure# kompose convert
INFO Kubernetes file "nginx-service.yaml" created
INFO Kubernetes file "db-deployment.yaml" created
INFO Kubernetes file "nginx-deployment.yaml" created
INFO Kubernetes file "phpfpm-deployment.yaml" created

So far, so good! Now let’s run kubectl apply with each of these files.

root@microk8s-a:~/glowing-adventure# kubectl apply -f nginx-service.yaml
Warning: kubectl apply should be used on resource created by either kubectl create --save-config or kubectl apply
service/nginx configured
root@microk8s-a:~# kubectl apply -f nginx-deployment.yaml
error: unable to recognize "nginx-deployment.yaml": no matches for kind "Deployment" in version "extensions/v1beta1"

Apparently the service files are all OK, the problem is in the deployment files. Hmm OK, let’s have a look at what could be wrong. Here’s the output file:

root@microk8s-a:~/glowing-adventure# cat nginx-deployment.yaml
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  annotations:
    kompose.cmd: kompose convert
    kompose.version: 1.20.0 (f3d54d784)
  creationTimestamp: null
  labels:
    io.kompose.service: nginx
  name: nginx
spec:
  replicas: 1
  strategy: {}
  template:
    metadata:
      annotations:
        kompose.cmd: kompose convert
        kompose.version: 1.20.0 (f3d54d784)
      creationTimestamp: null
      labels:
        io.kompose.service: nginx
    spec:
      containers:
      - image: localhost:32000/nginx
        name: nginx
        ports:
        - containerPort: 80
        resources: {}
      restartPolicy: Always
status: {}

Well, the extensions/v1beta1 API version doesn’t seem to support “Deployment” options any more, so let’s edit it to change that to what the official documentation example shows today. We need to switch to using the apiVersion: apps/v1 value. Let’s see what happens when we make that change!

root@microk8s-a:~/glowing-adventure# kubectl apply -f nginx-deployment.yaml
error: error validating "nginx-deployment.yaml": error validating data: ValidationError(Deployment.spec): missing required field "selector" in io.k8s.api.apps.v1.DeploymentSpec; if you choose to ignore these errors, turn validation off with --validate=false

Hmm this seems to be a fairly critical issue. A selector basically tells the orchestration engine which images we want to be deployed. Let’s go back to the official example. So, we need to add the “selector” value in the “spec” block, at the same level as “template”, and it needs to match the labels we’ve specified. It also looks like we don’t need most of the metadata that kompose has given us. So, let’s adjust the deployment to look a bit more like that example.

root@microk8s-a:~/glowing-adventure# cat nginx-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: nginx
  name: nginx
spec:
  replicas: 1
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - image: localhost:32000/nginx
        name: nginx
        ports:
        - containerPort: 80
        resources: {}
      restartPolicy: Always

Fab. And what happens when we run it?

root@microk8s-a:~/glowing-adventure# kubectl apply -f nginx-deployment.yaml
deployment.apps/nginx created

Woohoo! Let’s apply all of these now.

root@microk8s-a:~/glowing-adventure# for i in db-deployment.yaml nginx-deployment.yaml nginx-service.yaml phpfpm-deployment.yaml; do kubectl apply -f $i ; done
deployment.apps/db created
deployment.apps/nginx unchanged
service/nginx unchanged
deployment.apps/phpfpm created

Oh, hang on a second, that service (service/nginx) is unchanged, but we changed the label from io.kompose.service: nginx to app: nginx, so we need to fix that. Let’s open it up and edit it!

apiVersion: v1
kind: Service
metadata:
  annotations:
    kompose.cmd: kompose convert
    kompose.version: 1.20.0 (f3d54d784)
  creationTimestamp: null
  labels:
    io.kompose.service: nginx
  name: nginx
spec:
  ports:
  - name: "1980"
    port: 1980
    targetPort: 80
  selector:
    io.kompose.service: nginx
status:
  loadBalancer: {}

Ah, so this has the “annotations” field too, in the metadata, and, as suspected, it’s got the io.kompose.service label as the selector. Hmm OK, let’s fix that.

root@microk8s-a:~/glowing-adventure# cat nginx-service.yaml
apiVersion: v1
kind: Service
metadata:
  labels:
    app: nginx
  name: nginx
spec:
  ports:
  - name: "1980"
    port: 1980
    targetPort: 80
  selector:
    app: nginx
status:
  loadBalancer: {}

Much better. And let’s apply it…

root@microk8s-a:~/glowing-adventure# kubectl apply -f nginx-service.yaml
service/nginx configured

Fab! So, let’s review the state of the deployments, the services, the pods and the replication sets.

root@microk8s-a:~/glowing-adventure# kubectl get deploy
NAME     READY   UP-TO-DATE   AVAILABLE   AGE
db       1/1     1            1           8m54s
nginx    0/1     1            0           8m53s
phpfpm   1/1     1            1           8m48s

Hmm. That doesn’t look right.

root@microk8s-a:~/glowing-adventure# kubectl get pod
NAME                      READY   STATUS             RESTARTS   AGE
db-f78f9f69b-grqfz        1/1     Running            0          9m9s
nginx-7774fcb84c-cxk4v    0/1     CrashLoopBackOff   6          9m8s
phpfpm-66945b7767-vb8km   1/1     Running            0          9m3s
root@microk8s-a:~# kubectl get rs
NAME                DESIRED   CURRENT   READY   AGE
db-f78f9f69b        1         1         1       9m18s
nginx-7774fcb84c    1         1         0       9m17s
phpfpm-66945b7767   1         1         1       9m12s

Yep. What does “CrashLoopBackOff” even mean?! Let’s check the logs. We need to ask the pod itself, not the deployment, so let’s use the kubectl logs command to ask.

root@microk8s-a:~/glowing-adventure# kubectl logs nginx-7774fcb84c-cxk4v
2020/01/17 08:08:50 [emerg] 1#1: host not found in upstream "phpfpm" in /etc/nginx/conf.d/default.conf:10
nginx: [emerg] host not found in upstream "phpfpm" in /etc/nginx/conf.d/default.conf:10

Hmm. That’s not good. We were using the fact that Docker just named everything for us in the docker-compose file, but now in Kubernetes, we need to do something different. At this point I ran out of ideas. I asked on the McrTech slack for advice. I was asked to run this command, and would you look at that, there’s nothing for nginx to connect to.

root@microk8s-a:~/glowing-adventure# kubectl get service
NAME         TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)    AGE
kubernetes   ClusterIP   10.152.183.1    <none>        443/TCP    24h
nginx        ClusterIP   10.152.183.62   <none>        1980/TCP   9m1s

It turns out that I need to create a service for each of the deployments. So, now I have a separate service for each one. I copied the nginx-service.yaml file into db-service.yaml and phpfpm-service.yaml, edited the files and now… tada!

root@microk8s-a:~/glowing-adventure# kubectl get service
NAME         TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)    AGE
db           ClusterIP   10.152.183.61   <none>        3306/TCP   5m37s
kubernetes   ClusterIP   10.152.183.1    <none>        443/TCP    30h
nginx        ClusterIP   10.152.183.62   <none>        1980/TCP   5h54m
phpfpm       ClusterIP   10.152.183.69   <none>        9000/TCP   5m41s

But wait… How do I actually address nginx now? Huh. No external-ip (not even “pending”, which is what I ended up with), no ports to talk to. Uh oh. Now I need to understand how to hook this service up to the public IP of this node. Ahh, see up there it says “ClusterIP”? That means “this service is only available INSIDE the cluster”. If I change this to “NodePort” or “LoadBalancer”, it’ll attach that port to the external interface.

What’s the difference between “NodePort” and “LoadBalancer”? Well, according to this page, if you are using a managed Public Cloud service that supports an external load balancer, then putting this to “LoadBalancer” should attach your “NodePort” to the provider’s Load Balancer automatically. Otherwise, you need to define the “NodePort” value in your config (which must be a value between 30000 and 32767, although that is configurable for the node). Once you’ve done that, you can hook your load balancer up to that port, for example Client -> Load Balancer IP (TCP/80) -> K8S Cluster IP (e.g. TCP/31234)

So, how does this actually look. I’m going to use the “LoadBalancer” option, because if I ever deploy this to “live”, I want it to integrate with the load balancer, but for now, I can cope with addressing a “high port”. Right, well, let me open back up that nginx-service.yaml, and make the changes.

root@microk8s-a:~/glowing-adventure# cat nginx-service.yaml
apiVersion: v1
kind: Service
metadata:
  labels:
    app: nginx
  name: nginx
spec:
  type: LoadBalancer
  ports:
  - name: nginx
    nodePort: 30000
    port: 1980
    targetPort: 80
  selector:
    app: nginx
status:
  loadBalancer: {}

The key parts here are the lines type: LoadBalancer and nodePort: 30000 under spec: and ports: respectively. Note that I can use, at this point type: LoadBalancer and type: NodePort interchangably, but, as I said, if you were using this in something like AWS or Azure, you might want to do it differently!

So, now I can curl http://192.0.2.100:30000 (where 192.0.2.100 is the address of my “bridged interface” of K8S environment) and get a response from my PHP application, behind nginx, and I know (from poking at it a bit) that it works with my Database.

OK, one last thing. I don’t really want lots of little files which have got config items in. I quite liked the docker-compose file as it was, because it had all the services in as one block, and I could run “docker-compose up”, but the kompose script split it out into lots of pieces. In Kubernetes, if the YAML file it loads has got a divider in it (a line like this: ---) then it stops parsing it at that point, and starts reading the file after that as a new file. Like this I could have the following layout:

apiVersion: apps/v1
kind: Deployment
more: stuff
---
apiVersion: v1
kind: Service
more: stuff
---
apiVersion: apps/v1
kind: Deployment
more: stuff
---
apiVersion: v1
kind: Service
more: stuff

But, thinking about it, I quite like having each piece logically together, so I really want db.yaml, nginx.yaml and phpfpm.yaml, where each of those files contains both the deployment and the service. So, let’s do that. I’ll do one file, so it makes more sense, and then show you the output.

root@microk8s-a:~/glowing-adventure# mkdir -p k8s
root@microk8s-a:~/glowing-adventure# mv db-deployment.yaml k8s/db.yaml
root@microk8s-a:~/glowing-adventure# echo "---" >> k8s/db.yaml
root@microk8s-a:~/glowing-adventure# cat db-service.yaml >> k8s/db.yaml
root@microk8s-a:~/glowing-adventure# rm db-service.yaml
root@microk8s-a:~/glowing-adventure# cat k8s/db.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: db
  name: db
spec:
  replicas: 1
  selector:
    matchLabels:
      app: db
  template:
    metadata:
      labels:
        app: db
    spec:
      containers:
      - env:
        - name: MYSQL_DATABASE
          value: a_db
        - name: MYSQL_PASSWORD
          value: a_password
        - name: MYSQL_ROOT_PASSWORD
          value: a_root_pw
        - name: MYSQL_USER
          value: a_user
        image: localhost:32000/db
        name: db
        resources: {}
      restartPolicy: Always
---
apiVersion: v1
kind: Service
metadata:
  labels:
    app: db
  name: db
spec:
  ports:
  - name: mariadb
    port: 3306
    targetPort: 3306
  selector:
    app: db
status:
  loadBalancer: {}

So, now, if I do kubectl apply -f k8s/db.yaml I’ll get this output:

root@microk8s-a:~/glowing-adventure# kubectl apply -f k8s/db.yaml
deployment.apps/db unchanged
service/db unchanged

You can see the final files in the git repo for this set of tests.

Next episode, I’ll start looking at making my application scale (as that’s the thing that Kubernetes is known for) and having more than one K8S node to house my K8S pods!

Featured image is “So many coats…” by “Scott Griggs” on Flickr and is released under a CC-BY-ND license.

"Shipping Containers" by "asgw" on Flickr

Creating my first Docker containerized LEMP (Linux, nginx, MariaDB, PHP) application

Want to see what I built without reading the why’s and wherefore’s? The git repository with all the docker-compose goodness is here!

Late edit 2020-01-16: The fantastic Jerry Steel, my co-host on The Admin Admin podcast looked at what I wrote, and made a few suggestions. I’ve updated the code in the git repo, and I’ll try to annotate below when I’ve changed something. If I miss it, it’s right in the Git repo!

One of the challenges I set myself this Christmas was to learn enough about Docker to put an arbitrary PHP application, that I would previously have misused Vagrant to contain.

Just before I started down this rabbit hole, I spoke to my Aunt about some family tree research my father had left behind after he died, and how I wished I could easily share the old tree with her (I organised getting her a Chromebook a couple of years ago, after fighting with doing remote support for years on Linux and Windows laptops). In the end, I found a web application for genealogical research called HuMo-gen, that is a perfect match for both projects I wanted to look at.

HuMo-gen was first created in 1999, with a PHP version being released in 2005. It used MySQL or MariaDB as the Database engine. I was reasonably confident that I could have created a Vagrantfile to deliver this on my home server, but I wanted to try something new. I wanted to use the “standard” building blocks of Docker and Docker-Compose, and some common containers to make my way around learning Docker.

I started by looking for some resources on how to build a Docker container. Much of the guidance I’d found was to use Docker-Compose, as this allows you to stand several components up at the same time!

In contrast to how Vagrant works (which is basically a CLI wrapper to many virtual machine services), Docker isolates resources for a single process that runs on a machine. Where in Vagrant, you might run several processes on one machine (perhaps, in this instance, nginx, PHP-FPM and MariaDB), with Docker, you’re encouraged to run each “service” as their own containers, and link them together with an overlay network. It’s possible to also do the same with Vagrant, but you’ll end up with an awful lot of VM overhead to separate out each piece.

So, I first needed to select my services. My initial line-up was:

  • MariaDB
  • PHP-FPM
  • Apache’s httpd2 (replaced by nginx)

I was able to find official Docker images for PHP, MariaDB and httpd, but after extensive tweaking, I couldn’t make the httpd image talk the way I wanted it to with the PHP image. Bowing to what now seems to be conventional wisdom, I swapped out the httpd service for nginx.

One of the stumbling blocks for me, particularly early on, was how to build several different Dockerfiles (these are basically the instructions for the container you’re constructing). Here is the basic outline of how to do this:

version: '3'
services:
  yourservice:
    build:
      context: .
      dockerfile: relative/path/to/Dockerfile

In this docker-compose.yml file, I tell it that to create the yourservice service, it needs to build the docker container, using the file in ./relative/path/to/Dockerfile. This file in turn contains an instruction to import an image.

Each service stacks on top of each other in that docker-compose.yml file, like this:

version: '3'
services:
  service1:
    build:
      context: .
      dockerfile: service1/Dockerfile
    image: localhost:32000/service1
  service2:
    build:
      context: .
      dockerfile: service2/Dockerfile
    image: localhost:32000/service2

Late edit 2020-01-16: This previously listed Dockerfile/service1, however, much of the documentation suggested that Docker gets quite opinionated about the file being called Dockerfile. While docker-compose can work around this, it’s better to stick to tradition :) The docker-compose.yml files below have also been adjusted accordingly. I’ve also added an image: somehost:1234/image_name line to help with tagging the images for later use. It’s not critical to what’s going on here, but I found it useful with some later projects.

To allow containers to see ports between themselves, you add the expose: command in your docker-compose.yml, and to allow that port to be visible from the “outside” (i.e. to the host and upwards), use the ports: command listing the “host” port (the one on the host OS), then a colon and then the “target” port (the one in the container), like these:

version: '3'
services:
  service1:
    build:
      context: .
      dockerfile: service1/Dockerfile
    image: localhost:32000/service1
    expose:
    - 12345
  service2:
    build:
      context: .
      dockerfile: service2/Dockerfile
    image: localhost:32000/service2
    ports:
    - 8000:80

Now, let’s take a quick look into the Dockerfiles. Each “statement” in a Dockerfile adds a new “layer” to the image. For local operations, this probably isn’t a problem, but when you’re storing these images on a hosted provider, you want to keep these images as small as possible.

I built a Database Dockerfile, which is about as small as you can make it!

FROM mariadb:10.4.10

Yep, one line. How cool is that? In the docker-compose.yml file, I invoke this, like this:

version: '3'
services:
  db:
    build:
      context: .
      dockerfile: mariadb/Dockerfile
    image: localhost:32000/db
    restart: always
    environment:
      MYSQL_ROOT_PASSWORD: a_root_pw
      MYSQL_USER: a_user
      MYSQL_PASSWORD: a_password
      MYSQL_DATABASE: a_db
    expose:
      - 3306

OK, so this one is a bit more complex! I wanted it to build my Dockerfile, which is “mariadb/Dockerfile“. I wanted it to restart the container whenever it failed (which hopefully isn’t that often!), and I wanted to inject some specific environment variables into the file – the root and user passwords, a user account and a database name. Initially I was having some issues where it wasn’t building the database with these credentials, but I think that’s because I wasn’t “building” the new database, I was just using it. I also expose the MariaDB (MySQL) port, 3306 to the other containers in the docker-compose.yml file.

Let’s take a look at the next part! PHP-FPM. Here’s the Dockerfile:

FROM php:7.4-fpm
RUN docker-php-ext-install pdo pdo_mysql
ADD --chown=www-data:www-data public /var/www/html

There’s a bit more to this, but not loads. We build our image from a named version of PHP, and install two extensions to PHP, pdo and pdo_mysql. Lastly, we copy the content of the “public” directory into the /var/www/html path, and make sure it “belongs” to the right user (www-data).

I’d previously tried to do a lot more complicated things with this Dockerfile, but it wasn’t working, so instead I slimmed it right down to just this, and the docker-compose.yml is a lot simpler too.

  phpfpm:
    build:
      context: .
      dockerfile: phpfpm/Dockerfile
    image: localhost:32000/phpfpm

See! Loads simpler! Now we need the complicated bit! :) This is the Dockerfile for nginx.

FROM nginx:1.17.7
COPY nginx/default.conf /etc/nginx/conf.d/default.conf

COPY public /var/www/html

Weirdly, even though I’ve added version numbers for MariaDB and PHP, I’ve not done the same for nginx, perhaps I should! Late edit 2020-01-16: I’ve put a version number on there now, previously where it said nginx:1.17.7 it actually said nginx:latest.

I’ve created the configuration block for nginx in a single “RUN” line. Late edit 2020-01-16: This Dockerfile now doesn’t have a giant echo 'stuff' > file block either, following Jerry’s advice, and I’m using COPY instead of ADD on his advice too. I’ll show that config file below. There’s a couple of high points for me here!

server {
  index index.php index.html;
  server_name _;
  error_log /proc/self/fd/2;
  access_log /proc/self/fd/1;
  root /var/www/html;
  location ~ \.php$ {
    try_files $uri =404;
    fastcgi_split_path_info ^(.+\.php)(/.+)$;
    fastcgi_pass phpfpm:9000;
    fastcgi_index index.php;
    include fastcgi_params;
    fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
    fastcgi_param PATH_INFO $fastcgi_path_info;
  }
}
  • server_name _; means “use this block for all unnamed requests”.
  • access_log /proc/self/fd/1; and error_log /proc/self/fd/2;These are links to the “stdout” and “stderr” file descriptors (or pointers to other parts of the filesystem), and basically means that when you do docker-compose logs, you’ll see the HTTP logs for the server! These two files are guaranteed to be there, while /dev/stderr isn’t!

Because nginx is “just” caching the web content, and I know the content doesn’t need to be written to from nginx, I knew I didn’t need to do the chown action, like I did with the PHP-FPM block.

Lastly, I need to configure the docker-compose.yml file for nginx:

  nginx:
    build:
      context: .
      dockerfile: Dockerfile/nginx
    image: localhost:32000/nginx
    ports:
      - 127.0.0.1:1980:80

I’ve gone for a slightly unusual ports configuration when I deployed this to my web server… you see, I already have the HTTP port (TCP/80) configured for use on my home server – for running the rest of my web services. During development, on my home machine, the ports line instead showed “1980:80” because I was running this on Instead, I’m running this application bound to “localhost” (127.0.0.1) on a different port number (1980 selected because it could, conceivably, be a birthday of someone on this system), and then in my local web server configuration, I’m proxying connections to this service, with HTTPS encryption as well. That’s all outside the scope of this article (as I probably should be using something like Traefik, anyway) but it shows you how you could bind to a separate port too.

Anyway, that was my Docker journey over Christmas, and I look forward to using it more, going forward!

Featured image is “Shipping Containers” by “asgw” on Flickr and is released under a CC-BY license.

“code crunching” by “Ruben Molina” on Flickr

Getting Started with Terraform on AWS

I recently wrote a blog post about Getting Started with Terraform on Azure. You might have read it (I know I did!).

Having got a VM stood up in Azure, I wanted to build a VM in AWS, after all, it’s more-or-less the same steps. Note, this is a work-in-progress, and shouldn’t be considered “Final” – this is just something to use as *your* starting block.

What do you need?

You need an AWS account for this. If you’ve not got one, signing up for one is easy, but bear in mind that while there are free resource on AWS (only for the first year!), it’s also quite easy to suddenly enable a load of features that cost you money.

Best practice suggests (or rather, INSISTS) you shouldn’t use your “root” account for AWS. It’s literally just there to let you define the rest of your admin accounts. Turn on MFA (Multi-Factor Authentication) on that account, give it an exceedingly complex password, write that on a sheet of paper, and lock it in a box. You should NEVER use it!

Create your admin account, log in to that account. Turn on MFA on *that* account too. Then, create an “Access Token” for your account. This is in IAM (Identity and Access Management). These are what we’ll use to let Terraform perform actions in AWS, without you needing to actually “log in”.

On my machine, I’ve put the credentials for this in /home/<MYUSER>/.aws/credentials and it looks like this:

[default]
aws_access_key_id = ABC123DEF456
aws_secret_access_key = AaBbCcDd1234EeFf56

This file should be chmod 600 and make sure it’s only your account that can access this file. With this token, Terraform can perform *ANY ACTION* as you, including anything that charges you money, or creating servers that can mine a “cryptocurrency” for someone malicious.

I’m using Windows Subsystem for Linux (WSL). I’m using the Ubuntu 18.04 distribution obtained from the Store. This post won’t explain how to get *that*. Also, you might want to run Terraform on Mac, in Windows or on Linux natively… so, yehr.

Next, we need to actually install Terraform. Excuse the long, unwrapped code block, but it gets what you need quickly (assuming the terraform webpage doesn’t change any time soon!)

mkdir -p ~/bin
cd ~/bin
sudo apt update && sudo apt install unzip
curl -sLO $(curl https://www.terraform.io/downloads.html | grep "linux_amd64.zip" | cut -d\" -f 2) && unzip terraform*.zip && rm terraform*.zip && chmod 755 terraform

Starting coding your infrastructure

Before you can build your first virtual machine on AWS, you need to stand up the supporting infrastructure. These are:

  • An SSH Keypair (no password logins here!)
  • A VPC (“Virtual Private Cloud”, roughly the same as a VNet on Azure, or somewhat like a L3 switch in the Physical Realm).
  • An Internet Gateway (if your VPC isn’t classed as “the default one”)
  • A Subnet.
  • A Security Group.

Once we’ve got these, we can build our Virtual Machine on EC2 (“Elastic Cloud Compute”), and associate a “Public IP” to it.

To quote my previous post:

One quirk with Terraform, versus other tools like Ansible, is that when you run one of the terraform commands (like terraform init, terraform plan or terraform apply), it reads the entire content of any file suffixed “tf” in that directory, so if you don’t want a file to be loaded, you need to either move it out of the directory, comment it out, or rename it so it doesn’t end .tf. By convention, you normally have three “standard” files in a terraform directory – main.tf, variables.tf and output.tf, but logically speaking, you could have everything in a single file, or each instruction in it’s own file.

Getting Started with Terraform on Azure – Building the file structure

For the sake of editing and annotating the files for this post, these code blocks are all separated, but on my machine, they’re all currently one big file called “main.tf“.

In that file, I start by telling it that I’m working with the Terraform AWS provider, and that it should target my nearest region.

If you want to risk financial ruin, you can put things like your access tokens in here, but I really wouldn’t chance this!

Next, we create our network infrastructure – VPC, Internet Gateway and Subnet. We also change the routing table.

I suspect, if I’d created the VPC as “The Default” VPC, then I wouldn’t have needed to amend the routing table, nor added an Internet Gateway. To help us make the routing table change, there’s a “data” block in this section of code. A data block is an instruction to Terraform to go and ask a resource for *something*, in this case, we need AWS to tell Terraform what the routing table is that it created for the VPC. Once we have that we can ask for the routing table change.

AWS doesn’t actually give “proper” names to any of it’s assets. To provide something with a “real” name, you need to tag that thing with the “Name” tag. These can be practically anything, but I’ve given semi-sensible names to everything. You might want to name everything “main” (like I nearly did)!

We’re getting close to being able to create the VM now. First of all, we’ll create the Security Groups. I want to separate out my “Allow Egress Traffic” rule from my “Inbound SSH” rule. This means that I can clearly see what hosts allow inbound SSH access. Like with my Azure post, I’m using a “data provider” to get my public IP address, but in a normal “live” network, you’d specify a collection of valid source address ranges.

Last steps before we create the Virtual Machine. We need to upload our SSH key, and we need to find the “AMI” (AWS Machine ID) of the image we’ll be using. To create the key, in this directory, along side the .tf files, I’ve put my SSH public key (called id_rsa.pub), and we load that key when we create the “my_key” resource. To find the AMI, we need to make another data call, this time asking the AMI index to find the VM with the name containing ubuntu-bionic-18.04 and some other stuff. AMIs are region specific, so the image I’m using in eu-west-2 will not be the same AMI in eu-west-1 or us-east-1 and so on. This filtering means that, as long as the image exists in that region, we can use “the right one”. So let’s take a look at this file.

So, now we have everything we need to create our VM. Let’s do that!

In here, we specify a “user_data” file to upload, in this case, the contents of a file – CloudDev.sh, but you can load anything you want in here. My CloudDev.sh is shown below, so you can see what I’m doing with this file :)

So, having created all this lot, you need to execute the terraform workload. Initially you do terraform init. This downloads all the provisioners and puts them into the same tree as these .tf files are stored in. It also resets the state of the terraform discovered or created datastore.

Next, you do terraform plan -out tfout. Technically, the tfout part can be any filename, but having something like tfout marks it as clearly part of Terraform. This creates the tfout file with the current state, and whatever needs to change in the Terraform state file on it’s next run. Typically, if you don’t use a tfout file within about 20 minutes, it’s probably worth removing it.

Finally, once you’ve run your plan stage, now you need to apply it. In this case you execute terraform apply tfout. This tfout is the same filename you specified in terraform plan. If you don’t include -out tfout on your plan (or even run a plan!) and tfout in your apply, then you can skip the terraform plan stage entirely.

Once you’re done with your environment, use terraform destroy to shut it all down… and enjoy :)

Featured image is “code crunching” by “Ruben Molina” on Flickr and is released under a CC-ND license.

"Key mess" by "Alper Çuğun" on Flickr

Making Ansible Keys more useful in complex and nested data structures

One of the things I miss about Jekyll when I’m working with Ansible is the ability to fragment my data across multiple files, but still have it as a structured *whole* at the end.

For example, given the following directory structure in Jekyll:

+ _data
|
+---+ members
|   +--- member1.yml
|   +--- member2.yml
|
+---+ groups
    +--- group1.yml
    +--- group2.yml

The content of member1.yml and member2.yml will be rendered into site.data.members.member1 and site.data.members.member2 and likewise, group1 and group2 are loaded into their respective variables.

This kind of structure isn’t possible in Ansible, because all the data files are compressed into one vars value that we can read. To work around this on a few different projects I’ve worked on, I’ve ended up doing the following:

- set_fact:
    my_members: |-
      [
        {%- for var in vars | dict2items -%}
          {%- if var.key | regex_search(my_regex) is not none -%}
            "{{ var.key | regex_replace(my_regex, '') }}": 
              {%- if var.value | string %}"{% endif -%}
              {{ var.value }}
              {%- if var.value | string %}"{% endif %},
          {%- endif -%}
        {%- endfor -%}
      ]
  vars:
    my_regex: '^member_'

So, what this does is to step over all the variables defined (for example, in host_vars\*, group_vars\*, from the gathered facts and from the role you’re in – following Ansible’s loading precedence), and then checks to see whether the key of that variable name (e.g. “member_i_am_a_member” or “member_1”) matches the regular expression (click here for more examples). If it does, the key (minus the regular expression matching piece [using regex_replace]) is added to a dictionary, and the value attached. If the value is actually a string, then it wraps it in quotes.

So, while this doesn’t give me my expressive data structure that Jekyll does (no site.data.members.member1.somevalue for me), I do at least get to have my_members.member1.somevalue if I put the right headers in! :)

I’ll leave extending this model for doing other sorts of building variables out (for example, something like if var.value['variable_place'] | default('') == 'my_members.member' + current_position) to the reader to work out how they could use something like this in their workflows!

Featured image is “Key mess” by “Alper ÇuÄźun” on Flickr and is released under a CC-BY license.

"Seca" by "Olearys" on Flickr

Getting Started with Terraform on Azure

I’m strongly in the “Ansible is my tool, what needs fixing” camp, when it comes to Infrastructure as Code (IaC) but, I know there are other tools out there which are equally as good. I’ve been strongly advised to take a look at Terraform from HashiCorp. I’m most familiar at the moment with Azure, so this is going to be based around resources available on Azure.


Late edit: I want to credit my colleague, Pete, for his help getting started with this. While many of the code samples have been changed from what he provided me with, if it hadn’t been for these code samples in the first place, I’d never have got started!

Late edit 2: This post was initially based on Terraform 0.11, and I was prompted by another colleague, Jon, that the available documentation still follows the 0.11 layout. 0.12 was released in May, and changes how variables are reused in the code. This post now *should* follow the 0.12 conventions, but if you spot something where it doesn’t, check out this post from the Terraform team.


As with most things, there’s a learning curve, and I struggled to find a “simple” getting started guide for Terraform. I’m sure this is a failing on my part, but I thought it wouldn’t hurt to put something out there for others to pick up and see if it helps someone else (and, if that “someone else” is you, please let me know in the comments!)

Pre-requisites

You need an Azure account for this. This part is very far outside my spectrum of influence, but I’m assuming you’ve got one. If not, look at something like Digital Ocean, AWS or VMWare :) For my “controller”, I’m using Windows Subsystem for Linux (WSL), and wrote the following notes about getting my pre-requisites.

Building the file structure

One quirk with Terraform, versus other tools like Ansible, is that when you run one of the terraform commands (like terraform init, terraform plan or terraform apply), it reads the entire content of any file suffixed “tf” in that directory, so if you don’t want a file to be loaded, you need to either move it out of the directory, comment it out, or rename it so it doesn’t end .tf. By convention, you normally have three “standard” files in a terraform directory – main.tf, variables.tf and output.tf, but logically speaking, you could have everything in a single file, or each instruction in it’s own file. Because this is a relatively simple script, I’ll use this standard layout.

The actions I’ll be performing are the “standard” steps you’d perform in Azure to build a single Infrastructure as a Service (IAAS) server service:

  • Create your Resource Group (RG)
  • Create a Virtual Network (VNET)
  • Create a Subnet
  • Create a Security Group (SG) and rules
  • Create a Public IP address (PubIP) with a DNS name associated to that IP.
  • Create a Network Interface (NIC)
  • Create a Virtual Machine (VM), supplying a username and password, the size of disks and VM instance, and any post-provisioning instructions (yep, I’m using Ansible for that :) ).

I’m using Visual Studio Code, but almost any IDE will have integrations for Terraform. The main thing I’m using it for is auto-completion of resource, data and output types, also the fact that control+clicking resource types opens your browser to the documentation page on terraform.io.

So, creating my main.tf, I start by telling it that I’m working with the Terraform AzureRM Provider (the bit of code that can talk Azure API).

This simple statement is enough to get Terraform to load the AzureRM, but it still doesn’t tell Terraform how to get access to the Azure account. Use az login from a WSL shell session to authenticate.

Next, we create our basic resource, vnet and subnet resources.

But wait, I hear you cry, what are those var.something bits in there? I mentioned before that in the “standard” set of files is a “variables.tf” file. In here, you specify values for later consumption. I have recorded variables for the resource group name and location, as well as the VNet name and subnet name. Let’s add those into variables.tf.

When you’ve specified a resource, you can capture any of the results from that resource to use later – either in the main.tf or in the output.tf files. By creating the resource group (called “rg” here, but you can call it anything from “demo” to “myfirstresourcegroup”), we can consume the name or location with azurerm_resource_group.rg.name and azurerm_resource_group.rg.location, and so on. In the above code, we use the VNet name in the subnet, and so on.

After the subnet is created, we can start adding the VM specific parts – a security group (with rules), a public IP (with DNS name) and a network interface. I’ll create the VM itself later. So, let’s do this.

BUT WAIT, what’s that ${trimspace(data.http.icanhazip.body)}/32 bit there?? Any resources we want to load from the terraform state, but that we’ve not directly defined ourselves needs to come from somewhere. These items are classed as “data” – that is, we want to know what their values are, but we aren’t *changing* the service to get it. You can also use this to import other resource items, perhaps a virtual network that is created by another team, or perhaps your account doesn’t have the rights to create a resource group. I’ll include a commented out data block in the overall main.tf file for review that specifies a VNet if you want to see how that works.

In this case, I want to put the public IP address I’m coming from into the NSG Rule, so I can get access to the VM, without opening it up to *everyone*. I’m not that sure that my IP address won’t change between one run and the next, so I’m using the icanhazip.com service to determine my IP address. But I’ve not defined how to get that resource yet. Let’s add it to the main.tf for now.

So, we’re now ready to create our virtual machine. It’s quite a long block, but I’ll pull certain elements apart once I’ve pasted this block in.

So, this is broken into four main pieces.

  • Virtual Machine Details. This part is relatively sensible. Name RG, location, NIC, Size and what happens to the disks when the machine powers on. OK.
name                             = "iaas-vm"
location                         = azurerm_resource_group.rg.location
resource_group_name              = azurerm_resource_group.rg.name
network_interface_ids            = [azurerm_network_interface.iaasnic.id]
vm_size                          = "Standard_DS1_v2"
delete_os_disk_on_termination    = true
delete_data_disks_on_termination = true
  • Disk details.
storage_image_reference {
  publisher = "Canonical"
  offer     = "UbuntuServer"
  sku       = "18.04-LTS"
  version   = "latest"
}
storage_os_disk {
  name              = "iaas-os-disk"
  caching           = "ReadWrite"
  create_option     = "FromImage"
  managed_disk_type = "Standard_LRS"
}
  • OS basics: VM Hostname, username of the first user, and it’s password. Note, if you want to use an SSH key, this must be stored for Terraform to use without passphrase. If you mention an SSH key here, as well as a password, this can cause all sorts of connection issues, so pick one or the other.
os_profile {
  computer_name  = "iaas"
  admin_username = var.ssh_user
  admin_password = var.ssh_password
}
os_profile_linux_config {
  disable_password_authentication = false
}
  • And lastly, provisioning. I want to use Ansible for my provisioning. In this example, I have a basic playbook stored locally on my Terraform host, which I transfer to the VM, install Ansible via pip, and then execute ansible-playbook against the file I uploaded. This could just as easily be a git repo to clone or a shell script to copy in, but this is a “simple” example.
provisioner "remote-exec" {
  inline = ["mkdir /tmp/ansible"]

  connection {
    type     = "ssh"
    host     = azurerm_public_ip.iaaspubip.fqdn
    user     = var.ssh_user
    password = var.ssh_password
  }
}

provisioner "file" {
  source = "ansible/"
  destination = "/tmp/ansible"

  connection {
    type     = "ssh"
    host     = azurerm_public_ip.iaaspubip.fqdn
    user     = var.ssh_user
    password = var.ssh_password
  }
}

provisioner "remote-exec" {
  inline = [
    "sudo apt update > /tmp/apt_update || cat /tmp/apt_update",
    "sudo apt install -y python3-pip > /tmp/apt_install_python3_pip || cat /tmp/apt_install_python3_pip",
    "sudo -H pip3 install ansible > /tmp/pip_install_ansible || cat /tmp/pip_install_ansible",
    "ansible-playbook /tmp/ansible/main.yml"
  ]

  connection {
    type     = "ssh"
    host     = azurerm_public_ip.iaaspubip.fqdn
    user     = var.ssh_user
    password = var.ssh_password
  }
}

This part of code is done in three parts – create upload path, copy the files in, and then execute it. If you don’t create the upload path, it’ll upload just the first file it comes to into the path specified.

Each remote-exec and file provisioner statement must include the hostname, username and either the password, or SSH private key. In this example, I provide just the password.

So, having created all this lot, you need to execute the terraform workload. Initially you do terraform init. This downloads all the provisioners and puts them into the same tree as these .tf files are stored in. It also resets the state of the terraform discovered or created datastore.

Next, you do terraform plan -out tfout. Technically, the tfout part can be any filename, but having something like tfout marks it as clearly part of Terraform. This creates the tfout file with the current state, and whatever needs to change in the Terraform state file on it’s next run. Typically, if you don’t use a tfout file within about 20 minutes, it’s probably worth removing it.

Finally, once you’ve run your plan stage, now you need to apply it. In this case you execute terraform apply tfout. This tfout is the same filename you specified in terraform plan. If you don’t include -out tfout on your plan (or even run a plan!) and tfout in your apply, then you can skip the terraform plan stage entirely.

When I ran this, with a handful of changes to the variable files, I got this result:

Once you’re done with your environment, use terraform destroy to shut it all down… and enjoy :)

The full source is available in the associated Gist. Pull requests and constructive criticism are very welcome!

Featured image is “Seca” by “Olearys” on Flickr and is released under a CC-BY license.

"Tower" by " Yijun Chen" on Flickr

Building a Gitlab and Ansible Tower (AWX) Demo in Vagrant with Ansible

TL;DR – I created a repository on GitHub‌ containing a Vagrantfile and an Ansible Playbook to build a VM running Docker. That VM hosts AWX (Ansible Tower’s upstream open-source project) and Gitlab.

A couple of years ago, a colleague created (and I enhanced) a Vagrant and Ansible playbook called “Project X” which would run an AWX instance in a Virtual Machine. It’s a bit heavy, and did a lot of things to do with persistence that I really didn’t need, so I parked my changes and kept an eye on his playbook…

Fast-forward to a week-or-so ago. I needed to explain what a Git/Ansible Workflow would look like, and so I went back to look at ProjectX. Oh my, it looks very complex and consumed a lot of roles that, historically, I’ve not been that impressed with… I just needed the basics to run AWX. Oh, and I also needed a Gitlab environment.

I knew that Gitlab had a docker-based install, and so does AWX, so I trundled off to find some install guides. These are listed in the playbook I eventually created (hence not listing them here). Not all the choices I made were inspired by those guides – I wanted to make quite a bit of this stuff “build itself”… this meant I wanted users, groups and projects to be created in Gitlab, and users, projects, organisations, inventories and credentials to be created in AWX.

I knew that you can create Docker Containers in Ansible, so after I’d got my pre-requisites built (full upgrade, docker installed, pip libraries installed), I add the gitlab-ce:latest docker image, and expose some ports. Even now, I’m not getting the SSH port mapped that I was expecting, but … it’s no disaster.

I did notice that the Gitlab service takes ages to start once the container is marked as running, so I did some more digging, and found that the uri module can be used to poll a URL. It wasn’t documented well how you can make it keep polling until you get the response you want, so … I added a PR on the Ansible project’s github repo for that one (and I also wrote a blog post about that earlier too).

Once I had a working Gitlab service, I needed to customize it. There are a bunch of Gitlab modules in Ansible but since a few releases back of Gitlab, these don’t work any more, so I had to find a different way. That different way was to run an internal command called “gitlab-rails”. It’s not perfect (so it doesn’t create repos in your projects) but it’s pretty good at giving you just enough to build your demo environment. So that’s getting Gitlab up…

Now I need to build AWX. There’s lots of build guides for this, but actually I had most luck using the README in their repository (I know, who’d have thought it!??!) There are some “Secrets” that should be changed in production that I’m changing in my script, but on the whole, it’s pretty much a vanilla install.

Unlike the Gitlab modules, the Ansible Tower modules all work, so I use these to create the users, credentials and so-on. Like the gitlab-rails commands, however, the documentation for using the tower modules is pretty ropey, and I still don’t have things like “getting your users to have access to your organisation” working from the get-go, but for the bulk of the administration, it does “just work”.

Like all my playbooks, I use group_vars to define the stuff I don’t want to keep repeating. In this demo, I’ve set all the passwords to “Passw0rd”, and I’ve created 3 users in both AWX and Gitlab – csa, ops and release – indicative of the sorts of people this demo I ran was aimed at – Architects, Operations and Release Managers.

Maybe, one day, I’ll even be able to release the presentation that went with the demo ;)

On a more productive note, if you’re doing things with the tower_ modules and want to tell me what I need to fix up, or if you’re doing awesome things with the gitlab-rails tool, please visit the repo with this automation code in, and take a look at some of my “todo” items! Thanks!!

Featured image is “Tower” by “Yijun Chen” on Flickr and is released under a CC-BY-SA license.

"funfair action" by "Jon Bunting" on Flickr

Improving the speed of Azure deployments in Ansible with Async

Recently I was building a few environments in Azure using Ansible, and found this stanza which helped me to speed things up.

  - name: "Schedule UDR Creation"
    azure_rm_routetable:
      resource_group: "{{ resource_group }}"
      name: "{{ item.key }}_udr"
    loop: "{{ routetables | dict2items }}"
    loop_control:
        label: "{{ item.key }}_udr"
    async: 1000
    poll: 0
    changed_when: False
    register: sleeper

  - name: "Check UDRs Created"
    async_status:
      jid: "{{ item.ansible_job_id }}"
    register: sleeper_status
    until: sleeper_status.finished
    retries: 500
    delay: 4
    loop: "{{ sleeper.results|flatten(levels=1) }}"
    when: item.ansible_job_id is defined
    loop_control:
      label: "{{ item._ansible_item_label }}"

What we do here is to start an action with an “async” time (to give the Schedule an opportunity to register itself) and a “poll” time of 0 (to prevent the Schedule from waiting to be finished). We then tell it that it’s “never changed” (changed_when: False) because otherwise it always shows as changed, and to register the scheduled item itself as a “sleeper”.

After all the async jobs get queued, we then check the status of all the scheduled items with the async_status module, passing it the registered job ID. This lets me spin up a lot more items in parallel, and then “just” confirm afterwards that they’ve been run properly.

It’s not perfect, and it can make for rather messy code. But, it does work, and it’s well worth giving it the once over, particularly if you’ve got some slow-to-run tasks in your playbook!

Featured image is “funfair action” by “Jon Bunting” on Flickr and is released under a CC-BY license.

A web browser with the example.com web page loaded

Working around the fact that Ansible’s URI module doesn’t honour the no_proxy variable…

An Ansible project I’ve been working on has tripped me up this week. I’m working with some HTTP APIs and I need to check early whether I can reach the host. To do this, I used a simple Ansible Core Module which lets you call an HTTP URI.

- uri:
    follow_redirects: none
    validate_certs: False
    timeout: 5
    url: "http{% if ansible_https | default(True) %}s{% endif %}://{{ ansible_host }}/login"
  register: uri_data
  failed_when: False
  changed_when: False

This all seems pretty simple. One of the environments I’m working in uses the following values in their environment:

http_proxy="http://192.0.2.1:8080"
https_proxy="http://192.0.2.1:8080"
no_proxy="10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16, 192.0.2.0/24, 198.51.100.0/24, 203.0.113.0/24"

And this breaks the uri module, because it tries to punt everything through the proxy if the “no_proxy” contains CIDR values (like 192.0.2.0/24) (there’s a bug raised for this)… So here’s my fix!

- set_fact:
    no_proxy_match: |
      {
        {% for no_proxy in (lookup('env', 'no_proxy') | replace(',', '') ).split() %}
          {% if no_proxy| ipaddr | type_debug != 'NoneType' %}
            {% if ansible_host | ipaddr(no_proxy) | type_debug != 'NoneType' %}
              "match": "True"
            {% endif %}
          {% endif %}
        {% endfor %}
      }

- uri:
    follow_redirects: none
    validate_certs: False
    timeout: 5
    url: "http{% if ansible_https | default(True) %}s{% endif %}://{{ ansible_host }}/login"
  register: uri_data
  failed_when: False
  changed_when: False
  environment: "{ {% if no_proxy_match.match | default(False) %}'no_proxy': '{{ ansible_host }}'{% endif %} }"

So, let’s break this down.

The key part to this script is that we need to override the no_proxy environment variable with the IP address that we’re trying to address (so that we’re not putting 16M addresses for 10.0.0.0/8 into no_proxy, for example). To do that, we use the exact same URI block, except for the environment line at the end.

In turn, the set_fact block steps through the no_proxy values, looking for IP Addresses to check ({% if no_proxy | ipaddr ... %}‌ says “if the no_proxy value is an IP Address, return it, but if it isn’t, return a ‘None’ value”) and if it’s an IP address or subnet mask, it checks to see whether the IP address of the host you’re trying to reach falls inside that IP Address or Subnet Mask ({% if ansible_host | ipaddr(no_proxy) ... %} says “if the ansible_host address falls inside the no_proxy range, then return it, otherwise return a ‘None’ value”). Both of these checks say “If this previous check returns anything other than a ‘None’ value, do the next thing”, and on the last check, the “next” thing is to set the flag ‘match’ to ‘true’. When we get to the environment variable, we say “if match is not true, it’s false, so don’t put a value in there”.

So that’s that! Yes, I could merge the set_fact block into the environment variable, but I do end up using that a fair amount. And really, if it was merged, that would be even MORE complicated to pick through.

I have raised a pull request on the Ansible project to update the documentation, so we’ll see whether we end up with people over here looking for ways around this issue. If so, let me know in the comments below! Thanks!!

"Wifi Here on a Blackboard" by "Jem Stone" on Flickr

Free Wi-Fi does not need to be password-less!

Recently a friend of mine forwarded an email to me about a Wi-fi service he wanted to use from a firm, but he raised some technical questions with them which they seemed to completely misunderstand!

So, let’s talk about the misconceptions of Wi-fi passwords.

Many people assume that when you log into a system, it means that system is secure. For example, logging into a website makes sure that your data is secure and protected, right? Not necessarily – the password you entered could be on a web page that is not secured by TLS, or perhaps the web server doesn’t properly transfer it’s contents to a database. Maybe the website was badly written, and means it’s vulnerable to one of a handful of common attacks (with fun names like “Cross Site Scripting” or “SQL Injection Attacks”)…

People also assume the same thing about Wi-fi. You reached a log in page, so it must be secure, right? It depends. If you didn’t put in a password to access the Wi-fi in the first place (like in the image of the Windows 10 screen, or on my KDE Desktop) then you’re probably using Unsecured Wi-fi.

An example of a secured Wi-fi sign-in box on Windows 10
The same Wi-fi sign in box on KDE Neon

People like to compare network traffic to “sending things through the post”, notablycomparing E-Mail to “sending a postcard”, versus PGP encrypted E-Mail being compared to “sending a sealed letter”. Unencrypted Wi-fi is like using CB. Anyone who can hear your signal can understand what you are saying… but if you visit a website which uses HTTPS, then it’s like listening to someone saying random numbers over the radio.

And, if you’re using Unencrypted Wi-fi, it’s also possible for an attacker to see what website you visited, because the request for the address to reach on the Internet (e.g. “Google.com” = 172.217.23.14) is sent in the clear. Also because of the way that DNS works (that name to address matching thing) means that if someone knows you’re visiting a “site of interest” (like, perhaps a bank website), they can reply *before* the real DNS server, and tell you that the server on their machine is actually your bank’s website.

So, many of these things can be protected against by using a simple method, that many people who provide Wi-fi don’t do.

Turn on WPA2 (the authentication bit). Even if *everyone* uses the same password (which they’d have to for WPA2), the fact you’re logging into the Access Point means it creates a unique shared secret for your session.

“But hang on”, I hear the guy at the back cry, “you used the same password – how does that work?”

OK, so this is where the fun stuff starts. The password is just part of how you negotiate to get on to the network. There’s a complex beast of a method that explains how get a shared unique secret when you’re passing stuff around “in the clear”, and so as a result, when you first connect to that Wi-fi access point, and you hand over your password, it “Authorises” you on to the network, but then hands you over to the encryption part, where you generate a key and then use that to talk to each other. The encryption is the bit like “HTTPS”, where you make it so that people can’t see what you’re looking at.

“I got told that if everyone used the same password” said a hipster in the front row, “I wouldn’t be able to tell them apart.” Aha, not true. You can have a separate passphrase to access the Wi-fi from the Login page, after all, you’ve got to make sure that people aren’t breaking the rules (which they *TOTALLY* read, before clicking “I agree, just get me on the damn Wi-fi already”) by using your network.

“OK”, says the lady over on the right, “but when I connected to the Wi-fi, they asked me to log in using Facebook – that’s secure, right?”

Um, no. Well, maybe. See, if they gave you a WPA2 password to log into the Wi-fi, and then the first thing you got to was that login screen, then yep, it’s all good! {*} You can browse with (relative) impunity. But if they didn’t… well, not only are they asking you to shout your secrets on the radio, but if you’re really unlucky, the page asking you to log into Facebook might *also* not actually be Facebook, but another website that just looks like Facebook… after all, I’m sure that page you went to complained that it wasn’t Google or Facebook when you tried to open it…

{*} Except for the fact they’re asking you to tell them not only who you are, but who you’re also friends with, where you went to school, what your hobbies are, what groups you’re in, your date of birth and so on.

But anyway. I understand why those login screens are there. They’re asserting that not only do you understand that you mustn’t use their network for bad things, but that if the police come and ask them who used their network to do something naughty, they can say “He said his name was ‘Bob Smith’ and his email address was ‘bob@example.com’, Officer”…

It also means that the “free” service they provide to you, usually at some great expense (*eye roll*) can get them some return on investment (like, they just got your totally-real-and-not-at-all-made-up-email-address… honest, and they also know what websites you visited while you were there, which they can sell on).

So… What to do the next time you “need” Wi-fi, and there’s a free service there? Always use a VPN when you’re not using a network you trust. If the Wi-fi isn’t using WPA2 encryption (even something as simple as “Buy a drink first” is a great passphrase to use!) point them to this page, and tell them it’s virtually pain free (as long as the passphrase is easy to remember, easy to type and doesn’t have too many weird symbols in) and makes their service more safe and secure for their customers…

Featured image is “Wifi Here on a Blackboard” by “Jem Stone” on Flickr and is released under a CC-BY license.

"www.GetIPv6.info decal" from Phil Wolff on Flickr

Hurricane Electric IPv6 Gateway on Raspbian for Raspberry Pi

NOTE: This article was replaced on 2019-03-12 by a github repository where I now use Vagrant instead of a Raspberry Pi, because I was having some power issues with my Raspberry Pi. Also, using this method means I can easily use an Ansible Playbook. The following config will still work(!) however I prefer this Vagrant/Ansible workflow for this, so won’t update this blog post any further.

Following an off-hand remark from a colleague at work, I decided I wanted to set up a Raspberry Pi as a Hurricane Electric IPv6 6in4 tunnel router. Most of the advice around (in particular, this post about setting up IPv6 on the Raspberry Pi Forums) related to earlier version of Raspbian, so I thought I’d bring it up-to-date.

I installed the latest available version of Raspbian Stretch Lite (2018-11-13) and transferred it to a MicroSD card. I added the file ssh to the boot volume and unmounted it. I then fitted it into my Raspberry Pi, and booted it. While it was booting, I set a static IPv4 address on my router (192.168.1.252) for the Raspberry Pi, so I knew what IP address it would be on my network.

I logged into my Hurricane Electric (HE) account at tunnelbroker.net and created a new tunnel, specifying my public IP address, and selecting my closest HE endpoint. When the new tunnel was created, I went to the “Example Configurations” tab, and selected “Debian/Ubuntu” from the list of available OS options. I copied this configuration into my clipboard.

I SSH’d into the Pi, and gave it a basic config (changed the password, expanded the disk, turned off “predictable network names”, etc) and then rebooted it.

After this was done, I created a file in /etc/network/interfaces.d/he-ipv6 and pasted in the config from the HE website. I had to change the “local” line from the public IP I’d provided HE with, to the real IP address of this box. Note that any public IPs (that is, not 192.168.x.x addresses) in the config files and settings I’ve noted refer to documentation addressing (TEST-NET-2 and the IPv6 documentation address ranges)

auto he-ipv6
iface he-ipv6 inet6 v4tunnel
        address 2001:db8:123c:abd::2
        netmask 64
        endpoint 198.51.100.100
        local 192.168.1.252
        ttl 255
        gateway 2001:db8:123c:abd::1

Next, I created a file in /etc/network/interfaces.d/eth0 and put the following configuration in, using the first IPv6 address in the “routed /64” range listed on the HE site:

auto eth0
iface eth0 inet static
    address 192.168.1.252
    gateway 192.168.1.254
    netmask 24
    dns-nameserver 8.8.8.8
    dns-nameserver 8.8.4.4

iface eth0 inet6 static
    address 2001:db8:123d:abc::1
    netmask 64

Next, I disabled the DHCPd service by issuing systemctl stop dhcpcd.service Late edit (2019-01-22): Note, a colleague mentioned that this should have actually been systemctl stop dhcpcd.service && systemctl disable dhcpcd.service – good spot! Thanks!! This ensures that if, for some crazy reason, the router stops offering the right DHCP address to me, I can still access this box on this IP. Huzzah!

I accessed another host which had IPv6 access, and performed both a ping and an SSH attempt. Both worked. Fab. However, this now needs to be blocked, as we shouldn’t permit anything to be visible downstream from this gateway.

I’m using the Uncomplicated Firewall (ufw) which is a simple wrapper around IPTables. Let’s create our policy.

# First install the software
sudo apt update && sudo apt install ufw -y

# Permits inbound IPv4 SSH to this host - which should be internal only. 
# These rules allow tailored access in to our managed services
ufw allow in on eth0 app DNS
ufw allow in on eth0 app OpenSSH

# These rules accept all broadcast and multicast traffic
ufw allow in on eth0 to 224.0.0.0/4 # Multicast addresses
ufw allow in on eth0 to 255.255.255.255 # Global broadcast
ufw allow in on eth0 to 192.168.1.255 # Local broadcast

# Alternatively, accept everything coming in on eth0
# If you do this one, you don't need the lines above
ufw allow in on eth0

# Setup the default rules - deny inbound and routed, permit outbound
ufw default deny incoming 
ufw default deny routed
ufw default allow outgoing

# Prevent inbound IPv6 to the network
# Also, log any drops so we can spot them if we have an issue
ufw route deny log from ::/0 to 2001:db8:123d:abc::/64

# Permit outbound IPv6 from the network
ufw route allow from 2001:db8:123d:abc::/64

# Start the firewall!
ufw enable

# Check the policy
ufw status verbose
ufw status numbered

Most of the documentation I found suggested running radvd for IPv6 address allocation. This basically just allocates on a random basis, and, as far as I can make out, each renewal gives the host a new IPv6 address. To make that work, I performed apt-get update && apt-get install radvd -y and then created this file as /etc/radvd.conf. If all you want is a floating IP address with no static assignment – this will do it…

interface eth0
{
    AdvSendAdvert on;
    MinRtrAdvInterval 3;
    MaxRtrAdvInterval 10;
    prefix 2001:db8:123d:abc::/64
    {
        AdvOnLink on;
        AdvAutonomous on;
    };
   route ::/0 {
   };
};

However, this doesn’t give me the ability to statically assign IPv6 addresses to hosts. I found that a different IPv6 allocation method will do static addressing, based on your MAC address called SLAAC (note there are some privacy issues with this, but I’m OK with them for now…) In this mode assuming the prefix as before – 2001:db8:123d:abc:: and a MAC address of de:ad:be:ef:01:23, your IPv6 address will be something like: 2001:db8:123d:abc:dead:beff:feef:0123and this will be repeatably so – because you’re unlikely to change your MAC address (hopefully!!).

This SLAAC allocation mode is available in DNSMasq, which I’ve consumed before (in a Pi-Hole). To use this, I installed DNSMasq with apt-get update && apt-get install dnsmasq -y and then configured it as follows:

interface=eth0
listen-address=127.0.0.1
# DHCPv6 - Hurricane Electric Resolver and Google's
dhcp-option=option6:dns-server,[2001:470:20::2],[2001:4860:4860::8888]
# IPv6 DHCP scope
dhcp-range=2001:db8:123d:abc::, slaac

I decided to move from using my router as a DHCP server, to using this same host, so expanded that config as follows, based on several posts, but mostly centred around the MAN page (I’m happy to have this DNSMasq config improved if you’ve got any suggestions ;) )

# Stuff for DNS resolution
domain-needed
bogus-priv
no-resolv
filterwin2k
expand-hosts
domain=localnet
local=/localnet/
log-queries

# Global options
interface=eth0
listen-address=127.0.0.1

# Set these hosts as the DNS server for your network
# Hurricane Electric and Google
dhcp-option=option6:dns-server,[2001:470:20::2],2001:4860:4860::8888]

# My DNS servers are:
server=1.1.1.1                # Cloudflare's DNS server
server=8.8.8.8                # Google's DNS server

# IPv4 DHCP scope
dhcp-range=192.168.1.10,192.168.1.210,12h
# IPv6 DHCP scope
dhcp-range=2001:db8:123d:abc::, slaac

# Record the DHCP leases here
dhcp-leasefile=/run/dnsmasq/dhcp-lease

# DHCPv4 Router
dhcp-option=3,192.168.1.254

So, that’s what I’m doing now! Hope it helps you!

Late edit (2019-01-22): In issue 129 of the “Awesome Self Hosted Newsletter“, I found a post called “My New Years Resolution: Learn IPv6“… which uses a pfSense box and a Hurricane Electric tunnel too. Fab!

Header image is “www.GetIPv6.info decal” by “Phil Wolff” on Flickr and is released under a CC-BY-SA license. Used with thanks!