Work Stuff – Page 6 – A nice guy's view on life

A web browser with the example.com web page loaded

Working around the fact that Ansible’s URI module doesn’t honour the no_proxy variable…

2019-05-142019-05-14 JonTheNiceGuy Leave a comment

An Ansible project I’ve been working on has tripped me up this week. I’m working with some HTTP APIs and I need to check early whether I can reach the host. To do this, I used a simple Ansible Core Module which lets you call an HTTP URI.

- uri:
    follow_redirects: none
    validate_certs: False
    timeout: 5
    url: "http{% if ansible_https | default(True) %}s{% endif %}://{{ ansible_host }}/login"
  register: uri_data
  failed_when: False
  changed_when: False

This all seems pretty simple. One of the environments I’m working in uses the following values in their environment:

http_proxy="http://192.0.2.1:8080"
https_proxy="http://192.0.2.1:8080"
no_proxy="10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16, 192.0.2.0/24, 198.51.100.0/24, 203.0.113.0/24"

And this breaks the uri module, because it tries to punt everything through the proxy if the “no_proxy” contains CIDR values (like 192.0.2.0/24) (there’s a bug raised for this)… So here’s my fix!

- set_fact:
    no_proxy_match: |
      {
        {% for no_proxy in (lookup('env', 'no_proxy') | replace(',', '') ).split() %}
          {% if no_proxy| ipaddr | type_debug != 'NoneType' %}
            {% if ansible_host | ipaddr(no_proxy) | type_debug != 'NoneType' %}
              "match": "True"
            {% endif %}
          {% endif %}
        {% endfor %}
      }

- uri:
    follow_redirects: none
    validate_certs: False
    timeout: 5
    url: "http{% if ansible_https | default(True) %}s{% endif %}://{{ ansible_host }}/login"
  register: uri_data
  failed_when: False
  changed_when: False
  environment: "{ {% if no_proxy_match.match | default(False) %}'no_proxy': '{{ ansible_host }}'{% endif %} }"

So, let’s break this down.

The key part to this script is that we need to override the no_proxy environment variable with the IP address that we’re trying to address (so that we’re not putting 16M addresses for 10.0.0.0/8 into no_proxy, for example). To do that, we use the exact same URI block, except for the environment line at the end.

In turn, the set_fact block steps through the no_proxy values, looking for IP Addresses to check ({% if no_proxy | ipaddr ... %}‌ says “if the no_proxy value is an IP Address, return it, but if it isn’t, return a ‘None’ value”) and if it’s an IP address or subnet mask, it checks to see whether the IP address of the host you’re trying to reach falls inside that IP Address or Subnet Mask ({% if ansible_host | ipaddr(no_proxy) ... %} says “if the ansible_host address falls inside the no_proxy range, then return it, otherwise return a ‘None’ value”). Both of these checks say “If this previous check returns anything other than a ‘None’ value, do the next thing”, and on the last check, the “next” thing is to set the flag ‘match’ to ‘true’. When we get to the environment variable, we say “if match is not true, it’s false, so don’t put a value in there”.

So that’s that! Yes, I could merge the set_fact block into the environment variable, but I do end up using that a fair amount. And really, if it was merged, that would be even MORE complicated to pick through.

I have raised a pull request on the Ansible project to update the documentation, so we’ll see whether we end up with people over here looking for ways around this issue. If so, let me know in the comments below! Thanks!!

"Juniper NetScreen 25 Firewall front" by "jackthegag" on Flickr

Standard Firewall Rules

2019-02-012019-02-01 JonTheNiceGuy Leave a comment

One of the things I like to do is to explain how I set things up, but a firewall is one of those things that’s a bit complicated, because it depends on your situation, and what you’re trying to do in your environment. That said, there’s a template that you can probably get away with deploying, and see if it works for your content, and then you’ll see where to add the extra stuff from there. Firewall policies typically work from the top down.

This document will assume you have a simple boundary firewall. This simple firewall has two interfaces, the first being an “Outside” interface, connected to your ISP, with an IPv4 address of 192.0.2.2/24 and a default gateway of 192.0.2.1, it also has a IPv6 address of 2001:db8:123c:abd::2/64 and a default gateway address of 2001:db8:123c:abd::1. The second “Inside” interface, where your protected network is attached, has an IPv4 address of 198.51.100.1/24 and an IPv6 address of 2001:db8:123d:abc::1/64. On this inside interface, the firewall is the default gateway for the inside network.

I’ll be using simple text rules to describe firewall policies, following this format:

Source Interface: <outside | inside>
Source IP Address: <x.x.x.x/x | "any">
NAT Source IP Address: <x.x.x.x/x | no>
Destination Interface: <outside | inside>
Destination IP Address: <x.x.x.x/x | "any">
NAT Destination IP Address: <x.x.x.x/x | no>
Destination Port: <tcp | udp | icmp | ip>/<x>
Action: <allow | deny | reject>
Log: <yes | no>
Notes: <some commentary if required>

In this model, if you want to describe HTTP access to a web server, you might write the following policy:

Source Interface: outside
Source IP Address: 0.0.0.0/0 (Any IP)
NAT Source IP Address: no
Destination Interface: inside
Destination IP Address: 192.0.2.2 (External IP)
NAT Destination IP Address: 198.51.100.2 (Internal IP)
Destination Port: tcp/80
Action: allow
Log: yes

So, without further waffling, let’s build a policy. By default all traffic will be logged. In high-traffic environments, you may wish to prevent certain traffic from being logged, but on the whole, I think you shouldn’t really lose firewall logs unless you need to!

Allowing established, related and same-host traffic

This rule is only really needed on iptables based firewalls, as all the commercial vendors (as far as I can tell, at least) already cover this as “standard”. If you’re using UFW (a wrapper to iptables), this rule is covered off already, but essentially it goes a bit like this:

Source Interface: lo (short for "local", where the traffic never leaves the device)
Source IP Address: any
NAT Source IP Address: no
Destination Interface: lo
Destination IP Address: any
NAT Destination IP Address: no
Destination Port: any
Action: allow
Log: no
Notes: This above rule permits traffic between localhost addresses (127.0.0.0/8) or between public addresses on the same host, for example, between two processes without being blocked.

flags: Established OR Related
Action: allow
Log: no
Notes: This above rule is somewhat special, as it looks for specific flags on the packet, that says "If we've already got a session open, let it carry on talking".

Dropping Noisy Traffic

In a network, some proportion of the traffic is going to be “noisy”. Whether it’s broadcast traffic from your application that uses mDNS, or the Windows File Share trying to find like-minded hosts to exchange data… these can fill up your logs, so lets drop the broadcast and multicast IPv4 traffic, and not log them.

Source Interface: any
Source IP Address: 0.0.0.0/0
NAT Source IP Address: no
Destination Interface: any
Destination IP Address: 255.255.255.255 (global broadcast), 192.0.2.255 ("outside" broadcast), 198.51.100.255 ("inside" broadcast) and 224.0.0.0/4 (multicast)
NAT Destination IP Address: no
Destination Port: any
Action: deny
Log: no
Notes: The global and local broadcast addresses are used to "find" other hosts in a network, whether that's a DHCP server or something like mDNS. Dropping this prevents the traffic from appearing in your logs later.

Permitting Management Traffic

Typically you want to trust certain machines to access or be accessed by this host – whether it’s your SYSLOG collector, or the box that can manage the firewall policy, so here we’ll create a policy that lets these in.

Source Interface: inside
Source IP Address: 198.51.100.2 and 2001:db8:123d:abc::2 (Management IP)
NAT Source IP Address: no
Destination Interface: inside
Destination IP Address: 198.51.100.1 and 2001:db8:123d:abc::1 (Firewall IP)
NAT Destination IP Address: no
Destination Port: SSH (tcp/22)
Action: permit
Log: yes
Notes: Allow inbound SSH access. You're unlikely to need more inbound ports, but if you do - customise them here.

Source Interface: inside
Source IP Address: 198.51.100.1 and 2001:db8:123d:abc::1 (Firewall IP)
NAT Source IP Address: no
Destination Interface: inside
Destination IP Address: 198.51.100.2 and 2001:db8:123d:abc::2 (Management IP)
NAT Destination IP Address: no
Destination Port: SYSLOG (udp/514)
Action: permit
Log: yes
Notes: Allow outbound SYSLOG access. Tailor this to outbound ports you need.

Allowing Control Traffic

ICMP is a protocol that is fundamental to IPv4 and IPv6. Commonly used for Traceroute and Ping, but also used to perform REJECT responses and that sort of thing. We’re only going to let it be initiated *out* not in. Some people won’t allow this rule, or tailor it to more specific destinations.

Source Interface: inside
Source IP Address: any
NAT Source IP Address: 192.0.2.2 (The firewall IP address which may be replaced with 0.0.0.0 indicating "whatever IP address is bound to the outbound interface")
Destination Interface: outside
Destination IP Address: any
NAT Destination IP Address: no
Destination Port: icmp
Action: allow
Log: yes
Notes: ICMPv4 and ICMPv6 are different things. This is just the ICMPv4 version. IPv4 does require NAT, hence the difference from the IPv6 version below.

Source Interface: inside
Source IP Address: any
NAT Source IP Address: no
Destination Interface: outside
Destination IP Address: any
NAT Destination IP Address: no
Destination Port: icmpv6
Action: allow
Log: yes
Notes: ICMPv4 and ICMPv6 may be treated as different things. This is just the ICMPv6 version. IPv6 does not require NAT.

Protect the Firewall

There should be no other traffic going to the Firewall, so let’s drop everything. There are two types of “Deny” message – a “Reject” and a “Drop”. A Reject sends a message back from the host which is refusing the connection – usually the end server to say that the service didn’t want to reply to you, but if there’s a box in the middle – like a firewall – this reject (actually an ICMP packet) comes from the firewall instead. In this case it’s identifying that the firewall was refusing the connection for the node, so it advertises the fact the end server is protected by a security box. Instead, firewall administrators tend to use Drop, which just silently discards the initial request, leaving the initiating end to “Time Out”. You’re free to either “Reject” or “Drop” whenever we show “Deny” in the below policies, but bear it in mind that it’s less secure to use Reject than it is to Drop.

Source Interface: any
Source IP Address: any
NAT Source IP Address: no
Destination Interface: any
Destination IP Address: 192.0.2.2, 2001:db8:123c:abd::2, 198.51.100.1 and 2001:db8:123d:abc::1 (may also be represented as :: or 0.0.0.0 depending on the platform)
NAT Destination IP Address: no
Destination Port: any
Action: deny
Log: no
Notes: Drop everything targetted at the firewall IPs. If you have more NICs or additional IP addresses on the firewall, these will also need blocking.

“Normal” Inbound Traffic

After you’ve got your firewall protected, now you can sort out your “normal” traffic flows. I’m going to add a single inbound policy to represent the sort of traffic you might want to configure (in this case a simple web server), but bear in mind some environments don’t have any “inbound” rules (for example, most homes would be in this case), and some might need lots and lots of inbound rules. This is just to give you a flavour on what you might see here.

Source Interface: outside
Source IP Address: any
NAT Source IP Address: no
Destination Interface: inside
Destination IP Address: 192.0.2.2 (External IP)
NAT Destination IP Address: 198.51.100.2 (Internal IP)
Destination Port: tcp/80 (HTTP), tcp/443 (HTTPS)
Action: allow
Log: yes
Notes: This is the IPv4-only rule. Note a NAT MUST be applied here.

Source Interface: outside
Source IP Address: any
NAT Source IP Address: no
Destination Interface: inside
Destination IP Address: 2001:db8:123d:abc::2
NAT Destination IP Address: no
Destination Port: tcp/80 (HTTP), tcp/443 (HTTPS)
Action: allow
Log: yes
Notes: This is the IPv6-only rule. Note that NO NAT is required (but, you may wish to perform NAT, depending on your environment).

“Normal” Outbound Traffic

If you’re used to a DSL router, that basically just allows all outbound traffic. We’re going to implement that here. If you want to be more specific about things, you’d define your outbound rules like the inbound rules in the block above… but if you’re not that worried, then this rule below is generally going to be all OK :)

Source Interface: inside
Source IP Address: any
NAT Source IP Address: 192.0.2.2 (The firewall IP address which may be replaced with 0.0.0.0 indicating "whatever IP address is bound to the outbound interface")
Destination Interface: outside
Destination IP Address: any
NAT Destination IP Address: no
Destination Port: any
Action: allow
Log: yes
Notes: This is just the IPv4 version. IPv4 does require NAT, hence the difference from the IPv6 version below.

Source Interface: inside
Source IP Address: any
NAT Source IP Address: no
Destination Interface: outside
Destination IP Address: any
NAT Destination IP Address: no
Destination Port: any
Action: allow
Log: yes
Notes: This is just the IPv6 version. IPv6 does not require NAT.

Drop Rule

Following your permit rules above, you now need to drop everything else. Fortunately, by now, you’ve “white-listed” all the permitted traffic, so now we can just drop “everything”. So, let’s do that!

Source Interface: any
Source IP Address: any
NAT Source IP Address: no
Destination Interface: any
Destination IP Address: any
NAT Destination IP Address: no
Destination Port: any
Action: deny
Log: yes

And so that is a basic firewall policy… or at least, it’s the template I tend to stick to! :)

Expanding XFS drives with LVM

2019-01-232019-01-23 JonTheNiceGuy Leave a comment

Say, for example, you’ve got a lovely CentOS VM (using XFS by default) which has a disk that isn’t quite big enough. Fair enough, your VM Hypervisor is sensible enough to resize that disk without question… How do you resize the XFS partition? Assuming you’ve got your disk mounted as /dev/sda, and you’ve got a boot volume as partition 1 and a root volume as partition 2 (the standard install model)

parted /dev/sda resizepart 2 100%
partprobe /dev/sda
pvresize /dev/sda2
lvextend /dev/centos/root /dev/sda2
xfs_growfs /dev/mapper/centos-root

The graphical version of the steps above

Research via:

"Zenith Z-19 Terminal" from ajmexico on Flickr

Some things I learned this week while coding extensions to Ansible!

2019-01-182019-01-21 JonTheNiceGuy Leave a comment

If you follow any of the content I post around the internet, you might have seen me asking questions about trying to get data out of azure_rm_*_facts into something that’s usable. I can’t go into why I needed that data yet (it’s a little project I’m working on), but the upshot is that trying to manipulate data using “set_fact” with jinja is *doable* but *messy*. In the end, I decided to hand it all off to a new ansible module I’m writing. So, here are the things I learned about this.

There’s lots more documentation about writing a module (a plugin that let’s you do stuff) than there is about writing filters (things that change text inline) or lookups (things that let you search other data stores). In the end, while I could have spent the time to figure out how better to write a filter or a lookup, it actually makes more sense in my context to hand a module all my data, and say “Parse this” and register the result than it would have done to have the playbook constantly check whether things were in other things. I still might have to do that, but… you know, for now, I’ve got the bits I want! :)
I did start looking at writing a filter, and discovered that the “debugging advice” on the ansible site is all geared up to enable more modules than enabling filters… but I did discover that modules execute on their target (e.g. WebHost01) while filters and lookups execute on the local machine. Why does this matter? Well…..
While I was looking for documentation about debugging Ansible code, I stumbled over this page on debugging modules that makes it all look easy. Except, it’s only for debugging *MODULES* (very frustrating. Well, what does it actually mean? The modules get zipped up and sent to the host that will be executing the code, which means that with an extra flag to your playbook (ANSIBLE_KEEP_REMOTE_FILES – even if it’s going to be run on “localhost”), you get the combined output of the script placed into a path on your machine, which means you can debug that specific play. That doesn’t work for filters…
SOO, I jumped into #ansible on Freenode and asked for help. They in turn couldn’t help me (it’s more about writing playbooks than writing filters, modules, etc), so they directed me to #ansible-devel, where I was advised to use a python library called “q” (Edit, same day: my friend @mohclips pointed me to this youtube video from 2003 of the guy who wrote q explaining about it. Thanks Nick! I learned something *else* about this library).
Oh man, this is the motherlode. So, q makes life *VERY* easy. Assuming this is valid code: All you’d need to do would be to add two lines, as you’ll see here: This then dumps the output from each of the q(something) lines into /tmp/q for you to read at your leisure! (To be fair, I’d probably remove it after you’ve finished, so you don’t fill a disk :) )
And that’s when I discovered that it’s actually easier to use q() for all my python debugging purposes than it is to follow the advice above about debugging modules. Yehr, it’s basically a load of print statements, so you don’t get to see stack traces, or read all the variables, and you don’t get to step through code to see why decisions were taken… but for the rubbish code I produce, it’s easily enough for me!

Header image is “Zenith Z-19 Terminal” by “ajmexico” on Flickr and is released under a CC-BY license. Used with thanks!

"LEGO Factory Playset" from Brickset on Flickr

Building Azure Environments in Ansible

2018-12-212018-12-21 JonTheNiceGuy Leave a comment

Recently, I’ve been migrating my POV (proof of value) and POC (proof of concept) environment from K5 to Azure to be able to test vendor products inside Azure. I ran a few tests to build the environment using the native tools (the powershell scripts) and found that the Powershell way of delivering Azure environments seems overly complicated… particularly as I’m comfortable with how Ansible works.

To be fair, I also need to look at Terraform, but that isn’t what I’m looking at today :)

So, let’s start with the scaffolding. Any Ansible Playbook which deals with creating virtual machines needs to have some extra modules installed. Make sure you’ve got ansible 2.7 or later and the python azure library 2.0.0 or later (you can get both with pip for python).

Next, let’s look at the group_vars for this playbook.

--- project_prefix: env01 project_breakglass: "My$uper$ecretPassw0rd" project_location: uksouth image: ubuntu1804: offer: UbuntuServer publisher: Canonical sku: '18.04-LTS' version: latest provision_script: userdata/ubuntu.j2 fortigate603_payg: offer: fortinet_fortigate-vm_v5 publisher: fortinet sku: 'fortinet_fg-vm_payg' version: 6.0.3 plan: yes provision_script: userdata/fortigate.j2 network: wan: name: wan prefix: 10.10.10 protected: name: protected prefix: 10.10.20 gateway: 10.10.20.4 appliances: fw01: name: fw01 image: "{{ image.fortigate603_payg }}" size: Standard_F1 priority: 1 ports: - subnet: "{{ network.wan }}" - subnet: "{{ network.protected }}" private_ip: "{{ network.wan.prefix }}.4" public: false default_allow_policy: - source: port2 destination: port1 nat: yes vm01: name: vm01 image: "{{ image.ubuntu1804 }}" size: Standard_B1s priority: 2 ports: - subnet: "{{ network.protected }}" private_ip: "{{ network.protected.prefix }}.5" public: false vm02: name: vm02 image: "{{ image.ubuntu1804 }}" size: Standard_B1s priority: 0 ports: - subnet: "{{ network.wan }}" private_ip: "{{ network.wan.prefix }}.250" public: false

This file has several pieces. We define the project settings (anything prefixed project_ is a project setting), including the prefix used for all resources we create (in this case “env01“), and a standard password used for all VMs we create (in this case “My$uper$ecret$Passw0rd“).

Next we define the standard images to load from the Marketplace. You can extend this with other images, these are just the “easiest” ones that I’m most familiar with (your mileage may vary). Next up is the networks to build inside the VNet, and lastly we define the actual machines we want to build. If you’ve got questions about any of the values we define here, just let me know in the comments below :)

Next, we’ll start looking at the playbook (this has been exploded out – the full playbook is also in the gist).

vars: location: "{{ project_location|default('uksouth') }}" prefix: | {%- if lookup('env', 'ANSIBLE_PREFIX') != '' -%} {{ lookup('env', 'ANSIBLE_PREFIX') }} {%- else -%} {{ project_prefix|default(ansible_date_time.epoch|hash('md5')|truncate(8, true, '')) }} {%- endif %}" breakglass_pw: | {%- if lookup('env', 'BREAKGLASS') != '' -%} {{ lookup('env', 'BREAKGLASS') }} {%- else -%} {{ project_breakglass|default(ansible_date_time.epoch|hash('md5')|truncate(8, true, '') + '_Azure') }} {%- endif -%} tasks: - debug: var=prefix - debug: var=breakglass_pw

Here we start by pulling in the variables we might want to override, and we do this by reading system environment variables (ANSIBLE_PREFIX and BREAKGLASS) and using them if they’re set. If they’re not, use the project defaults, and if that hasn’t been set, use some pre-defined values… and then tell us what they are when we’re running the tasks (those are the debug: lines).

- name: "Create Resource Group" azure_rm_resourcegroup: name: "{{ prefix }}" location: "{{ location }}" # Yes, this is not good practice, but it's just for a POV/POC - name: "Create Any-Any Allow Security Group" azure_rm_securitygroup: resource_group: "{{ prefix }}" name: "{{ prefix }}AnyAnyAllow" rules: - name: AllowAllInbound protocol: "*" destination_port_range: "*" access: Allow priority: 100 direction: Inbound - name: AllowAllOutbound protocol: "*" destination_port_range: "*" access: Allow priority: 100 direction: Outbound - name: "Create Storage Account" azure_rm_storageaccount: resource_group: "{{ prefix }}" name: "{{ prefix }}storage" account_type: Standard_LRS - name: "Create VNET" azure_rm_virtualnetwork: resource_group: "{{ prefix }}" name: "{{ prefix }}network01" address_prefixes_cidr: "10.0.0.0/8"

This block is where we create our “Static Assets” – individual items that we will be consuming later. This shows a clear win here over the Powershell methods endorsed by Microsoft – here you can create a Resource Group (RG) as part of the playbook! We also create a single Storage Account for this RG and a single VNET too.

These creation rules are not suitable for production use, as this defines an “Any-Any” Security group! You should tailor your security groups for your need, not for blanket access in!

- name: "Schedule UDR Creation" azure_rm_routetable: resource_group: "{{ prefix }}" name: "{{ prefix }}{{ item.value.name }}_udr" with_dict: "{{ network }}" loop_control: label: "{{ prefix }}{{ item.value.name }}" when: item.value.gateway is defined async: 1000 poll: 0 register: sleeper - name: "Check UDRs Created" async_status: jid: "{{ item.ansible_job_id }}" register: sleeper_status until: sleeper_status.finished retries: 500 delay: 4 with_items: "{{ sleeper.results }}" when: item.ansible_job_id is defined loop_control: label: "{{ item._ansible_item_label }}"

This is where things start to get a bit more interesting – We’re using the “async/async_status” pattern here (and the rest of these sections) to start creating the resources in parallel. As far as I can tell, sometimes you’ll get a case where the async doesn’t quite get set up fast enough, then the async_status can’t track the resources properly, but re-running the playbook should be enough to sort that out, without slowing things down too much.

But what are we actually doing with this block of code? A UDR is a “User Defined Route” or routing table for Azure. Effectively, you treat each network interface as being plumbed directly to the router (none of this “same subnet broadcast” stuff works here!) so you can do routing at the router for all the networks.

By default there are some existing network routes (stuff to the internet flows to the internet, RFC1918 addresses are dropped with the exception of any RFC1918 addresses you have covered in your VNETs, and each of your subnets can reach each other “directly”). Adding a UDR overrides this routing table. The UDRs we’re creating here are applied at a subnet level, but currently don’t override any of the existing routes (they’re blank). We’ll start putting routes in after we’ve added the UDRs to the subnets. Talking of which….

- name: "Schedule Subnet Creation" azure_rm_subnet: resource_group: "{{ prefix }}" name: "{{ prefix }}{{ item.value.name }}" address_prefix: "{{ item.value.prefix }}.0/24" virtual_network: "{{ prefix }}network01" route_table: | {%- if item.value.gateway is defined -%} {{ prefix }}{{ item.value.name }}_udr {%- else -%} {{ omit }}{%- endif -%} with_dict: "{{ network }}" loop_control: label: "{{ prefix }}{{ item.value.name }}" async: 1000 poll: 0 register: sleeper - name: "Check Subnet Created" async_status: jid: "{{ item.ansible_job_id }}" register: sleeper_status until: sleeper_status.finished retries: 500 delay: 4 with_items: "{{ sleeper.results }}" when: item.ansible_job_id is defined loop_control: label: "{{ item._ansible_item_label }}"

Again, this block is not really suitable for production use, and assumes the VNET supernet of /8 will be broken down into several /24’s. In the “real world” you might deliver a handful of /26’s in a /24 VNET… or you might even have lots of disparate /24’s in the VNET which are then allocated exactly as individual /24 subnets… this is not what this model delivers but you might wish to investigate further!

- name: "Schedule Route Creation" azure_rm_route: resource_group: "{{ prefix }}" name: "{{ prefix }}{{ item.value.name }}" address_prefix: "0.0.0.0/0" route_table_name: "{{ prefix }}{{ item.value.name }}_udr" next_hop_type: virtual_appliance next_hop_ip_address: "{{ item.value.gateway }}" with_dict: "{{ network }}" loop_control: label: "{{ prefix }}{{ item.value.name }}" when: item.value.gateway is defined async: 1000 poll: 0 register: sleeper - name: "Check Routes Created" async_status: jid: "{{ item.ansible_job_id }}" register: sleeper_status until: sleeper_status.finished retries: 500 delay: 4 with_items: "{{ sleeper.results }}" when: item.ansible_job_id is defined loop_control: label: "{{ item._ansible_item_label }}"

Now that we’ve created our subnets, we can start adding the routing table to the UDR. This is a basic one – add a 0.0.0.0/0 route (internet access) from the “protected” network via the firewall. You can get a lot more specific than this – most people are likely to want to add the VNET range (in this case 10.0.0.0/8) via the firewall as well, except for this subnet (because otherwise, for example, 10.0.0.100 trying to reach 10.0.0.101 will go via the firewall too).

Without going too much into the intricacies of network architecture, if you are routing your traffic between subnets to the firewall, it’s probably better to get an appliance with more interfaces, so you can route traffic across the appliance, rather than going across a single interface as this will halve your traffic bandwidth (it’s currently capped 1Gb/s – so 500Mb/s).

Having mentioned “The Internet” – let’s give our firewall a public IP address, and create the rest of the interfaces as well.

- name: "Schedule Public IP creation" azure_rm_publicipaddress: resource_group: "{{ prefix }}" allocation_method: Static name: "{{ prefix }}{{ item.0.name }}port{{ item.1.subnet.name }}pubip" with_subelements: - "{{ appliances }}" - ports when: item.1.public is not defined or (item.1.public is defined and item.1.public == 'true') loop_control: label: "{{ prefix }}{{ item.0.name }}port{{ item.1.subnet.name }}pubip" async: 1000 poll: 0 register: sleeper - name: "Check Public IPs Created" async_status: jid: "{{ item.ansible_job_id }}" register: sleeper_status until: sleeper_status.finished retries: 500 delay: 4 with_items: "{{ sleeper.results }}" when: item.ansible_job_id is defined loop_control: label: "{{ item._ansible_item_label }}" - name: "Schedule Network Interface Creation" azure_rm_networkinterface: resource_group: "{{ prefix }}" name: "{{ prefix }}{{ item.0.name }}port{{ item.1.subnet.name }}" virtual_network: "{{ prefix }}network01" subnet: "{{ prefix }}{{ item.1.subnet.name }}" public_ip_name: "{{ prefix }}pubip01" security_group: "{{ prefix }}AnyAnyAllow" ip_configurations: - name: "{{ prefix }}{{ item.0.name }}port{{ item.1.subnet.name }}" public_ip_address_name: | {%- if item.1.public is not defined or (item.1.public is defined and item.1.public == 'true') -%} {{ prefix }}{{ item.0.name }}port{{ item.1.subnet.name }}pubip {%- else -%}{%- endif -%}" primary: yes with_subelements: - "{{ appliances }}" - ports loop_control: label: "{{ prefix }}{{ item.0.name }}port{{ item.1.subnet.name }}" async: 1000 poll: 0 register: sleeper - name: "Check Network Interfaces Created" async_status: jid: "{{ item.ansible_job_id }}" register: sleeper_status until: sleeper_status.finished retries: 500 delay: 4 with_items: "{{ sleeper.results }}" when: item.ansible_job_id is defined loop_control: label: "{{ item._ansible_item_label }}"

This script creates a public IP address by default for each interface unless you explicitly tell it not to (see lines 40, 53 and 62 in the group_vars file I rendered above). You could easily turn this around by changing the lines which contain this:

item.1.public is not defined or (item.1.public is defined and item.1.public == 'true')

into lines which contain this:

item.1.public is defined and item.1.public == 'true'

OK, having done all that, we’re now ready to build our virtual machines. I’ve introduced a “Priority system” here – VMs with priority 0 go first, then 1, and 2 go last. The code snippet below is just for priority 0, but you can easily see how you’d extrapolate that out (and in fact, the full code sample does just that).

- name: "Schedule Virtual Machine Creation - Priority 0 (or 'All')" azure_rm_virtualmachine: resource_group: "{{ prefix }}" name: "{{ prefix }}{{ item.value.name }}" vm_size: "{{ item.value.size }}" storage_account: "{{ prefix }}storage" storage_container: "{{ prefix }}{{ item.value.name }}os" storage_blob: "{{ prefix }}{{ item.value.name }}os.vhd" admin_username: breakglass admin_password: "{{ breakglass_pw }}" network_interfaces: | [ {%- for nw in item.value.ports -%} '{{ prefix }}{{ item.value.name }}port{{ nw.subnet.name }}' {%- if not loop.last -%}, {%- endif -%} {%- endfor -%} ] custom_data: | {%- if item.value.provision_script is defined and item.value.provision_script != '' -%} {%- include(item.value.provision_script) -%} {%- elif item.value.image.provision_script is defined and item.value.image.provision_script != '' -%} {%- include(item.value.image.provision_script) -%} {%- else -%} {{ omit }} {%- endif -%} image: publisher: "{{ item.value.image.publisher }}" offer: "{{ item.value.image.offer }}" sku: "{{ item.value.image.sku }}" version: "{{ item.value.image.version }}" plan: | {%- if item.value.image.plan is not defined -%}{{ omit }}{%- else -%} {'name': '{{ item.value.image.sku }}', 'publisher': '{{ item.value.image.publisher }}', 'product': '{{ item.value.image.offer }}' } {%- endif -%} with_dict: "{{ appliances }}" when: item.value.priority | default(0) == 0 loop_control: label: "{{ prefix }}{{ item.value.name }}" async: 1000 poll: 0 register: sleeper - name: "Check Virtual Machines Created - Priority 0" async_status: jid: "{{ item.ansible_job_id }}" register: sleeper_status until: sleeper_status.finished retries: 500 delay: 4 with_items: "{{ sleeper.results }}" when: item.ansible_job_id is defined loop_control: label: "{{ item._ansible_item_label }}"

There are a few blocks here to draw attention to :) I’ve re-jigged them a bit here so it’s clearer to understand, but when you see them in the main playbook they’re a bit more compact. Let’s start with looking at the Network Interfaces section!

network_interfaces: |
  [
    {%- for nw in item.value.ports -%}
      '{{ prefix }}{{ item.value.name }}port{{ nw.subnet.name }}'
      {%- if not loop.last -%}, {%- endif -%} 
    {%- endfor -%}
  ]

In this part, we loop over the ports defined for the virtual machine. This is because one device may have 1 interface, or four interfaces. YAML is parsed to make a JSON variable, so here we can create a JSON variable, that when the YAML is parsed it will just drop in. We’ve previously created all the interfaces to have names like this PREFIXhostnamePORTsubnetname (or aFW01portWAN in more conventional terms), so here we construct a JSON array, like this: ['aFW01portWAN'] but that could just as easily have been ['aFW01portWAN', 'aFW01portProtect', 'aFW01portMGMT', 'aFW01portSync']. This will then attach those interfaces to the virtual machine.

Next up, custom_data. This section is sometimes known externally as userdata or config_disk. My code has always referred to it as a “Provision Script” – hence the variable name in the code below!

custom_data: |
  {%- if item.value.provision_script is defined and item.value.provision_script != '' -%}
    {%- include(item.value.provision_script) -%}
  {%- elif item.value.image.provision_script is defined and item.value.image.provision_script != '' -%}
    {%- include(item.value.image.provision_script) -%}
  {%- else -%}
    {{ omit }}
  {%- endif -%}

Let’s pick this one apart too. If we’ve defined a provisioning script file for the VM, include it, if we’ve defined a provisioning script file for the image (or marketplace entry), then include that instead… otherwise, pretend that there’s no “custom_data” field before you submit this to Azure.

One last quirk to Azure, is that some images require a “plan” to go with it, and others don’t.

plan: |
  {%- if item.value.image.plan is not defined -%}{{ omit }}{%- else -%}
    {'name': '{{ item.value.image.sku }}',
     'publisher': '{{ item.value.image.publisher }}',
     'product': '{{ item.value.image.offer }}'
    }
  {%- endif -%}

So, here we say “if we’ve not got a plan, omit the value being passed to Azure, otherwise use these fields we previously specified. Weird huh?

The very last thing we do in the script is to re-render the standard password we’ve used for all these builds, so that we can check them out!

Want to review this all in one place?

Here’s the link to the full playbook, as well as the group variables (which should be in ./group_vars/all.yml) and two sample userdata files (which should be in ./userdata) for an Ubuntu machine (using cloud-init) and one for a FortiGate Firewall.

All the other files in that gist (prefixes from 10-16 and 00) are for this blog post only, and aren’t likely to work!

If you do end up using this, please drop me a note below, or star the gist! That’d be awesome!!

Image credit: “Lego Factory Playset” from Flickr by “Brickset” released under a CC-BY license. Used with Thanks!

Picture of a bull from an 1882 reference guide, as found on Flickr

Abattoir Architecture for Evergreen – using the Pets versus Cattle parable

2018-12-132019-01-03 JonTheNiceGuy Leave a comment

Work very generously sent me on a training course today about a cloud based technology we’re considering deploying.

During the course, the organiser threw a question to the audience about “who can explain what a container does?” and a small number of us ended up talking about Docker (primarily for Linux) and CGroups, and this then turned into a conversation about the exceedingly high rate of changes deployed by Amazon, Etsy and others who have completely embraced microservices and efficient CI/CD pipelines… and then I mentioned the parable of Pets versus Cattle.

The link above points to where the story comes from, but the short version is…

When you get a pet, it comes as a something like a puppy or kitten, you name it, you nurture it, bring it up to live in your household, and when it gets sick, because you’ve made it part of your family, you bring in a vet who nurses it back to health.
When you have cattle, it doesn’t have a name, it has a number, and if it gets sick, you isolate it from the herd and if it doesn’t get better, you take it out back and shoot it, and get a new one.

A large number of the audience either hadn’t heard of the parable, or if they had, hadn’t heard it delivered like this.

We later went on to discuss how this applies in a practical sense, not just in docker or kubernetes containers, but how it could be applied to Infrastructure as a Service (IaaS), even down to things like vendor supplied virtual firewalls where you have Infrastructure as Code (IaC).

If, in your environment, you have some service you treat like cattle – perhaps a cluster of firewalls behind a load balancer or a floating IP address and you need to upgrade it (because it isn’t well, or it’s not had the latest set of policy deployed to it). You don’t amend the policy on the boxes in question… No! You stand up a new service using your IaC with the latest policy deployed upon it, and then you would test it (to make sure it’s stood up right), and then once you’re happy it’s ready, you transition your service to the new nodes. Once the connections have drained from your old nodes, you take them out and shoot them.

Or, if you want this in pictures…

Stage 1 - Before the new service deployment — Stage 1 – Before the new service deployment

Stage 2 - The new service deployment is built and tested — Stage 2 – The new service deployment is built and tested

Stage 3 - Service transitions to the new service deployment — Stage 3 – Service transitions to the new service deployment

Stage 4 - The old service is demised, and any resources (including licenses) return to the pool — Stage 4 – The old service is demised, and any resources (including licenses) return to the pool

I was advised (by a very enthusiastic Mike until he realised that I intended to follow through with it) that the name for this should be as per the title. So, the next time someone asks me to explain how they could deploy, I’ll suggest they look for the Abattoir in my blog, because, you know, that’s normal, right? :)

Image credit: Image from page 255 of “Breeder and sportsman” (1882) via Internet Archive Book Image on Flickr

Troubleshooting FortiGate API issues with the CLI?

2018-12-102020-03-27 JonTheNiceGuy Leave a comment

One of my colleagues has asked me for some help with an Ansible script he’s writing to push some policy to a cloud hosted FortiGate appliance. Unfortunately, he kept getting some very weird error messages, like this one:

fatal: [localhost]: FAILED! => {"changed": false, "meta": {"build": 200, "error": -651, "http_method": "PUT", "http_status": 500, "mkey": "vip8080", "name": "vip", "path": "firewall", "revision": "36.0.0.10745196634707694665.1544442857", "serial": "CENSORED", "status": "error", "vdom": "root", "version": "v6.0.3"}, "msg": "Error in repo"}

This is using Fortinet’s own Ansible Modules, which, in turn use the fortiosapi python module.

This same colleague came across a post on the Fortinet Developer Network site (access to the site requires vendor approval), which said “this might be an internal bug, but to debug it, use the following”

fgt # diagnose debug enable

fgt # diagnose debug cli 8
Debug messages will be on for 30 minutes.

And then run your API commands. Your error message will be surfaced there… so here’s mine! (Mapped port doesn’t match extport in a vip).

0: config firewall vip
0: edit "vip8080"
0: unset src-filter
0: unset service
0: set extintf "port1"
0: set portforward enable
0: unset srcintf-filter
0: set mappedip "192.0.2.1-192.0.2.1"
0: unset extport
0: set extport 8080-8081
0: unset mappedport
0: set mappedport 8080
-651: end

Late edit 2020-03-27: I spotted a bug in the Ansible issues tracker today, and I added a note to the end of that bug mentioning that as well as diagnose debug cli 8, if that doesn’t give you enough logs to figure out what’s up, you can also try diagnose debug application httpsd -1 but this enables LOTS AND LOTS of logs, so really think twice before turning that one on!

Oh, and if 30 minutes isn’t enough, try diagnose debug duration 480 or however many minutes you think you need. Beware that it will write event logs out to the serial console even when you’ve logged out.

Creating Self Signed certificates in Ansible

2018-11-232019-01-23 JonTheNiceGuy 1 Comment

In my day job, I sometimes need to use a self-signed certificate when building a box. As I love using Ansible, I wanted to make the self-signed certificate piece something that was part of my Ansible workflow.

Here follows a bit of basic code that you could use to work through how the process of creating a self-signed certificate would work. I would strongly recommend using something more production-ready (e.g. LetsEncrypt) when you’re looking to move from “development” to “production” :)

What happens on an IT Penetration Test

2018-11-172018-12-12 JonTheNiceGuy Leave a comment

I recently was asked to describe what happens in a penetration test (pentest), how it’s organised and what happens after the test is completed.

Some caveats first:

While I’ve been involved in escorting penetration testers in controlled areas, and helping to provide environments for the tests to occur in, I’ve not been heavily involved in the set-up of one, so some of the details in that area are likely to be a bit fuzzy.
I’m not involved in procurement in any way, so I can’t endorse or discredit any particular testing organisations.
This is a personal viewpoint and doesn’t represent a professional recommendation of how a penetration test should or typically does occur.

So, what actually happens?…

Before the pentest begins, a testing firm would be sourced and the “Terms of Engagement” (TOE), or perhaps a list of requirements would be defined. This might result in a list of tasks that are expected to be performed, and some idea on resources required. It would also be the point where the initiator (the organisation getting the test performed) defines what is “In Scope” (is available to be tested) and what is “Out Of Scope” (and must not be tested).

Some of the usual tasks are:

Internet Scan (the testers will have a testing appliance that will scan all IPs provided to them, and all open ports on those IPs, looking for vulnerable services, servers and responding applications).
~~Black Box~~ [See note below] Red Team test (the testers are probing the network as though they were malicious outsiders with no knowledge of the system, similar to the sorts of things you might see in hacker movies – go through discovered documentation, socially engineer access to environments, run testing applications like NMAP and Metasploit, and generally see what stuff appears to be publicly accessible, and from there see how the environment appears once you have a foothold).
White Box test (the testers have access to internal documentation and/or source code about the environment they’re testing, and thus can run customised and specific tests against the target environment).
Configuration Analysis (access to servers, source code, firewall policies or network topology documentation intending to check where flaws may have been introduced).
Social Engineering test (see how amenable staff, customers and suppliers are to providing access to the target environment for your testing team).
Physical access test (prove whether the testing team can gain physical access to elements of your target, e.g. servers, documentation, management stations, signing keys, etc).
Some testing firms will also stress test any Denial Of Service Mitigations you may have in-place, but these must be carefully negotiated first with your bandwidth providers, their peering firms and so on, as they will more-than-likely disrupt more than just your services! DO NOT ENGAGE A DOS TEST LIGHTLY!

Once the Terms have been agreed and the duration of these tests have been ironed out (some tests could go on indefinitely, but you wouldn’t *really* want to pay the bills for an indefinite Black Box test, for example!), a project plan is usually defined showing these stages. Depending on the complexity of your environment, I might expect a reasonable duration for a small estate being approximately a day or two for each test. In a larger estate, particularly where little-to-no automation has been implemented, you may find (for example) a thorough Configuration Analysis of your server configurations taking weeks or even months.

Depending on how true-to-life the test “should” be, you may have the Physical Security assessment and Social Engineering tests be considered part of the Black Box test, or perhaps you may purposefully provide some entry point for the testing team to reduce the entry time. Most of the Black Box tests I’ve been involved in supporting have started from giving the testers access to a point inside your trusted network (e.g. a server which has been built for the purpose of giving access to testers or a VPN entry point with a “lax” firewall policy). Others will provide a “standard” asset (e.g. laptop) and user credential to the testing firm. Finally, some environments will put the testing firm through “recruitment” and put them in-situ in the firm for a week or two to bed them in before the testing starts… this is pretty extreme however!!

The Black Box test will typically be run before any others (except perhaps the Social Engineering and Physical Access tests) and without the knowledge of the “normal” administrators. This will also test the responses of the “Blue Team” (the system administrators and any security operations centre teams) to see whether they would notice an in-progress, working attack by a “Red Team” (the attackers).

After the Black Box test is completed, the “Blue Team” may be notified that there was a pentest, and then (providing it is being run) the testing organisation will start a White Box test will be given open access to the tested environment.

The Configuration Check will normally start with hands-on time with members of the “Blue Team” to see configuration and settings, and will compare these settings against known best practices. If there is an application being tested where source code is available to the testers, then they may check the source code against programming bad practices.

Once these tests are performed, the testing organisation will write a report documenting the state of the environment and rate criticality of the flaws against current recommendations.

The report would be submitted to the organisation who requested the test, and then the *real* fun begins – either fixing the flaws, or finger pointing at who let the flaws occur… Oh, and then scheduling the next pentest :)

I hope this has helped people who may be wondering what happens during a pentest!

Just to note – If you want to know more about pentests, and how they work in the real world, check out the podcast “Darknet Diaries“, and in particular episode 6 – “The Beirut Bank Job”. To get an idea of what the pentest is supposed to simulate, (although it’s a fictional series) “Mr Robot” (<- Amazon affiliate link) is very close to how I would imagine a sequence of real-world “Red Team” attacks might look like, and experts seem to agree!

Image Credit: “hacker” by Dani Latorre licensed under a Creative Commons, By-Attribution, Share-Alike license.

Additional note; 2018-12-12: Following me posting this to the #security channel on the McrTech Slack group, one of the other members of that group (Jay Harris from Digital Interruption) mentioned that I’d conflated a black box test and a red team test. A black box test is like a white box test, but with no documentation or access to the implementing team. It’s much slower than a white box test, because you don’t have access to the people to ask “why did you do this” or “what threat are you trying to mitigate here”. He’s right. This post was based on a previous experience I had with a red team test, but that they’d referred to it as a black box test, because that’s what the engagement started out as. Well spotted Jay!

He also suggested reading his post in a similar vein.

TCPDump Made Easier Parody Book Cover, with the subtitle "Who actually understands all those switches?"

One to use: tcpdump101.com

2018-08-312018-08-31 JonTheNiceGuy Leave a comment

I’m sure that anyone doing operational work has been asked at some point if you can run a “TCPDump” on something, or if you could get a “packet capture” – if you have, this tool (as spotted on the Check Point community sites) might help you!

https://tcpdump101.com

Using simple drop-down fields for filters and options and using simple prompts, this tool tells you how to run each of the packet capturing commands for common firewall products (FortiGate, ASA, Check Point) and the more generic tcpdump tool (indicated by a Linux Penguin, but it runs on all major desktop and server OSs, as well as rooted Android devices).

Well worth a check out!

Share this:

Allowing established, related and same-host traffic

Dropping Noisy Traffic

Permitting Management Traffic

Allowing Control Traffic

Protect the Firewall

“Normal” Inbound Traffic

“Normal” Outbound Traffic

Drop Rule

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

So, what actually happens?…

Share this:

Share this: