Tips – Page 5 – A nice guy's view on life

apt update && apt full-upgrade -y && apt autoremove -y && apt autoclean -y

Apt Updates with Ansible

2020-01-062020-01-06 JonTheNiceGuy 1 Comment

I’ve got a small Ansible script that I bundle up on Ubuntu boxes to do apt updates. This was originally a one-statement job, but I’ve added a few lines to it, so I thought I’d explain what I’m doing (more for myself, for later!)

Initally, I just had a task to do apt: upgrade=full update_cache=yes autoremove=yes autoclean-yes but if you’re running the script over and over again, well, this gets slow… So I added a tweak!

Here it is folks, in all it’s glory!

- hosts: all
  tasks:
  - name: Get stat of last run apt
    stat:
      path: /var/cache/apt/pkgcache.bin
    register: apt_run

  - name: "Apt update, Full-upgrade, autoremove, autoclean check"
    debug:
      msg: "Skipping apt-update, etc. actions as apt update was run today"
    when: "'%Y-%m-%d' | strftime(apt_run.stat.mtime) in ansible_date_time.date"

  - name: "Apt update, Full-upgrade, autoremove, autoclean"
    apt:
      upgrade: full
      update_cache: yes
      autoremove: yes
      autoclean: yes
    when: "'%Y-%m-%d' | strftime(apt_run.stat.mtime) not in ansible_date_time.date"

What does this do? Well, according to this AskUbuntu post, the best file to check if an update has been performed is /var/cache/apt/pkgcache.bin, so we check the status of that file. Most file systems available to Linux distributions provide the mtime – or “last modified time”. This is returned in the number of seconds since UTC 00:00:00 on the Unix Epoch (1970-01-01), so we need to convert that to a date., which we return as YYYY-MM-DD (e.g. today is 2020-01-06) and then compare that to what the system thinks today is. If the dates don’t equate (so one string doesn’t match the other – in other words, apt update wasn’t run today), it runs the update. If the dates do match up, we get a statement saying that apt update was already run.

Fun times!

"Infinite Mirror" by "Luca Sartoni" on Flickr

My Blogging Workflow

2019-12-132021-01-08 JonTheNiceGuy Leave a comment

Hello bold traveller! I recently got into a discussion with someone about how I put content into my blog and I realised that people don’t write this stuff down enough!

At the end of this post, I’d expect you to be able to use this information to be able to create your own, similar stack. If you don’t have the technical knowledge to be able to do so, you should at least have the pieces of information to ask someone else to help you build it. While I’d be flattered if you asked me for similar advice, I’m afraid I have limited time for new projects, even consultancy ones, and all I can do is point you back to this blog!

That said, consider this a bit of the “making the sausage” that most people don’t tend to share! 😀

“Swatch Water Store, Grand Central Station, NYC, 9/2016, pics by Mike Mozart of TheToyChannel and JeepersMedia on YouTube #Swatch #Watch” by “Mike Mozart” on Flickr

Time Based Security

2019-08-212019-08-21 JonTheNiceGuy Leave a comment

I came across the concept of “Time Based Security” (TBS) in the Sysadministrivia podcast, S4E13.

I’m still digging into the details of it, but in essence, the “Armadillo” (Crunchy on the outside, soft on the inside) protection model is broken (sometimes known as the “Fortress Model”). You assume that your impenetrable network boundary will prevent attackers from getting to your sensitive data. While this may stop them for a while, what you’re actually seeing here is one part of a complex protection system, however many organisations miss the fact that this is just one part.

The examples used in the only online content I’ve found about this refer to a burglary.

In this context, your “Protection” (P) is measured in time. Perhaps you have hardened glass that takes 20 seconds to break.

Next, we evaluate “Detection” (D) which is also, surprisingly enough, measured in time. As the glass is hit, it triggers an alarm to a security facility. That takes 20 seconds to respond and goes to a dispatch centre, another 20 seconds for that to be answered and a police officer dispatched.

The police officer being dispatched is the “Response” (R). The police take (optimistically) 2 minutes to arrive (it was written in the 90’s so the police forces weren’t decimated then).

So, in the TBS system, we say that Detection (D) of 40 seconds plus Response (R) of 120 seconds = 160 seconds. This is greater than Protection (P) of 20 seconds, so we have an Exposure (E) time of 140 seconds E = P – (D + R). The question that is posed is, how much damage can be done in E?

So, compare this to your average pre-automation SOC. Your firewall, SIEM (Security Incident Event Management system), IDS (Intrusion Detection System) or WAF (Web Application Firewall) triggers an alarm. Someone is trying to do something (e.g. Denial Of Service attack, password spraying or port scanning for vulnerable services) a system you’re responsible for. While D might be in the tiny fractions of a minute (perhaps let’s say 1 minute, for maths sake), R is likely to be minutes or even hours, depending on the refresh rate of the ticket management system or alarm system (again, for maths sake, let’s say 60 minutes). So, D+R is now 61 minutes. How long is P really going to hold? Could it be less than 30 minutes against a determined attacker? (Let’s assume P is 30 minutes for maths sake).

Let’s do the calculation for a pre-automation SOC (Security Operations Centre). P-(D+R)=E. E here is 31 minutes. How much damage can an attacker do in 31 minutes? Could they put a backdoor into your system? Can they download sensitive data to a remote system? Could they pivot to your monitoring system, and remove the logs that said they were in there?

If you consider how much smaller the D and R numbers become with an event driven SOAR (Security Orchestration and Automation Response) system – does that improve your P and E numbers? Consider that if you can get E to 0, this could be considered to be “A Secure Environment”.

Also, consider the fact that many of the tools we implement for security reduce D and R, but if you’re not monitoring the outputs of the Detection components, then your response time grows significantly. If your Detection component is misconfigured in that it’s producing too many False Positives (for example, “The Boy Who Cried Wolf“), so you don’t see the real incident, then your Response might only be when a security service notifies you that your data, your service or your money has been exposed and lost. And that wouldn’t be good now… Time to look into automation 😁

Featured image is “Swatch Water Store, Grand Central Station, NYC, 9/2016, pics by Mike Mozart of TheToyChannel and JeepersMedia on YouTube #Swatch #Watch” by “Mike Mozart” on Flickr and is released under a CC-BY license.

"Hacker" by "The Preiser Project" on Flickr

What to do when a website you’ve trusted gets hacked?

2019-07-192019-07-19 JonTheNiceGuy Leave a comment

Hello! Maybe you just got a sneeking suspicion that a website you trusted isn’t behaving right, perhaps someone told you that “unusual content” is being posted in your name somewhere, or, if you’re really lucky, you might have just had an email from a website like “HaveIBeenPwned.com” or “Firefox Monitor“. It might look something like this:

An example of an email from the service “Have I Been Pwned”

Of course, it doesn’t feel like you’re lucky! It’s OK. These things happen quite a lot of the time, and you’re not the only one in this boat!

How bad is it, Doc?

First of all, don’t panic! Get some idea of the scale of problem this is by looking at a few key things.

How recent was the breach? Give this a score between 1 (right now) and 10 (more than 1 month ago).
How many websites and services do you use this account on? Give this a score between 1 (right now) and 10 (OMG, this is *my* password, and I use it everywhere).
How many other services would use this account to authenticate to, or get a password reset from? Give this a score between 1 (nope, it’s just this website. We’re good) and 10 (It’s my email account, and everything I’ve ever signed up to uses this account as the login address… or it’s Facebook/Google and I use their authentication to login to everything else).
How much does your reputation hang on this website or any other websites that someone reusing the credentials of this account would get access to? Give this a score between 1 (meh, I post cat pictures from an anonymous username) and 10 (I’m an INFLUENCER HERE dagnamit! I get money because I said stuff here and/or my job is on that website, or I am publicly connected to my employer by virtue of that profile).
(Optional) If this is from a breach notification, does it say that it’s just email addresses (score 1), or that it includes passwords (score 5), unencrypted or plaintext passwords (score 8) or full credit card details (score 10)?

Once you’ve got an idea of scale (4 to 40 or 5 to 50, depending on whether you used that last question), you’ve got an idea of how potentially bad it is.

Take action!

Make a list of the websites you think that you need to change this password on.

Start with email accounts (GMail, Hotmail, Outlook, Yahoo, AOL and so on) – each email account that uses the same password needs to be changed, and this is because almost every website uses your email address to make a “password” change on it! (e.g. “Forgot your password, just type in your email address here, and we’ll send you a reset link“).
Prominent social media profiles (e.g. Facebook, Twitter, Instagram) come next, even if they’re not linked to your persona. This is where your potential reputation damage comes from!
Next up is *this* website, the one you got the breach notification for. After all, you know this password is “wild” now!

Change some passwords

This is a bit of a bind, but I’d REALLY recommend making a fresh password for each of those sites. There are several options for doing this, but my preferred option is to use a password manager. If you’re not very tech savvy, consider using the service Lastpass. If you’re tech savvy, and understand how to keep files in sync across multiple devices, you might be interested in using KeePassXC (my personal preference) or BitWarden instead.

No really. A fresh password. Per site. Honest. And not just “MyComplexPassw0rd-Hotmail” because there are ways of spotting you’ve done something like that, and when they come to your facebook account, they’ll try “MyComplexPassw0rd-Facebook” just to see if it gets them in.

ℹ️ Using a password manager gives you a unique, per-account password. I just generated a fresh one (for a dummy website), and it was 2-K$F+j7#Zz8b$A^]qj. But, fortunately, I don’t have to remember it. I can just copy and paste it in to the form when I need to change it, or perhaps, if you have a browser add-on, that’ll fill it in for you.

Making a list, and checking it twice!

Fab, so you’ve now got a lovely list of unique passwords. A bit like Santa, it’s time to check your list again. Assume that your list of sites you just changed passwords for were all compromised, because someone knew that password… I know, it’s a scary thought! So, have a look at all those websites you just changed the password on and figure out what they have links to, then you’ll probably make your list of things you need to change a bit bigger.

Not sure what they have links to?

Well, perhaps you’re looking at an email account… have a look through the emails you’ve received in the last month, three months or year and see how many of those come from “something” unique. Perhaps you signed up to a shopping site with that email address? It’s probably worth getting a password reset for that site done.

Perhaps you’re looking at a social media site that lets you login to other services? Check through those other services and make sure that “someone” hasn’t allowed access to a website they control. After all, you did lose access to that website, and so you don’t know what it’s connected to.

Also, check all of these sites, and make sure there aren’t any unexpected “active sessions” (where someone else is logged into your account still). If you have got any, kick them out :)

OK, so the horse bolted, now close the gate!

Once you’ve sorted out all of these passwords, it’s probably worth looking at improving your security in general. To do this, we need to think about how people get access to your account. As I wrote in my “What to do when your Facebook account gets hacked?” post:

What if you accidentally gave your password to someone? Or if you went to a website that wasn’t actually the right page and put your password in there by mistake? Falling prey to this when it’s done on purpose is known as social engineering or phishing, and means that someone else has your password to get into your account.
A quote from the “Second Factor” part of “What to do when your Facebook account gets hacked?“

The easiest way of locking this down is to use a “Second Factor” (sometimes abbreviated to 2FA). You need to give your password (“something you know”) to log into the website. Now you also need something separate, that isn’t in the same store. If this were a physical token (like a SoloKey, Yubikey, or a RSA SecurID token), it’d be “something you have” (after all, you need to carry around that “token” with you), but normally these days it’s something on your phone.

Some places will send you a text message, others will pop up an “approve login” screen (and, I should note, if you get one and YOU AREN’T LOGGING IN, don’t press “approve”!), or you might have a separate app (perhaps called “Google Authenticator”, “Authy” or something like “Duo Security”) that has a number that keeps changing.

You should then finish your login with a code from that app, SMS or token or reacting to that screen or perhaps even pressing a button on a thing you plug into your computer. If you want to know how to set this up, take a look at “TwoFactorAuth.org“, a website providing access to the documentation on setting up 2FA on many of the websites you currently use… but especially do this with your email accounts.

Featured image is “Hacker” by “The Preiser Project” on Flickr and is released under a CC-BY license.

How to quickly get the next SemVer for your app

2019-06-282023-09-06 JonTheNiceGuy 2 Comments

SemVer, short for Semantic Versioning is an easy way of numbering your software versions. They follow the model Major.Minor.Patch, like this 0.9.1 and has a very opinionated view on what is considered a Major “version bump” and what isn’t.

Sometimes, when writing a library, it’s easy to forget what version you’re on. Perhaps you have a feature change you’re working on, but also bug fixes to two or three previous versions you need to keep an eye on? How about an easy way of figuring out what that next bump should be?

In a recent conversation on the McrTech slack, Steven [0] mentioned he had a simple bash script for incrementing his SemVer numbers, and posted it over. Naturally, I tweaked it to work more easily for my usecases so, this is *mostly* Steven’s code, but with a bit of a wrapper before and after by me :)

#! /bin/bash RE='[^0-9]*$[0-9]*$[.]$[0-9]*$[.]$[0-9]*$$[0-9A-Za-z-]*$' step="$1" if [ -z "$1" ] then step=patch fi base="$2" if [ -z "$2" ] then base=$(git tag 2>/dev/null| tail -n 1) if [ -z "$base" ] then base=0.0.0 fi fi MAJOR=`echo $base | sed -e "s#$RE#\1#"` MINOR=`echo $base | sed -e "s#$RE#\2#"` PATCH=`echo $base | sed -e "s#$RE#\3#"` case "$step" in major) let MAJOR+=1 let MINOR=0 let PATCH=0 ;; minor) let MINOR+=1 let PATCH=0 ;; patch) let PATCH+=1 ;; esac echo "$MAJOR.$MINOR.$PATCH"

Late Edit: 2022-11-19 ictus4u spotted that I wasn’t handling the reset of PATCH to 0 when MINOR gets a bump. I fixed this in the above gist.

So how do you use this? Dead simple, use nextver in a tree that has an existing git tag SemVer to get the next patch number. If you want to bump it to the next minor or major version, try nextver minor or nextver major. If you don’t have a git tag, and don’t specify a SemVer number, then it’ll just assume you’re starting from fresh, and return 0.0.1 :)

Want to find more cool stuff from the original author of this work? Here is a video by the author :)

Featured image is “Feb 11” by “Gordon” on Flickr and is released under a CC-BY-SA license.

"funfair action" by "Jon Bunting" on Flickr

Improving the speed of Azure deployments in Ansible with Async

2019-05-152019-05-15 JonTheNiceGuy Leave a comment

Recently I was building a few environments in Azure using Ansible, and found this stanza which helped me to speed things up.

  - name: "Schedule UDR Creation"
    azure_rm_routetable:
      resource_group: "{{ resource_group }}"
      name: "{{ item.key }}_udr"
    loop: "{{ routetables | dict2items }}"
    loop_control:
        label: "{{ item.key }}_udr"
    async: 1000
    poll: 0
    changed_when: False
    register: sleeper

  - name: "Check UDRs Created"
    async_status:
      jid: "{{ item.ansible_job_id }}"
    register: sleeper_status
    until: sleeper_status.finished
    retries: 500
    delay: 4
    loop: "{{ sleeper.results|flatten(levels=1) }}"
    when: item.ansible_job_id is defined
    loop_control:
      label: "{{ item._ansible_item_label }}"

What we do here is to start an action with an “async” time (to give the Schedule an opportunity to register itself) and a “poll” time of 0 (to prevent the Schedule from waiting to be finished). We then tell it that it’s “never changed” (changed_when: False) because otherwise it always shows as changed, and to register the scheduled item itself as a “sleeper”.

After all the async jobs get queued, we then check the status of all the scheduled items with the async_status module, passing it the registered job ID. This lets me spin up a lot more items in parallel, and then “just” confirm afterwards that they’ve been run properly.

It’s not perfect, and it can make for rather messy code. But, it does work, and it’s well worth giving it the once over, particularly if you’ve got some slow-to-run tasks in your playbook!

Featured image is “funfair action” by “Jon Bunting” on Flickr and is released under a CC-BY license.

A web browser with the example.com web page loaded

Working around the fact that Ansible’s URI module doesn’t honour the no_proxy variable…

2019-05-142019-05-14 JonTheNiceGuy Leave a comment

An Ansible project I’ve been working on has tripped me up this week. I’m working with some HTTP APIs and I need to check early whether I can reach the host. To do this, I used a simple Ansible Core Module which lets you call an HTTP URI.

- uri:
    follow_redirects: none
    validate_certs: False
    timeout: 5
    url: "http{% if ansible_https | default(True) %}s{% endif %}://{{ ansible_host }}/login"
  register: uri_data
  failed_when: False
  changed_when: False

This all seems pretty simple. One of the environments I’m working in uses the following values in their environment:

http_proxy="http://192.0.2.1:8080"
https_proxy="http://192.0.2.1:8080"
no_proxy="10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16, 192.0.2.0/24, 198.51.100.0/24, 203.0.113.0/24"

And this breaks the uri module, because it tries to punt everything through the proxy if the “no_proxy” contains CIDR values (like 192.0.2.0/24) (there’s a bug raised for this)… So here’s my fix!

- set_fact:
    no_proxy_match: |
      {
        {% for no_proxy in (lookup('env', 'no_proxy') | replace(',', '') ).split() %}
          {% if no_proxy| ipaddr | type_debug != 'NoneType' %}
            {% if ansible_host | ipaddr(no_proxy) | type_debug != 'NoneType' %}
              "match": "True"
            {% endif %}
          {% endif %}
        {% endfor %}
      }

- uri:
    follow_redirects: none
    validate_certs: False
    timeout: 5
    url: "http{% if ansible_https | default(True) %}s{% endif %}://{{ ansible_host }}/login"
  register: uri_data
  failed_when: False
  changed_when: False
  environment: "{ {% if no_proxy_match.match | default(False) %}'no_proxy': '{{ ansible_host }}'{% endif %} }"

So, let’s break this down.

The key part to this script is that we need to override the no_proxy environment variable with the IP address that we’re trying to address (so that we’re not putting 16M addresses for 10.0.0.0/8 into no_proxy, for example). To do that, we use the exact same URI block, except for the environment line at the end.

In turn, the set_fact block steps through the no_proxy values, looking for IP Addresses to check ({% if no_proxy | ipaddr ... %}‌ says “if the no_proxy value is an IP Address, return it, but if it isn’t, return a ‘None’ value”) and if it’s an IP address or subnet mask, it checks to see whether the IP address of the host you’re trying to reach falls inside that IP Address or Subnet Mask ({% if ansible_host | ipaddr(no_proxy) ... %} says “if the ansible_host address falls inside the no_proxy range, then return it, otherwise return a ‘None’ value”). Both of these checks say “If this previous check returns anything other than a ‘None’ value, do the next thing”, and on the last check, the “next” thing is to set the flag ‘match’ to ‘true’. When we get to the environment variable, we say “if match is not true, it’s false, so don’t put a value in there”.

So that’s that! Yes, I could merge the set_fact block into the environment variable, but I do end up using that a fair amount. And really, if it was merged, that would be even MORE complicated to pick through.

I have raised a pull request on the Ansible project to update the documentation, so we’ll see whether we end up with people over here looking for ways around this issue. If so, let me know in the comments below! Thanks!!

"Wifi Here on a Blackboard" by "Jem Stone" on Flickr

Free Wi-Fi does not need to be password-less!

2019-03-282019-03-28 JonTheNiceGuy 2 Comments

Recently a friend of mine forwarded an email to me about a Wi-fi service he wanted to use from a firm, but he raised some technical questions with them which they seemed to completely misunderstand!

So, let’s talk about the misconceptions of Wi-fi passwords.

Many people assume that when you log into a system, it means that system is secure. For example, logging into a website makes sure that your data is secure and protected, right? Not necessarily – the password you entered could be on a web page that is not secured by TLS, or perhaps the web server doesn’t properly transfer it’s contents to a database. Maybe the website was badly written, and means it’s vulnerable to one of a handful of common attacks (with fun names like “Cross Site Scripting” or “SQL Injection Attacks”)…

People also assume the same thing about Wi-fi. You reached a log in page, so it must be secure, right? It depends. If you didn’t put in a password to access the Wi-fi in the first place (like in the image of the Windows 10 screen, or on my KDE Desktop) then you’re probably using Unsecured Wi-fi.

An example of a secured Wi-fi sign-in box on Windows 10

People like to compare network traffic to “sending things through the post”, notablycomparing E-Mail to “sending a postcard”, versus PGP encrypted E-Mail being compared to “sending a sealed letter”. Unencrypted Wi-fi is like using CB. Anyone who can hear your signal can understand what you are saying… but if you visit a website which uses HTTPS, then it’s like listening to someone saying random numbers over the radio.

And, if you’re using Unencrypted Wi-fi, it’s also possible for an attacker to see what website you visited, because the request for the address to reach on the Internet (e.g. “Google.com” = 172.217.23.14) is sent in the clear. Also because of the way that DNS works (that name to address matching thing) means that if someone knows you’re visiting a “site of interest” (like, perhaps a bank website), they can reply *before* the real DNS server, and tell you that the server on their machine is actually your bank’s website.

So, many of these things can be protected against by using a simple method, that many people who provide Wi-fi don’t do.

Turn on WPA2 (the authentication bit). Even if *everyone* uses the same password (which they’d have to for WPA2), the fact you’re logging into the Access Point means it creates a unique shared secret for your session.

“But hang on”, I hear the guy at the back cry, “you used the same password – how does that work?”

OK, so this is where the fun stuff starts. The password is just part of how you negotiate to get on to the network. There’s a complex beast of a method that explains how get a shared unique secret when you’re passing stuff around “in the clear”, and so as a result, when you first connect to that Wi-fi access point, and you hand over your password, it “Authorises” you on to the network, but then hands you over to the encryption part, where you generate a key and then use that to talk to each other. The encryption is the bit like “HTTPS”, where you make it so that people can’t see what you’re looking at.

“I got told that if everyone used the same password” said a hipster in the front row, “I wouldn’t be able to tell them apart.” Aha, not true. You can have a separate passphrase to access the Wi-fi from the Login page, after all, you’ve got to make sure that people aren’t breaking the rules (which they *TOTALLY* read, before clicking “I agree, just get me on the damn Wi-fi already”) by using your network.

“OK”, says the lady over on the right, “but when I connected to the Wi-fi, they asked me to log in using Facebook – that’s secure, right?”

Um, no. Well, maybe. See, if they gave you a WPA2 password to log into the Wi-fi, and then the first thing you got to was that login screen, then yep, it’s all good! {*} You can browse with (relative) impunity. But if they didn’t… well, not only are they asking you to shout your secrets on the radio, but if you’re really unlucky, the page asking you to log into Facebook might *also* not actually be Facebook, but another website that just looks like Facebook… after all, I’m sure that page you went to complained that it wasn’t Google or Facebook when you tried to open it…

{*} Except for the fact they’re asking you to tell them not only who you are, but who you’re also friends with, where you went to school, what your hobbies are, what groups you’re in, your date of birth and so on.

But anyway. I understand why those login screens are there. They’re asserting that not only do you understand that you mustn’t use their network for bad things, but that if the police come and ask them who used their network to do something naughty, they can say “He said his name was ‘Bob Smith’ and his email address was ‘bob@example.com’, Officer”…

It also means that the “free” service they provide to you, usually at some great expense (*eye roll*) can get them some return on investment (like, they just got your totally-real-and-not-at-all-made-up-email-address… honest, and they also know what websites you visited while you were there, which they can sell on).

So… What to do the next time you “need” Wi-fi, and there’s a free service there? Always use a VPN when you’re not using a network you trust. If the Wi-fi isn’t using WPA2 encryption (even something as simple as “Buy a drink first” is a great passphrase to use!) point them to this page, and tell them it’s virtually pain free (as long as the passphrase is easy to remember, easy to type and doesn’t have too many weird symbols in) and makes their service more safe and secure for their customers…

Featured image is “Wifi Here on a Blackboard” by “Jem Stone” on Flickr and is released under a CC-BY license.

"LEGO Factory Playset" from Brickset on Flickr

Building Azure Environments in Ansible

2018-12-212018-12-21 JonTheNiceGuy Leave a comment

Recently, I’ve been migrating my POV (proof of value) and POC (proof of concept) environment from K5 to Azure to be able to test vendor products inside Azure. I ran a few tests to build the environment using the native tools (the powershell scripts) and found that the Powershell way of delivering Azure environments seems overly complicated… particularly as I’m comfortable with how Ansible works.

To be fair, I also need to look at Terraform, but that isn’t what I’m looking at today :)

So, let’s start with the scaffolding. Any Ansible Playbook which deals with creating virtual machines needs to have some extra modules installed. Make sure you’ve got ansible 2.7 or later and the python azure library 2.0.0 or later (you can get both with pip for python).

Next, let’s look at the group_vars for this playbook.

--- project_prefix: env01 project_breakglass: "My$uper$ecretPassw0rd" project_location: uksouth image: ubuntu1804: offer: UbuntuServer publisher: Canonical sku: '18.04-LTS' version: latest provision_script: userdata/ubuntu.j2 fortigate603_payg: offer: fortinet_fortigate-vm_v5 publisher: fortinet sku: 'fortinet_fg-vm_payg' version: 6.0.3 plan: yes provision_script: userdata/fortigate.j2 network: wan: name: wan prefix: 10.10.10 protected: name: protected prefix: 10.10.20 gateway: 10.10.20.4 appliances: fw01: name: fw01 image: "{{ image.fortigate603_payg }}" size: Standard_F1 priority: 1 ports: - subnet: "{{ network.wan }}" - subnet: "{{ network.protected }}" private_ip: "{{ network.wan.prefix }}.4" public: false default_allow_policy: - source: port2 destination: port1 nat: yes vm01: name: vm01 image: "{{ image.ubuntu1804 }}" size: Standard_B1s priority: 2 ports: - subnet: "{{ network.protected }}" private_ip: "{{ network.protected.prefix }}.5" public: false vm02: name: vm02 image: "{{ image.ubuntu1804 }}" size: Standard_B1s priority: 0 ports: - subnet: "{{ network.wan }}" private_ip: "{{ network.wan.prefix }}.250" public: false

This file has several pieces. We define the project settings (anything prefixed project_ is a project setting), including the prefix used for all resources we create (in this case “env01“), and a standard password used for all VMs we create (in this case “My$uper$ecret$Passw0rd“).

Next we define the standard images to load from the Marketplace. You can extend this with other images, these are just the “easiest” ones that I’m most familiar with (your mileage may vary). Next up is the networks to build inside the VNet, and lastly we define the actual machines we want to build. If you’ve got questions about any of the values we define here, just let me know in the comments below :)

Next, we’ll start looking at the playbook (this has been exploded out – the full playbook is also in the gist).

vars: location: "{{ project_location|default('uksouth') }}" prefix: | {%- if lookup('env', 'ANSIBLE_PREFIX') != '' -%} {{ lookup('env', 'ANSIBLE_PREFIX') }} {%- else -%} {{ project_prefix|default(ansible_date_time.epoch|hash('md5')|truncate(8, true, '')) }} {%- endif %}" breakglass_pw: | {%- if lookup('env', 'BREAKGLASS') != '' -%} {{ lookup('env', 'BREAKGLASS') }} {%- else -%} {{ project_breakglass|default(ansible_date_time.epoch|hash('md5')|truncate(8, true, '') + '_Azure') }} {%- endif -%} tasks: - debug: var=prefix - debug: var=breakglass_pw

Here we start by pulling in the variables we might want to override, and we do this by reading system environment variables (ANSIBLE_PREFIX and BREAKGLASS) and using them if they’re set. If they’re not, use the project defaults, and if that hasn’t been set, use some pre-defined values… and then tell us what they are when we’re running the tasks (those are the debug: lines).

- name: "Create Resource Group" azure_rm_resourcegroup: name: "{{ prefix }}" location: "{{ location }}" # Yes, this is not good practice, but it's just for a POV/POC - name: "Create Any-Any Allow Security Group" azure_rm_securitygroup: resource_group: "{{ prefix }}" name: "{{ prefix }}AnyAnyAllow" rules: - name: AllowAllInbound protocol: "*" destination_port_range: "*" access: Allow priority: 100 direction: Inbound - name: AllowAllOutbound protocol: "*" destination_port_range: "*" access: Allow priority: 100 direction: Outbound - name: "Create Storage Account" azure_rm_storageaccount: resource_group: "{{ prefix }}" name: "{{ prefix }}storage" account_type: Standard_LRS - name: "Create VNET" azure_rm_virtualnetwork: resource_group: "{{ prefix }}" name: "{{ prefix }}network01" address_prefixes_cidr: "10.0.0.0/8"

This block is where we create our “Static Assets” – individual items that we will be consuming later. This shows a clear win here over the Powershell methods endorsed by Microsoft – here you can create a Resource Group (RG) as part of the playbook! We also create a single Storage Account for this RG and a single VNET too.

These creation rules are not suitable for production use, as this defines an “Any-Any” Security group! You should tailor your security groups for your need, not for blanket access in!

- name: "Schedule UDR Creation" azure_rm_routetable: resource_group: "{{ prefix }}" name: "{{ prefix }}{{ item.value.name }}_udr" with_dict: "{{ network }}" loop_control: label: "{{ prefix }}{{ item.value.name }}" when: item.value.gateway is defined async: 1000 poll: 0 register: sleeper - name: "Check UDRs Created" async_status: jid: "{{ item.ansible_job_id }}" register: sleeper_status until: sleeper_status.finished retries: 500 delay: 4 with_items: "{{ sleeper.results }}" when: item.ansible_job_id is defined loop_control: label: "{{ item._ansible_item_label }}"

This is where things start to get a bit more interesting – We’re using the “async/async_status” pattern here (and the rest of these sections) to start creating the resources in parallel. As far as I can tell, sometimes you’ll get a case where the async doesn’t quite get set up fast enough, then the async_status can’t track the resources properly, but re-running the playbook should be enough to sort that out, without slowing things down too much.

But what are we actually doing with this block of code? A UDR is a “User Defined Route” or routing table for Azure. Effectively, you treat each network interface as being plumbed directly to the router (none of this “same subnet broadcast” stuff works here!) so you can do routing at the router for all the networks.

By default there are some existing network routes (stuff to the internet flows to the internet, RFC1918 addresses are dropped with the exception of any RFC1918 addresses you have covered in your VNETs, and each of your subnets can reach each other “directly”). Adding a UDR overrides this routing table. The UDRs we’re creating here are applied at a subnet level, but currently don’t override any of the existing routes (they’re blank). We’ll start putting routes in after we’ve added the UDRs to the subnets. Talking of which….

- name: "Schedule Subnet Creation" azure_rm_subnet: resource_group: "{{ prefix }}" name: "{{ prefix }}{{ item.value.name }}" address_prefix: "{{ item.value.prefix }}.0/24" virtual_network: "{{ prefix }}network01" route_table: | {%- if item.value.gateway is defined -%} {{ prefix }}{{ item.value.name }}_udr {%- else -%} {{ omit }}{%- endif -%} with_dict: "{{ network }}" loop_control: label: "{{ prefix }}{{ item.value.name }}" async: 1000 poll: 0 register: sleeper - name: "Check Subnet Created" async_status: jid: "{{ item.ansible_job_id }}" register: sleeper_status until: sleeper_status.finished retries: 500 delay: 4 with_items: "{{ sleeper.results }}" when: item.ansible_job_id is defined loop_control: label: "{{ item._ansible_item_label }}"

Again, this block is not really suitable for production use, and assumes the VNET supernet of /8 will be broken down into several /24’s. In the “real world” you might deliver a handful of /26’s in a /24 VNET… or you might even have lots of disparate /24’s in the VNET which are then allocated exactly as individual /24 subnets… this is not what this model delivers but you might wish to investigate further!

- name: "Schedule Route Creation" azure_rm_route: resource_group: "{{ prefix }}" name: "{{ prefix }}{{ item.value.name }}" address_prefix: "0.0.0.0/0" route_table_name: "{{ prefix }}{{ item.value.name }}_udr" next_hop_type: virtual_appliance next_hop_ip_address: "{{ item.value.gateway }}" with_dict: "{{ network }}" loop_control: label: "{{ prefix }}{{ item.value.name }}" when: item.value.gateway is defined async: 1000 poll: 0 register: sleeper - name: "Check Routes Created" async_status: jid: "{{ item.ansible_job_id }}" register: sleeper_status until: sleeper_status.finished retries: 500 delay: 4 with_items: "{{ sleeper.results }}" when: item.ansible_job_id is defined loop_control: label: "{{ item._ansible_item_label }}"

Now that we’ve created our subnets, we can start adding the routing table to the UDR. This is a basic one – add a 0.0.0.0/0 route (internet access) from the “protected” network via the firewall. You can get a lot more specific than this – most people are likely to want to add the VNET range (in this case 10.0.0.0/8) via the firewall as well, except for this subnet (because otherwise, for example, 10.0.0.100 trying to reach 10.0.0.101 will go via the firewall too).

Without going too much into the intricacies of network architecture, if you are routing your traffic between subnets to the firewall, it’s probably better to get an appliance with more interfaces, so you can route traffic across the appliance, rather than going across a single interface as this will halve your traffic bandwidth (it’s currently capped 1Gb/s – so 500Mb/s).

Having mentioned “The Internet” – let’s give our firewall a public IP address, and create the rest of the interfaces as well.

- name: "Schedule Public IP creation" azure_rm_publicipaddress: resource_group: "{{ prefix }}" allocation_method: Static name: "{{ prefix }}{{ item.0.name }}port{{ item.1.subnet.name }}pubip" with_subelements: - "{{ appliances }}" - ports when: item.1.public is not defined or (item.1.public is defined and item.1.public == 'true') loop_control: label: "{{ prefix }}{{ item.0.name }}port{{ item.1.subnet.name }}pubip" async: 1000 poll: 0 register: sleeper - name: "Check Public IPs Created" async_status: jid: "{{ item.ansible_job_id }}" register: sleeper_status until: sleeper_status.finished retries: 500 delay: 4 with_items: "{{ sleeper.results }}" when: item.ansible_job_id is defined loop_control: label: "{{ item._ansible_item_label }}" - name: "Schedule Network Interface Creation" azure_rm_networkinterface: resource_group: "{{ prefix }}" name: "{{ prefix }}{{ item.0.name }}port{{ item.1.subnet.name }}" virtual_network: "{{ prefix }}network01" subnet: "{{ prefix }}{{ item.1.subnet.name }}" public_ip_name: "{{ prefix }}pubip01" security_group: "{{ prefix }}AnyAnyAllow" ip_configurations: - name: "{{ prefix }}{{ item.0.name }}port{{ item.1.subnet.name }}" public_ip_address_name: | {%- if item.1.public is not defined or (item.1.public is defined and item.1.public == 'true') -%} {{ prefix }}{{ item.0.name }}port{{ item.1.subnet.name }}pubip {%- else -%}{%- endif -%}" primary: yes with_subelements: - "{{ appliances }}" - ports loop_control: label: "{{ prefix }}{{ item.0.name }}port{{ item.1.subnet.name }}" async: 1000 poll: 0 register: sleeper - name: "Check Network Interfaces Created" async_status: jid: "{{ item.ansible_job_id }}" register: sleeper_status until: sleeper_status.finished retries: 500 delay: 4 with_items: "{{ sleeper.results }}" when: item.ansible_job_id is defined loop_control: label: "{{ item._ansible_item_label }}"

This script creates a public IP address by default for each interface unless you explicitly tell it not to (see lines 40, 53 and 62 in the group_vars file I rendered above). You could easily turn this around by changing the lines which contain this:

item.1.public is not defined or (item.1.public is defined and item.1.public == 'true')

into lines which contain this:

item.1.public is defined and item.1.public == 'true'

OK, having done all that, we’re now ready to build our virtual machines. I’ve introduced a “Priority system” here – VMs with priority 0 go first, then 1, and 2 go last. The code snippet below is just for priority 0, but you can easily see how you’d extrapolate that out (and in fact, the full code sample does just that).

- name: "Schedule Virtual Machine Creation - Priority 0 (or 'All')" azure_rm_virtualmachine: resource_group: "{{ prefix }}" name: "{{ prefix }}{{ item.value.name }}" vm_size: "{{ item.value.size }}" storage_account: "{{ prefix }}storage" storage_container: "{{ prefix }}{{ item.value.name }}os" storage_blob: "{{ prefix }}{{ item.value.name }}os.vhd" admin_username: breakglass admin_password: "{{ breakglass_pw }}" network_interfaces: | [ {%- for nw in item.value.ports -%} '{{ prefix }}{{ item.value.name }}port{{ nw.subnet.name }}' {%- if not loop.last -%}, {%- endif -%} {%- endfor -%} ] custom_data: | {%- if item.value.provision_script is defined and item.value.provision_script != '' -%} {%- include(item.value.provision_script) -%} {%- elif item.value.image.provision_script is defined and item.value.image.provision_script != '' -%} {%- include(item.value.image.provision_script) -%} {%- else -%} {{ omit }} {%- endif -%} image: publisher: "{{ item.value.image.publisher }}" offer: "{{ item.value.image.offer }}" sku: "{{ item.value.image.sku }}" version: "{{ item.value.image.version }}" plan: | {%- if item.value.image.plan is not defined -%}{{ omit }}{%- else -%} {'name': '{{ item.value.image.sku }}', 'publisher': '{{ item.value.image.publisher }}', 'product': '{{ item.value.image.offer }}' } {%- endif -%} with_dict: "{{ appliances }}" when: item.value.priority | default(0) == 0 loop_control: label: "{{ prefix }}{{ item.value.name }}" async: 1000 poll: 0 register: sleeper - name: "Check Virtual Machines Created - Priority 0" async_status: jid: "{{ item.ansible_job_id }}" register: sleeper_status until: sleeper_status.finished retries: 500 delay: 4 with_items: "{{ sleeper.results }}" when: item.ansible_job_id is defined loop_control: label: "{{ item._ansible_item_label }}"

There are a few blocks here to draw attention to :) I’ve re-jigged them a bit here so it’s clearer to understand, but when you see them in the main playbook they’re a bit more compact. Let’s start with looking at the Network Interfaces section!

network_interfaces: |
  [
    {%- for nw in item.value.ports -%}
      '{{ prefix }}{{ item.value.name }}port{{ nw.subnet.name }}'
      {%- if not loop.last -%}, {%- endif -%} 
    {%- endfor -%}
  ]

In this part, we loop over the ports defined for the virtual machine. This is because one device may have 1 interface, or four interfaces. YAML is parsed to make a JSON variable, so here we can create a JSON variable, that when the YAML is parsed it will just drop in. We’ve previously created all the interfaces to have names like this PREFIXhostnamePORTsubnetname (or aFW01portWAN in more conventional terms), so here we construct a JSON array, like this: ['aFW01portWAN'] but that could just as easily have been ['aFW01portWAN', 'aFW01portProtect', 'aFW01portMGMT', 'aFW01portSync']. This will then attach those interfaces to the virtual machine.

Next up, custom_data. This section is sometimes known externally as userdata or config_disk. My code has always referred to it as a “Provision Script” – hence the variable name in the code below!

custom_data: |
  {%- if item.value.provision_script is defined and item.value.provision_script != '' -%}
    {%- include(item.value.provision_script) -%}
  {%- elif item.value.image.provision_script is defined and item.value.image.provision_script != '' -%}
    {%- include(item.value.image.provision_script) -%}
  {%- else -%}
    {{ omit }}
  {%- endif -%}

Let’s pick this one apart too. If we’ve defined a provisioning script file for the VM, include it, if we’ve defined a provisioning script file for the image (or marketplace entry), then include that instead… otherwise, pretend that there’s no “custom_data” field before you submit this to Azure.

One last quirk to Azure, is that some images require a “plan” to go with it, and others don’t.

plan: |
  {%- if item.value.image.plan is not defined -%}{{ omit }}{%- else -%}
    {'name': '{{ item.value.image.sku }}',
     'publisher': '{{ item.value.image.publisher }}',
     'product': '{{ item.value.image.offer }}'
    }
  {%- endif -%}

So, here we say “if we’ve not got a plan, omit the value being passed to Azure, otherwise use these fields we previously specified. Weird huh?

The very last thing we do in the script is to re-render the standard password we’ve used for all these builds, so that we can check them out!

Want to review this all in one place?

Here’s the link to the full playbook, as well as the group variables (which should be in ./group_vars/all.yml) and two sample userdata files (which should be in ./userdata) for an Ubuntu machine (using cloud-init) and one for a FortiGate Firewall.

All the other files in that gist (prefixes from 10-16 and 00) are for this blog post only, and aren’t likely to work!

If you do end up using this, please drop me a note below, or star the gist! That’d be awesome!!

Image credit: “Lego Factory Playset” from Flickr by “Brickset” released under a CC-BY license. Used with Thanks!

Troubleshooting FortiGate API issues with the CLI?

2018-12-102020-03-27 JonTheNiceGuy Leave a comment

One of my colleagues has asked me for some help with an Ansible script he’s writing to push some policy to a cloud hosted FortiGate appliance. Unfortunately, he kept getting some very weird error messages, like this one:

fatal: [localhost]: FAILED! => {"changed": false, "meta": {"build": 200, "error": -651, "http_method": "PUT", "http_status": 500, "mkey": "vip8080", "name": "vip", "path": "firewall", "revision": "36.0.0.10745196634707694665.1544442857", "serial": "CENSORED", "status": "error", "vdom": "root", "version": "v6.0.3"}, "msg": "Error in repo"}

This is using Fortinet’s own Ansible Modules, which, in turn use the fortiosapi python module.

This same colleague came across a post on the Fortinet Developer Network site (access to the site requires vendor approval), which said “this might be an internal bug, but to debug it, use the following”

fgt # diagnose debug enable

fgt # diagnose debug cli 8
Debug messages will be on for 30 minutes.

And then run your API commands. Your error message will be surfaced there… so here’s mine! (Mapped port doesn’t match extport in a vip).

0: config firewall vip
0: edit "vip8080"
0: unset src-filter
0: unset service
0: set extintf "port1"
0: set portforward enable
0: unset srcintf-filter
0: set mappedip "192.0.2.1-192.0.2.1"
0: unset extport
0: set extport 8080-8081
0: unset mappedport
0: set mappedport 8080
-651: end

Late edit 2020-03-27: I spotted a bug in the Ansible issues tracker today, and I added a note to the end of that bug mentioning that as well as diagnose debug cli 8, if that doesn’t give you enough logs to figure out what’s up, you can also try diagnose debug application httpsd -1 but this enables LOTS AND LOTS of logs, so really think twice before turning that one on!

Oh, and if 30 minutes isn’t enough, try diagnose debug duration 480 or however many minutes you think you need. Beware that it will write event logs out to the serial console even when you’ve logged out.

Share this:

Share this:

Share this:

How bad is it, Doc?

Take action!

Change some passwords

Making a list, and checking it twice!

OK, so the horse bolted, now close the gate!

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this: