My current Ansible project relies on me collecting a lot of data from AWS and then checking it again later, to see if something has changed.
This is great for one-off tests (e.g. terraform destroy ; terraform apply ; ansible-playbook run.yml
) but isn’t great for repetitive tests, especially if you have to collect data that may take many minutes to run all the actions, or if you have slow or unreliable internet in your development environment.
To get around this, I wrote a wrapper for caching this data.
At the top of my playbook, run.yml
, I have these tasks:
- name: Set Online Status.
# This stores the value of run_online, unless run_online
# is not set, in which case, it defines it as "true".
ansible.builtin.set_fact:
run_online: |-
{{- run_online | default(true) | bool -}}
- name: Create cache_data path.
# This creates a "cached_data" directory in the same
# path as the playbook.
when: run_online | bool and cache_data | default(false) | bool
delegate_to: localhost
run_once: true
file:
path: "cached_data"
state: directory
mode: 0755
- name: Create cache_data for host.
# This creates a directory under "cached_data" in the same
# path as the playbook, with the name of each of the inventory
# items.
when: run_online | bool and cache_data | default(false) | bool
delegate_to: localhost
file:
path: "cached_data/{{ inventory_hostname }}"
state: directory
mode: 0755
Running this sets up an expectation for the normal operation of the playbook, that it will be “online”, by default.
Then, every time I need to call something “online”, for example, collect EC2 Instance Data (using the community.aws.ec2_instance_info
module), I call out to (something like) this set of tasks, instead of just calling the task by itself.
- name: List all EC2 instances in the regions of interest.
when: run_online | bool
community.aws.ec2_instance_info:
region: "{{ item.region_name }}"
loop: "{{ regions }}"
loop_control:
label: "{{ item.region_name }}"
register: regional_ec2
- name: "NOTE: Set regional_ec2 data path"
when: not run_online | bool or cache_data | default(false) | bool
set_fact:
regional_ec2_cached_data_file_loop: "{{ regional_ec2_cached_data_file_loop | default(0) | int + 1 }}"
cached_data_filename: "cached_data/{{ inventory_hostname }}/{{ cached_data_file | default('regional_ec2') }}.{{ regional_ec2_cached_data_file_loop | default(0) | int + 1 }}.json"
- name: "NOTE: Cache/Get regional_ec2 data path"
when: not run_online | bool or cache_data | default(false) | bool
debug:
msg: "File: {{ cached_data_filename }}"
- name: Cache all EC2 instances in the regions of interest.
when: run_online | bool and cache_data | default(false) | bool
delegate_to: localhost
copy:
dest: "{{ cached_data_filename }}"
mode: "0644"
content: "{{ regional_ec2 }}"
- name: "OFFLINE: Load all EC2 instances in the regions of interest."
when: not run_online | bool
set_fact:
regional_ec2: "{% include( cached_data_filename ) %}"
The first task, if it’s still set to being “online” will execute the task, and register
s the result for later. If cache_data
is configured, we generate a filename for the caching, record the filename to the log (via the debug
task) and then store it (using the copy
task). So far, so online… but what happens when we don’t need the instance to be up and running?
In that case, we use the set_fact
module, triggered by running the playbook like this: ansible-playbook run.yml -e run_online=false
. This reads the cached data out of that locally stored pool of data for later use.
Featured image is ‘Geocache “Goodies”‘ by ‘sk‘ on Flickr and is released under a CC-BY-ND license.