I’ve been using Desktop Linux for probably 15 years, and Server Linux for more like 25 in one form or another. One of the things you learn to write pretty early on in Linux System Administration is Bash Scripting. Here’s a great example
#!/bin/bash
i = 0
until [ $i -eq 10 ]
do
print "Jon is the best!"
(( i += 1 ))
done
Bash scripts are pretty easy to come up with, you just write the things you’d type into the interactive shell, and it does those same things for you! Yep, it’s pretty hard not to love Bash for a shell script. Oh, and it’s portable too! You can write the same Bash script for one flavour of Linux (like Ubuntu), and it’s probably going to work on another flavour of Linux (like RedHat Enterprise Linux, or Arch, or OpenWRT).
But. There comes a point where a Bash script needs to be more than just a few commands strung together.
At work, I started writing a “simple” installer for a Kubernetes cluster – it provisions the cloud components with Terraform, and then once they’re done, it then starts talking to the Kubernetes API (all using the same CLI tools I use day-to-day) to install other components and services.
When the basic stuff works, it’s great. When it doesn’t work, it’s a bit of a nightmare, so I wrote some functions to put logs in a common directory, and another function to gracefully stop the script running when something fails, and then write those log files out to the screen, so I know what went wrong. And then I gave it to a colleague, and he ran it, and things broke in a way that didn’t make sense for either of us, so I wrote some more functions to trap that type of error, and try to recover from them.
And each time, the way I tested where it was working (or not working) was to just… run the shell script, and see what it told me. There had to be a better way.
Enter Python
Python earns my vote for a couple of reasons (and they might not be right for you!)
- I’ve been aware of the language for some time, and in fact, had patched a few code libraries in the past to use Ansible features I wanted.
- My preferred IDE (Integrated Desktop Environment), Visual Studio Code, has a step-by-step debugger I can use to work out what’s going on during my programming
- It’s still portable! In fact, if anything, it’s probably more portable than Bash, because the version of Bash on the Mac operating system – OS X is really old, so lots of “modern” features I’d expect to be in bash and associate tooling isn’t there! Python is Python everywhere.
- There’s an argument parsing tool built into the core library, so if I want to handle things like
./myscript.py --some-long-feature "option-A" --some-long-feature "option-B" -a -s -h -o -r -t --argument
I can do, without having to remember how to write that in Bash (which is a bit esoteric!) - And lastly, for now at least!, is that Python allows you to raise errors that can be surfaced up to other parts of your program
Given all this, my personal preference is to write my shell scripts now in Python.
If you’ve not written python before, variables are written without any prefix (like you might have seen $ in PHP) and any flow control (like if
, while
, for
, until
) as well as any functions and classes use white-space indentation to show where that block finishes, like this:
def do_something():
pass
if some_variable == 1:
do_something()
and_something_else()
while some_variable < 2:
some_variable = some_variable * 2
Starting with Boilerplate
I start from a “standard” script I use. This has a lot of those functions I wrote previously for bash, but with cleaner code, and in a way that’s a bit more understandable. I’ll break down the pieces I use regularly.
Starting the script up
Here’s the first bit of code I always write, this goes at the top of everything
#!/usr/bin/env python3
import logging
logger = logging
This makes sure this code is portable, but is always using Python3 and not Python2. It also starts to logging engine.
At the bottom I create a block which the “main” code will go into, and then run it.
def main():
logger.basicConfig(level=logging.DEBUG)
logger.debug('Started main')
if __name__ == "__main__":
main()
Adding argument parsing
There’s a standard library which takes command line arguments and uses them in your script, it’s called argparse
and it looks like this:
#!/usr/bin/env python3
# It's convention to put all the imports at the top of your files
import argparse
import logging
logger = logging
def process_args():
parser=argparse.ArgumentParser(
description="A script to say hello world"
)
parser.add_argument(
'--verbose', # The stored variable can be found by getting args.verbose
'-v',
action="store_true",
help="Be more verbose in logging [default: off]"
)
parser.add_argument(
'who', # This is a non-optional, positional argument called args.who
help="The target of this script"
)
args = parser.parse_args()
if args.verbose:
logger.basicConfig(level=logging.DEBUG)
logger.debug('Setting verbose mode on')
else:
logger.basicConfig(level=logging.INFO)
return args
def main():
args=process_args()
print(f'Hello {args.who}')
# Using f'' means you can include variables in the string
# You could instead do printf('Hello %s', args.who)
# but I always struggle to remember in what order I wrote things!
if __name__ == "__main__":
main()
The order you put things in makes a lot of difference. You need to have the if __name__ == "__main__":
line after you’ve defined everything else, but then you can put the def main():
wherever you want in that file (as long as it’s before the if __name__
). But by having everything in one file, it feels more like those bash scripts I was talking about before. You can have imports (a bit like calling out to other shell scripts) and use those functions and classes in your code, but for the “simple” shell scripts, this makes most sense.
So what else do we do in Shell scripts?
Running commands
This is class in it’s own right. You can pass a class around in a variable, but it has functions and properties of it’s own. It’s a bit chunky, but it handles one of the biggest issues I have with bash scripts – capturing both the “normal” output (stdout) and the “error” output (stderr) without needing to put that into an external file you can read later to work out what you saw, as well as storing the return, exit or error code.
# Add these extra imports
import os
import subprocess
class RunCommand:
command = ''
cwd = ''
running_env = {}
stdout = []
stderr = []
exit_code = 999
def __init__(
self,
command: list = [],
cwd: str = None,
env: dict = None,
raise_on_error: bool = True
):
self.command = command
self.cwd = cwd
self.running_env = os.environ.copy()
if env is not None and len(env) > 0:
for env_item in env.keys():
self.running_env[env_item] = env[env_item]
logger.debug(f'exec: {" ".join(command)}')
try:
result = subprocess.run(
command,
cwd=cwd,
capture_output=True,
text=True,
check=True,
env=self.running_env
)
# Store the result because it worked just fine!
self.exit_code = 0
self.stdout = result.stdout.splitlines()
self.stderr = result.stderr.splitlines()
except subprocess.CalledProcessError as e:
# Or store the result from the exception(!)
self.exit_code = e.returncode
self.stdout = e.stdout.splitlines()
self.stderr = e.stderr.splitlines()
# If verbose mode is on, output the results and errors from the command execution
if len(self.stdout) > 0:
logger.debug(f'stdout: {self.list_to_newline_string(self.stdout)}')
if len(self.stderr) > 0:
logger.debug(f'stderr: {self.list_to_newline_string(self.stderr)}')
# If it failed and we want to raise an exception on failure, record the command and args
# then Raise Away!
if raise_on_error and self.exit_code > 0:
command_string = None
args = []
for element in command:
if not command_string:
command_string = element
else:
args.append(element)
raise Exception(
f'Error ({self.exit_code}) running command {command_string} with arguments {args}\nstderr: {self.stderr}\nstdout: {self.stdout}')
def __repr__(self) -> str: # Return a string representation of this class
return "\n".join(
[
f"Command: {self.command}",
f"Directory: {self.cwd if not None else '{current directory}'}",
f"Env: {self.running_env}",
f"Exit Code: {self.exit_code}",
f"nstdout: {self.stdout}",
f"stderr: {self.stderr}"
]
)
def list_to_newline_string(self, list_of_messages: list):
return "\n".join(list_of_messages)
So, how do we use this?
Well… you can do this: prog = RunCommand(['ls', '/tmp', '-l'])
with which we’ll get back the prog object. If you literally then do print(prog)
it will print the result of the __repr__()
function:
Command: ['ls', '/tmp', '-l']
Directory: current directory
Env: <... a collection of things from your environment ...>
Exit Code: 0
stdout: total 1
drwx------ 1 root root 0 Jan 1 01:01 somedir
stderr:
But you can also do things like:
for line in prog.stdout:
print(line)
or:
try:
prog = RunCommand(['false'], raise_on_error=True)
catch Exception as e:
logger.error(e)
exit(e.exit_code)
Putting it together
So, I wrote all this up into a git repo, that you’re more than welcome to take your own inspiration from! It’s licenced under an exceptional permissive license, so you can take it and use it without credit, but if you want to credit me in some way, feel free to point to this blog post, or the git repo, which would be lovely of you.
Github: JonTheNiceGuy/python_shell_script_template
Featured image is “The Conch” by “Kurtis Garbutt” on Flickr and is released under a CC-BY license.
As always, another great/useful post Jon. Thanks for sharing. You’re posts help to keep me current whilst I’m not directly working in a hands-on role 😁
Curious, where do Lamba, Google/Azure functions come into play today in the admin space – where I used to keep a library of scripts in a folder on my pc, these would now live on GitHub/GitLab etc. Do admins today leverage FaaS over cronjobs?
And a follow up question, I’ve played a lot with AI (ChatGpt/Gemini) to help rapidly prototype scripts & programs. I find it helpful but still requires knowledge to be able to make the output actually work as expected – are admins making productive use of AI today? (Personally I feel it has become less useful over time and this may be by design to encourage upsell – always the cynic!)
Thanks again for sharing
Hey Graham, thanks for the reply (and my word, it’s lovely to hear from you!)
I can’t speak for other admins, but my current team doesn’t use FaaS for anything except for where a SaaS provider has told us to implement it. That said, we’re pretty conservative on things which aren’t self-hosted, after we’ve had near-misses with 3rd party branches putting out user’s data at risk (but never, afaik, in the hands of attackers!)
I’d say that our model still looks like:
1. Script stored on GitHub, pull and run from your machine. Mostly for unprivileged users making local changes to their machines.
2. IaC stored in GitHub, run in specific trusted SaaS platforms on push, mostly unprivileged users in specific roles that need to change infra.
3. IaC stored in GitHub, run locally on an admin machine, essentially where sudo is required.
Regarding AI, it’s fair to say that I’m fairly distrustful of it. I essentially use it to fix my “google-foo” (for a term that shows my age!) as a way to find the search terms I don’t remember or never knew.
I have tried several times to more directly incorporate AI tooling into my workflow and each time removed it fairly soon after putting it in, because it was either plain wrong, because it did things that didn’t match what I needed, or because it did something that wasn’t in keeping with the rest of the project, and thus made the code less maintainable.