One of my colleagues asked today about using recursive git submodules. First, let’s quickly drill into what a Submodule is.
Git Submodules
A submodule is a separate git repository, attached to the git repository you’re working on via two “touch points” – a file in the root directory called .gitmodules
, and, when checked out, the HEAD file in the .git
directory.
When you clone a repository with a submodule attached, it creates the directory the submodule will be cloned into, but leave it empty, unless you either do git submodule update --init --recursive
or, when you clone the repository initially, you can ask it to pull any recursive submodules, like this git clone https://your.vcs.example.org/someorg/somerepo.git --recursive
.
Git stores the commit reference of the submodule (via a file in .git/modules/$SUBMODULE_NAME/HEAD
which contains the commit reference). If you change a file in that submodule, it marks the path of the submodule as “dirty” (because you have an uncommitted change), and if you either commit that change, or pull an updated commit from the source repository, then it will mark the path of the submodule as having changed.
In other words, you can track two separate but linked parts of your code in the same tree, working on each in turn, and without impacting each other code base.
I’ve used this, mostly with Ansible playbooks, where I’ve consumed someone else’s role, like this:
My_Project
|
+- Roles
| |
| +- <SUBMODULE> someorg.some_role
| +- <SUBMODULE> anotherorg.another_role
+- inventory
+- playbook.yml
+- .git
| |
| +- HEAD
| +- modules
| +- etc
+- .gitmodules
In .gitmodules
the file looks like this:
[submodule "module1"]
path = module1
url = https://your.vcs.example.org/someorg/module1.git
Once you’ve checked out this submodule, you can do any normal operations in this submodule, like pulls, pushes, commits, tags, etc.
So, what happens when you want to nest this stuff?
Nesting Submodule Recursion
So, my colleague wanted to have files in three layers of directories. In this instance, I’ve simulated this by creating three directories, root
, module1
and module2
. Typically these would be pulled from their respective Git Service paths, like GitHub or GitLab, but here I’m just using everything on my local file system. Where, in the following screen shot, you see /tmp/
you could easily replace that with https://your.vcs.example.org/someorg/
.
So, here, we’ve created these three paths (basically to initiate the repositories), added a basic commit to the furthest submodule (module2), then done a submodule add
into the next furthest submodule (module1) and finally added that into the root tree.
Note, however, when you perform the submodule add
it doesn’t automatically clone any submodules, and if you were to, from another machine, perform git clone
you wouldn’t get any of the submodules (neither module1 nor module2) without adding either https://your.vcs.example.org/someorg/root.git
--recursive
to the clone command (like this: git clone --recursive https://your.vcs.example.org/someorg/root.git
), or by running the follow-up command git submodule update --init --recursive
.
Oh, and if any of these submodules are updated? You need to go in and pull those updates, and then commit that change, like this!
The only thing which isn’t in these submodules is if you’ve done a git clone
of the root repo (using the terms from the above screen images), the submodules won’t be using the “master” branch (or a particular “tag” or “branch hame”, for that matter), but will instead be using the commit reference. If you wanted to switch to a specific branch or tag, then you’d need to issue the command git checkout some_remote/some_branch
or git checkout master
instead of (in the above screen captures) git pull
.
If you have any questions or issues with this post, please either add a comment, or contact me via one of the methods at the top or side of this page!
Featured image is “Submarine” by “NH53” on Flickr and is released under a CC-BY license.