One of my colleagues asked today about using recursive git submodules. First, let’s quickly drill into what a Submodule is.
A submodule is a separate git repository, attached to the git repository you’re working on via two “touch points” – a file in the root directory called
.gitmodules, and, when checked out, the HEAD file in the
When you clone a repository with a submodule attached, it creates the directory the submodule will be cloned into, but leave it empty, unless you either do
git submodule update --init --recursive or, when you clone the repository initially, you can ask it to pull any recursive submodules, like this
git clone https://your.vcs.example.org/someorg/somerepo.git --recursive.
Git stores the commit reference of the submodule (via a file in
.git/modules/$SUBMODULE_NAME/HEAD which contains the commit reference). If you change a file in that submodule, it marks the path of the submodule as “dirty” (because you have an uncommitted change), and if you either commit that change, or pull an updated commit from the source repository, then it will mark the path of the submodule as having changed.
In other words, you can track two separate but linked parts of your code in the same tree, working on each in turn, and without impacting each other code base.
I’ve used this, mostly with Ansible playbooks, where I’ve consumed someone else’s role, like this:
My_Project | +- Roles | | | +- <SUBMODULE> someorg.some_role | +- <SUBMODULE> anotherorg.another_role +- inventory +- playbook.yml +- .git | | | +- HEAD | +- modules | +- etc +- .gitmodules
.gitmodules the file looks like this:
[submodule "module1"] path = module1 url = https://your.vcs.example.org/someorg/module1.git
Once you’ve checked out this submodule, you can do any normal operations in this submodule, like pulls, pushes, commits, tags, etc.
So, what happens when you want to nest this stuff?
Nesting Submodule Recursion
So, my colleague wanted to have files in three layers of directories. In this instance, I’ve simulated this by creating three directories,
module2. Typically these would be pulled from their respective Git Service paths, like GitHub or GitLab, but here I’m just using everything on my local file system. Where, in the following screen shot, you see
/tmp/ you could easily replace that with
So, here, we’ve created these three paths (basically to initiate the repositories), added a basic commit to the furthest submodule (module2), then done a
submodule add into the next furthest submodule (module1) and finally added that into the root tree.
Note, however, when you perform the
submodule add it doesn’t automatically clone any submodules, and if you were to, from another machine, perform
git clone you wouldn’t get any of the submodules (neither module1 nor module2) without adding either
--recursive to the clone command (like this:
git clone --recursive https://your.vcs.example.org/someorg/root.git), or by running the follow-up command
git submodule update --init --recursive.
Oh, and if any of these submodules are updated? You need to go in and pull those updates, and then commit that change, like this!
The only thing which isn’t in these submodules is if you’ve done a
git clone of the root repo (using the terms from the above screen images), the submodules won’t be using the “master” branch (or a particular “tag” or “branch hame”, for that matter), but will instead be using the commit reference. If you wanted to switch to a specific branch or tag, then you’d need to issue the command
git checkout some_remote/some_branch or
git checkout master instead of (in the above screen captures)
If you have any questions or issues with this post, please either add a comment, or contact me via one of the methods at the top or side of this page!
Featured image is “Submarine” by “NH53” on Flickr and is released under a CC-BY license.