Version Control Whodunnit

Your git history is not the lone historical log out there!

Version Control Whodunnit

“Someone's force-pushed to staging, all our work is gone!”

A frenzy to restore the branch to what it once was ensues. Who's the mystery murderer? Most would come to terms with the reality that they will never find out, as force-pushes do not leave a trace in the git history.

Logs!

Most of us understand GitHub as simply a wrapper for a git server. Providing features on top of the basic functionality of simply storing the git project, such as Pull Requests and CI/CD with GitHub Actions. The history is always handled by git, hidden in the .git folder.

But because it's a wrapper, everything we do to our repository gets logged, whether you like it or not. Your git history is not the lone historical log out there!

Getting the Data

We're particularly interested in GitHub's repository events API. I personally suggest using GitHub's CLI tool, gh for this, so you won't have to mess with tokens and the sort.

Running the following command gives us a list of all events in all branches of the specified repository;

gh api \
-H "Accept: application/vnd.github+json" \
/repos/OWNER/REPO/events

Unfortunately, that gives us a whole load of events that we don't really care for. We'll use jq to filter for PushEvent events to the staging branch.

gh api \
-H "Accept: application/vnd.github+json" \
/repos/OWNER/REPO/events | \
jq 'del(.events[] | select(.type != "PushEvent")) | del(.events[] | select(.payload.ref != "refs/heads/staging"))'
Get repository events, filter PushEvents on staging branch

Save the resulting data, and move on to the next section.

Interpreting the Data

Here's a snippet of an event with a commit. This is what a “normal” event might look like.

{
        "id": "27209193720",
        "type": "PushEvent",
        "actor": {
            "id": 12345678,
            "login": "fakeuser-1",
            "display_login": "fakeuser-1",
            "gravatar_id": "",
            "url": "https://api.github.com/users/fakeuser-1",
            "avatar_url": "https://avatars.githubusercontent.com/u/12345678?"
        },
        "repo": {
            "id": 9988776655,
            "name": "fakeuser-1/sample-repo",
            "url": "https://api.github.com/repos/fakeuser-1/sample-repo"
        },
        "payload": {
            "repository_id": 9988776655,
            "push_id": 12682682377,
            "size": 1,
            "distinct_size": 1,
            "ref": "refs/heads/staging",
            "head": "ad739cb24299a10be537398706d06fa30d2cf989a",
            "before": "9294828aa5108407c7f09440508bc1e80ea0cc45",
            "commits": [
                {
                    "sha": "ad739cb24299a10be537398706d06fa30d2cf989a",
                    "author": {
                        "email": "fakeuser-1@fakeorg.com",
                        "name": "fakeuser-1 full name"
                    },
                    "message": "Add a confetti canon",
                    "distinct": true,
                    "url": "https://api.github.com/repos/fakeuser-1/sample-repo/commits/ad739cb24299a10be537398706d06fa30d2cf989a"
                }
            ]
        },
        "public": false,
        "created_at": "2023-02-20T15:16:09Z",
        "org": {
            "id": 56789,
            "login": "fakeorg",
            "gravatar_id": "",
            "url": "https://api.github.com/orgs/fakeorg",
            "avatar_url": "https://avatars.githubusercontent.com/u/56789?"
        }
    }

Here we can see that user fakeuser-1 had pushed a single commit (size: 1)to staging branch at 2023-02-20T15:15:09z. Simple!

Now let's learn how to understand a list of many of such events in order to build a timeline of when and how a force-push occurred.

Tracing commits

Let's take a list of events and isolate just the head, before, and size keys. We'll replace the hashes and IDs with numbers that are easier to handle.

[
    {
        "display_login": "user-1",
        "push_id": 4,
        "size": 5,
        "head": 6,
        "before": 3
    },
    {
        "display_login": "user-2",
        "push_id": 3,
        "size": 0,
        "head": 3,
        "before": 5
    },
    {
        "display_login": "user-2",
        "push_id": 2,
        "size": 1,
        "head": 5,
        "before": 4
    },
    {
        "display_login": "user-1",
        "push_id": 1,
        "size": 2,
        "head": 4,
        "before": 3
    },
]

Can you spot the two force-pushes here? The first one, push_id: 3 pushed by “user-2”, pushes a reset to an earlier commit. Notice how the prior push (push_id: 2) brought the head up to 5, but push_id: 3 with size: 0 (No commits in payload) brought the head back down to 3.

The other force push… can't be traced here. Oops. It is at push_id: 4, pushed by “user-1”, which had diverted from remote when the head was at commit 4. This force-push overwrote the history and brought the head to 6 from 4, discarding commits between push_id: 1 and push_id: 3.

Unfortunately, that kind of force push is still a mystery. As this push contained new commits, we can't use the commit hash to determine the nature of the push, and now we are stuck wondering what to do.

You could technically try to checkout that branch and view the git log from there, see if anything doesn't line up, but it would likely be a futile effort.

Conclusion

There you go, your git-breaking culprits (Or a dead end). What you do with this information is up to you, but try not to get them fired. Teach each other how to use git effectively, and when exactly is a force push necessary.