Git for Postgres Hacking

2023-11-06

In Postgres development it’s normal for patch attempts to require many revisions and last a long time. I just sent in v17 of my SQL:2011 application time patch. The commitfest entry dates back to summer of 2021, but it’s really a continuation of this thread from 2018. And it’s not yet done.

My work on multiranges is a similar story: 1.5 years from first patch to committed.

Today I saw this post by Julia Evans about problems people have with git rebase (also see the hn discussion), and it reminded me of my struggles handling long-lived branches.

In my early days with git I avoided rebasing, because I wanted the history to be authentic. Nowaday I rebase pretty freely, both to move my commits on top of the latest master branch work and to interactively clean things up so the commits show logical progress (with generous commit messages explaining the motivation and broad design decisions: the “why”).

But in my paid client work, PRs get merged pretty fast. There is nothing like the multi-year wait of Postgres hacking. Often I’ve wished for more history there. It’s not my day job, so it’s hard to remember fine details about something from months or years ago. And I’ve changed direction a couple times, and sometimes I want a way to consult that old history.

But with Postgres you don’t have any choice but to rebase. You send your patch files to a mailing list, and if they don’t apply cleanly no one will look at them. I’ve spent hours and hours rebasing patches because the underlying systems changed before they could get committed.

With multiranges this was tough, but at least it was just one patch file. Application time is a series of five patches, which over time have changed order and evolved from four. When it’s time to send a new version, I run git format-patch, which turns each commit into a .patch file. So I need to wind up with five well-groomed commits rebased on the latest master.

My personal copy of the postgres repo on github has a bunch of silly-named branches for stashing work when I want to change direction, so the history isn’t totally lost. But for a long time I had no system. It feels like when you see a spreadsheet named Annual Report - Copy of Jan 7.bak - final - FINAL.xls. After all these years it’s unmanageable. (Okay at least I know not to name any Postgres submission “final”! ;-)

I think I finally found a way to keep history that works for me. On my main valid-time branch I keep a series of commits for each small change. I rebase to move them up and down, so that they will squash cleanly into the five commits I need at the end. You can see that I have one main commit for each of the five patches, but each is followed by many commits named fixup pks: fixed this or fixup fks: feedback from so-and-so. I rebase on master every so often. I force-push all the time, since no one else uses the repo. (I do work on both a laptop and a desktop though, so I have to remember to git fetch && git reset --hard origin/valid-time.)

When I’m ready to submit new patches, I take a snapshot with git checkout -b valid-time-v17-pre-squash and “make a backup” with git push -u. Then I make a branch to squash things (git checkout -b valid-time-v17). I do a git rebase -i HEAD~60, press * on pick, type cw fixup, then n.n.n.n.n.n., etc. ’til I have just the five commits. Then I have a script to do a clean build + test on each commit, since I want things to work at every point. While that’s running I write the email about the new patch, and hopefully send it in.

So now I’m capturing the fine-grained history that went into each submission, and that won’t change no matter how aggressively I rebase the current work. I’m pretty happy with this flow. I wish I had started years ago.

One git feature I could almost use is git rebase -i --autosquash. (Here are some articles about it.) If your commit messages are named fixup! foo, then git will automatically set those commits to fixup, not pick, and it will move them to just below whatever commit matches foo. I follow this pattern but with fixup not fixup!, to keep it all manual. At first I just didn’t trust it (or myself).

Now I’m ready to move to this workflow, but I’m not sure how to “match” one of my five main commits. I want a meaningful title (i.e. the first line of the commit message) for each little commit, so I use short abbreviations for the patch they target, e.g. fixup pks: Add documentation for pg_constraint.contemporal column. Git doesn’t know that it should match pks to Add temporal PRIMARY KEY and UNIQUE constraints and ignore everything after the colon. If there were a way to preserve tags after a rebase I think I could tag the main commit as pks and it might work (but maybe not with the extra stuff after the colon).

You can have git generate the new commit message for you with git commit --fixup $sha, but it just copies the whole title verbatim, which is not what I want. Also who wants to remember $sha for those five parent commits? And finally, I want to move these commits into place immediately, so I can build & test against each patch as I work. Git can’t move them for me without squashing them.

The Thoughtbot article linked above says you can use a regex, e.g. git commit --fixup :/pks, but: (1) The regex is used immediately to find the parent, but it gets replaced with that parent’s title. It doesn’t stay in your commit message. (2) If you give an additional commit message, it goes two lines below the fixup! line, so it’s not in the commit title. This only solves having to remember $sha.

What I really want is fixup! ^: blah blah blah where ^ means “the closest non-squashed parent”, and the ^ is resolved at rebase time, not commit time, and everything after the colon is not used for matching. (If it needs to be a regex then :/. is sufficient too.)

Anyway I’m using my manual process for now, since with vim I can change 60 picks to fixup in a few seconds. I’m not willing to lose meaningful titles to save a few seconds with fixup!.

Nonetheless it would be nice to have one less step I have to remember. Involuntarily I keep thinking about how I can make this feature work for me. If someone has a suggestion, please do let me know.

Another approach is “stacked commits”. I went as far as installing git branchless and reading the docs and some articles, but to be honest I never went beyond a few tests, and I haven’t thought about it for a few months. It’s in the back of my head to give it a more honest effort.

blog comments powered by Disqus Prev: Temporal PKs Merged! Next: Rails ActionMailer Internals

Illuminated Computing

Git for Postgres Hacking

Paul A. Jungwirth

Code

Writing

Talks