Temporal PKs Merged!

2024-01-24

Today first thing in the morning I saw that the first part of my temporal tables work for Postgres got merged. It was two patches actually: a little one to add a new GiST support function and then the main patch adding support for temporal primary keys and unique constraints based on range types. The support for SQL:2011 PERIODs comes later; for now you must use ranges—although in my opinion that is better anyway. Also this patch allows multiranges or, keeping with Postgres’s long history of extensibility, any type with an overlaps operator. So unless some big problem appears, PKs and UNIQUE constraints are on track to be released in Postgres 17.

Probably I can get (basic) foreign keys into v17 too. Temporal update/delete, foreign keys with CASCADE, and PERIODs will more likely take ’til 18.

If you are interested in temporal features, early testing is always appreciated! :-)

Getting this into Postgres has been a ten-year journey, and the rest of this post is going to be a self-indulgent history of that work. You’ve been warned. :-)

It started in 2013 when I kept noticing my clients needed a better way to track the history of things that change over time, and I discovered Richard Snodgrass’s book Developing Time-Oriented Database Applications in SQL. He offered a rigorous, systematic approach, with working SQL solutions for everything. This was exactly what I needed. His approach was vastly better than the ad hoc history-tracking I’d seen so far. But no one had implemented any of it!

My first Postgres patch in 2015 was motivated by temporal databases: I added UUID support to the btree_gist extension. A temporal primary key is basically an exclusion constraint on (id WITH =, valid_at WITH &&), and I had a project with UUID ids. But that exclusion constraint requires a GiST index that knows how to perform equal comparisons against the id column and overlap comparisons against the valid_at column. Out-the-box GiST indexes can’t do that (unless your ids are something weird like range types). If your ids are integers, you can install btree_gist to create a GiST opclass that knows what integer = means, but at the time UUIDs were not supported. So I started there. I liked that temporal databases had a manageable feature set and a manageable body of literature, so that even a working programmer like me could break new ground (not like Machine Learning or even Time Series databases). Nonetheless that patch took a year and a half to get committed, and it was really other people like Chris Bandy who finished it.

I kept reading about temporal databases, and in 2017 I wrote a proof-of-concept for temporal foreign keys, mostly at AWS Re:Invent. I happened to be given a free registration & hotel room, but it was too late to register for any of the good talks. But all that time with nothing to do was fantastically productive, and I remember by the flight home I was adding tons of tests, trying to cover every feature permutation—ha, as if. A few days after I returned I also published my annotated bibliography, which I’ve updated many times since.

In Snodgrass a temporal foreign key is a page-and-a-half of SQL, mostly because a referencing row may need more than one referenced row to completely cover its time span. But I realized we could make the check much simpler if we used an aggregate function to combine all the relevant rows in the referenced table first. So I wrote range_agg, first as an extension, then as a core patch. Jeff Davis (who laid the foundation for temporal support with range types and exclusion constraints) said my function was too narrow and pushed me to implement multiranges, a huge improvement. Again it took a year and a half, and I had trouble making consistent progress. There was a lot of work at the end by Alvaro Herrera and Alexander Korotkov (and I’m sure others) to get it committed. That was a few days before Christmas 2020.

Although the Postgres review process can take a long time, I cherish how it pushes me to do better. As a consultant/freelancer I encounter codebases of, hmm, varying quality, and Postgres gives me an example of what high standards look like.

One thing I still remember from reading Programmers at Work many years ago was how many inteviewees said they tried to build things at a higher level of abstraction than they thought they’d need. I’ve seen enough over-engineered tangles and inner-platform effects that my own bias is much closer to YAGNI and keeping things concrete, but the advice in those interviews still prods me to discover good abstractions. The Postgres codebase is full of things like that, and really it’s such a huge project that strong organizing ideas are essential. Multiranges was a great example of how to take a concrete need and convert it into something more general-purpose. And I thought I was doing that already with range_agg! I think one thing that makes an abstraction good is a kind of definiteness, something opinionated. So it is not purely general, but really adds something new. It always requires an act of creation.

The coolest thing I’ve heard of someone doing with multiranges was using them in astronomy to search for neutrinos, gravitational waves, and gamma-ray bursts. By using multiranges, they were able to compare observations with maps of the night sky “orders of magnitude faster” than with other implementations. (Hopefully I’ve got that right: I read a pre-print of the paper but it was not all easy for me to understand!)

My first patch for an actual temporal feature was primary keys back in 2018. Then foreign keys followed in 2019, just a couple weeks before I gave a talk at PgCon about temporal databases. By the end of the year I had FOR PORTION OF as well. At first FOR PORTION OF was implemented in the Executor Phase, but when I gave a progress report for PgCon 2020 I was already working on a trigger-based reimplementation, though it wasn’t submitted until June 2021. I also pulled in work by Vik Fearing from 2018 to support ADD/DROP PERIOD.

Soon after that progress got harder: my wife and I had our sixth baby in August, and somehow he seemed to be more work than the others. I took over daily math lessons (we homeschool), and I had to let go my biggest client, who needed more hours than I could give. (I’m proud to have given them an orderly transition over several months though.) In January 2022 Peter Eisentraut gave me a thorough review, but I went silent. Still, I had a lot of encouragement from the community, especially Corey Huinker, and eventually doing Postgres got easier again. I had a talk accepted for PgCon 2023, and I worked hard to submit new patches, which I did only weeks before the conference.

The best part of PgCon was getting everyone who cared about my work together in the hallway to agree on the overall approach. I had worried for years about using ranges as well as PERIODs, since the standard doesn’t know anything about ranges. The second-best part was when someone told me I should stop calling myself a Postgres newbie.

At PgCon Peter asked me to re-organize the patches, essentially implementing PERIODs as GENERATED range columns. It made the code much nicer. I also went back to an Executor Phase approach for FOR PORTION OF. Using triggers had some problems around updateable views and READ COMMITTED transaction isolation.

Since May I’ve felt more consistent than during my other Postgres work. I’ve been kept busy by excellent feedback by a meticulous reviewer, Jian He, who has caught many bugs. Often as soon as I get caught up, before I’ve written the email with the new patch files, he finds more things!

Another thing that’s helped is going out once a week (for nearly a year now) to get early dinner then work on Postgres at a local bar. Somehow it’s much easier to do Postgres from somewhere besides my home office, where I do all my normal work. Getting dinner lets me read something related (lately Designing Data-Intensive Applications by Martin Klepmann and PostgreSQL 14 Internals by Egor Rogov), and it’s fun. Doing just a little every week helps me keep momentum, so that fitting in further progress here and there seems easy. I’m lucky to have a wife who has supported it so often, despite leaving her with the kids and dishes.

I think I have years more work of temporal features to add, first finishing SQL:2011 then going beyond (e.g. temporal outer joins, temporal aggregates, temporal upsert). It’s been a great pleasure!

blog comments powered by Disqus Next: Git for Postgres Hacking