Survey of SQL:2011 Temporal Features

2019-09-04

Introduction

This blog post is a survey of SQL:2011 Temporal features from MariaDB, IBM DB2, Oracle, and MS SQL Server. I’m working on adding temporal features to Postgres, so I wanted to see how other systems interpret the standard.

If you’re new to temporal databases, you also might enjoy this talk I gave at PGCon 2019.

In this post I cover both application-time (aka valid-time) and system-time, but I focus more on valid-time. Valid-time tracks the history of the thing “out there”, e.g. when a house was remodeled, when an employee got a raise, etc. System-time tracks the history of when you changed the database. In general system-time is more widely available, both as native SQL:2011 features and as extensions/plugins/etc., but is less interesting. It is great for compliance/auditing, but you’re unlikely to build application-level features on it. Also since it’s generated automatically you don’t need special DML commands for it, and it is less important to protect yourself with temporal primary and foreign keys.

At this point all the major systems I survey have some temporal support, although none of them support it completely. On top of that the standard itself is quite modest, although in some ways it can be interpreted more or less expansively.

The Standard

I’ll start by giving a quick overview of the standard. Here I’m working from the draft documents (downloaded from here), and my interpretation may not be correct. If you have any corrections please let me know! Also you can find a more complete description of the standard at this article by Kulkarni and Michels (pdf).

In SQL:2011 the gateway to temporal features is a PERIOD, which is something you declare on your table. It is a range-like structure derived from two existing date columns. (Actually the standard also supports timestamp and timestamp with time zone, but I’ll use date as a synecdoche throughout this post.)

Periods

You can declare a valid-time PERIOD when you create the table or afterwards:

CREATE TABLE t (
  id          INTEGER,
  valid_from  DATE,
  valid_til   DATE,
  PERIOD FOR valid_at (valid_from, valid_til)
);

You can call the PERIOD whatever you like except SYSTEM_TIME, which is magical and enables system-time features. Both of the PERIOD’s source columns must be NOT NULL, and if not they are automatically converted to it. (Most databases do the same thing with a PRIMARY KEY.) Note that the NOT NULL requirement means to represent “forever” or “until further notice” you must use a sentinel value like 3000-01-01.

Naturally a PERIOD adds an implicit constraint that valid_from must be less than valid_til.

You can also define a SYSTEM_TIME period and ask the database to track changes for you:

CREATE TABLE t (
  id        INTEGER,
  sys_from  TIMESTAMP GENERATED ALWAYS AS ROW START,
  sys_til   TIMESTAMP GENERATED ALWAYS AS ROW END,
  PERIOD FOR system_time (valid_from, valid_til)
) WITH SYSTEM VERSIONING;

Technically the standard lets you use DATE columns for system-time periods, but it’s hard to imagine how that would work in practice. Really anything short of the RDBMS’s finest granularity could “squeeze out” some history.

Primary Keys

If you have a valid-time PERIOD then you can declare a temporal primary key when you create the table:

CREATE TABLE t (
  id          INTEGER,
  valid_from  DATE,
  valid_til   DATE,
  PERIOD FOR valid_at (valid_from, valid_til),
  CONSTRAINT tpk_t PRIMARY KEY (id, valid_at WITHOUT OVERLAPS)
);

A temporal primary key is a lot like a normal primary key, except the scalar part (here just id) does not have to be unique, as long as rows with the same key don’t overlap in time. In other words you can give product 5 one price today and another tomorrow, and there’s no contradiction. But if you have two rows with the same scalar key covering the same date, that’s a violation of temporal entity integrity.

Foreign Keys

Temporal referential integrity is like ordinary referential integrity, except the non-unique nature of temporal primary keys makes it trickier. In a temporal foreign key, the child row’s lifespan must be completely “covered” by one (or more!) rows in the parent table. In other words some parent record must exist for every moment the child record exists. You can declare a temporal foreign key between two tables that both have PERIODs, e.g.:

CREATE TABLE ch (
  id          INTEGER,
  valid_from  DATE,
  valid_til   DATE,
  t_id        INTEGER,
  PERIOD FOR valid_at (valid_from, valid_til),
  CONSTRAINT tpk_ch PRIMARY KEY (id, valid_at WITHOUT OVERLAPS),
  CONSTRAINT tfk_ch_to_t FOREIGN KEY (id, PERIOD valid_at)
    REFERENCES t (id, PERIOD valid_at)
);

Projecting

A PERIOD is not included in the projection when you SELECT * FROM t. It is questionable whether you can project it explicitly with SELECT *, valid_at FROM t, but since it’s not a full-fledged data type I’d say probably not.

Filtering

Also you can’t reference a PERIOD in most other contexts, e.g. as a function input, or a GROUP BY criterion, or when ORDERing, or joining. You can use it in a “period predicate”, which lets you test these period relationships:

  • overlaps
  • equals
  • contains
  • precedes
  • succeeds
  • immediately precedes
  • immediately succeeds

Either side of the relationship can use a previously-named PERIOD or an anonymous dynamically-constructed one, e.g.

x.valid_at OVERLAPS PERIOD(y.valid_from, y.valid_til)

It’s not clear to me where you can use a period predicate, although the standard groups it with other kinds of predicate under the <predicate> object, so maybe anywhere you like? This browsable BNF grammer makes it easy to see that a <predicate> can go anywhere that accepts a boolean expression, which can be used in a <search condition>, which is what you put into your WHERE clause, or a join’s ON, or a CASE WHEN, or lots of other places. If you have a firmer read of the standard here, let me know!

Also there is a special syntax for querying based on system-time. The standard doesn’t mention using it for valid-time, although you could imagine doing it:

SELECT * FROM t FOR SYSTEM_TIME AS OF t1
SELECT * FROM t FOR SYSTEM_TIME BETWEEN t1 AND t2
SELECT * FROM t FOR SYSTEM_TIME BETWEEN SYMMETRIC t1 AND t2
SELECT * FROM t FOR SYSTEM_TIME FROM t1 TO t2

If you ask for a limited time range, the stard/end columns do not get truncated to match your request. In other words, if you query FOR SYSTEM_TIME BETWEEN '2000-01-01' AND '2020-01-01', your result records’ sys_til attributes are still 3000-01-01 (or whatever your sentinel is).

DML

In UPDATE and DELETE commands you can restrict the timespan you want changed:

UPDATE  t
FOR PORTION OF valid_at FROM t1 TO t2
SET     ...
...

and

DELETE FROM t
FOR PORTION OF valid_at FROM t1 TO t2
...

These commands may require special transformations if they “hit” only part of an existing record. For example if you delete the middle of a longer timespan, then you need to replace the old big record with your new version plus two short records (one on each end). An update is the same: after changing the targeted portion, you’d have to insert new records to preserve each end of the original. The standard gives careful instructions here: the RDBMS should include these extra inserts within the “primary effect” of the operation.

There is no need for any special syntax for INSERT, nor for special transformations.

The standard doesn’t have anything to say about a MERGE statement (in Postgres ON CONFLICT DO UPDATE), except in the case of system-time tables, where there is no new syntax and it does what you’d expect.

Questions

Since a PERIOD is attached to a table and isn’t part of the relational model, it isn’t part of a result set. It gets lost when you query a table. That makes it hard to query non-table temporal data, like views, subqueries, CTEs, and set-returning functions. (This was a major criticism of the original TSQL2 proposal from the 90s.) Nonetheless I can imagine how SQL:2011 leaves open some workarounds, e.g. by letting you use anonymous PERIODs inside period predicates, and letting you use period predicates as widely as possible. Also you could argue that projecting a PERIOD is unnecessary since you already have the start and end columns. So if an RDBMS gave you deep support for period predicates, composing temporal results would still be possible—albeit awkward. In practice though, no one does this, as we will see.

SQL:2011 also has no support for joining temporal results. You can effect an inner join with the OVERLAPS operator, but not the other kinds.

Snodgrass suggested that temporal databases should “coalesce” results before presenting them or at least before saving them to a table. Coalescing means that when two rows have adjacent or overlapping timespans and all other attributs are identical, they get merged to become just one row. Duplicates are removed. This seems like good behavior, both for clarity and to avoid cutting up your data more and more finely as time goes on, but SQL:2011 doesn’t mention it.

There is also no explicit mention of how triggers combine with the new temporal DML operations.

MariaDB

MySQL doesn’t support any temporal features, but recent versions of MariaDB have started to add support. Version 10.3.4 (released Jan 2018) included system-time support; Version 10.4.3 (Feb 2019), valid-time.

System Time

MariaDB supports the normal syntax for declaring system-time tables, but you can also use this abbreviated syntax if you like:

CREATE TABLE t (
  id INT
) WITH SYSTEM VERSIONING;

That will automatically add pseudo-columns named ROW_START and ROW_END (which also don’t appear in SELECT * FROM t.

Or the standard syntax works too:

CREATE TABLE t (
  id        INTEGER,
  sys_from  TIMESTAMP(6) GENERATED ALWAYS AS ROW START,
  sys_til   TIMESTAMP(6) GENERATED ALWAYS AS ROW END,
  PERIOD FOR SYSTEM_TIME (valid_from, valid_til)
) WITH SYSTEM VERSIONING;

Either way, for a timestamp(6) column (which is what the docs use) it looks like the max future date is 2038:

MariaDB [temporal]> insert into t (id) values (2);
Query OK, 1 row affected (0.008 sec)

MariaDB [temporal]> select * from t2;
+------+----------------------------+----------------------------+
| id   | valid_from                 | valid_til                  |
+------+----------------------------+----------------------------+
|    2 | 2019-07-27 17:07:51.849190 | 2038-01-18 19:14:07.999999 |
+------+----------------------------+----------------------------+
1 row in set (0.004 sec)

That seems awfully soon to me.

You can use these three ways of asking for system-time filters:

SELECT * FROM t FOR SYSTEM_TIME AS OF '2020-01-01';
SELECT * FROM t FOR SYSTEM_TIME FROM '2020-01-01' TO '2030-01-01';
SELECT * FROM t FOR SYSTEM_TIME BETWEEN '2020-01-01' AND '2030-01-01';

MariaDB doesn’t know about BETWEEN SYMMETRIC.

You can also say FOR SYSTEM_TIME ALL, which is useful because the default (with no FOR SYSTEM_TIME at all) is to filter AS OF NOW().

MariaDB partially addresses the composability problem by letting you say FOR SYSTEM_TIME against a view, which “pushes down” the filter to the underlying tables. This even works if the view queries some non-system-time tables. Since every system-time PERIOD is named the same thing, the database can sensibly interpret FOR SYSTEM_TIME against your view.

System-Time Partitions

To prevent tables getting too large, you can automatically partition a table by its SYSTEM_TIME:

CREATE TABLE t (
  id INT
) WITH SYSTEM VERSIONING
  PARTITION BY SYSTEM_TIME (
    PARTITION p_hist HISTORY,
    PARTITION p_curr CURRENT
  );

That will keep current records in one partition and historical records in another. You can also have multiple historical partitions and ask the system to switch to the next one every n rows. You can also drop older partitions to keep your data growth under control.

Excluded Columns

To further economize on disk, you can qualify specific columns as WITHOUT SYSTEM VERSIONING to exclude them from history.

Application Time

Declaring an application-time PERIOD works, but you can’t include a temporal PRIMARY KEY:

CREATE TABLE t (
  id          INTEGER,
  valid_from  DATE,
  valid_til   DATE,
  PERIOD FOR valid_at (valid_from, valid_til),
  -- This next line breaks!:
  CONSTRAINT tpk PRIMARY KEY (id, valid_at WITHOUT OVERLAPS)
);

Naturally you can’t create temporal foreign keys either.

If you omit the NOT NULL for the PERIOD source columns (as above), they become NOT NULL automatically.

DML

In UPDATE and DELETE statements you can use FOR PORTION OF valid_at, per the standard. You can’t use an anonymous period:

UPDATE  t
FOR PORTION OF PERIOD (valid_from, valid_til)
SET     ...

(It’s hard to imagine why you’d want to though.)

Projecting

You can’t SELECT a period, named or anonymous:

SELECT * FROM t;
SELECT *, valid_at FROM t;
SELECT *, PERIOD (valid_from, valid_til) FROM t;

Filtering

In a SELECT you can’t use FOR valid_at to filter things. That’s a little sad but perhaps understandable since arguably the standard only requires FOR SYSTEM_TIME. But period predicates don’t work either. These were all errors for me:

SELECT * FROM t WHERE valid_at CONTAINS '2020-01-01';
SELECT * FROM t WHERE valid_at OVERLAPS PERIOD('2020-01-01', '2030-01-01');

So if you want to ask questions about your valid-time history, you need to query against the scalar date columns.

Triggers

You can declare triggers on valid-time tables, and the triggers do fire for the extra inserts. Here is what I did to test things:

CREATE TABLE thist (
  id INTEGER,
  old_valid_from DATE,
  old_valid_til DATE,
  new_valid_from DATE,
  new_valid_til DATE, op CHAR(1)
);

CREATE TRIGGER tins AFTER INSERT ON t
FOR EACH ROW
INSERT INTO thist VALUES 
(NEW.id, NULL, NULL, NEW.valid_from, NEW.valid_til, 'i');

CREATE TRIGGER tupd AFTER UPDATE ON t
FOR EACH ROW
INSERT INTO thist VALUES
(NEW.id, OLD.valid_from, OLD.valid_til, NEW.valid_from, NEW.valid_til, 'u');

CREATE TRIGGER tdel AFTER DELETE ON t
FOR EACH ROW
INSERT INTO thist VALUES
(OLD.id, OLD.valid_from, OLD.valid_til, NULL, NULL, 'd');

If you UPDATE in the middle of a larger record, you get two INSERTs for the unaltered ends followed by an UPDATE of the middle. (The INSERTs come first.) The NEW.valid_from and NEW.valid_til mark the part that is being inserted/updated, as you’d expect.

If you DELETE in the middle of a larger record, you also get two INSERTS followed by a DELETE of the part you touched. In the delete trigger the OLD.valid_{from,til} columns have their actual old values, not the slice you’re deleting. (This probably makes sense, but it feels a little too mechanical/literal. It means your DELETE trigger doesn’t know what slice of history you’re actually removing.)

Bitemporal

You can also define bitemporal tables!

IBM DB2

DB2 has the fullest temporal support of all the databases I examined. My tests used version 11.5.0.0 on Linux.

System Time

System-time works with a few syntax differences:

CREATE TABLE t (
  id        INTEGER NOT NULL PRIMARY KEY,
  sys_from  TIMESTAMP(12) NOT NULL GENERATED ALWAYS AS ROW BEGIN,
  sys_til   TIMESTAMP(12) NOT NULL GENERATED ALWAYS AS ROW END,
  PERIOD SYSTEM_TIME (sys_from, sys_til)
);

You have to omit WITH SYSTEM VERSIONING, and you have to explicitly make the period source columns NOT NULL. Also you say GENERATED ALWAYS AS ROW BEGIN not GENERATED ALWAYS AS ROW START. Finally it is PERIOD SYSTEM_TIME not PERIOD FOR SYSTEM_TIME.

The sentinel for “forever” is 9999-12-30-00.00.00.000000000000.

Application Time

DB2 supports many valid-time features—but only if you name the period BUSINESS_TIME. At IBM, it’s always business time! (I am shamelessly stealing this joke from my audience at PGCon 2019.)

Valid-time periods have the same syntax quirks as system-time.

You can define temporal primary keys!

According to the docs you can define temporal foreign keys, but I couldn’t make it work:

db2 => create table t2 (id integer not null, valid_from date not null, valid_til date not null, \
db2 (cont.) => t_id integer, period business_time (valid_from, valid_til), \
db2 (cont.) => constraint t2pk primary key (id, business_time without overlaps), \
db2 (cont.) => constraint tfk foreign key (t_id, period business_time) \
db2 (cont.) => references t (id, period business_time));
DB21034E  The command was processed as an SQL statement because it was not a 
valid Command Line Processor command.  During SQL processing it returned:
SQL0104N  An unexpected token "business_time" was found following "gn key 
(t_id, period".  Expected tokens may include:  "<space>".  SQLSTATE=42601

Someome else can’t make it work either, according to this forum thread.

ALTER TABLE failed for me too:

db2 => create table t2 (id integer not null, valid_from date not null, valid_til date not null, \
db2 (cont.) => t_id integer, period business_time (valid_from, valid_til), \
db2 (cont.) => constraint t2pk primary key (id, business_time without overlaps));
DB20000I  The SQL command completed successfully.
db2 => alter table t2 add constraint tfk foreign key (t_id, period business_time) \
db2 (cont.) => references t (id, period business_time);
DB21034E  The command was processed as an SQL statement because it was not a 
valid Command Line Processor command.  During SQL processing it returned:
SQL0104N  An unexpected token "business_time" was found following "gn key 
(t_id, period".  Expected tokens may include:  "<space>".  SQLSTATE=42601

If I learn a way to make it work, I’ll update this article.

Projecting

As usual SELECT * FROM t does not give you the period, and SELECT *, valid_at FROM t is an error. Periods are not first-class types.

Filtering

DB2 nicely interprets the standard generously and lets you use the system-time SELECT syntax for application-time too:

SELECT * FROM t FOR business_time FROM t1 TO t2
SELECT * FROM t FOR business_time AS OF t1

but not:

SELECT * FROM t FOR business_time BETWEEN t1 AND t2

I couldn’t get any of the period predicates to work, e.g.:

db2 => select * from t where business_time contains '2015-01-01';
SQL0104N  An unexpected token "contains" was found following "where 
business_time".  Expected tokens may include:  "CONCAT".  SQLSTATE=42601

I also couldn’t do anything creative with anonymous periods, e.g.:

db2 => select * from t for period(valid_from, valid_til) as of '2015-01-01';
SQL0104N  An unexpected token "period" was found following "select * from t 
for".  Expected tokens may include:  "<space>".  SQLSTATE=42601

IBM doesn’t even care if you call it business_time:

db2 => select * from t for period business_time(valid_from, valid_til) as of '2015-01-01';
SQL0104N  An unexpected token "period business_time" was found following 
"select * from t for".  Expected tokens may include:  "<space>".  
SQLSTATE=42601

That means temporal features are going to break down when used with views, subqueries, CTEs, and set-returning functions. A period is tied to a table, but not a result set.

DML

IBM DML is pretty standard. You can UPDATE or DELETE FOR PORTION OF BUSINESS_TIME FROM '2010-06-01' TO '2010-06-15'. The extra INSERTs happen as expected.

Triggers

Like MariaDB, DB2 does call triggers for the derived INSERTs. Here is some setup to add a row to thist whenever a trigger gets called:

create table thist (id integer, old_valid_from date, old_valid_til date, new_valid_from date, new_valid_til date, op char(1));

create trigger tins after insert on t referencing new as new \
for each row insert into thist values \
(NEW.id, null, null, NEW.valid_from, NEW.valid_til, 'i');

create trigger tupd after update on t referencing old as old new as new \
for each row insert into thist values \
(NEW.id, OLD.valid_from, OLD.valid_til, NEW.valid_from, NEW.valid_til, 'u');

create trigger tdel after delete on t referencing old as old \
for each row insert into thist values \
(OLD.id, OLD.valid_from, OLD.valid_til, null, null, 'd');

If we UPDATE FOR PORTION OF in the middle of a larger record, our INSERT trigger is called twice:

db2 => update t \
db2 (cont.) => for portion of business_time \
db2 (cont.) => from '2015-01-01' to '2016-01-01' \
db2 (cont.) => set foo = 'bar';
DB20000I  The SQL command completed successfully.
db2 => select * from t;

ID          VALID_FROM VALID_TIL  FOO       
----------- ---------- ---------- ----------
          1 01/01/2015 01/01/2016 bar       
          1 01/01/2020 01/01/2030 -         
          1 01/01/2010 01/01/2015 -         
          1 01/01/2016 01/01/2020 -         

  4 record(s) selected.

db2 => select * from thist;

ID          OLD_VALID_FROM OLD_VALID_TIL NEW_VALID_FROM NEW_VALID_TIL OP
----------- -------------- ------------- -------------- ------------- --
          1 -              -             01/01/2010     01/01/2015    i 
          1 -              -             01/01/2016     01/01/2020    i 
          1 01/01/2010     01/01/2020    01/01/2015     01/01/2016    u 

  3 record(s) selected.

Bitemporal

Bitemporal works too!

Oracle

For my tests I used Oracle 19c (version 19.3) for Linux and ran it on CentOS 7.

System time

Oracle has its own way of tracking table history, so it doesn’t bother with SQL:2011 system-time.

Application time

Oracle lets you declare a PERIOD, but like MariaDb you can’t define a temporal primary key:

CREATE TABLE t (
  id          INTEGER,
  valid_from  DATE,
  valid_til   DATE,
  PERIOD FOR valid_at (valid_from, valid_til),
  -- This next line breaks!:
  CONSTRAINT tpk PRIMARY KEY (id, valid_at WITHOUT OVERLAPS)
);

Of course that means no foreign keys either.

One interesting thing is that a PERIOD doesn’t force your columns to NOT NULL:

SQL> desc t;        
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 ID                                                 NUMBER(38)
 VALID_FROM                                         DATE
 VALID_TIL                                          DATE

And that’s because nulls are allowed in PERIOD-source columns:

SQL> insert into t values (6, null, null);

1 row created.

SQL> select * from t where id = 6;

        ID VALID_FRO VALID_TIL
---------- --------- ---------
         6

Projecting

When you say SELECT * FROM t you don’t get the period. You also can’t say this either, but in Oracle’s case it’s a parser error:

SELECT *, valid_at FROM t;

This doesn’t work either:

SELECT *, 1+1 FROM t;

But if you avoid the * you can select it!:

SQL> SELECT id, valid_from, valid_til, valid_at FROM t;

        ID VALID_FRO VALID_TIL   VALID_AT
---------- --------- --------- ----------
         1 01-JAN-00 01-JAN-30      33426
         2 01-JAN-10 01-JAN-30      33426
         3 01-JAN-20 01-JAN-30      33426
         4 01-JAN-00 01-JAN-10      33426

The result doesn’t mean much to me though. Anyone have any ideas?

Filtering

Like in DB2 you are able to filter by a valid-time period, although the syntax is a little non-standard (and wordy):

SQL> SELECT * FROM t
  2  AS OF PERIOD FOR valid_at DATE '2005-01-01';

        ID VALID_FRO VALID_TIL
---------- --------- ---------
         1 01-JAN-00 01-JAN-30
         4 01-JAN-00 01-JAN-10
         6

Incidentally, you can see here that NULL in a period means “unbounded”. You can also make just one of the bounds NULL, and AS OF queries give the expected results. This is just like Postgres ranges! If Oracle does this for PERIODs, perhaps Postgres should too?

You can use BETWEEN too, but its syntax is similarly garbled:

SQL> SELECT * FROM t
  2  VERSIONS PERIOD FOR valid_at
  3  BETWEEN DATE '2025-01-01' AND DATE '2035-01-01';

        ID VALID_FRO VALID_TIL
---------- --------- ---------
         2 01-JAN-10 01-JAN-30
         1 01-JAN-00 01-JAN-30
         3 01-JAN-20 01-JAN-30
         6

Anonymous periods don’t seem to work though:

SQL> SELECT * FROM t
  2  AS OF PERIOD FOR (valid_from, valid_til) DATE '2005-01-01';
AS OF PERIOD FOR (valid_from, valid_til) DATE '2005-01-01'
                 *
ERROR at line 2:
ORA-00904: : invalid identifier

You also can’t use standard period predicates:

SQL> SELECT * FROM t WHERE valid_at CONTAINS DATE '2015-01-01';
SELECT * FROM t WHERE valid_at CONTAINS DATE '2015-01-01'
                               *
ERROR at line 1:
ORA-00920: invalid relational operator


SQL> SELECT * FROM t WHERE valid_at OVERLAPS PERIOD('2015-01-01', '2020-01-01');
SELECT * FROM t WHERE valid_at OVERLAPS PERIOD('2015-01-01', '2020-01-01')
                               *
ERROR at line 1:
ORA-00920: invalid relational operator

DML

Oracle doesn’t understand FOR PORTION OF:

SQL> UPDATE t FOR PORTION OF valid_at
  2  FROM DATE '2005-01-01' TO DATE '2006-01-01'
  3  SET id = 8 WHERE id = 1;
UPDATE t FOR PORTION OF valid_at
             *
ERROR at line 1:
ORA-00905: missing keyword


SQL> DELETE FROM t FOR PORTION OF valid_at
  2  FROM DATE '2005-01-01' TO DATE '2006-01-01'
  3  WHERE id = 1;
DELETE FROM t FOR PORTION OF valid_at
              *
ERROR at line 1:
ORA-00933: SQL command not properly ended

Triggers

In Oracle you can define triggers on tables with a valid-time period, but without temporal DML there are no interesting questions about how they should behave. Nonetheless here are the same triggers as above but in Oracle syntax (in case I ever want to test this in the future):

CREATE TABLE thist (
  id INTEGER,
  old_valid_from DATE,
  old_valid_til DATE,
  new_valid_from DATE,
  new_valid_til DATE, op CHAR(1)
);

CREATE TRIGGER tins AFTER INSERT ON t
FOR EACH ROW
BEGIN
INSERT INTO thist VALUES 
(:NEW.id, NULL, NULL, :NEW.valid_from, :NEW.valid_til, 'i');
END;
/

CREATE TRIGGER tupd AFTER UPDATE ON t
REFERENCING OLD AS OLD NEW AS NEW
FOR EACH ROW
BEGIN
INSERT INTO thist VALUES
(:NEW.id, :OLD.valid_from, :OLD.valid_til, :NEW.valid_from, :NEW.valid_til, 'u');
END;
/

CREATE TRIGGER tdel AFTER DELETE ON t
REFERENCING OLD AS OLD
FOR EACH ROW
BEGIN
INSERT INTO thist VALUES
(:OLD.id, :OLD.valid_from, :OLD.valid_til, NULL, NULL, 'd');
END;
/

MS SQL Server

I tested an evaluation copy of MS SQL Server 2017 (version 14.0.1000.169, RTM).

SQL Server doesn’t support application-time periods at all, just system-time.

System Time

The syntax for system-time tables is just a little non-standard:

CREATE TABLE dbo.t (
  id INTEGER PRIMARY KEY,
  valid_from datetime2 GENERATED ALWAYS AS ROW START,
  valid_til datetime2 GENERATED ALWAYS AS ROW END,
  PERIOD FOR SYSTEM_TIME (valid_from, valid_til)
) WITH (
  SYSTEM_VERSIONING = ON
);

Note the parens, the underscore, and the = ON.

The history is stored in a separate invisible table with a generated name. But you can query that table like any other, so if you want to give it a nicer name you can:

WITH (
  SYSTEM_VERSIONING = ON (HISTORY_TABLE = dbo.thist)
);

The valid_til sentinel will be 9999-12-31 23:59:59.9999999.

To ask about a certain time you can say any of these:

SELECT * FROM t FOR SYSTEM_TIME AS OF '2020-01-01';
SELECT * FROM t FOR SYSTEM_TIME BETWEEN '2020-01-01' AND '2030-01-01';
SELECT * FROM t FOR SYSTEM_TIME FROM '2020-01-01' TO '2030-01-01';

but not BETWEEN SYMMETRIC.

Conclusion

So basically everyone has at least one kind of PERIOD.

Everyone but Oracle has system-time (and they have another older approach).

The only database with temporal primary keys is DB2. They claim to have temporal foreign keys too, but I couldn’t make it work.

I was pleased that two databases let you select with FOR and a valid-time period. No one lets you build anonymous periods (in FOR, FOR PORTION OF, or elsewhere), and no one supports period predicates.

With temporal DML, the extra inserts seem to be consistent (between MariaDB and DB2), and both databases fire triggers on them the same way.

I hope this helps the Postgres community work out their own temporal behavior with respect to the standard. I think it was an interesting study in its own right, too. One thing I learned is that “every other RDBMS supports SQL:2011” is only sort of true, at least as of today. :-)

Drawing Redux Form FieldArrays with Pug

2019-01-09

Having been spoiled by the Rails Prime Stack for nearly a decade, in my React projects I prefer using Pug (née Jade) instead of JSX for its Haml-like syntax. A lot of people praise Pug and Haml for saving them typing, and while that’s nice, the real appeal to me is how easy they are to read. You spend a lot more time reading code than writing it, and Pug/Haml make the document structure immediately obvious. Making closing-tag errors obsolete is pretty nice, too.

With the babel-plugin-transform-react-pug package, you can replace your JSX with something like this:

class MyApp extends React.Component {
  render() {
    return pug`
      Provider(store=configureStore(this.props))
        table
          tbody
            tr
              td table
              td layout
              td anyone?
    `;
  }
}

But Pug is definitely not as widely adopted within React as Haml is within Rails, and it shows. I ran into a tricky issue using Pug to render FieldArrays with Redux Form and React Bootstrap. To combine those two packages I’m basically following John Bennett’s advice, except with Pug.

Here is how typical scalar fields work:

const renderInput = ({input, label, type, meta}) => pug`
  FormGroup(controlId=input.name validationState=${validState(meta)})
    Col(componentClass=${ControlLabel} sm=2)
      = label || humanizeString(input.name)
    Col(sm=5)
      FormControl(...input type=type)
`

class EditClientPage extends React.Component {
  render() {
    return pug`
      Form(horizontal onSubmit=${this.props.handleSubmit})
        fieldset
          Field(name='name' component=renderInput type='text')
          Field(name='paymentDue' component=renderInput type='number')
    `
  }
}

That’s from my time-tracking app where I wrote about using FieldArray with redux-form-validators. The Field component is from Redux Form, and everything else is from React-Bootstrap (Form, FormGroup, Col, ControlLabel, and FormControl). You can see that Field expects a custom component to draw its details, giving you total flexibility how you structure your form’s DOM. Most of the Boostrap components go inside the custom component used by Field.

So far that’s pretty nice, but if you have a FieldArray you need more nesting. A FieldArray is also part of Redux Form, and is used to draw a list of child records with their sub-fields. In my case I want the page to have one or more “work categories”, each with a name and hourly rate, e.g. “Design” and “Development”.

Like Field, a FieldArray delegates rendering to a custom component. Then that component will render the individual Fields (each with their own custom component in turn). If you adapted the Redux Form docs’ example, you might try something like this:

const renderSimpleInput = ({input, placeholder, type, meta}) => pug`
  span(class=${validClass(meta)})
    FormControl(...input placeholder=placeholder type=type)
`

const renderWorkCategories = ({fields, meta}) => `pug
  .noop
    = fields.map((wc, index) => `pug
      FormGroup(key=${index})
        Col(sm=2)
          Field(name=${`${wc}.name`} component=renderSimpleInput type='text' placeholder='name')
        Col(sm=5)
          Field(name=${`${wc}.rate`} component=renderSimpleInput type='number' placeholder='rate')
        Col(sm=2)
          a.btn.btn-default(onClick=${()=>fields.remove(index)}) remove
    `)
    FormGroup
      Col(smOffset=2 sm=5)
        a.btn.btn-default(onClick=${()=>fields.push({})}) add rate
`

class EditClientPage extends React.Component {
  render() {
    return pug`
      ...
      FieldArray(name='workCategories' component=${renderWorkCategories})
      ...
    `
  }
}

The problem is that you can’t nest pug strings like that. I’m not sure if the problem is with the Babel transformer or the pug parser itself, but you get an error. Of course that’s not idiomatic Pug anyway, but surprisingly, you can’t use Pug’s each command either:

const renderWorkCategories = ({fields, meta}) => `pug
  .noop
    each wc, index in fields
      FormGroup(key=${index})
        Col(sm=2)
          Field(name=${`${wc}.name`} component=renderSimpleInput type='text' placeholder='name')
        Col(sm=5)
          Field(name=${`${wc}.rate`} component=renderSimpleInput type='number' placeholder='rate')
        Col(sm=2)
          a.btn.btn-default(onClick=${()=>fields.remove(index)}) remove
    ...
`

This gives you the error Expected "fields" to be an array because it was passed to each. Apparently Redux Form is not using a normal array here, but its own special object.

The trick is to call getAll, like this:

each wc, index in fields.getAll()
  FormGroup(key=${index})
    Col(sm=2)
      Field(name=${`workCategories[${index}].name`} component=renderSimpleInput type='text' placeholder='name')
    Col(sm=5)
      Field(name=${`workCategories[${index}].rate`} component=renderSimpleInput type='number' placeholder='rate')
    Col(sm=2)
      a.btn.btn-default(onClick=${()=>fields.remove(index)}) remove
`

Note that we also had to stop using ${wc} and are building the field name “by hand”. Personally I think we can stop here and be done, but if that feels like breaking encapsulation to you, or if you want something more generic that doesn’t need to “know” its FieldArray name, there is another way to do it. Even if it’s a bit too much for a real project, it’s interesting enough that it’s maybe worth seeing.

To start, we need to call fields.map with another custom component. This almost works:

const renderWorkCategory = (wc, index) => `pug
  FormGroup(key=${index})
    Col(sm=2)
      Field(name=${`${wc}.name`} component=renderSimpleInput type='text' placeholder='name')
    Col(sm=5)
      Field(name=${`${wc}.rate`} component=renderSimpleInput type='number' placeholder='rate')
    Col(sm=2)
      a.btn.btn-default(onClick=${()=>fields.remove(index)}) remove
`

const renderWorkCategories = ({fields, meta}) => `pug
  .noop
    = fields.map(renderWorkCategory)
    ...

The only problem is the remove button: fields is no longer in scope!

The solution is to use currying. The component we hand to fields.map will be a partially-applied function, generated by passing in fields early. ES6 syntax makes it really easy. The full code looks like this:

const renderWorkCategory = (fields) => (wc, index) => `pug
  FormGroup(key=${index})
    Col(sm=2)
      Field(name=${`${wc}.name`} component=renderSimpleInput type='text' placeholder='name')
    Col(sm=5)
      Field(name=${`${wc}.rate`} component=renderSimpleInput type='number' placeholder='rate')
    Col(sm=2)
      a.btn.btn-default(onClick=${()=>fields.remove(index)}) remove
`

const renderWorkCategories = ({fields, meta}) => `pug
  .noop
    = fields.map(renderWorkCategory(fields))
    ...
`

You may recall we also used currying to combine redux-form-validators with FieldArray. It can really come in handy!

As I’ve said before, programming one thing is usually easy; it gets hard when we try to do many things at once. Here I show how to use Pug with a FieldArray from Redux Form, on a page styled with React Bootstrap. I hope you found it useful if like me you are trying to have your cake and eat it too. :-)

UPDATE: It turns out I was making things too complicated: fields.map takes an optional third parameter, fields. That means there is no need to curry renderWorkCategory and pass in fields early. Instead of this:

const renderWorkCategory = (fields) => (wc, index) => ...

you can just say this:

const renderWorkCategory = (wc, index, fields) => ...

I guess it pays to read the documentation! :-)

Validating FieldArrays in Redux Form

2019-01-08

I have a Redux Form project where I’m using redux-form-validators to get easy Rails-style validations. Its docs explain how to define validators for normal scalar fields, but I thought I’d add how to do the same for a FieldArray, which is a list of zero of more sub-fields.

In my case I’m working on a time-tracking and invoicing application, where admins can edit a client. The form has regular fields for the client’s name, the invoicing frequency, etc., and then it also has “work categories”. Each client has one or more work categories, and a work category is just a name and an hourly rate. For instance you might charge one rate for design and another for development, or you might track retainer hours at $0/hour and extra hours at your normal rate.

Redux Form makes it really easy to include child records right inside the main form using FieldArray. Their docs give a nice example of how to validate those nested fields, but it’s pretty DIY and verbose.

With redux-form-validators on the other hand, it’s easy. First you define a validation object with the rules for each field, like so:

const validations = {
  name: [required()],
  paymentDue: [required(), numericality({int: true, '>=': 0})],
  // ...
};

Then you write a little validation function to pass to Redux Form:

const validate = function(values) => {
  const errors = {}
  for (let field in validations) {
    let value = values[field]
    errors[field] = validations[field].map(validateField => {
      return validateField(value, values)
    }).find(x => x) // Take the first non-null error message.
  }
  return errors
};

export default connect(mapStateToProps, mapDispatchToProps)(
  reduxForm({
    form: 'editClient',
    validate
  })(EditClientPage)
);

That is just straight from the docs. For each field it iterates over its validators and reports the first error it finds. It’s simple, but it doesn’t know how to handle nesting like with FieldArray.

But notice you can really do whatever you like. It’s a bit striking for someone used to Rails how exposed and customizable everything is here. So how can we rewrite validate to be smarter?

Ideally we’d like to support validations both on individual sub-fields, like the name of a single work category, and also on the workCategories as a whole, for example checking that you have at least one. This is exactly what the Redux Form example does above: it checks that your club has at least one member, and that each member has a first and last name.

Because this is a FieldArray, Redux Form expects a different structure for its entry in the errors object returned by validate. Normally you’d have a string value, like this:

{
  paymentDue: 'must be numeric'
}

But for a FieldArray you want to pass an array with one error object for each corresponding FieldArray element, like this:

{
  workCategories: [
    {},   // no errors
    {name: 'is required'},
    {name: 'is too short', rate: 'must be numeric'},
  ]
}

Those sub-errors will get passed in the meta object given to the component used to render each Field. Again, you can see that happening in the Redux Form example.

In addition, the array may have its own _error attribute for top-level errors. That gets passed as meta.error to the FieldArray component itself. So _error is not just a convention; it’s a “magic” attribute built into Redux Form. We want to set it too.

I want a way to define all these validations in one big object: top-level fields, the FieldArray itself, and the fields of individual FieldArray records. Here is the structure I set up:

const validations = {
  name:       [required()],
  paymentDue: [required(), numericality({int: true, '>=': 0})],
  ...
  workCategories: {
    _error: [
      required({msg: "Please enter at least one work category."}),
      length({min: 1, msg: "Please enter at least one work category."})
    ],
    name:   [required()],
    rate:   [required(), numericality({'>=': 0})],
  },
};

Then instead of the recommended validate function I used this:

const buildErrors = (validations) => (values) => {
  const errors = {};
  for (let field in validations) {
    if (field === '_error') continue;
    let value = values[field];
    const fieldValidations = validations[field];
    if (fieldValidations.constructor === Array) {
      errors[field] = fieldValidations
          .map(validateField => validateField(value, values))
          .find(x => x);
    } else {
      errors[field] = value ? value.map(o => buildErrors(fieldValidations)(o)) : [];
      if (fieldValidations._error) {
        errors[field]._error = fieldValidations._error
            .map(validateField => validateField(value, values))
            .find(x => x);
    }
  }
  return errors;
}

export default connect(mapStateToProps, mapDispatchToProps)(
  reduxForm({
    form: 'editClient',
    validate: buildErrors(validations),
  })(EditClientPage)
);

There are a few things to note here: I’m using a recursive function, where each call handles one “level” of the validations object. It iterates over the field names, and if the field has an array, it handles it as before. Otherwise it expects a nested object structured just like the outermost object, which each sub-field has its own array of validators. There may also be a list of validators under _errors, and those are handled specially. I’m using currying so that I can build a top-level validation function for reduxForm as well as nested functions for each FieldArray.

This function also supports reporting errors both on sub-fields and the whole FieldArray at the same time. That wouldn’t happen if the only FieldArray check was to have at least one element, like here, but you can imagine scenarios where you want to validate the FieldArray as a whole even when it isn’t empty.

I’m happy that this approach lets me combine the ease of redux-form-validators with FieldArray to get the best of both worlds. I also like that buildErrors is general enough that I can move it to a utility collection, and not write it separately for each form in my app.

Also: you might enjoy my follow-up article showing how to render the form Fields and FieldArrays with Pug and Bootstrap.

Testing Your ActionMailer Configuration

2018-08-04

Sometimes with Rails you just want to test basic email sending from the console. Here is how you can do it, without needing a Mailer or views or anything:

ActionMailer::Base.mail(
  to:      'dest@example.com',
  from:    'src@example.com', 
  subject: 'test',
  body:    'testing',
).deliver_now

Or in older versions use deliver instead of deliver_now. This will use your ActionMailer settings but doesn’t require anything else, so you can test just the SMTP config (or whatever).

Adding an if method to ActiveRecord::Relation

2018-06-30

I often find myself chaining ActiveRecord scopes, like this:

q = Article
  .not_deleted
  .published_within(start_date, end_date)
q = q.with_tags(tags) if tags.any?

I wish there were a nicer way to put in the conditional scopes, without assigning to a temporary variable. What I really want is to write this:

Article
  .not_deleted
  .published_within(start_date, end_date)
  .if(tags.any?) { with_tags(tags) }

Wouldn’t that be nicer?

Many years ago I brought this up on the pdxruby mailing list, and no one seemed very interested, but I’ve always wanted it in my projects.

Here is a naïve implementation, which just for simplicity I’ll add to Object instead of ActiveRecord::Relation (where it really belongs):

class Object

  def if(condition, &block)
    condition ? instance_eval(&block) : self
  end

end

5.if(true)  { self + 7 }    # equals 12
5.if(false) { self + 7 }    # equals 5

That almost works! The problem is that inside the block, things don’t really act like a closure. If we used this implementation, our example above would give a NameError about not finding a tags method or local variable. That’s because everything inside the block is evaluated with self set to the ActiveRecord::Relation instance.

Fortunately there is a way to fix it! Before calling the block, we can save the outside self, and then we can use method_missing to delegate any failures to there.

There is a nice writeup of this “cloaking” trick if you want more details. But if you read that article, perhaps you will notice it is not thread-safe, because it temporarily adds a method to the class, and two threads could stomp on each other if they did that at the same time.

This approach was in Rails ActiveSupport for a while as Proc#bind. They even fixed the multi-threading problem (more or less . . .) by generating a different method name every time. Unfortunately that created a new problem: since define_method takes a symbol, this creates more and more symbols, which in Ruby are never garbage collected! Effectively this is a memory leak. Eventually the Rails team deprecated it.

But we can still add something similar that doesn’t leak memory and is thread-safe. We just have to protect the brief moment when we define the method and then remove it, which is simple with a Mutex. In theory taking a lock adds some overhead, and possibly contention, but we don’t expect this to be a “hot spot”, so in practice the contention should be zero and the overhead trivial.

And here is our implementation (not on Object any more):

module ActiveRecord
  class Relation

    CLOAKER_MUTEX = Mutex.new

    def if(condition, &block)
      if condition
        meth = self.class.class_eval do
          CLOAKER_MUTEX.synchronize do
            define_method :cloaker_, &block
            meth = instance_method :cloaker_
            remove_method :cloaker_
            meth
          end
        end
        with_previous_context(block.binding) { meth.bind(self).call }
      else
        self
      end
    end

    def with_previous_context(binding, &block)
      @previous_context = binding.eval('self')
      result = block.call
      @previous_context = nil
      result
    end

    def method_missing(method, *args, &block)
      super
    rescue NameError => e
      if @previous_context
        @previous_context.send(method, *args, &block)
      else
        raise e
      end
    end
  end
end

Put that in config/initializers and try it out!

An nginx HTTP-to-HTTPS Redirect Mystery, and Configuration Advice

2018-03-25

I noticed a weird thing last night on an nginx server I administer. The logs were full of lines like this:

42.232.104.114 - - [25/Mar/2018:04:50:49 +0000] "GET http://www.ioffer.com/i/new-fashion-fine-gold-bracelet-versaec-bracelet-641175733 HTTP/1.1" 301 185 "http://www.ioffer.com/" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; Hotbar 4.1.8.0; RogueCleaner; Alexa Toolbar)"

Traffic was streaming in continuously: maybe ten or twenty requests per second.

At first I thought the server had been hacked, but really it seemed people were just sending lots of traffic and getting 301 redirects. I could reproduce the problem with a telnet session:

$ telnet 198.51.100.42 80
Trying 198.51.100.42...
Connected to example.com.
Escape character is '^]'.
GET http://www.ioffer.com/i/men-shouder-messenger-bag-briefcase-shoulder-bags-649303549 HTTP/1.1
Host: www.ioffer.com

HTTP/1.1 301 Moved Permanently
Server: nginx/1.10.1
Date: Sun, 25 Mar 2018 04:56:06 GMT
Content-Type: text/html
Content-Length: 185
Connection: keep-alive
Location: https://www.ioffer.com/i/men-shouder-messenger-bag-briefcase-shoulder-bags-649303549

<html>
<head><title>301 Moved Permanently</title></head>
<body bgcolor="white">
<center><h1>301 Moved Permanently</h1></center>
<hr><center>nginx/1.10.1</center>
</body>
</html>

In that session, I typed the first two lines after Escape character..., plus the blank line following. Normally a browser would not include a whole URL after GET, only the path, like GET /about.html HTTP/1.1, but including the whole URL is used when going through a proxy. Also it may be possible to leave off the Host header. Technically it is required for HTTP/1.1, so I added it just out of habit. I didn’t test without it.

So what was happening here? I was following some common advice to redirect HTTP to HTTPS, using configuration like this:

server {
  listen 80;
  server_name example.com *.example.com;
  return 301 https://$host$request_uri;
}

The problem is the $host evaluates to whatever the browser wants. In order of precedence, it can be (1) the host name from the request line (as in my example), (2) the Host header, or (3) what you declared as the server_name for the matching block. A safer alternative is to send people to https://$server_name$request_uri. Then everything is under your control. You can see people recommending that on the ServerFault page.

The problem is when you declare more than one server_name, or when one of them is a wildcard. The $server_name variable always evaluates to the first one. It also doesn’t expand wildcards. (How could it?) That wouldn’t work for me, because in this project admins can add new subdomains any time, and I don’t want to update nginx config files when that happens.

Eventually I solved it using a config like this:

server {
  listen 80 default_server;
  server_name example.com;
  return 301 https://example.com$request_uri;
}
server {
  listen 80;
  server_name *.example.com;
  return 301 https://$host$request_uri;
}

Notice the default_server modifier. If any traffic actually matches *.example.com, it will use the second block, but otherwise it will fall back to the first block, where there is no $host variable, but just a hardcoded redirect to my own domain. After I made this change, I immediately saw traffic getting the redirect and making a second request back to my own machine, usually getting a 404. I expect pretty soon whoever is sending this traffic will knock it off. If not, I guess it’s free traffic for me. :-)

(Technically default_server is not required since if no block is the declared default, nginx will make the first the default automatically, but being explicit seems like an improvement, especially here where it matters.)

I believe I could also use a setup like this:

server {
  listen 80 default_server;
  return 301 "https://www.youtube.com/watch?v=dQw4w9WgXcQ";
}
server {
  listen 80;
  server_name example.com *.example.com;
  return 301 https://$host$request_uri;
}

There I list all my legitimate domains in the second block, so the default only matches when people are playing games. I guess I’m too nice to do that for real though, and anyway it would make me nervous that some other misconfiguration would activate that first block more often than I intended.

I’d still like to know what the point of this abuse was. My server wasn’t acting as an open proxy exactly, because it wasn’t fulfilling these requests on behalf of the clients and passing along the response (confirmed with tcpdump -n 'tcp[tcpflags] & (tcp-syn) != 0 and src host 198.51.100.42'); it was just sending a redirect. So what was it accomplishing?

The requests were for only a handful of different domains, mostly Chinese. They came from a variety of IPs. Sometimes an IP would make requests for hours and then disappear. The referrers varied. Most were normal, like Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0), but some mentioned toolbars like the example above.

I guess if it were DNS sending them to my server there would (possibly) be a redirect loop, which I wasn’t seeing. So was my server configured as their proxy?

To learn a little more, I moved nginx over to port 81 and ran this:

mkfifo reply
netcat -kl 80 < reply | tee saved | netcat 127.0.0.1 81 > reply

(At first instead of netcat I tried ./mitmproxy --save-stream-file +http.log --listen-port 80 --mode reverse:http://127.0.0.1:81 --set keep_host_header, but it threw errors on requests with full URLs (GET http://example.com/ HTTP/1.1) because it thought it should only see those in regular mode.)

Once netcat was running I could tail -F saved in another session. I saw requests like this:

GET http://www.douban.com/ HTTP/1.1
User-Agent: Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0)
Accept: text/html, */*
Accept-Language: zh-cn; en-us
Referer: http://www.douban.com/
Host: www.douban.com
Pragma: no-cache

I also saw one of these:

CONNECT www.kuaishou.com:443 HTTP/1.0
User-Agent: Opera/9.80 (Windows NT 6.1; U; en) Presto/2.8.131 Version/11.11
Host: www.kuaishou.com:443
Content-Length: 0
Proxy-Connection: Keep-Alive
Pragma: no-cache

That is a more normal proxy request, although it seems like it was just regular scanning, because I’ve always returned a 400 to those.

Maybe the requests that were getting 301’d were just regular vulnerability scanning too? I don’t know. I seemed like something more specific than that.

The negatives for me were noisy logs and elevated bandwidth/CPU. Not a huge deal, but whatever was going on, I didn’t want to be a part of it.

. . .

By the way, as long as we’re talking about redirecting HTTP to HTTPS, I should mention HSTS, which is a way of telling browsers never to use HTTP here in the future. If you’re doing a redirect like this, it may be a good thing to add (to the HTTPS response, not the HTTP one). On the other hand it has some risks, if in the future you ever want to use HTTP again.

Next: Counting Topologically Distinct Directed Acyclic Graphs with Marshmallows