Benchmarking Temporal Foreign Keys

Way back in Februrary Peter Eisentraut asked me if I’d tested the performance of my patch to add temporal foreign keys to Postgres.

Have you checked that the generated queries can use indexes and have suitable performance? Do you have example execution plans maybe?

Here is a report on the tests I made. I gave a talk about this last month at pdxpug, but this blog post will be easier to access, and I’ll focus just on the foreign key results.

Method

As far as I know there are no published benchmark schemas or workflows for temporal data. Since the tables require start/end columns, you can’t use an existing benchmark like TCP-H. The tables built in to pgbench are no use either. I’m not even sure where to find a public dataset. The closest is something called “Incumben”, mentioned in the “Temporal Alignment” paper. They authors say it has 85,857 entries for job assignments across 49,195 employees at the University of Arizona—but I can’t find any trace of it online. (I’ll update here if I hear back from them about it.)

So I built a temporal benchmark of my own using CMU’s Benchbase framework. (Thanks to Mark Wong and Grant Holly for that recommendation!) It also uses employees and positions, both temporal tables with a valid_at column (a daterange). Each position has a reference to an employee, checked by a temporal foreign key. Primary and foreign keys have GiST indexes combining the integer part and the range part. Here is the DDL:

CREATE TABLE employees (
    id          int GENERATED BY DEFAULT AS IDENTITY NOT NULL,
    valid_at    daterange NOT NULL,
    name        text NOT NULL,
    salary      int NOT NULL,
    PRIMARY KEY (id, valid_at WITHOUT OVERLAPS)
);

CREATE TABLE positions (
    id          int GENERATED BY DEFAULT AS IDENTITY NOT NULL,
    valid_at    daterange NOT NULL,
    name        text NOT NULL,
    employee_id int NOT NULL,
    PRIMARY KEY (id, valid_at WITHOUT OVERLAPS),
    FOREIGN KEY (employee_id, PERIOD valid_at) REFERENCES employees (id, PERIOD valid_at)
);
CREATE INDEX idx_positions_employee_id ON positions USING gist (employee_id, valid_at);

Naturally you can’t run that unless you’ve compiled Postgres with the temporal patches above.

The benchmark has procedures that exercise foreign keys (update/delete employee, insert/update position). There are other procedures too: selecting one row, selecting many rows, inner join, outer join, semijoin, antijoin. I plan to add aggregates and set operations (union/except/intersect), as well as better queries for sequenced vs non-sequenced semantics. But right now the foreign key procedures are better developed than anything else. I also plan to change the SQL from rangetypes to standard SQL:2011 PERIODs, at least for non-Postgres RDBMSes. I’ll write more about all that later; this post is about foreign keys.

range_agg Implementation

Temporal foreign keys in Postgres are implemented like this:

SELECT 1
FROM    (
  SELECT pkperiodatt AS r
  FROM   [ONLY] pktable x
  WHERE  pkatt1 = $1 [AND ...]
  AND    pkperiodatt && $n
FOR KEY SHARE OF x
) x1
HAVING $n <@ range_agg(x1.r)

This is very similar to non-temporal checks. The main difference is we use range_agg to aggregate referenced records, since it may require their combination to satisfy the reference. For example if the employee got a raise in the middle of the position, neither employee record alone covers the position’s valid time:

Temporal foreign key

In our query, the HAVING checks that the “sum” of the employee times covers the position time.

A subquery is not logically required, but Postgres currently doesn’t allow FOR KEY SHARE in a query with aggregations.

I like this query because it works not just for rangetypes, but multiranges too. In fact we could easily support arbitrary types, as long as the user provides an opclass with an appropriate support function (similar to the stratnum support function introduced for temporal primary keys). We would call that function in place of range_agg. But how does it perform?

EXISTS implementation

I compared this query with two others. The original implementation for temporal foreign keys appears on pages 128–129 of Developing Time-Oriented Database Applications in SQL by Richard Snodgrass. I call this the “EXISTS implementation”. Here is the SQL I used:

SELECT 1
-- There was a PK when the FK started:
WHERE EXISTS
  SELECT  1
  FROM    [ONLY] <pktable>
  WHERE   pkatt1 = $1 [AND ...]
  AND     COALESCE(lower(pkperiodatt), '-Infinity')
       <= COALESCE(lower($n), '-Infinity')
  AND     COALESCE(lower($n), '-Infinity')
       <  COALESCE(upper(pkperiodatt), 'Infinity')
)
-- There was a PK when the FK ended:
AND EXISTS (
  SELECT  1
  FROM    [ONLY] <pktable>
  WHERE   pkatt1 = $1 [AND ...]
  AND     COALESCE(lower(pkperiodatt), '-Infinity')
       <  COALESCE(upper($n), 'Infinity')
  AND     COALESCE(upper($n), 'Infinity')
       <= COALESCE(upper(pkperiodatt), 'Infinity')
)
-- There are no gaps in the PK:
-- (i.e. there is no PK that ends early,
-- unless a matching PK record starts right away)
AND NOT EXISTS (
  SELECT  1
  FROM    [ONLY] <pktable> AS pk1
  WHERE   pkatt1 = $1 [AND ...]
  AND     COALESCE(lower($n), '-Infinity')
       <  COALESCE(upper(pkperiodatt), 'Infinity')
  AND     COALESCE(upper(pkperiodatt), 'Infinity')
       <  COALESCE(upper($n), 'Infinity')
  AND     NOT EXISTS (
    SELECT  1
    FROM    [ONLY] <pktable> AS pk2
    WHERE   pk1.pkatt1 = pk2.pkatt1 [AND ...]
            -- but skip pk1.pkperiodatt && pk2.pkperiodatt
    AND     COALESCE(lower(pk2.pkperiodatt), '-Infinity')
         <= COALESCE(upper(pk1.pkperiodatt), 'Infinity')
            COALESCE(upper(pk1.pkperiodatt), 'Infinity')
         <  COALESCE(upper(pk2.pkperiodatt), 'Infinity')
  )
);

The main idea here is that we check three things: (1) the referencing row is covered in the beginning, (2) it is covered in the end, (3) in between, the referenced row(s) have no gaps.

I made a few changes to the original:

  • It can’t be a CHECK constraint, since it references other rows.
  • There is less nesting. The original is wrapped in a big NOT EXISTS and looks for bad rows. Essentially it says “there are no invalid records.” In Postgres we check one referencing row at a time, and we give a result if it is valid. You could say we look for good rows. This also requires inverting the middle-layer EXISTS and NOT EXISTS predicates, and changing ORs to ANDs. I’ve often run into trouble with OR, so this is probably fortunate.
  • We have to “unwrap” the start/end times since they are stored in a rangetype. I could have used rangetype operators here, but I wanted to keep the adaptation as straightforward as possible, and the previous changes felt like a lot already. Unwrapping requires dealing with unbounded ranges, so I’m using plus/minus Infinity as a sentinel. This is not perfectly accurate, since in ranges a null bound is “further out” than a plus/minus Infinity. (Try select '{(,)}'::datemultirange - '{(-Infinity,Infinity)}'::datemultirange.) But again, solving that was taking me too far from the original, and it’s fine for a benchmark.
  • We need to lock the rows with FOR KEY SHARE in the same way as above. We need to do this in each branch, since they may use different rows.

Given the complexity, I didn’t expect this query to perform very well.

lag implementation

Finally there is an implementation in Vik Fearing’s periods extension. This is a lot like the EXISTS implementation, except to check for gaps it uses the lag window function. Here is the SQL I tested:

SELECT  1
FROM    (
  SELECT  uk.uk_start_value,
          uk.uk_end_value,
          NULLIF(LAG(uk.uk_end_value) OVER
            (ORDER BY uk.uk_start_value), uk.uk_start_value) AS x
  FROM   (
    SELECT  coalesce(lower(x.pkperiodatt), '-Infinity') AS uk_start_value,
            coalesce(upper(x.pkperiodatt), 'Infinity') AS uk_end_value
    FROM    pktable AS x
    WHERE   pkatt1 = $1 [AND ...]
    AND     uk.pkperiodatt && $n
    FOR KEY SHARE OF x
  ) AS uk
) AS uk
WHERE   uk.uk_start_value < upper($n)
AND     uk.uk_end_value >= lower($n)
HAVING  MIN(uk.uk_start_value) <= lower($n)
AND     MAX(uk.uk_end_value) >= upper($n)
AND     array_agg(uk.x) FILTER (WHERE uk.x IS NOT NULL) IS NULL

Again I had to make some adaptations to the original

  • There is less nesting, for similar reasons as before.
  • We unwrap the ranges, much like the EXISTS version. Again there is an Infinity-vs-null discrepancy, but it is harder to deal with since the query uses null entries in the lag result to indicate gaps.
  • I couldn’t resist using && instead of <= and >= in the most-nested part to find relevant rows. The change was sufficiently obvious, and if it makes a difference it should speed things up, so it makes the comparison a bit more fair.

I made a new branch rooted in my valid-time branch, and added an extra commit to switch between each implementation with a compile-tag flag. By default we still use range_agg, but instead you can say ‑DRI_TEMPORAL_IMPL_LAG or ‑DRI_TEMPORAL_IMPL_EXISTS. I installed each implementation in a separate cluster, listening on port 5460, 5461, and 5462 respectively.

I also included procedures in Benchbase to simply run the above queries as SELECTs. Since we are doing quite focused microbenchmarking here, I thought that would be less noisy than doing the same DML for each implementation. It also means we can run a mix of all three implementations together: they use the same cluster, and if there is any noise on the machine it affects them all. If you look at my temporal benchmark code, you’ll see the same SQL but adapted for the employees/positions tables.

Hypothesis

Here is the query plan for the range_agg implementation:

Aggregate
  Filter: ('[2020-10-10,2020-12-12)'::daterange <@ range_agg(x1.r))
  ->  Subquery Scan on x1
    ->  LockRows
      ->  Index Scan using employees_pkey on employees x
        Index Cond: ((id = 500) AND (valid_at && '[2020-10-10,2020-12-12)'::daterange))

It uses the index, and it all seems like what we’d want. It is not an Index Only Scan, but that’s because we lock the rows. Non-temporal foreign keys are the same way. This should perform pretty well.

Here is the query plan for the EXISTS implementation:

Result
  One-Time Filter: ((InitPlan 1).col1 AND (InitPlan 2).col1 AND (NOT (InitPlan 4).col1))
  InitPlan 1
  ->  LockRows
    ->  Index Scan using employees_pkey on employees x
      Index Cond: ((id = 500) AND (valid_at && '[2020-10-10,2020-12-12)'::daterange))
      Filter: ((COALESCE(lower(valid_at), '-infinity'::date) <= '2020-10-10'::date) AND ('2020-10-10'::date < COALESCE(upper(valid_at), 'infinity'::date)))
  InitPlan 2
  ->  LockRows
    ->  Index Scan using employees_pkey on employees x_1
      Index Cond: ((id = 500) AND (valid_at && '[2020-10-10,2020-12-12)'::daterange))
      Filter: ((COALESCE(lower(valid_at), '-infinity'::date) < '2020-12-12'::date) AND ('2020-12-12'::date <= COALESCE(upper(valid_at), 'infinity'::date)))
  InitPlan 4
  ->  LockRows
    ->  Index Scan using employees_pkey on employees pk1
      Index Cond: ((id = 500) AND (valid_at && '[2020-10-10,2020-12-12)'::daterange))
      Filter: (('2020-10-10'::date < COALESCE(upper(valid_at), 'infinity'::date)) AND (COALESCE(upper(valid_at), 'infinity'::date) < '2020-12-12'::date) AND (NOT EXISTS(SubPlan 3)))
      SubPlan 3
      ->  LockRows
        ->  Index Scan using employees_pkey on employees pk2
          Index Cond: (id = pk1.id)
          Filter: ((COALESCE(lower(valid_at), '-infinity'::date) <= COALESCE(upper(pk1.valid_at), 'infinity'::date)) AND (COALESCE(upper(pk1.valid_at), 'infinity'::date) < COALESCE(upper(valid_at), 'infinity'::date)))

That looks like a lot of work!

And here is the plan for the lag implementation:

Aggregate
  Filter: ((array_agg(uk.x) FILTER (WHERE (uk.x IS NOT NULL)) IS NULL) AND (min(uk.uk_start_value) <= '2020-10-10'::date) AND (max(uk.uk_end_value) >= '2020-12-12'::date))
  ->  Subquery Scan on uk
    Filter: ((uk.uk_start_value < '2020-12-12'::date) AND (uk.uk_end_value >= '2020-10-10'::date))
    ->  WindowAgg
      ->  Sort
        Sort Key: uk_1.uk_start_value
        ->  Subquery Scan on uk_1
          ->  LockRows
            ->  Index Scan using employees_pkey on employees x
              Index Cond: ((id = 500) AND (valid_at && '[2020-10-10,2020-12-12)'::daterange))

This looks a lot like the range_agg version. We still use our index. There is an extra Sort step, but internally the range_agg function must do much the same thing (if not something worse). Maybe the biggest difference (though a slight one) is aggregating twice.

So I expect range_agg to perform the best, with lag a close second, and EXISTS far behind.

One exception may be a single referencing row that spans many referenced rows. If range_agg is O(n2), it should fall behind as the referenced rows increase.

Results

I started by running a quick test on my laptop, an M2 Macbook Air with 16 GB of RAM. I tested the DML commands on each cluster, one after another. Then I checked the benchbase summary file for the throughput. The results were what I expected:

early results

Similarly, range_agg had the best latency at the 25th, 50th, 75th, 90th, and 99th percentiles:

latency comparison

But the difference throughout is pretty small, and at the time my Benchbase procedures used a lot of synchronized blocks to ensure there were few foreign key failures, and that kind of locking seemed like it might throw off the results. I needed to do more than this casual check.

I ran all the future benchmarks on my personal desktop, running Ubuntu 22.04.

It was hard to make things reproducible, but I wrote various scripts as I went, and I tried to capture results. The repo for all that is here. My pdxpug talk above contains some reflections about improving my benchmark methodology.

I also removed the synchronized blocks and dealt with foreign key failures a better way (by categorizing them as errors but not raising an exception).

The first more careful tests used the direct SELECT statements.

Again, the 95th percentile latency was what I expected:

95% latency comparison

But the winner for mean latency was EXISTS!:

mean latency comparison

A clue was in the Benchbase output showing successful transactions vs errors. (The Noop procedure is so can make the proportions 33/33/33/1 instead of 33/33/34.):

Completed Transactions:
com.oltpbenchmark.benchmarks.temporal.procedures.CheckForeignKeyRangeAgg/01      [72064] ********************************************************************************
com.oltpbenchmark.benchmarks.temporal.procedures.CheckForeignKeyLag/02           [71479] *******************************************************************************
com.oltpbenchmark.benchmarks.temporal.procedures.CheckForeignKeyExists/03        [71529] *******************************************************************************
com.oltpbenchmark.benchmarks.temporal.procedures.Noop/04                         [ 4585] *****
Aborted Transactions:
<EMPTY>

Rejected Transactions (Server Retry):
<EMPTY>

Rejected Transactions (Retry Different):
<EMPTY>

Unexpected SQL Errors:
com.oltpbenchmark.benchmarks.temporal.procedures.CheckForeignKeyRangeAgg/01      [80861] ********************************************************************************
com.oltpbenchmark.benchmarks.temporal.procedures.CheckForeignKeyLag/02           [80764] *******************************************************************************
com.oltpbenchmark.benchmarks.temporal.procedures.CheckForeignKeyExists/03        [80478] *******************************************************************************

More than half of the transactions were an invalid reference.

And if we put one of those into EXPLAIN ANALYZE, we see that most of the plan was never executed:

Result (actual time=0.034..0.035 rows=0 loops=1)
  One-Time Filter: ((InitPlan 1).col1 AND (InitPlan 2).col1 AND (NOT (InitPlan 4).col1))
  InitPlan 1
  ->  LockRows (actual time=0.033..0.033 rows=0 loops=1)
    ->  Index Scan using employees_pkey on employees x (actual time=0.033..0.033 rows=0 loops=1)
      Index Cond: ((id = 5999) AND (valid_at && '[2020-10-10,2020-12-12)'::daterange))
      Filter: ((COALESCE(lower(valid_at), '-infinity'::date) <= '2020-10-10'::date) AND ('2020-10-10'::date < COALESCE(upper(valid_at), 'infinity'::date)))
  InitPlan 2
  ->  LockRows (never executed)
    ->  Index Scan using employees_pkey on employees x_1 (never executed)
      Index Cond: ((id = 5999) AND (valid_at && '[2020-10-10,2020-12-12)'::daterange))
      Filter: ((COALESCE(lower(valid_at), '-infinity'::date) < '2020-12-12'::date) AND ('2020-12-12'::date <= COALESCE(upper(valid_at), 'infinity'::date)))
  InitPlan 4
  ->  LockRows (never executed)
    ->  Index Scan using employees_pkey on employees pk1 (never executed)
      Index Cond: ((id = 5999) AND (valid_at && '[2020-10-10,2020-12-12)'::daterange))
      Filter: (('2020-10-10'::date < COALESCE(upper(valid_at), 'infinity'::date)) AND (COALESCE(upper(valid_at), 'infinity'::date) < '2020-12-12'::date) AND (NOT EXISTS(SubPlan 3)))
      SubPlan 3
      ->  LockRows (never executed)
        ->  Index Scan using employees_pkey on employees pk2 (never executed)
          Index Cond: (id = pk1.id)
          Filter: ((COALESCE(lower(valid_at), '-infinity'::date) <= COALESCE(upper(pk1.valid_at), 'infinity'::date)) AND (COALESCE(upper(pk1.valid_at), 'infinity'::date) < COALESCE(upper(valid_at), 'infinity'::date)))

In this example, the beginning of the referencing range wasn’t covered, so Postgres never had to check the rest. Essentially the query is a AND b AND c, so Postgres can short-circuit the evaluation as soon as it finds a to be false. Using range_agg or lag doesn’t allow this, because an aggregate/window function has to run to completion to get a result.

As confirmation (a bit gratuitous to be honest), I ran the EXISTS benchmark with this bpftrace script:

// Count how many exec nodes per query were required,
// and print a histogram of how often each count happens.
// Run this for each FK implementation separately.
// My hypothesis is that the EXISTS implementation calls ExecProcNode far fewer times,
// but only if the FK is invalid.

u:/home/paul/local/bench-*/bin/postgres:standard_ExecutorStart {
  @nodes[tid] = 0
}
u:/home/paul/local/bench-*/bin/postgres:ExecProcNode {
  @nodes[tid] += 1
}
u:/home/paul/local/bench-*/bin/postgres:standard_ExecutorEnd {
  @calls = hist(@nodes[tid]);
  delete(@nodes[tid]);
}

For EXISTS I got this histogram when there were no invalid references:

@calls:
[0]                    6 |                                                    |
[1]                    0 |                                                    |
[2, 4)                 0 |                                                    |
[4, 8)            228851 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[8, 16)                1 |                                                    |
[16, 32)               1 |                                                    |
[32, 64)               2 |                                                    |
[64, 128)              2 |                                                    |
[128, 256)             2 |                                                    |
[256, 512)             5 |                                                    |

But with 50%+ errors I got this:

@calls:
[0]                    6 |                                                    |
[1]                    0 |                                                    |
[2, 4)            218294 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[4, 8)            183438 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@         |
[8, 16)              231 |                                                    |
[16, 32)               1 |                                                    |
[32, 64)               2 |                                                    |
[64, 128)              2 |                                                    |
[128, 256)             2 |                                                    |
[256, 512)             5 |                                                    |

So more than half the time, Postgres ran the query with half the steps (maybe one-fourth).

After tuning the random numbers to bring errors closer to 1%, I got results more like the original ones. Mean latency:

mostly valid mean latency comparison

Median latency:

mostly valid median latency comparison

95th percentile latency:

mostly valid 95% latency comparison

Conclusions

All foreign key implementations have expected query plans. We use indexes where we should, etc.

When most foreign key references are valid, range_agg outperforms the other two implementations by a small but consistent amount. But with a large number of invalid references, EXISTS is a lot faster.

In most applications I’ve seen, foreign keys are used as guardrails, and we expect 99% of checks to pass (or more really). When using ON DELETE CASCADE the situation is different, but these benchmarks are for NO ACTION or RESTRICT, and I don’t think CASCADE affords the EXISTS implementation the same shortcuts. So it seems right to optimize for the mostly-valid case, not the more-than-half-invalid case.

These results are good news, because range_agg is also more general: it supports multiranges and custom types.

Further Work

There are more things I’d like to benchmark (and if I do I’ll update this post):

  • Replace separate start/end comparisons with range operators in the EXISTS and lag implementations. I just need to make sure they still pass all the tests when I do that.
  • Correct the Infinity-vs-null discrepancy.
  • Monitor the CPU and disk activity under each implementation and compare the results. I don’t think I’ll see any difference in disk, but CPU might be interesting.
  • Compare different scale factors (i.e. starting number of employees/positions).
  • Compare implementations when an employee is chopped into many small records, and a single position spans all of them. If range_agg is O(n2) that should be worse than the sorting in the other options.
  • Compare temporal foreign keys to non-temporal foreign keys (based on B-tree indexes, not GiST). I’m not sure yet how to do this in a meaningful way. Of course b-trees are faster in general, but how do I use them to achieve the same primary key and foreign key constraints? Maybe the best way is to create the tables without constraints, give them only b-tree indexes, and run the direct SELECT statements, not the DML.

Benchbase Documentation

2024-08-26

Benchbase is a framework from Carnegie Mellon for benchmarking databases. It comes with support for about 20 benchmarks and about as many DBMSes.

Benchbase started life as OLTPBench as was introduced in an academic paper from 2014.

Using Benchbase the last month, I found the documentation to be pretty shallow, so this is my effort to improve things. A lot of this material was covered in my pdxpug talk last week.

Running

Benchbase is written in Java and uses Maven to build and use.

Following their README, first you build a tarball for your DBMS like this:

./mvnw clean package -P postgres

Then you expand the tarball and run a benchmark like this:

cd target
tar xvzf benchbase-postgres.tgz
cd benchbase-postgres
java -jar benchbase.jar -b tpcc -c config/postgres/sample_tpcc_config.xml --create=true --load=true --execute=true

The -b option says which benchmark you want to run.

The -c option points to a config file (covered below).

The --create option doesn’t run CREATE DATABASE, but creates the schema for the benchmark.

The --load option fills the schema with its starting data. The time for this is not included in the benchmark results.

The --execute option actually runs the benchmark. I often ran ‑‑create=true ‑‑load=true ‑‑execute=false to populate a database named e.g. benchbase_template, then createdb -T benchbase_template benchbase to make a quick copy, then ‑‑create=false ‑‑load=false ‑‑execute=true to run the benchmark. That helps iteration time a lot when you have a big load. But for higher-quality results you should do it all in one go, after running initdb, as Melanie Plageman points out in one of her talks. (Sorry, I haven’t been able to find the reference again, but if I do I’ll point a link here.)

If you are writing Java code for your own benchmark, then this one-liner is a lot faster than all that tarball stuff:

./mvnw clean compile exec:java -P postgres -Dexec.args="-b tpcc -c config/postgres/sample_tpcc_config.xml --create=true --load=true --execute=true"

Of course you can skip the clean and compile if you like.

Unfortunately the exec:java target has been broken since 2023, but I submitted a pull request.

Configuration

The benchmark behavior is controlled by the XML config file. The most complete docs are in the original OLTPBench repo’s Github wiki, although if you read the paper you’ll learn many other things you can control with this file. You can also look at a sample config file for your benchmark + database.

The file begins with connection details like this:

<type>POSTGRES</type>
<driver>org.postgresql.Driver</driver>
<url>jdbc:postgresql://localhost:5432/benchbase?sslmode=disable&amp;ApplicationName=tpcc&amp;reWriteBatchedInserts=true</url>
<username>admin</username>
<password>password</password>

The <isolation> element controls the transaction isolation level:

<isolation>TRANSACTION_SERIALIZABLE</isolation>

You can ask to reconnect after a connection failure:

<reconnectOnConnectionFailure>true</reconnectOnConnectionFailure>

I haven’t investigated exactly how that is used.

You can also open a new connection for every transaction:

<newConnectionPerTxn>true</newConnectionPerTxn>

By default that is false, but you may want to make it true if you are focusing on your database’s connection overhead.

Loading

Here are some elements that apply to the loading step (not the actual benchmark run):

<scalefactor>1</scalefactor>
<batchsize>128</batchsize>

Each benchmark interprets scalefactor in its own way. For TPC-C this is the number of warehouses. For Twitter you get 500 users and 20,000 tweets, multiplied by the scalefactor.

Then batchsize just tells the loader how to combine insert statements, for a quicker load.

Execution

You also list all the “procedures” the benchmark is capable of (or just the ones you care about):

<transactiontypes>
    <transactiontype>
        <name>NewOrder</name>
    </transactiontype>
    <transactiontype>
        <name>Payment</name>
    </transactiontype>
    <transactiontype>
        <name>OrderStatus</name>
    </transactiontype>
    <transactiontype>
        <name>Delivery</name>
    </transactiontype>
    <transactiontype>
        <name>StockLevel</name>
    </transactiontype>
</transactiontypes>

Each procedure is defined in a Java file.

Besides <name>, you can also include <preExecutionWait> and <postExecutionWait> to give a delay in milliseconds before/after running the transaction. So this is one way to add “think time”.

There is also a concept of “supplemental” procedures, but that is not controlled by the config file. Only the SEATS and AuctionMark benchmarks use it. From quickly scanning the code, I think it lets a benchmark define procedures without depending on the user to list them. They won’t be added to the normal transaction queue, but the benchmark can run them elsewhere as needed. For example SEATS uses its supplemental procedure to find out which airports/flights/etc were added in the load step, so it can use them.

The top-level <terminals> element controls the concurrency. This is how many simultaneous connections you want:

<terminals>1</terminals>

But the real behavior comes from the <works> element. This contains <work> child elements, each one a “phase” of your benchmark. For example:

<works>
    <work>
        <time>60</time>
        <rate>10000</rate>
        <weights>45,43,4,4,4</weights>
    </work>
</works>

Here was have one phase lasting 60 seconds.

The <weights> refer to the <transactiontypes> above. Each weight is a percentage giving the share of that procedure in the total transactions. They must add to 100%.

The <rate> gives the targeted transactions per second (per terminal). Mostly this is a way to slow things down, not to speed things up: it is another way to include “think time” in between transactions. If your run doesn’t achieve this rate, it’s not an error.

Each phase can override the top-level concurrency with <active_terminals>5</active_terminals>.

Also you can let the phase start gradually with <work arrival="poisson">. The OLTP-Bench paper demonstrates this technique.

In addition a benchmark may understand other XML elements. For example Twitter lets you give <tracefile> and <tracefile2>, and the benchmark will use those to read tweet ids and user ids (respectively), which it will use as inputs for its transactions (but not every transaction type uses both).

PDXPUG Talk: Benchbase and Temporal Foreign Keys

Last Thursday I gave a talk at PDXPUG about using Benchbase to compare the performance of temporal foreign keys. It was a lot of fun, and a really good turnout. There were even folks from Seattle and Bend. After listening for an hour, people stuck around and talked about databases and benchmarks for another two, then the last few holdouts went out for drinks for another hour and a half. At least half the audience were way more qualified to give the talk than me. To my surprise Mark Callaghan was there, who has published database benchmarks non-stop for years.

I had two major goals: to document how to use Benchbase and to report on comparing three implementations of temporal foreign keys. A couple minor goals were to share the start of a broader general-purpose benchmark for temporal databases and to talk about a benchmarking methodology, especially mistakes I made and how I tried to improve.

Temporal Ops

2024-07-17 , ,

One silver lining of temporal primary & foreign keys getting reverted is I got to meet Hettie Dombrovskaya and Boris Novikov.

I’ve been working with them to write SQL for various temporal operations not covered by the SQL:2011 standard. There is no support there for outer joins, semijoins, antijoins, aggregates, or set operations (UNION, INTERSECT, EXCEPT). As far as I know no one has ever shown how to implement those operations in SQL. I have queries so far for outer join, semijoin, and antijoin, and I’m planning to include aggregates based on this article by Boris. The set operations look pretty easy to me, so hopefully I’ll have those soon too.

If you’re interested, the repo is on Github.

Debugging the Sprinkler System

2024-07-14 ,

Saturday I debugged the sprinklers.

I thought I turned them on two weeks ago, and I heard someone’s sprinklers outside my window that next Monday morning at 5 a.m., but after a week of 100-degree days it was clear ours weren’t doing their job. I had skipped my usual routine of checking each line, unearthing the sunken heads, and replacing what had failed. So now I had to deal with it.

Somehow after living here for ten years I still found two new heads I had never seen before. Here is a map I’ve kept for years, maybe since our first summer:

sprinkler map

It has every sprinkler head I’ve seen. Going by the rate I charge clients, that map is worth thousands of dollars.

In the bottom corner is the box where the water comes in from the street. There are more boxes where valves let water into each line.

One year I came across a buried water spigot in the middle of the grass. Then I lost it again.

But this was a valuable spigot. It was over by our raised beds, where there is no other convenient water. You have to drag a hose from across the yard to water there. In 2022 I borrowed a neighbor’s metal detector. I still couldn’t find it. Finally I tore up the grass with a shovel, probing what must have been a 20’ x 20’ area, until finally I heard a metal clink. I extended the pipe and topped it with a copper rabbit spigot I won as a kid at the Redlands Garden Show for a potted cactus garden. I’ve carried that rabbit with me for 35 years, waiting for a chance to use it.

rabbit spigot

That was two years ago. It’s on my map.

So why is our grass dying?

Naturally I run our sprinklers off a raspberry pi. I set it up years ago, back in 2016. The controller that came with the house was dying. Two-thirds of the time when I tried to water line 12 or 13, line 4 or 5 would turn on instead. (Yes, we have 13 sprinkler lines. It’s a big yard.) Almost always it was off by 8, or sometimes 4: pretty clearly some loose wires. Why spend fifty bucks to replace it when I could spend days building my own? Look, at least there is no Kubernetes or CI pipeline, okay?

There were raspi sprinkler products you could buy, and I think I saw an open source project, but that didn’t seem like fun. I wanted control and flexbility. I wanted power. I wanted Raspbian, Python, and cron.

Here is my script, called sprinkle:

#!/usr/bin/env python

# sprinkle - Raspberry Pi sprinkler controller

import time
import RPi.GPIO as GPIO
import sys, signal

# Your sprinkler lines:
# Your sprinkler line 1 goes in array position 0,
# then sprinkler line 2 goes in array position 1,
# etc.
# Each value is the Raspi GPIO pin
# you will connect to that line.
# So if you say
#   sprinkler_lines = [6, 19]
# then you should connect pin 6 to sprinker line 1,
# and pin 19 to sprinker line 2.
# sprinkler_lines = [21, 20, 16, 12, 25, 24, 23, 26, 19, 13, 6, 5, 22]
sprinkler_lines = [23, 24, 25, 16, 12, 20, 21, 22, 5, 6, 13, 19, 26]

def usage(err_code):
    print("USAGE: sprinkle.py <sprinkler_line> <number_of_minutes>")
    sys.exit(err_code)

def int_or_usage(str):
    try:
        return int(str)
    except ValueError:
        usage(1)

if len(sys.argv) != 3:
    usage(1)

sprinkler_line    = int_or_usage(sys.argv[1])
number_of_minutes = int_or_usage(sys.argv[2])

if sprinkler_line < 1 or sprinkler_line > len(sprinkler_lines):
    print("I only know about sprinkler lines 1 to %d." % len(sprinkler_lines))
    sys.exit(1)

if number_of_minutes < 1 or number_of_minutes > 30:
    print("I don't want to run the sprinklers for %d minutes." % number_of_minutes)
    sys.exit(1)


def exit_gracefully(signal, frame):
    GPIO.cleanup()
    sys.exit(0)
signal.signal(signal.SIGINT, exit_gracefully)


active_pin = sprinkler_lines[sprinkler_line - 1]
GPIO.setmode(GPIO.BCM)
for pin in sprinkler_lines:
    GPIO.setup(pin, GPIO.OUT)
    GPIO.output(pin, False)
GPIO.output(active_pin, True)
time.sleep(60 * number_of_minutes)
GPIO.output(active_pin, False)
exit_gracefully(None, None)

That is a lot of code but it turns on one GPIO pin, sleeps a while, then turns it off. Near the top you can see an array that maps sprinkler lines to GPIO pins. I kept the old sprinkler numbering, so it matches the notes the old owners left us. Array position n means sprinker line n+1.

Then I have a higher-level script I run each morning out of cron, which does the front on even days and the back on odd. It logs when it starts and finishes, which has helped me a lot:

#!/usr/bin/env python

# do-yard - Run sprinklers for the whole yard.
# We do the front yard on even days and the back yard on odd days.

import time
from subprocess import call

t = time.localtime()
if t.tm_yday % 2:
    print("%s: Starting the back" % time.strftime("%Y-%m-%d %H:%M:%S", t))
    # odd days we do the back yard:
    for line in [4, 5, 6, 7, 8, 12]:
        call(["/home/pi/sprinkle", str(line), "5"])
        time.sleep(60)
    print("%s: Finished the back" % time.strftime("%Y-%m-%d %H:%M:%S", time.localtime()))

else:
    print("%s: Starting the front" % time.strftime("%Y-%m-%d %H:%M:%S", t))
    # even days we do the front yard (and a little bit of the back):
    for line in [1, 2, 3, 9, 10, 11, 13]:
        call(["/home/pi/sprinkle", str(line), "5"])
        time.sleep(60)
    print("%s: Finished the front" % time.strftime("%Y-%m-%d %H:%M:%S", time.localtime()))

The hard part was figuring out the wiring. I’ve never gone much further than Ohm’s Law. For a long time I was stuck working out how to drive the sprinkler valves. Sprinkler valves use a solenoid to open and shut. In my garage, 13 colored wires come out of the ground, along with one neutral white wire to complete the circuit. Then plugged into the wall is an adapter to produce 24 volt AC, and two wires come out of that. In between used to be the old controller. It would send 24 VAC down whichever wire matched the spinkler line (& ~(1 << 3)).

The pi outputs 3.3 volts DC. At first I thought there was an integrated circuit that could convert the signal for me, but eventually I resigned myself to using a bank of relays:

raspi sprinklers

Oh also I never learned how to solder.

A relay is a mechanical system. The AC power goes through, but it’s blocked by an open switch. The DC power is on another circuit, and it activates an electromagnet that closes the switch. When you turn on the signal, you see a red light, and the switch closing makes a loud click.

A bank of 16 relays cost $12, almost as much as a sprinkler controller, so I really wanted my ICs to work out. Oh well.

So today I started with checking the log. Well no, because the pi wasn’t responding to ssh again.

It has always been tempermental. After a few hours the wifi dies, sometimes sooner. Pulling the plug for a moment fixes it, but then you have to wait while it boots. So I have to bring a laptop down to the garage, even just to check on things. Today I thought I would finally fix that.

Other people have the same problem. One reported culprit is power-saving mode. I checked and mine was running that way:

pi@raspberrypi:~ $ iw dev wlan0 get power_save
Power save: on

The nicest advice I found was to disable it at boot with systemd. Just run this:

sudo systemctl --full --force edit wifi_powersave@.service

and in your editor enter—ugh, nano? That had to be fixed.

Setting EDITOR in root’s ~/.profile should do it.

No? ~/.bashrc then?

Still no? Back to Stack Overflow… .

No clues. I guess I’m on my own.

What is this .selected_editor file in root’s home directory? Hmm, it already says vim.

Is sudo even launching its command through a shell? Probably not, once I think of it. If it just execs the command directly, no wonder ~/.profile does nothing.

More Stack Overflow. Most questions are about visudo, and I see something called sudoedit, and people are asking how to control which editor that launches. (Why not just run the editor you want? The man page says it lets you keep your own editor configuration. Like my own ~/.vimrc? That’s cool. Really? How does that work?) But in my case the editor is getting launched by systemd. Surely we would have all been happier if we’d just gone with runit?

Does root have $SYSTEMD_EDITOR set? Surely not—no, too bad.

Of course I could just edit the file myself, but it’s the principle of the thing.

Okay, I give up:

sudo visudo -f /etc/sudoers.d/20_editor

I typed this:

Defaults env_keep += "editor EDITOR"

So now when I run sudo, it will pass along my own $EDITOR choice.

Is this a security hole? I can imagine some possible issues on a server, but for the pi in my garage it seems okay.

Now systemd launches vim! Shamelessly I copy and pasted:

[Unit]
Description=Set WiFi power save %i
After=sys-subsystem-net-devices-wlan0.device

[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/sbin/iw dev wlan0 set power_save %i

[Install]
WantedBy=sys-subsystem-net-devices-wlan0.device

I’ve never seen this %i thing before. The idea is it lets you do this:

sudo systemctl disable wifi_powersave@off.service
sudo systemctl enable wifi_powersave@on.service

or this:

sudo systemctl disable wifi_powersave@on.service
sudo systemctl enable wifi_powersave@off.service

That’s cool.

Oh, better not forget to run it now too:

sudo iw dev wlan0 set power_save off

So I turned off power saving. Maybe that will fix the wifi.

Let’s check the log file. Have the sprinklers been running?:

2024-06-30 06:00:01: Starting the front
2024-06-30 06:42:03: Finished the front
2024-07-01 06:00:01: Starting the back
2024-07-01 06:36:03: Finished the back
2024-07-02 06:00:01: Starting the front
2024-07-02 06:42:02: Finished the front
2024-07-03 06:00:01: Starting the back
2024-07-03 06:36:02: Finished the back
2024-07-04 06:00:01: Starting the front
2024-07-04 06:42:03: Finished the front
2024-07-05 06:00:01: Starting the back
2024-07-05 06:36:03: Finished the back
2024-07-06 06:00:01: Starting the front
2024-07-06 06:42:03: Finished the front
2024-07-07 06:00:01: Starting the back
2024-07-07 06:36:02: Finished the back
2024-07-08 06:00:01: Starting the front
2024-07-08 06:42:03: Finished the front
2024-07-09 06:00:01: Starting the back
2024-07-09 06:36:03: Finished the back
2024-07-10 06:00:01: Starting the front
2024-07-10 06:42:02: Finished the front
2024-07-11 06:00:02: Starting the back
2024-07-11 06:36:03: Finished the back
2024-07-12 06:00:01: Starting the front
2024-07-12 06:42:03: Finished the front
2024-07-13 06:00:01: Starting the back
2024-07-13 06:36:03: Finished the back

They’ve been running all along! 40 minutes for the front, 30 for the back.

But clearly they’re doing nothing. The pi is turning on a pin then just sitting there.

So there must be a loose connection.

I tried line 3: ./sprinkle 3 10. No red light, no click. Line 10. No red light, no click. Line 2. No red light, no click.

I went upstairs to fetch my multimeter. Time to test connectivity and voltage.

multimeter

How in the world did I wire this thing anyway?

Then I noticed a couple red wire loops, connecting GPIO pins to the breadboard power rail, but detached now from the power rail. The pins both said 5V. (That tiny text was easier to read in 2016.) So those came loose? What if I put them back in again? I think I remember . . . wasn’t this supposed to power the relay?

Trying my sprinkle command again made the light come on! I must have missed the click though. Were the sprinklers running? No? What if I try a few lines? I’m really not hearing the click. But the light is on.

relay with red light

How does each relay work again? I set the multimeter to connectivity to probe each pair of posts. They were more connected than I expected. Was that bad? Okay I remember the white neutral wire running from one relay to another in series. And the colored wires go out and into the ground, one per relay.

I remember something about those two little red wire loops. They really looked disconnected on purpose. They weren’t just loose, they were completely out of the breadboard.

Is anything else loose? A bit, but when I fix it nothing changes.

I remember those two red wires. They are supposed to give 5 volts to power the relay, but it never worked did it? It was supposed to, but it didn’t. Like the pi just didn’t have enough oomph. Or was the board supposed to power the pi?

What are these other two thin black wires leaving the relay board? Where do they go? Off to the right, oh, to a power adapter! Two weeks ago I plugged in the adapter for the pi, and I plugged in the 24 VAC adapter, but the relays need power too, and they get it from the power strip over by the garage freezer.

I guess this is why phone support asks if you’ve plugged it in.

sprinklers running

Temporal Reverted

2024-07-05 ,

My work adding temporal primary keys and foreign keys to Postgres was reverted from v17. The problem is empty ranges (and multiranges). An empty range doesn’t overlap anything, including another empty range. So 'empty' && 'empty' is false. But temporal PKs are essentially an exclusion constraint using (id WITH =, valid_at WITH &&). Therefore you can insert duplicates, as long as the range is empty:

INSERT INTO t (id, valid_at, name) VALUES (5, 'empty', 'foo');
INSERT INTO t (id, valid_at, name) VALUES (5, 'empty', 'bar');

That might be okay for some users, but it surely breaks expectations for others. And it’s a questionable thing to do that we should probably just forbid. The SQL standard forbids empty PERIODs, so we should make sure that using plain ranges does the same. Adding a record with an empty application time doesn’t really have a meaning in the temporal model.

I think this is a pretty small bump in the road. At the Postgres developers conference we found a good solution to excluding empty ranges. My original attempt used CHECK constraints, but that had a lot of complications. Forbidding them in the executor is a lot simpler. I’ve already sent in a new set of patches for v18 that implement that change.

Next: k3s Behind Nginx