Whenever you feel that itch…
The answer is: Yes you can! And you should! Let’s see how…
Calculating time differences between rows
Let’s consider the following database containing timestamps (e.g. in a log database). We’re using PostgreSQL syntax for this:
CREATE TABLE timestamps ( ts timestamp ); INSERT INTO timestamps VALUES ('2015-05-01 12:15:23.0'), ('2015-05-01 12:15:24.0'), ('2015-05-01 12:15:27.0'), ('2015-05-01 12:15:31.0'), ('2015-05-01 12:15:40.0'), ('2015-05-01 12:15:55.0'), ('2015-05-01 12:16:01.0'), ('2015-05-01 12:16:03.0'), ('2015-05-01 12:16:04.0'), ('2015-05-01 12:16:04.0');
Obviously, you’ll be adding constraints and indexes, etc. Now, let’s assume that each individual timestamp represents an event in your system, and you’d like to keep track of how long ago the previous event has happened. I.e. you’d like the following result:
ts delta ------------------------------- 2015-05-01 12:15:23 2015-05-01 12:15:24 00:00:01 2015-05-01 12:15:27 00:00:03 2015-05-01 12:15:31 00:00:04 2015-05-01 12:15:40 00:00:09 2015-05-01 12:15:55 00:00:15 2015-05-01 12:16:01 00:00:06 2015-05-01 12:16:03 00:00:02 2015-05-01 12:16:04 00:00:01 2015-05-01 12:16:04 00:00:00
In other words
- ts1 (12:15:23) + delta (00:00:01) = ts2 (12:15:24)
- ts2 (12:15:24) + delta (00:00:03) = ts3 (12:15:27)
- …
This can be achieved very easily with the LAG()
window function:
SELECT ts, ts - lag(ts, 1) OVER (ORDER BY ts) delta FROM timestamps ORDER BY ts;
The above reads simply:
Give me the difference between the
ts
value of the current row and thets
value of the row that “lags” behind this row by one, with rows ordered byts
.
Easy, right? With LAG()
you can actually access any row from another row within a “sliding window” by simply specifying the lag index.
We’ve already described this wonderful window function in a previous blog post.
Bonus: A running total interval
In addition to the difference between this timestamp and the previous one, we might be interested in the total difference between this timestamp and the first timestamp. This may sound like a running total (see our previous article about running totals using SQL), but it can be calculated much more easily using FIRST_VALUE()
– a “cousin” of LAG()
SELECT ts, ts - lag(ts, 1) OVER w delta, ts - first_value(ts) OVER w total FROM timestamps WINDOW w AS (ORDER BY ts) ORDER BY ts;
… the above query then yields
ts delta total --------------------------------------- 2015-05-01 12:15:23 00:00:00 2015-05-01 12:15:24 00:00:01 00:00:01 2015-05-01 12:15:27 00:00:03 00:00:04 2015-05-01 12:15:31 00:00:04 00:00:08 2015-05-01 12:15:40 00:00:09 00:00:17 2015-05-01 12:15:55 00:00:15 00:00:32 2015-05-01 12:16:01 00:00:06 00:00:38 2015-05-01 12:16:03 00:00:02 00:00:40 2015-05-01 12:16:04 00:00:01 00:00:41 2015-05-01 12:16:04 00:00:00 00:00:41
Extra bonus: The total since a “reset” event
We can take this as far as we want. Let’s assume that we want to reset the total from time to time:
CREATE TABLE timestamps ( ts timestamp, event varchar(50) ); INSERT INTO timestamps VALUES ('2015-05-01 12:15:23.0', null), ('2015-05-01 12:15:24.0', null), ('2015-05-01 12:15:27.0', 'reset'), ('2015-05-01 12:15:31.0', null), ('2015-05-01 12:15:40.0', null), ('2015-05-01 12:15:55.0', 'reset'), ('2015-05-01 12:16:01.0', null), ('2015-05-01 12:16:03.0', null), ('2015-05-01 12:16:04.0', null), ('2015-05-01 12:16:04.0', null);
We can now run the following query:
SELECT ts, ts - lag(ts, 1) OVER (ORDER BY ts) delta, ts - first_value(ts) OVER (PARTITION BY c ORDER BY ts) total FROM ( SELECT COUNT(*) FILTER (WHERE EVENT = 'reset') OVER (ORDER BY ts) c, ts FROM timestamps ) timestamps ORDER BY ts;
… to produce
ts delta total --------------------------------------- 2015-05-01 12:15:23 00:00:00 2015-05-01 12:15:24 00:00:01 00:00:01 2015-05-01 12:15:27 00:00:03 00:00:00 <-- reset 2015-05-01 12:15:31 00:00:04 00:00:04 2015-05-01 12:15:40 00:00:09 00:00:13 2015-05-01 12:15:55 00:00:15 00:00:00 <-- reset 2015-05-01 12:16:01 00:00:06 00:00:06 2015-05-01 12:16:03 00:00:02 00:00:08 2015-05-01 12:16:04 00:00:01 00:00:09 2015-05-01 12:16:04 00:00:00 00:00:09
The beautiful part is in the derived table
SELECT COUNT(*) FILTER (WHERE EVENT = 'reset') OVER (ORDER BY ts) c, ts FROM timestamps
This derived table just adds the “partition” to each set of timestamps given the most recent “reset” event. The result of the above subquery is:
c ts ---------------------- 0 2015-05-01 12:15:23 0 2015-05-01 12:15:24 1 2015-05-01 12:15:27 <-- reset 1 2015-05-01 12:15:31 1 2015-05-01 12:15:40 2 2015-05-01 12:15:55 <-- reset 2 2015-05-01 12:16:01 2 2015-05-01 12:16:03 2 2015-05-01 12:16:04 2 2015-05-01 12:16:04
As you can see, the COUNT(*)
window function counts all the previous “reset” events, ordered by timestamp. This information can then be used as the PARTITION
for the FIRST_VALUE()
window function in order to find the first timestamp in each partition, i.e. at the time of the most recent “reset” event:
ts - first_value(ts) OVER (PARTITION BY c ORDER BY ts) total
Conclusion
It’s almost a running gag on this blog to say that…
There was SQL before window functions and SQL after window functions
Window functions are extremely powerful and they’re a part of the SQL standard, supported in most commercial databases, in PostgreSQL, in Firebird 3.0, and in CUBRID. If you aren’t using them already, start using them today!
If you’ve liked this article, find out more about window functions in any of the following articles:
- NoSQL? No, SQL! – How to Calculate Running Totals
- The Difference Between ROW_NUMBER(), RANK(), and DENSE_RANK()
- Probably the Coolest SQL Feature: Window Functions
- Still Using Windows 3.1? So why stick to SQL-92?
- The Awesome PostgreSQL 9.4 / SQL:2003 FILTER Clause for Aggregate Functions
Filed under: sql Tagged: FIRST_VALUE, LAG(), LAST_VALUE, LEAD(), Oracle, PostgreSQL, sql, sql standard, Window Functions
