Implementing a generic REDUCE aggregate function with SQL

So, @rotnroll666 nerd sniped me again. Apparently, the Neo4j Cypher query language supports arbitrary reductions, just like any functional collection API, oh say, the JDK Stream API:

Stream.of(2, 4, 3, 1, 6, 5)
      .reduce((i, j) -> i * j)
      .ifPresent(System.out::println); // Prints 720

SQL doesn’t have this, yet it would be very useful to be able to occasionally do that. An arbitrary reduction can be implemented “easily” in SQL. Let’s look at the above multiplication reduction. In PostgreSQL, you’d write it like this:

with t(i) as (values (2), (4), (3), (1), (6), (5))
select 
  (
    with recursive
      u(i, o) as (
         select i, o
         from unnest(array_agg(t.i)) with ordinality as u(i, o)
      ),
      r(i, o) as (
        select u.i, u.o from u where o = 1
        union all 
        select r.i * u.i, u.o from u join r on u.o = r.o + 1
        --     ^^^^^^^^^ reduction
      )
    select i from r
    order by o desc
    limit 1
  )
from t;

Woah. That’s a bit of a syntactic beast. Let’s decompose it.

The aggregate function

First off, if we were summing the values, we’d use the built-in SUM function, like this:

with t(i) as (values (2), (4), (3), (1), (6), (5))
select sum(i)
from t;

That would produce 21. If you’re willing to lose precision, you could emulate PRODUCT() using logarithms. But we wrote REDUCE(), a hypothetical one, like this:

with t(i) as (values (2), (4), (3), (1), (6), (5))
select reduce(
  t1.i * t2.i referencing accumulated as t1, accumulating as t2
)
from t;

This is SQL, so the lambda expression would obviously use a ton of keywords, completely novel and unique to this particular function, and you’d need jOOQ to make it composable . Essentially, we’d have some sort of reduction expression based on two pseudo tables:

The accumulated table containing the result
The accumulating table (or rather row)

A reduction is a generic aggregate function that operates on groups. So, we will have to re-use some SQL aggregate function mechanism to achieve the desired behaviour.

Using ARRAY_AGG() to get the aggregation effect

First off, let’s do some aggregation. PostgreSQL’s ARRAY_AGG() is perfect for this job, because it

Aggregates
Yet kinda leaves the data untouched, unlike e.g. SUM()

In a way, it’s a collection like Stream.collect(), not a reduction.

If we use ARRAY_AGG() in a correlated subquery, we’ll still get the aggregation effect, but we can unnest the array again to a table, in order to operate on it. You can see this in the following example:

with t(i) as (values (2), (4), (3), (1), (6), (5))
select 
  (
    select string_agg(i::text, ', ')
    from unnest(array_agg(t.i)) as u(i)
  )
from t;

This yields:

2, 4, 3, 1, 6, 5

Not a very useful thing to do, aggregate, unnest, and aggregate again, but it shows the power of nesting an aggregate function in a correlated subquery’s FROM clause. If your RDBMS doesn’t have arrays, maybe you can do the same thing using JSON_ARRAYAGG and JSON_TABLE, or XMLAGG and XMLTABLE.

Disclaimer: PostgreSQL often Does The Right Thing. I think you’d be more hard pressed to juggle with SQL syntax as elegantly in most other RDBMS, so this approach isn’t portable. But as Lætitia Avrot so elegantly put it:
People might come into #PostgreSQL because it's free, but they stay because it's good.
— Lætitia (@l_avrot) February 5, 2021

Next step, generate row numbers

There are mainly 2 ways how we can generate row numbers in our example:

With the ROW_NUMBER() window function
Using WITH ORDINALITY, a standard SQL syntax that you can use with UNNEST() to produce the array ordinals (1-based indexes) as a separate column. (I’ve learned today that you can use WITH ORDINALITY with any function in Postgr e SQL, which is again awesome!)

Adapting our previous example for some visualisation:

with t(i) as (values (2), (4), (3), (1), (6), (5))
select 
  (
    select string_agg(row(i, o)::text, ', ')
    from unnest(array_agg(t.i)) with ordinality as u(i, o)
  )
from t;

(Awesome, that row constructor!)

This produces:

(2,1), (4,2), (3,3), (1,4), (6,5), (5,6)

Doesn’t look fancy, but imagine we group by even numbers:

with t(i) as (values (2), (4), (3), (1), (6), (5))
select 
  i % 2,
  (
    select string_agg(row(i, o)::text, ', ')
    from unnest(array_agg(t.i)) with ordinality as u(i, o)
  )
from t
group by i % 2;

The result is now:

i % 2	string_agg
0	(2,1), (4,2), (6,3)
1	(3,1), (1,2), (5,3)

It’s a bit weird, right? We GROUP BY in the outer query, and the entire correlated subquery is the aggregate function based on the fact that its FROM clause contains ARRAY_AGG(). This isn’t so much different from this query:

with t(i) as (values (2), (4), (3), (1), (6), (5))
select 1 + sum(i) + 2
from t;

We’re used to building scalar expressions from aggregate functions all the time. This is nothing fancy. We can easily also just wrap the function in another subquery:

with t(i) as (values (2), (4), (3), (1), (6), (5))
select (select 1 + sum(i) + 2)
from t;

From here, it’s not far fetched to extend the aggregate-function-in-scalar-subquery approach to the FROM clause, and then unnesting the aggregation again. This may not “click” immediately. The GROUP BY clause in SQL is a bit weird, syntactically.

Remark: Regrettably, PostgreSQL doesn’t allow using aggregate functions in the FROM clause on the same query level like in a correlated subquery. I was going to show a fancy LATERAL version, but this doesn’t work (yet).

Now, recurse

The final bit is the recursion with the r table:

with t(i) as (values (2), (4), (3), (1), (6), (5))
select 
  (
    with recursive
      u(i, o) as (
         select i, o
         from unnest(array_agg(t.i)) with ordinality as u(i, o)
      ),
      r(i, o) as (
        select u.i, u.o from u where o = 1
        union all 
        select r.i * u.i, u.o from u join r on u.o = r.o + 1
        --     ^^^^^^^^^ reduction
      )
    select i from r
    order by o desc
    limit 1
  )
from t;

We simply recurse on the ordinality. The first subquery of UNION ALL produces the first row of our data, namely (1, 1). The next iterations just always multiply the result of r.i by the value of u.i from the next row by ordinality. This is probably best shown visually:

r.i	r.o	u.i
2 = u.i (first iteration)	1	2
8 = prev r.i * u.i	2	4
24 = prev r.i * u.i	3	3
24 = prev r.i * u.i	4	1
144 = prev r.i * u.i	5	6
720 = prev r.i * u.i	6	5

Finally, we don’t care about SQL’s set-based way of working. I.e. we don’t care about the whole set of multiplications that are shown in the table above. We only care about the last row, ordered by the ordinality, which contains our result in r.i

Done!

Using group by

Just as shown before, we can easily add a GROUP BY clause to the outer query. E.g. let’s multiply odd and even numbers separately:

with t(i) as (values (2), (4), (3), (1), (6), (5))
select 
  i % 2,
  (
    with recursive
      u(i, o) as (
         select i, o
         from unnest(array_agg(t.i)) with ordinality as u(i, o)
      ),
      r(i, o) as (
        select u.i, u.o from u where o = 1
        union all 
        select r.i * u.i, u.o from u join r on u.o = r.o + 1
      )
    select i from r
    order by o desc
    limit 1
  ),
  string_agg(i::text, ' * ')
from t
group by i % 2

I’ve added another aggregate function STRING_AGG() for good measure to get:

i % 2	i	string_agg
0	48	2 * 4 * 6
1	15	3 * 1 * 5

Wonderful, isn’t it? Now, I wasn’t able to just add an OVER() clause right there. That produced

SQL Error [42P20]: ERROR: window functions are not allowed in functions in FROM

Maybe that will work as well, in the near future? Or, I might come up with another hack to make it work, in case of which I’ll update this post.

jOOQ support

Obviously, this will be supported in jOOQ soon: https://github.com/jOOQ/jOOQ/issues/11385. The syntax will be again much more bearable:

ctx.select(T.I.mod(inline(2)), reduce(T.I, (i1, i2) -> i1.times(i2)))
   .from(T.I)
   .groupBy(T.I.mod(inline(2)))
   .fetch();

Other emulations using actual CREATE AGGREGATE FUNCTION will be investigated as well, in the near future.

Implementing a generic REDUCE aggregate function with SQL

The aggregate function

Using ARRAY_AGG() to get the aggregation effect

Next step, generate row numbers

Now, recurse

Using group by

jOOQ support

Trending Articles

Bath man appears in court charged with attempted murder of a man...

MACLEAN, Allan

Black Angus Grilled Artichokes

Practice Sheet of Right form of verbs for HSC Students

Police blotter for Jan. 12

99 God Status for Whatsapp, Facebook

Rajasthan Board 12th Science Result 2018 name wise- RBSE 12th commerce result...

Notorious Naushad of Ippa gang nabbed

Child Kidnapping: Amy McNeil was kidnapped on her way to school by 5 adults;...

Sonible Smartlimit v1.1.5-R2R

NCERT Solutions for Class 9th Sanskrit Chapter 3 पाथेयम्

मतलबी दोस्त स्टेट्स | Matlabi Dost Status in Hindi – Selfish Friends Status

Arrow Flash 2 – Sinhala Dubbed – Episode 23 – 20th March 2016

[GET] AI Traffic Goldmine

[E² Plugin] HDF-Radio

Universal Multi-Patch v1.3 By RADIXX11

IWAN – Thanks and Praise ( Throw Back Thursday )

RONALD P SONDERGAARD Arrested by Miami-Dade County Corrections on Mar 03, 2017

मुख मैथुन से उठाएं सेक्स का भरपूर मज़ा, जानें क्या है इसका सही तरीकामुख मैथुन...

HSSC Excise & Taxation Inspector Result 2017 Scorecard/ Category Wise Merit List