Quantcast
Channel: sql – Java, SQL and jOOQ.
Viewing all articles
Browse latest Browse all 426

Quantified Comparison Predicates – Some of SQL’s Rarest Species

$
0
0

A recent Tweet by Aaron Bertrand (whom you’ve certainly encountered on Stack Overflow) has triggered my interest

Indeed, few people I’ve met and who’ve visited my SQL masterclass have heard about the ANY and ALL quantifiers, which you can use in SQL, let alone used them on a regular basis (or ever used them at all).

What are these things?

The formal definition is rather simple. Let

R <comp op> <quantifier> S

… be a quantified comparison predicate where

  • R is a row
  • <comp op> is a comparison operator (like =, !=, etc.)
  • <quantifier> is ANY or ALL
  • S is a subquery

You can now write things like:

SELECT *
FROM author AS a
WHERE a.author_id = ANY (
  SELECT b.author_id
  FROM book AS b
)

The above reads relatively intuitively (depending on your understanding of “intuitive”):

I want all authors whose author_id is equal to “ANY” of the author_id values contained in book

In fact, the SQL standard defines the IN predicate as being just syntax sugar for the = ANY() quantified comparison predicate.

8.4 <in predicate>

Let RVC be the <row value predicand> and
let IPV be the <in predicate value>.

...

The expression

  RVC IN IPV

is equivalent to

  RVC = ANY IPV

OK, so why even use these weird operators?

The interesting case for these operators is not where the comparison operator is an equality (“=”) or non-equality (“!=” or “”). The interesting case is where we’re comparing things with less-than or greater-than operators, such as:

SELECT *
FROM author AS a
WHERE a.age > ALL (
  SELECT c.age
  FROM customer AS c
)

Or, in plain English:

I want all authors whose age is greater than the age of all customers

That’s neat, isn’t it? Of course, there are (at least) two alternative ways of tackling the same query, namely:

Comparing only with the MAX()

SELECT *
FROM author AS a
WHERE a.age > (
  SELECT MAX(c.age)
  FROM customer AS c
)

This solution compares the author’s age only with the maximum customer age, which obviously yields the same result. Once we’ve found an author older than the oldest customer, then that author is older than ALL the customers.

Instead of using MAX(), we can use the following, semantically equivalent query:

Comparing only with the “first”

SELECT *
FROM author AS a
WHERE a.age > (
  SELECT c.age
  FROM customer AS c
  ORDER BY c.age DESC
  LIMIT 1
)

I’m using PostgreSQL syntax (LIMIT 1), although most databases have some way of returning only one row from a query (see the jOOQ manual for details). This is the same as using MAX(). Some databases recognise the fact that we’re only returning a single row, and they won’t sort the entire result set first in O(N log N), but yield the maximum value only in O(N), and only once for all authors, instead of once per author. Whether this optimisation is done in your database, I can’t say. You should measure this yourself, and check out execution plans.

So, again, why use quantifiers?

Aaron from the twitter conversation would advise against using this “unintuitive” syntax:

Perhaps. Ultimately, you should always choose performance first, and then – most certainly – intuitiveness second (because some poor soul might need to maintain your query). But personally, I find these quantifiers quite elegant for three reasons:

  1. They express the quantification right where it belongs. With the comparison operator. Compare this with the solution using LIMIT, which may be far away, visually, from the greater-than operator. Quantifiers are much more concise, even than when using MAX() (in my opinion)
  2. They’re very set oriented. I like thinking in terms of sets when I work with SQL. Whenever I can omit the ORDER BY clause, I will. If only to avoid potentially slow operations (in case the database doesn’t optimise this, and a full O(N log N) sort operation is invoked)
  3. Quantified comparison predicates work on rows too, not just on single values.

Check this out:

SELECT (c1.last_name, c1.first_name) >= ALL (
  SELECT c2.last_name, c2.first_name
  FROM customer AS c2
)
FROM customer AS c1
WHERE c1.id = 1

The above query will yield a single column containing a “TRUE” or “FALSE” value (e.g. in PostgreSQL, which supports this exact syntax). The idea is that we’re running a query for customer with id = 1 and we want to see if that customer’s (last_name, first_name) tuple is “after” all the other customers’. Or in plain English:

Am I at the end of the phone book?

Again, you could do this with LIMIT:

SELECT (c1.last_name, c1.first_name) >= (
  SELECT c2.last_name, c2.first_name
  FROM customer AS c2
  ORDER BY c2.last_name DESC, c2.first_name DESC
  LIMIT 1
)
FROM customer AS c1
WHERE c1.id = 1

… but I find this much less elegant.

Unfortunately, this time there’s no way to solve this problem with MAX(). No database (that I’m aware of) supports using row value expressions (tuples) for aggregate functions like MAX(). This would be cool:

SELECT (c1.last_name, c1.first_name) >= (
  SELECT MAX((c2.last_name, c2.first_name))
  FROM customer AS c2
)
FROM customer AS c1
WHERE c1.id = 1

Or this:

SELECT (c1.last_name, c1.first_name) >= (
  SELECT MAX(ROW(c2.last_name, c2.first_name))
  FROM customer AS c2
)
FROM customer AS c1
WHERE c1.id = 1

Of course, the above is just a constructed example. Yes, you could also use window functions instead (e.g. LEAD()).

Conclusion

Quantified comparison predicates are very rarely seen in the wild. One reason is, few people know about them, or think of them when they’re solving a problem. Another reason might be, they’re not well optimised in some databases, so it’s actually a bad idea to use them.

Nonetheless, I find them to be very interesting SQL trivia to know about. And who knows, perhaps one day, you do run into a problem that is indeed best solved with quantified comparison predicates. Then you’ll shine with the most optimal solution.

Further reading


Filed under: sql Tagged: ALL, ANY, Quantified Comparison Predicates, sql

Viewing all articles
Browse latest Browse all 426

Trending Articles