A question that is frequently occurring among my SQL training‘s participants is:
What’s the difference between putting a predicate in the
JOIN .. ON
clause and theWHERE
clause?
I can definitely see how that’s confusing some people, as there seems to be no difference at first sight, when running queries like these, e.g. in Oracle. I’m using the Sakila database, as always:
-- First query SELECT a.actor_id, a.first_name, a.last_name, count(fa.film_id) FROM actor a LEFT JOIN film_actor fa ON a.actor_id = fa.actor_id WHERE fa.film_id < 10 GROUP BY a.actor_id, a.first_name, a.last_name ORDER BY count(fa.film_id) ASC;
This will yield something like:
ACTOR_ID FIRST_NAME LAST_NAME COUNT -------------------------------------- 108 WARREN NOLTE 3 162 OPRAH KILMER 3 19 BOB FAWCETT 2 10 CHRISTIAN GABLE 2 53 MENA TEMPLE 2 137 MORGAN WILLIAMS 1 2 NICK WAHLBERG 1
Of course, we could have written this instead, and received the same result:
-- Second query SELECT a.actor_id, a.first_name, a.last_name, count(fa.film_id) FROM actor a JOIN film_actor fa ON a.actor_id = fa.actor_id AND fa.film_id < 10 GROUP BY a.actor_id, a.first_name, a.last_name ORDER BY count(fa.film_id) DESC;
Now, I’ve moved the FILM_ID < 10
filter from the WHERE
clause to the ON
clause. But the execution plan is the same for both queries:
--------------------------------------------------------- | Id | Operation | Name | Rows | --------------------------------------------------------- | 0 | SELECT STATEMENT | | 49 | | 1 | SORT ORDER BY | | 49 | | 2 | HASH GROUP BY | | 49 | |* 3 | HASH JOIN | | 49 | |* 4 | INDEX FAST FULL SCAN| PK_FILM_ACTOR | 49 | | 5 | VIEW | VW_GBF_7 | 200 | | 6 | TABLE ACCESS FULL | ACTOR | 200 | --------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 3 - access("ITEM_1"="FA"."ACTOR_ID") 4 - filter("FA"."FILM_ID"<10)
It does not seem matter at all. Both queries yield the same result as well as the same plan. So…
Are ON and WHERE really the same thing?
They are when you run an inner join. But they are not when you run an outer join.
And now, let’s compare these two queries here:
-- First query SELECT a.actor_id, a.first_name, a.last_name, count(fa.film_id) FROM actor a LEFT JOIN film_actor fa ON a.actor_id = fa.actor_id WHERE fa.film_id < 10 GROUP BY a.actor_id, a.first_name, a.last_name ORDER BY count(fa.film_id) ASC;
Yielding
ACTOR_ID FIRST_NAME LAST_NAME COUNT -------------------------------------- 194 MERYL ALLEN 1 198 MARY KEITEL 1 30 SANDRA PECK 1 85 MINNIE ZELLWEGER 1 123 JULIANNE DENCH 1
Notice that with this syntax, we’re not getting any actors that have no films with FILM_ID < 10
. We should get dozens! How about this:
-- Second query SELECT a.actor_id, a.first_name, a.last_name, count(fa.film_id) FROM actor a LEFT JOIN film_actor fa ON a.actor_id = fa.actor_id AND fa.film_id < 10 GROUP BY a.actor_id, a.first_name, a.last_name ORDER BY count(fa.film_id) ASC;
This used to produce the same result for an (INNER
) JOIN
, but given the LEFT JOIN
, we’re now not getting Susan Davis in the result:
ACTOR_ID FIRST_NAME LAST_NAME COUNT ----------------------------------------- 3 ED CHASE 0 4 JENNIFER DAVIS 0 5 JOHNNY LOLLOBRIGIDA 0 6 BETTE NICHOLSON 0 ... 1 PENELOPE GUINESS 1 200 THORA TEMPLE 1 2 NICK WAHLBERG 1 198 MARY KEITEL 1
The plans are also different:
--------------------------------------------------------- | Id | Operation | Name | Rows | --------------------------------------------------------- | 0 | SELECT STATEMENT | | 49 | | 1 | SORT ORDER BY | | 49 | | 2 | HASH GROUP BY | | 49 | |* 3 | HASH JOIN | | 49 | |* 4 | INDEX FAST FULL SCAN| PK_FILM_ACTOR | 49 | | 5 | VIEW | VW_GBF_7 | 200 | | 6 | TABLE ACCESS FULL | ACTOR | 200 | --------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 3 - access("ITEM_1"="FA"."ACTOR_ID") 4 - filter("FA"."FILM_ID"<10)
No outer join here! Versus
--------------------------------------------------------------- | Id | Operation | Name | Rows | --------------------------------------------------------------- | 0 | SELECT STATEMENT | | 200 | | 1 | SORT ORDER BY | | 200 | | 2 | MERGE JOIN OUTER | | 200 | | 3 | TABLE ACCESS BY INDEX ROWID| ACTOR | 200 | | 4 | INDEX FULL SCAN | PK_ACTOR | 200 | |* 5 | SORT JOIN | | 44 | | 6 | VIEW | VW_GBC_5 | 44 | | 7 | HASH GROUP BY | | 44 | |* 8 | INDEX FAST FULL SCAN | PK_FILM_ACTOR | 49 | --------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 5 - access("A"."ACTOR_ID"="ITEM_1"(+)) filter("A"."ACTOR_ID"="ITEM_1"(+)) 8 - filter("FILM_ID"(+)<10)
The first query did not produce an outer join operation, the second one did!
What’s the difference?
The difference is:
- An
INNER JOIN
produces all the actors who played in at least one film, filtering out the actors who did not play in a film. That’s the very definition of an inner join. If we filter the films withFILM_ID < 10
, that simply means we don’t want any actors without such films either. - A
LEFT JOIN
will produce all the rows from the left side of the join, regardless if there is a matching row on the right side of the join.
In both cases, the matching rows are determined by the ON
clause. If two rows don’t match, then:
- The
INNER JOIN
removes them both from the result - The
LEFT JOIN
retains the left row in the result
But regardless what the JOIN
produces, the WHERE
clause will again remove rows that do not satisfy the filter. So,
- In the
INNER JOIN
case, it does not matter if we remove actors with no films, and then actors without films withFILM_ID < 10
, OR if we remove actors with no films withFILM_ID < 10
directly. They’re going to be removed anyway. - In the
LEFT JOIN
case, it does matter if we retain actors with no films, and then remove actors without films withFILM_ID < 10
(in case of which actors without films will be removed again), OR if we retain actors without films withFILM_ID < 10
, and then not apply any further filters.
Conclusion
For INNER JOIN
, WHERE
predicates and ON
predicates have the same effect.
For OUTER JOIN
, WHERE
predicates and ON
predicates have a different effect.
In general, it is always best to put a predicate where it belongs, logically. If the predicate is related to a JOIN
operation, it belongs in the ON
clause. If a predicate is related to a filter applied to the entire FROM
clause, it belongs in the WHERE
clause.