SQL is a really cool language. I can write really complex business logic with this logic programming language. I was again thrilled about SQL recently, at a customer site:
But whenever I tweet something like the above, the inevitable happened. I was nerd sniped. Oleg Šelajev from ZeroTurnaround challenged me to prove that SQL is so awesome:
Given a string, find all substrings from that string, which are palindromes. Challenge accepted! (For the moment, let’s forget about algorithmic complexity.)
Here’s the Full Query
Spoiler first.
Bear with me, I’ll explain it step by step afterwards. Here’s with PostgreSQL syntax:
WITH RECURSIVE words (word) AS ( VALUES ('pneumonoultramicroscopicsilicovolcanoconiosis'), ('pseudopseudohypoparathyroidism'), ('floccinaucinihilipilification'), ('antidisestablishmentarianism'), ('supercalifragilisticexpialidocious'), ('incomprehensibilities'), ('honorificabilitudinitatibus'), ('tattarrattat') ), starts (word, start) AS ( SELECT word, 1 FROM words UNION ALL SELECT word, start + 1 FROM starts WHERE start < length(word) ), palindromes (word, palindrome, start, length) AS ( SELECT word, substring(word, start, x), start, x FROM starts CROSS JOIN (VALUES(0), (1)) t(x) UNION ALL SELECT word, palindrome, start, length + 2 FROM ( SELECT word, substring(word, start - length / 2, length) AS palindrome, start, length FROM palindromes ) AS p WHERE start - length / 2 > 0 AND start + (length - 1) / 2 <= length(word) AND substring(palindrome, 1, 1) = substring(palindrome, length(palindrome), 1) ) SELECT DISTINCT word, trim(replace(word, palindrome, ' ' || upper(palindrome) || ' ')) AS palindromes FROM palindromes WHERE length(palindrome) > 1 ORDER BY 2
(You can run it yourself on SQLFiddle)
The result being:
word |palindromes ----------------------------------------------|----------------------------------------------- antidisestablishmentarianism |ant IDI sestablishmentarianism antidisestablishmentarianism |antidi SES tablishmentarianism floccinaucinihilipilification |flo CC inaucinihilipilification floccinaucinihilipilification |floccinauc INI hilipilification floccinaucinihilipilification |floccinaucin IHI lipilification floccinaucinihilipilification |floccinaucinih ILI p ILI fication floccinaucinihilipilification |floccinaucinih ILIPILI fication floccinaucinihilipilification |floccinaucinihi LIPIL ification floccinaucinihilipilification |floccinaucinihil IPI lification floccinaucinihilipilification |floccinaucinihilipil IFI cation honorificabilitudinitatibus |h ONO rificabilitudinitatibus honorificabilitudinitatibus |honor IFI cabilitudinitatibus honorificabilitudinitatibus |honorificab ILI tudinitatibus honorificabilitudinitatibus |honorificabilitud INI tatibus honorificabilitudinitatibus |honorificabilitudin ITATI bus honorificabilitudinitatibus |honorificabilitudini TAT ibus incomprehensibilities |incompr EHE nsibilities incomprehensibilities |incomprehens IBI lities incomprehensibilities |incomprehensib ILI ties incomprehensibilities |incomprehensibil ITI es pneumonoultramicroscopicsilicovolcanoconiosis |pneum ONO ultramicroscopicsilicovolcanoconios pneumonoultramicroscopicsilicovolcanoconiosis |pneumonoultramicroscopics ILI covolcanoconios pneumonoultramicroscopicsilicovolcanoconiosis |pneumonoultramicroscopicsilic OVO lcanoconios pneumonoultramicroscopicsilicovolcanoconiosis |pneumonoultramicroscopicsilicovolca NOCON ios pneumonoultramicroscopicsilicovolcanoconiosis |pneumonoultramicroscopicsilicovolcan OCO nios pneumonoultramicroscopicsilicovolcanoconiosis |pneumonoultramicroscopicsilicovolcanoconio SIS pseudopseudohypoparathyroidism |pseudopseudohy POP arathyroidism pseudopseudohypoparathyroidism |pseudopseudohypop ARA thyroidism pseudopseudohypoparathyroidism |pseudopseudohypoparathyro IDI sm supercalifragilisticexpialidocious |supercalifrag ILI sticexpialidocious tattarrattat |t ATTA rr ATTA t tattarrattat |t ATTARRATTA t tattarrattat |ta TT arra TT at tattarrattat |ta TTARRATT at tattarrattat |tat TARRAT tat tattarrattat |TAT tarrat TAT tattarrattat |tatt ARRA ttat tattarrattat |tatta RR attat tattarrattat |TATTARRATTAT
This query uses a couple of nice features:
Common Table Expressions
They’re the only way to declare variables in SQL – i.e. you can “store” a query in such a table expression, assign a name to it, and reuse it several times. This is done with the WITH
clause. The nice thing about common table expressions is that they are allowed to be RECURSIVE
(depending on the database, the RECURSIVE
keyword may be required / optional / not available).
VALUES() clause
The VALUES()
clause is a very handy tool to create ad-hoc data in the form of tables. We did this to create a table called WORDS
, which contains a couple of words inside of which we’d like to look for palindromes. In some databases (including DB2 and PostgreSQL), it’s totally possible to just use the VALUES()
clause as a standalone clause instead of SELECT
:
VALUES ('pneumonoultramicroscopicsilicovolcanoconiosis'), ('pseudopseudohypoparathyroidism'), ('floccinaucinihilipilification'), ('antidisestablishmentarianism'), ('supercalifragilisticexpialidocious'), ('incomprehensibilities'), ('honorificabilitudinitatibus'), ('tattarrattat')
Recursive Generation of Integers
In order to look for palindromes, the algorithm used here lists, for each word, the character index of each individual character in the word:
WITH RECURSIVE ... starts (word, start) AS ( SELECT word, 1 FROM words UNION ALL SELECT word, start + 1 FROM starts WHERE start < length(word) ), ...
For example, if we used the word “word”…
WITH RECURSIVE words (word) AS (VALUES('word')), starts (word, start) AS ( SELECT word, 1 FROM words UNION ALL SELECT word, start + 1 FROM starts WHERE start < length(word) ) SELECT * FROM starts
We’d get the following result:
word |start -----|------ word |1 word |2 word |3 word |4
The idea of the algorithm is that we start from a given character and “fan out” in both directions of a character, again recursively. The characters to the left and to the right of such a character must be the same and so on.
If you’re interested in the syntax, stay tuned. There will be a blog post coming up soon.
Using CROSS JOIN to run the Algorithm Twice
There are two types of palindromes:
- Those with an even amount of characters: “”, “aa”, “abba”, “babbab”
- Those with an odd amount of characters: “b”, “aba”, “aabaa”
In our algorithm, we’re treating them in almost the same way by running the algorithm twice. Whenever you think “hey, I need to run this twice in SQL”, CROSS JOIN
could be a good candidate, as we’re creating a cartesian product between:
- The previous “starts” table, giving the starting character index
- A table containing values 0 (palindromes with even amounts of letters) and 1 (palindromes with odd amounts of letters)
For more information about CROSS JOIN
, read our article about JOINs.
Run this query for illustration:
WITH RECURSIVE words (word) AS (VALUES('word')), starts (word, start) AS ( SELECT word, 1 FROM words UNION ALL SELECT word, start + 1 FROM starts WHERE start < length(word) ) SELECT * FROM starts CROSS JOIN (VALUES(0),(1)) AS t(x)
Output:
word |start |x -----|------|-- word |1 |0 <-- Even length palindromes centering around position 1 length 0 word |1 |1 <-- Odd length palindromes centering around position 1 length 1 word |2 |0 word |2 |1 word |3 |0 word |3 |1 word |4 |0 word |4 |1
Trivially, a palindrome of length 0 (empty string) or length 1 (“w”, “o”, “r”, “d”) is acceptable in principle, but boring. We’ll filter them out later, but keep them in the algorithm as this simplifies the algorithm. If you want to tune the query, you could prevent generating them in the first place.
The Palindrome Algorithm
Thus far, we’ve just prepared utility data to calculate palindromes:
- The words inside of which to search for palindromes
- The individual character indexes to start fanning out from, in each individual word
- The minimum palindrome length 0 (even) or 1 (odd)
Now comes the interesting part. Recursively fanning out from a starting character to find more palindromes, stopping the fanning out as soon as the new candidate is no longer a plaindrome:
WITH RECURSIVE ... palindromes (word, palindrome, start, length) AS ( -- This part just creates the even/odd semantics SELECT word, substring(word, start, x), start, x FROM starts CROSS JOIN (VALUES(0), (1)) t(x) UNION ALL -- This part recurses by "fanning out" SELECT word, palindrome, start, length + 2 FROM ( SELECT word, substring(word, start - length / 2, length) AS palindrome, start, length FROM palindromes ) AS p WHERE start - length / 2 > 0 AND start + (length - 1) / 2 <= length(word) AND substring(palindrome, 1, 1) = substring(palindrome, length(palindrome), 1) ) ...
It isn’t really so hard in fact. The recursion part selects recursively from the PALINDROMES
table (the previously calculated palindromes). That table has 4 columns:
WORD
: The word we’re looking for palindromes in. This is always the same per recursionPALINDROME
: The palindrome, i.e. the substring inside of a word. This changes per recursionSTART
: The start character index from which we started fanning out. This is always the same per recursionLENGTH
: The palindrome length. This increases by 2 per recursion
Let’s look at the result for “floccinaucinihilipilification”:
flo CC inaucinihilipilification floccinauc INI hilipilification floccinaucin IHI lipilification floccinaucinih ILI p ILI fication floccinaucinih ILIPILI fication floccinaucinihi LIPIL ification floccinaucinihil IPI lification floccinaucinihilipil IFI cation
There are a total of 8 distinct palindromes contained in this word (and believe it or not, the word exists, too. Pronunciation is another thing).
Let’s “debug” the algorithm to get to this list (remember, SQL indexes are 1 based):
Start 1-4: No palindromes Start 5: Even palindrome [4:5] flo CC inaucinihilipilification Start 6-11: No palindromes (I wont' repeat this further down) Start 12: Odd palindrome [11:13] floccinauc INI hilipilification Start 14: Odd palindrome [13:15] floccinaucin IHI lipilification Start 16: Odd palindrome [15:17] floccinaucinih ILI p ILI fication Start 18: Odd palindrome [17:19], [16:20], [15:21] (Fanning out 3 times) floccinaucinihil IPI lification floccinaucinihi LIPIL ification floccinaucinih ILIPILI fication Start 20: Odd palindrome [19:17] (already found this) floccinaucinih ILI p ILI fication Start 22: Odd palindrome [21:23] floccinaucinihilipil IFI cation
The IPI, LIPIL, ILIPILI chain is the most interesting. We’ve succeeded to fan out 3 times adding new characters from WORD on both sides of the initial character.
When do we stop fanning out? Whenever one of these conditions hold true:
WHERE start - length / 2 > 0 AND start + (length - 1) / 2 <= length(word) AND substring(palindrome, 1, 1) = substring(palindrome, length(palindrome), 1)
I.e.
- When we’ve reached the beginning of WORD (no more characters to the left)
- When we’ve reached the end of WORD (no more characters to the right)
- When the letter to the left of the palindrome of the previous recursion doesn’t match the letter to the right of the palindrome
That last predicate could also simply read:
AND palindrome = reverse(palindrome)
But that might be a bit slower as it compares something that we’ve already proven to be true in the previous recursion.
Finally, formatting the result
The final part isn’t too interesting anymore:
SELECT DISTINCT word, trim(replace(word, palindrome, ' ' || upper(palindrome) || ' ')) AS palindromes FROM palindromes WHERE length(palindrome) > 1 ORDER BY 2
We’ll simply:
- Select
DISTINCT
results only, as palindromes might appear several times in a word - Replace the palindrome substring by its upper case version and add some whitespace to better visualise it. That’s totally optional, of course
- Remove the trivial palindromes of length 0 and 1
And we’re done! Again, feel free to play around with this on SQLFiddle, or, much better, provide a cool palindrome algorithm implementation in your favourite language in the comments section, or on Twitter:
More beautiful SQL in these articles here:
- SQL, Streams, For Comprehension… It’s All the Same
- A Beginner’s Guide to the True Order of SQL Operations
- A Probably Incomplete, Comprehensive Guide to the Many Different Ways to JOIN Tables in SQL
- 10 SQL Tricks That You Didn’t Think Were Possible
Filed under: sql Tagged: Common Table Expressions, Palindromes, Recursive SQL, sql Image may be NSFW.
Clik here to view.
Clik here to view.
Clik here to view.
Clik here to view.
Clik here to view.
Clik here to view.
Clik here to view.
Clik here to view.
