SQL Server supports transforming flat tabular SQL result sets into hierarchical structures by convention using the convenient FOR XML
or FOR JSON
syntaxes. This is really convenient and less verbose than the standard SQL/XML or SQL/JSON APIs – although the standard ones are more powerful.
In this blog post, I’d like to show a few core features of the SQL Server syntax, and what they correspond to in standard SQL. jOOQ 3.14 will support both SQL Server’s syntax and the standard syntax, and will be able to translate from one to the other, such that you can use SQL Server syntax also on Db2, MariaDB, MySQL, Oracle, PostgreSQL. You can play around with the current state of development on our website here.
As always, using the Sakila database, here’s a simple example as a teaser:
-- SQL Server SELECT a.first_name, a.last_name, f.title FROM actor a JOIN film_actor fa ON a.actor_id = fa.actor_id JOIN film f ON fa.film_id = f.film_id FOR XML RAW; -- Db2, Oracle, PostgreSQL SELECT xmlagg(xmlelement( NAME row, xmlattributes( t.first_name AS first_name, t.last_name AS last_name, t.title AS title ) )) FROM ( -- Original query here SELECT a.first_name, a.last_name, f.title FROM actor a JOIN film_actor fa ON a.actor_id = fa.actor_id JOIN film f ON fa.film_id = f.film_id ) AS t
Producing in both cases something like:
<row first_name="PENELOPE" last_name="GUINESS" title="OKLAHOMA JUMANJI"/> <row first_name="PENELOPE" last_name="GUINESS" title="RULES HUMAN"/> <row first_name="PENELOPE" last_name="GUINESS" title="SPLASH GUMP"/> <row first_name="PENELOPE" last_name="GUINESS" title="VERTIGO NORTHWEST"/> <row first_name="PENELOPE" last_name="GUINESS" title="WESTWARD SEABISCUIT"/> <row first_name="PENELOPE" last_name="GUINESS" title="WIZARD COLDBLOODED"/> <row first_name="NICK" last_name="WAHLBERG" title="ADAPTATION HOLES"/> <row first_name="NICK" last_name="WAHLBERG" title="APACHE DIVINE"/>
FOR XML and FOR JSON concepts
As could be seen in the above teaser, the SQL Server syntax is far less verbose and concise, and it seems to produce a reasonable default behaviour, where the Db2, Oracle, PostgreSQL (and SQL Standard) SQL/XML APIs are more verbose, but also more powerful. For example, it is possible to map column a to attribute x and column b to a nested XML element y very easily.
The advantages of both approaches are clear. SQL Server’s approach is much more usable in the general case. But what is the general case? Let’s summarise a few key parallels between SQL result sets, and XML/JSON data structures:
- Tables are XML elements, or JSON arrays
- Rows are XML elements, or JSON objects
- Column values are XML elements or attributes, or JSON attributes
- GROUP BY and ORDER BY can be seen as a way to nest data
Tables are XML elements or JSON arrays
Tables (i.e. sets of data) are not a foreign concept to both XML and JSON documents. The most natural way to represent a set of data in XML is a set of elements using the same element name, optionally wrapped by a wrapper element. For example:
<!-- With wrapper element --> <films> <film title="OKLAHOMA JUMANJI"/> <film title="RULES HUMAN"/> <film title="SPLASH GUMP"/> </films> <!-- Without wrapper element --> <film title="OKLAHOMA JUMANJI"/> <film title="RULES HUMAN"/> <film title="SPLASH GUMP"/>
The distinction of whether a wrapper element is added is mostly significant when nesting data.
With JSON, the obvious choice of data structure to represent a table is an array. For example:
[ {"title": "OKLAHOMA JUMANJI"}, {"title": "RULES HUMAN"}, {"title": "SPLASH GUMP"} ]
Rows are XML elements or JSON objects
As we’ve already seen above, a SQL row is represented in XML using an element.
<film title="OKLAHOMA JUMANJI"/>
The question is only what the element name should be. It can usually be any of:
- A standard name, such as “row”
- The name of the table the row stems from
- A custom name
In JSON, it is an object.
{"title": "OKLAHOMA JUMANJI"}
Unlike in XML, there is no such thing as an element name, so the row is “anonymous”. The row type is defined by what table / array the JSON object is contained in.
Column values are XML elements or attributes, or JSON attributes
We have a bit more choices of how to represent SQL column values in XML. Mainly two choices:
- Represent values as attributes
- Represent values as elements
Scalar values can easily be represented as attributes. If a value needs further nesting (e.g. an array, user defined type, etc.), then elements are a better choice. In most cases, the choice is not relevant, so we can pick both:
<!-- Using attributes --> <film film_id="635" title="OKLAHOMA JUMANJI"/> <!-- Using elements from table and column names --> <film> <film_id>635</film_id> <title>OKLAHOMA JUMANJI</title> </film> <!-- Using standard element names <row> <value name="film_id" value="635"/> <value name="title" value="OKLAHOMA JUMANJI"/> </row>
There are a few other reasonable default options for the representation of a column value in XML.
In JSON, on the other hand, there are two main reasonable approaches. In most cases, an object will be chosen, where column values are identified by column name. But just like SQL records are a mixture between “structs” and “tuples”, we could imagine a representation that maps column values to array indexes as well:
// Using objects {"film_id": 635, "title": "OKLAHOMA JUMANJI"} // Using arrays [635, "OKLAHOMA JUMANJI"]
GROUP BY and ORDER BY can be seen as a way to nest data
So far, all data was represented in a flat way, just like the SQL table. There was some nesting when wrapping XML elements of JSON arrays in some wrapper element or object, or when representing XML data with more elements rather than attributes, but the data was still always tabular.
Very often, we want to consume data in a hierarchical form, though. An actor played in films, so we’d like to group the films by actor, rather than repeating the actor information for every film. In general, operations like GROUP BY
or ORDER BY
serve this purpose. GROUP BY
allows for aggregating all data into nested data structures per group (e.g. into strings, arrays, XML elements, JSON arrays, JSON objects). ORDER BY
does the same, “visually” – perhaps a bit less formally. When we look at this set of XML elements, we can see visually that they’re “grouped” (i.e. ordered) by actor:
<row first_name="PENELOPE" last_name="GUINESS" title="OKLAHOMA JUMANJI"/> <row first_name="PENELOPE" last_name="GUINESS" title="RULES HUMAN"/> <row first_name="PENELOPE" last_name="GUINESS" title="SPLASH GUMP"/> <row first_name="PENELOPE" last_name="GUINESS" title="VERTIGO NORTHWEST"/> <row first_name="PENELOPE" last_name="GUINESS" title="WESTWARD SEABISCUIT"/> <row first_name="PENELOPE" last_name="GUINESS" title="WIZARD COLDBLOODED"/> <row first_name="NICK" last_name="WAHLBERG" title="ADAPTATION HOLES"/> <row first_name="NICK" last_name="WAHLBERG" title="APACHE DIVINE"/>
SQL Server supports such grouping in at least two ways:
- Implicitly by convention, using
ORDER BY
- Explicity by creating correlated subqueries
The implicit approach could transform the above flat representation into something like this:
<a first_name="PENELOPE" last_name="GUINESS"> <f title="OKLAHOMA JUMANJI"/> <f title="RULES HUMAN"/> <f title="SPLASH GUMP"/> <f title="VERTIGO NORTHWEST"/> <f title="WESTWARD SEABISCUIT"/> <f title="WIZARD COLDBLOODED"/> </a> <a first_name="NICK" last_name="WAHLBERG"> <f title="ADAPTATION HOLES"/> <f title="APACHE DIVINE"/> </a>
… where “a” and “f” are the table names in the query (actor a
and film f
).
How do FOR XML and FOR JSON work in detail?
There are several features that can be combined in SQL Server. The complete picture can be seen from the docs. We’ll omit a few features in this blog post here.
- The transformation algorithm
RAW
(flat results, only in XML),AUTO
(hierarchical, automatic results),PATH
(hierarchical, explicit results) - The “root” name, which corresponds to an XML wrapper element, or a JSON wrapper object
- XML only: Whether values should be placed in
ELEMENTS
or attributes - JSON only:
INCLUDE_NULL_VALUES
specifies whetherNULL
values are explicit, or implicit (absent from the JSON object). - JSON only:
WITHOUT_ARRAY_WRAPPER
specifies whether the set of JSON objects should be listed as a JSON array, or a comma separated list of objects (which could be combined with other queries)
This is not complete, there are more flags and features, but instead of discussing them in theory, let’s look at a few examples:
FOR XML RAW
Producing flat results with attributes for values
-- SQL Server SELECT a.first_name, a.last_name, f.title FROM actor a JOIN film_actor fa ON a.actor_id = fa.actor_id JOIN film f ON fa.film_id = f.film_id ORDER BY 1, 2, 3 FOR XML RAW; -- Standard SQL SELECT xmlagg(xmlelement( NAME row, xmlattributes( t.first_name AS first_name, t.last_name AS last_name, t.title AS title ) )) FROM ( SELECT a.first_name, a.last_name, f.title FROM actor a JOIN film_actor fa ON a.actor_id = fa.actor_id JOIN film f ON fa.film_id = f.film_id ORDER BY 1, 2, 3 ) AS t
This produces
<row first_name="NICK" last_name="WAHLBERG" title="SMILE EARRING"/> <row first_name="NICK" last_name="WAHLBERG" title="WARDROBE PHANTOM"/> <row first_name="PENELOPE" last_name="GUINESS" title="ACADEMY DINOSAUR"/> <row first_name="PENELOPE" last_name="GUINESS" title="ANACONDA CONFESSIONS"/>
FOR XML RAW, ROOT
Producing flat results with attributes for values, and a root element to wrap the listed elements
-- SQL Server SELECT a.first_name, a.last_name, f.title FROM actor a JOIN film_actor fa ON a.actor_id = fa.actor_id JOIN film f ON fa.film_id = f.film_id ORDER BY 1, 2, 3 FOR XML RAW, ROOT('rows'); -- Standard SQL SELECT xmlelement( NAME rows, xmlagg(xmlelement( NAME row, xmlattributes( t.first_name AS first_name, t.last_name AS last_name, t.title AS title ) )) ) FROM ( SELECT a.first_name, a.last_name, f.title FROM actor a JOIN film_actor fa ON a.actor_id = fa.actor_id JOIN film f ON fa.film_id = f.film_id ORDER BY 1, 2, 3 ) AS t
This produces
<rows> <row first_name="NICK" last_name="WAHLBERG" title="SMILE EARRING"/> <row first_name="NICK" last_name="WAHLBERG" title="WARDROBE PHANTOM"/> <row first_name="PENELOPE" last_name="GUINESS" title="ACADEMY DINOSAUR"/> <row first_name="PENELOPE" last_name="GUINESS" title="ANACONDA CONFESSIONS"/> </rows>
FOR XML RAW, ELEMENTS
Producing flat results with elements for values.
-- SQL Server SELECT a.first_name, a.last_name, f.title FROM actor a JOIN film_actor fa ON a.actor_id = fa.actor_id JOIN film f ON fa.film_id = f.film_id ORDER BY 1, 2, 3 FOR XML RAW, ELEMENTS; -- Standard SQL SELECT xmlagg(xmlelement( NAME row, xmlelement( NAME first_name, first_name ), xmlelement( NAME last_name, last_name ), xmlelement( NAME title, title ) )) FROM ( SELECT a.first_name, a.last_name, f.title FROM actor a JOIN film_actor fa ON a.actor_id = fa.actor_id JOIN film f ON fa.film_id = f.film_id ORDER BY 1, 2, 3 FOR XML RAW, ELEMENTS ) AS t
This produces
<row> <first_name>NICK</first_name> <last_name>WAHLBERG</last_name> <title>SMILE EARRING</title> </row> <row> <first_name>NICK</first_name> <last_name>WAHLBERG</last_name> <title>WARDROBE PHANTOM</title> </row> <row> <first_name>PENELOPE</first_name> <last_name>GUINESS</last_name> <title>ACADEMY DINOSAUR</title> </row> <row> <first_name>PENELOPE</first_name> <last_name>GUINESS</last_name> <title>ANACONDA CONFESSIONS</title> </row>
This could also be combined with ROOT
, which we’re omitting for brevity.
FOR XML/JSON AUTO
This approach derives results completely automatically from your query structure. Mainly:
- The
SELECT
clause defines in what order XML or JSON data is nested. - The
FROM
clause defines the table names (via aliasing), which are translated to XML element or JSON object attribute names. - The
ORDER BY
clause produces the “grouping”, which is translated to nesting XML elements or JSON objects.
-- SQL Server SELECT a.first_name, a.last_name, f.title FROM actor a JOIN film_actor fa ON a.actor_id = fa.actor_id JOIN film f ON fa.film_id = f.film_id ORDER BY 1, 2, 3 FOR XML AUTO; -- Standard SQL SELECT xmlagg(e) FROM ( SELECT xmlelement( NAME a, xmlattributes( t.first_name AS first_name, t.last_name AS last_name ), xmlagg(xmlelement( NAME f, xmlattributes(t.title AS title) )) ) AS e FROM ( SELECT a.first_name, a.last_name, f.title FROM actor a JOIN film_actor fa ON a.actor_id = fa.actor_id JOIN film f ON fa.film_id = f.film_id ORDER BY 1, 2, 3 ) AS t GROUP BY first_name, last_name ) AS t
Notice how this emulation requires two steps of XMLAGG
with GROUP BY
. It gets more hairy with more tables being joined and projected! I won’t add more complex examples here, but try it online!
This produces
<a first_name="NICK" last_name="WAHLBERG"> <f title="SMILE EARRING"/> <f title="WARDROBE PHANTOM"/> </a> <a first_name="PENELOPE" last_name="GUINESS"> <f title="ACADEMY DINOSAUR"/> <f title="ANACONDA CONFESSIONS"/> </a>
Let’s try the same thing again with JSON:
-- SQL Server SELECT a.first_name, a.last_name, f.title FROM actor a JOIN film_actor fa ON a.actor_id = fa.actor_id JOIN film f ON fa.film_id = f.film_id ORDER BY 1, 2, 3 FOR JSON AUTO; -- Standard SQL SELECT json_arrayagg(e) FROM ( SELECT JSON_OBJECT( KEY 'FIRST_NAME' VALUE first_name, KEY 'LAST_NAME' VALUE last_name, KEY 'F' VALUE JSON_ARRAYAGG(JSON_OBJECT( KEY 'TITLE' VALUE title ABSENT ON NULL )) ABSENT ON NULL ) e FROM ( SELECT a.first_name, a.last_name, f.title FROM actor a JOIN film_actor fa ON a.actor_id = fa.actor_id JOIN film f ON fa.film_id = f.film_id ORDER BY 1, 2, 3 ) t GROUP BY first_name, last_name ) t
The result being:
[ { "first_name": "NICK", "last_name": "WAHLBERG", "f": [ { "title": "SMILE EARRING" }, { "title": "WARDROBE PHANTOM" } ] }, { "first_name": "PENELOPE", "last_name": "GUINESS", "f": [ { "title": "ACADEMY DINOSAUR" }, { "title": "ANACONDA CONFESSIONS" } ] } ]
FOR XML/JSON AUTO, ROOT
Like before, we could wrap this in a root XML element or a root JSON object if need be.
-- SQL Server SELECT a.first_name, a.last_name, f.title FROM actor a JOIN film_actor fa ON a.actor_id = fa.actor_id JOIN film f ON fa.film_id = f.film_id ORDER BY 1, 2, 3 FOR XML AUTO, ROOT; -- Standard SQL SELECT xmlelement( NAME join, xmlagg(e) ) FROM ( SELECT xmlelement( NAME a, xmlattributes( t.first_name AS first_name, t.last_name AS last_name ), xmlagg(xmlelement( NAME f, xmlattributes(t.title AS title) )) ) e FROM ( SELECT a.first_name, a.last_name, f.title FROM actor a JOIN film_actor fa ON a.actor_id = fa.actor_id JOIN film f ON fa.film_id = f.film_id ORDER BY 1, 2, 3 ) t GROUP BY first_name, last_name ) t
This does the same thing as before, but just wraps the previous root XMLAGG()
element in another XMLELEMENT()
function call.
This produces
<root> <a first_name="NICK" last_name="WAHLBERG"> <f title="SMILE EARRING"/> <f title="WARDROBE PHANTOM"/> </a> <a first_name="PENELOPE" last_name="GUINESS"> <f title="ACADEMY DINOSAUR"/> <f title="ANACONDA CONFESSIONS"/> </a> </root>
Let’s try the same thing again with JSON:
-- SQL Server SELECT a.first_name, a.last_name, f.title FROM actor a JOIN film_actor fa ON a.actor_id = fa.actor_id JOIN film f ON fa.film_id = f.film_id ORDER BY 1, 2, 3 FOR JSON AUTO, ROOT; -- Standard SQL SELECT JSON_OBJECT(KEY 'a' VALUE json_arrayagg(e)) FROM ( SELECT JSON_OBJECT( KEY 'FIRST_NAME' VALUE first_name, KEY 'LAST_NAME' VALUE last_name, KEY 'F' VALUE JSON_ARRAY_AGG(JSON_OBJECT( KEY 'TITLE' VALUE title ABSENT ON NULL )) ABSENT ON NULL ) e FROM ( SELECT a.first_name, a.last_name, f.title FROM actor a JOIN film_actor fa ON a.actor_id = fa.actor_id JOIN film f ON fa.film_id = f.film_id ORDER BY 1, 2, 3 ) t GROUP BY first_name, last_name ) t
The result being:
{ "a": [ { "first_name": "NICK", "last_name": "WAHLBERG", "f": [ { "title": "SMILE EARRING" }, { "title": "WARDROBE PHANTOM" } ] }, { "first_name": "PENELOPE", "last_name": "GUINESS", "f": [ { "title": "ACADEMY DINOSAUR" }, { "title": "ANACONDA CONFESSIONS" } ] } ] }
FOR XML AUTO, ELEMENTS
Like before, instead of producing attributes, we might decide to produce elements instead (in XML only):
-- SQL Server SELECT a.first_name, a.last_name, f.title FROM actor a JOIN film_actor fa ON a.actor_id = fa.actor_id JOIN film f ON fa.film_id = f.film_id ORDER BY 1, 2, 3 FOR XML AUTO, ELEMENTS; -- Standard SQL SELECT xmlagg(e) FROM ( SELECT xmlelement( NAME a, xmlelement( NAME first_name, first_name ), xmlelement( NAME last_name, last_name ), xmlagg(xmlelement( NAME f, xmlelement( NAME title, title ) )) ) e FROM ( SELECT a.first_name, a.last_name, f.title FROM actor a JOIN film_actor fa ON a.actor_id = fa.actor_id JOIN film f ON fa.film_id = f.film_id ORDER BY 1, 2, 3 ) t GROUP BY first_name, last_name ) t
Not much has changed, except for the fact that a set of XMLELEMENT()
calls are made, rather than XMLATTRIBUTES()
callse.
This produces
<a> <first_name>NICK</first_name> <last_name>WAHLBERG</last_name> <f> <title>SMILE EARRING</title> </f> <f> <title>WARDROBE PHANTOM</title> </f> </a> <a> <first_name>PENELOPE</first_name> <last_name>GUINESS</last_name> <f> <title>ACADEMY DINOSAUR</title> </f> <f> <title>ANACONDA CONFESSIONS</title> </f> </a>
FOR XML/JSON PATH
The PATH
strategy is my personal favourite. It is used to create nested XML or JSON path structures more explicitly, and also allows for additional nesting levels when grouping projections together. This is best shown by example. Notice, how I’m now using aliases for my columns, and the alias looks like an XPath expression using '/'
(slashes):
-- SQL Server SELECT a.first_name AS [author/first_name], a.last_name AS [author/last_name], f.title FROM actor a JOIN film_actor fa ON a.actor_id = fa.actor_id JOIN film f ON fa.film_id = f.film_id ORDER BY 1, 2, 3 FOR XML PATH; -- Standard SQL SELECT xmlagg(xmlelement( NAME row, xmlelement( NAME author, xmlelement( NAME first_name, "author/first_name" ), xmlelement( NAME last_name, "author/last_name" ) ), xmlelement( NAME title, title ) )) FROM ( SELECT a.first_name AS "author/first_name", a.last_name AS "author/last_name", f.title FROM actor a JOIN film_actor fa ON a.actor_id = fa.actor_id JOIN film f ON fa.film_id = f.film_id ORDER BY 1, 2, 3 ) t
Check out how by convention, we’re now getting an additional level of nesting for author related columns under the row/author
element:
<row> <author> <first_name>NICK</first_name> <last_name>WAHLBERG</last_name> </author> <title>SMILE EARRING</title> </row> <row> <author> <first_name>NICK</first_name> <last_name>WAHLBERG</last_name> </author> <title>WARDROBE PHANTOM</title> </row> <row> <author> <first_name>PENELOPE</first_name> <last_name>GUINESS</last_name> </author> <title>ACADEMY DINOSAUR</title> </row> <row> <author> <first_name>PENELOPE</first_name> <last_name>GUINESS</last_name> </author> <title>ANACONDA CONFESSIONS</title> </row>
This is really neat! The SQL Server syntax is definitely much more convenient for this common use-case.
Let’s try the same thing again with JSON. The only thing we change is we now use a JSON-path-ish syntax using dots ('.'
) rather than slashes ('/'
):
-- SQL Server SELECT a.first_name AS [author.first_name], a.last_name AS [author.last_name], f.title FROM actor a JOIN film_actor fa ON a.actor_id = fa.actor_id JOIN film f ON fa.film_id = f.film_id ORDER BY 1, 2, 3 FOR JSON PATH; -- Standard SQL SELECT JSON_ARRAYAGG(JSON_OBJECT( KEY 'author' VALUE JSON_OBJECT( KEY 'first_name' VALUE author.first_name, KEY 'last_name' VALUE author.last_name ), KEY 'TITLE' VALUE title ABSENT ON NULL )) FROM ( SELECT a.first_name AS "author.first_name", a.last_name AS "author.last_name", f.title FROM actor a JOIN film_actor fa ON a.actor_id = fa.actor_id JOIN film f ON fa.film_id = f.film_id ORDER BY 1, 2, 3 ) t
The result being (again, with nested objects):
[ { "author": { "first_name": "NICK", "last_name": "WAHLBERG" }, "title": "SMILE EARRING" }, { "author": { "first_name": "NICK", "last_name": "WAHLBERG" }, "title": "WARDROBE PHANTOM" }, { "author": { "first_name": "PENELOPE", "last_name": "GUINESS" }, "title": "ACADEMY DINOSAUR" }, { "author": { "first_name": "PENELOPE", "last_name": "GUINESS" }, "title": "ANACONDA CONFESSIONS" } ]
For more sophisticated nesting, including nesting of collections, a correlated subquery is needed in SQL Server, also with a FOR XML
or FOR JSON
syntax.
Conclusion
XML and JSON are popular document formats outside and inside of the database. SQL Server has some of the most conventient syntax for most cases, while standard SQL supports much more basic, and thus more powerful constructs. In standard SQL, almost any kind of XML or JSON projection is possible, and with XMLTABLE()
and JSON_TABLE()
, the documents can be transformed back to SQL tables, as well. In many applications, using these XML or JSON features natively would lead to much less boilerplate code, as many applications do not need middleware between the database and some client, just to transform data between formats.
Most ORMs don’t expose this functionality for a variety of reasons, the main one being that the devil is in the details. While both XML and JSON are nicely standardised, the implementations differ greatly:
- The SQL/XML standard is implemented mostly by DB2, Oracle, and PostgreSQL. Many dialects offer some XML capabilities, but not as impressive as the standard and the previous three. SQL Server has
FOR XML
which is very powerful for standard XML serialisations, but may be a bit difficult to use for edge cases - The SQL/JSON standard was added late and is implemented again to large extents by DB2 and Oracle, but inceasingly also by MariaDB and MySQL. PostgreSQL (and by consequence, compatible dialects, like CockroachDB) had their own proprietary functions and APIs, which are not compatible with the standard. And again, SQL Server has
FOR JSON
which works well for standard serialisations, but a bit less well for edge cases
These technologies are poorly adopted in clients because of the many subtle differences. jOOQ has been leveling out these minor differences for many years without hiding the core functionality. SQL/XML and SQL/JSON are perfect use-cases for jOOQ 3.14 (due in Q2 2020), which now allows for using both the standard SQL/XML and SQL/JSON syntaxes as well as the SQL Server FOR XML
and FOR JSON
syntax in the jOOQ Professional and Enterprise Editions.
Before jOOQ 3.14 is out, you can already play with the current functionality on our website: https://www.jooq.org/translate