Copyright © 2013-2014 Inria
This work is licensed under a Creative Commons Attribution License 1.0 Generic License. This copyright applies to the SQL AST Vocabulary Specification in Turtle and RDF/XML, and to the present documentation.The SQL AST Vocabulary complies with W3C's RDF 1.1 and RDFS 1.1 recommendations.
The SQL AST Vocabulary allows SQL code abstract syntax trees to be published in RDF.
As relational databases remain widespread in enterprise systems, most business intelligence and data analytics software make large use of SQL [SQL2] to query and manipulate their content in a deterministic and efficient manner, and use the result sets to populate dynamic documents such as visualizations or infographics. While such documents remain disparate from a system to another, they often share the use of (vendor-specific dialects of) SQL, which provides a formal ground for edition, comparison or retrieval of the documents, provided the queries are modelled with a richer representation than their textual serialization.
Abstract syntax trees allow to build structured representations of code for any language with a grammar: AST nodes carrying labels can stand for keywords, objects, variables, constants or any language element while the tree structure allows to abstract away the language's concepts of scope or dependency. ASTs are a machine-readable format for concrete syntax code, and can be decorated with implicit or contextual knowledge from the grammar. We hereby propose to model ASTs with RDF graphs. AST structures naturally fit RDF graphs: nodes map resources, node labels map resources' types, and grammar knowledge map to vocabulary's semantics, e.g. with subsumption.
With this vocabulary, we take the case of modeling SQL ASTs, largely inspired by the SPIN work which represents SPARQL queries in RDF [SPIN]. We propose a toolbox of RDF solutions to represent SQL queries with a concision requirement in order to ensure the conveniency of these ASTs'' manipulation with SPARQL (edition, comparison or retrieval). Therefore we focus on Turtle serialization [TURTLE] of the ASTs, as it is close to SPARQL triple pattern syntax. We build an ontological view for a subset of the most widely used SQL constructs, and leave extensions to dialects specialists who can extend it according to their SQL vendor-specific features (e.g. built-in functions), according to the web of data reusability best practices.
In vocabulary outline diagrams, black arrows represent rdfs:subClassOf
relationships between classes, white arrows represent rdfs:subPropertyOf
relationships between properties, and empty arrows represent other properties as specified with their labels. The recommended prefix for this vocabulary is sql:
, unless otherwise specified it is assumed as the default prefix in outline diagrams.
All AST nodes are typed with one of the language's abstract classes that are further detailed below. The default way to represent an AST node's children is by using the generic property sql:args
and an grouping them into an rdf:List
collection to enforce children ordering.
SQL statement classes represent the different types of instructions that can be communicated as queries to a RDB. This vocabulary essentially focuses on SELECT data manipulation statements which are most commonly found in data analytics use cases.
SQL statement are segmented in clauses, which are represented with properties. These are all subproperties of sql:clause
. Some of the clauses are bound to specific statements, which is represented with their property's rdfs:domain
. Other clauses are generic an can be used in different types of statements. For instance, the FROM clause can be found in SELECT and DELETE statements, its content, and therefore its abstract syntax subtree, differs accordingly. When a single type of object can be found in a clause, the rdfs:range
of its property is specified (with LIMIT, INTO and VALUES clauses). Other clauses have no domain nor range.
SQL predicates are expressions or sets of expressions combined with logical operators, and are evaluated as booleans that are used in WHERE and HAVING clauses and FROM's clauses join conditions.
SQL expressions combine symbols, literals and operators and evaluate to values. In an AST, the expression concept is often omitted and is represented by its highest-precedence operator or function, or sequences of value objects.
SQL operators are symbols that translate to an action to be performed on one or more expressions, and also include wildcards as nullary operators whose evaluation depend on the local statement context.
SQL functions are the most vendor-specific part of SQL, hence we provide only the generic aggregation functions, and abstract classes that characterize functions depending on the data types they handle or return.
Finally, we give a list of catalog objects that can be manipulated in queries, by referencing them with their identifiers. These identifiers can be qualified ones, e.g. the "name" column from the "customers" table from the "operations" schema is referred to in SQL using the membership operator '.
', resulting in operations.customers.name
being its qualified identifier. This translates to Turtle as:
[ a sql:ColumnIdentifier ; sql:args ( [ a sql:Schema ; rdfs:label "operations" ] [ a sql:Table ; rdfs:label "customers" ] [ a sql:Column ; rdfs:label "name" ] ) ]
Hence, the catalog objects classes below are merely used to type the blank nodes that bear the unqualified names of objects as their rdfs:label
s. Each type of catalog object also has a corresponding identifier class which is a subclass of sql:ObjectIdentifier
, it can then be added any number of unqualified raw catalog object nodes (in the above example, three).
These examples show how to model given SQL queries using the RDFS vocabulary hereby specified. Each example presents the source SQL code and its target RDF representation with the Turtle format. Original queries are adapted from those generated by the TPC-DC benchmark query generator [TPCDS].
This first example shows a simple query often used in SQL tutorials. It queries a table called "Cutomers" by selecting all its columns. The root blank node refers to the SELECT statement, which is described with two triples for its two clauses, the first one being the projection clause labeled as "SELECT" and the second one being the FROM one. Here the table identifier is not qualified, so its single argument is the table name.
SELECT * FROM customers;
@prefix sql: <http://ns.inria.fr/ast/sql#> . @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . [ a sql:Select ; sql:select ( [ a sql:All ] ) ; sql:from ( [ a sql:TableIdentifier ; sql:args ( [ a sql:Table ; rdfs:label "customers" ] ) ] ) ] .
This second example shows the generic use of sql:args
predicate to specify any AST node's children, and longer lists of elements. Literals used in the query are represented as (eventually abbreviated) datatyped literals, according to the mapping of SQL types with RDF literals proposed in the R2RML recommendation [R2RML]. A sequence of identical yet independant operators, such as several conjunctions of predicate expressions in a WHERE clause, is factorized into a unique blank node which gets all the conditions as arguments.
SELECT l_orderkey, sum( l_extendedprice * ( 1 - l_discount) ) AS revenue, o_orderdate, o_shippriority FROM customer, orders, lineitem WHERE c_mktsegment = 'FURNITURE' AND c_custkey = o_custkey AND l_orderkey = o_orderkey AND o_orderdate < 2013-12-21 AND l_shipdate > date 2014-01-06 GROUP BY l_orderkey, o_orderdate, o_shippriority ORDER BY revenue, o_orderdate;
@prefix sql: <http://ns.inria.fr/ast/sql#> . @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . @prefix xsd: <http://www.w3.org/2001/XMLSchema#> . [ a sql:Select ; sql:select ( [ a sql:ColumnIdentifier ; sql:args ( [ a sql:Column ; rdfs:label "L_ORDERKEY" ] ) ] [ a sql:Alias ; rdfs:label "REVENUE" ; sql:args ( [ a sql:Sum ; sql:args ( [ a sql:Times ; sql:args ( [ a sql:ColumnIdentifier ; sql:args ( [ a sql:Column ; rdfs:label "L_EXTENDEDPRICE" ] ) ] [ a sql:Minus ; sql:args ( 1 [ a sql:ColumnIdentifier ; sql:args ( [ a sql:Column ; rdfs:label "L_DISCOUNT" ] ) ] ) ] ) ] ) ] ) ] [ a sql:ColumnIdentifier ; sql:args ( [ a sql:Column ; rdfs:label "O_ORDERDATE" ] ) ] [ a sql:ColumnIdentifier ; sql:args ( [ a sql:Column ; rdfs:label "O_SHIPPRIORITY" ] ) ] ) ; sql:from ( [ a sql:TableIdentifier ; sql:args ( [ a sql:Table ; rdfs:label "CUSTOMER" ] ) ] [ a sql:TableIdentifier ; sql:args ( [ a sql:Table ; rdfs:label "ORDER" ] ) ] [ a sql:TableIdentifier ; sql:args ( [ a sql:Table ; rdfs:label "LINEITEM" ] ) ] ) ; sql:where ( [ a sql:And ; sql:args ( [ a sql:Equals ; sql:args ( [ a sql:ColumnIdentifier ; sql:args ( [ a sql:Column ; rdfs:label "C_MKTSEGMENT" ] ) ] "FURNITURE" ) ] [ a sql:Equals ; sql:args ( [ a sql:ColumnIdentifier ; sql:args ( [ a sql:Column ; rdfs:label "C_CUSTKEY" ] ) ] [ a sql:ColumnIdentifier ; sql:args ( [ a sql:Column ; rdfs:label "O_CUSTKEY" ] ) ] ) ] [ a sql:Equals ; sql:args ( [ a sql:ColumnIdentifier ; sql:args ( [ a sql:Column ; rdfs:label "L_ORDERKEY" ] ) ] [ a sql:ColumnIdentifier ; sql:args ( [ a sql:Column ; rdfs:label "O_ORDERKEY" ] ) ] ) ] [ a sql:LessThan ; sql:args ( [ a sql:ColumnIdentifier ; sql:args ( [ a sql:Column ; rdfs:label "O_ORDERDATE" ] ) ] "2013-12-21"^^xsd:date ) ] [ a sql:GreaterThan ; sql:args ( [ a sql:ColumnIdentifier ; sql:args ( [ a sql:Column ; rdfs:label "L_SHIPDATE" ] ) ] "2014-01-06"^^xsd:date ) ] ) ] ) ; sql:groupBy ( [ a sql:ColumnIdentifier ; sql:args ( [ a sql:Column ; rdfs:label "L_ORDERKEY" ] ) ] [ a sql:ColumnIdentifier ; sql:args ( [ a sql:Column ; rdfs:label "O_ORDERDATE" ] ) ] [ a sql:ColumnIdentifier ; sql:args ( [ a sql:Column ; rdfs:label "O_SHIPPRIORITY" ] ) ] ) ; sql:orderBy ( [ a sql:ColumnIdentifier ; sql:args ( [ a sql:Column ; rdfs:label "REVENUE" ] ) ] [ a sql:ColumnIdentifier ; sql:args ( [ a sql:Column ; rdfs:label "O_ORDERDATE" ] ) ] ) ] .
IRI: http://ns.inria.fr/ast/sql#AccessControlStatement
IRI: http://ns.inria.fr/ast/sql#Plus
IRI: http://ns.inria.fr/ast/sql#AggregateFunction
A SQL aggregate function returns a single value given the values of multiple rows from a column.
IRI: http://ns.inria.fr/ast/sql#AggregationExpression
A SQL aggregation expression consists in the use of an aggregate function.
IRI: http://ns.inria.fr/ast/sql#Alias
The SQL AS operator used to give an identifier to an expression.
IRI: http://ns.inria.fr/ast/sql#All
The SQL '*' wildcard operator, substitutes to all the columns of the tables in the FROM clause.
IRI: http://ns.inria.fr/ast/sql#AlterStatement
A SQL ALTER statement.
IRI: http://ns.inria.fr/ast/sql#ArithmeticOperator
A SQL arithmetic operator used to perform arithmetic calculation.
IRI: http://ns.inria.fr/ast/sql#Avg
The SQL aggregate function that returns the mean of the given expression.
IRI: http://ns.inria.fr/ast/sql#Function
A SQL built-in function that performs calculation and returns a value. May be vendor-specific.
IRI: http://ns.inria.fr/ast/sql#CallStatement
A SQL CALL statement.
IRI: http://ns.inria.fr/ast/sql#CaseExpression
A SQL case expression enables the use of the IF <condition> THEN <consequent> ELSE <alternative> logic within SQL statements.
IRI: http://ns.inria.fr/ast/sql#Cast
A SQL scalar function that converts an expression to a target data type
IRI: http://ns.inria.fr/ast/sql#CommitStatement
A SQL COMMIT statement.
IRI: http://ns.inria.fr/ast/sql#ComparisonOperator
A binary operator used to compare two values, and when evaluated returns one of the TRUE / FALSE / UNKNOWN truth values.
IRI: http://ns.inria.fr/ast/sql#Concatenation
A SQL concatenation operator that returns the combination of two strings, expressions or values.
IRI: http://ns.inria.fr/ast/sql#ConnectStatement
A SQL CONNECT statement.
IRI: http://ns.inria.fr/ast/sql#Count
The SQL aggregate function that returns the number of rows or non NULL values from the result set.
IRI: http://ns.inria.fr/ast/sql#CreateStatement
A SQL CREATE statement.
IRI: http://ns.inria.fr/ast/sql#CrossJoin
Join predicate used to perform the cross-product of two tables.
IRI: http://ns.inria.fr/ast/sql#DataDefinitionStatement
A SQL data definition statement.
IRI: http://ns.inria.fr/ast/sql#DataManipulationStatement
A SQL data manipulation statement.
IRI: http://ns.inria.fr/ast/sql#DataTypeConversionFunction
A SQL scalar function.
IRI: http://ns.inria.fr/ast/sql#CatalogObject
A database catalog object refernce, identified with delimited identifiers listed as separated arguments.
IRI: http://ns.inria.fr/ast/sql#DateTimeFunction
A SQL date time function returns a date or time value or performs a specific date or time value computation.
IRI: http://ns.inria.fr/ast/sql#DeleteStatement
A SQL DELETE statement.
IRI: http://ns.inria.fr/ast/sql#NotEquals
IRI: http://ns.inria.fr/ast/sql#Div
IRI: http://ns.inria.fr/ast/sql#DropStatement
A SQL DROP statement.
IRI: http://ns.inria.fr/ast/sql#Equals
IRI: http://ns.inria.fr/ast/sql#Except
A SQL exclusion operator used to return the result set produced by removing the result rows of the second query that appear in the result set of the first query.
IRI: http://ns.inria.fr/ast/sql#Exists
Unary predicate that returns one of the TRUE or FALSE truth values depending on whether the evaluated subquery returns an empty result set.
IRI: http://ns.inria.fr/ast/sql#Expression
A SQL expression denotes any clause that when evaluated returns values.
IRI: http://ns.inria.fr/ast/sql#FullOuterJoin
IRI: http://ns.inria.fr/ast/sql#FunctionExpression
A SQL function expressions consists in the use of any built-in SQL function.
IRI: http://ns.inria.fr/ast/sql#ASTNode
Abstract class subsumed by all AST node classes.
IRI: http://ns.inria.fr/ast/sql#Grant
A SQL GRANT statement used to give a set of privileges or roles to specified users that are actionable on specified database objects.
IRI: http://ns.inria.fr/ast/sql#GreaterThan
IRI: http://ns.inria.fr/ast/sql#GreaterEquals
IRI: http://ns.inria.fr/ast/sql#IOStatement
A SQL import or export statement.
IRI: http://ns.inria.fr/ast/sql#In
Variadic predicate used to assert whether the first provided expression's value is within the set of following provided expressions' values.
IRI: http://ns.inria.fr/ast/sql#InnerJoin
IRI: http://ns.inria.fr/ast/sql#InsertStatement
A SQL INSERT statement.
IRI: http://ns.inria.fr/ast/sql#Intersect
A SQL intersection operator used to return the common result rows of the combination of the result sets of two or more queries.
IRI: http://ns.inria.fr/ast/sql#IsNull
Unary predicate that compares the evaluated expression's value with NULL
IRI: http://ns.inria.fr/ast/sql#JoinPredicate
Binary predicate used join two or more tables.
IRI: http://ns.inria.fr/ast/sql#LeftOuterJoin
IRI: http://ns.inria.fr/ast/sql#LessThan
IRI: http://ns.inria.fr/ast/sql#LessEquals
IRI: http://ns.inria.fr/ast/sql#Like
Binary predicate that returns one of the TRUE or FALSE truth values depending on whether the string resulting in the evaluation of the first expression matches the lightweight regular expression described in the second expression.
IRI: http://ns.inria.fr/ast/sql#LockTableStatement
A SQL LOCK TABLE statement.
IRI: http://ns.inria.fr/ast/sql#And
IRI: http://ns.inria.fr/ast/sql#ObjectIdentifier
A database catalog object identifier, qualified or not.
IRI: http://ns.inria.fr/ast/sql#Or
IRI: http://ns.inria.fr/ast/sql#Not
A SQL logical negation that can be used to negate any condition.
IRI: http://ns.inria.fr/ast/sql#LogicalOperator
A SQL logical connective used to perform logical calculation.
IRI: http://ns.inria.fr/ast/sql#Max
The SQL aggregate function that returns the maximum value of the given expression.
IRI: http://ns.inria.fr/ast/sql#Min
The SQL aggregate function that returns the minimum value of the given expression.
IRI: http://ns.inria.fr/ast/sql#Mult
IRI: http://ns.inria.fr/ast/sql#Negate
A unary arithmetic operator used to produce the negative of its operand.
IRI: http://ns.inria.fr/ast/sql#Now
A SQL now function is a nullary function that returns the current timestamp.
IRI: http://ns.inria.fr/ast/sql#NumberFunction
A SQL number function that returns a numeric value and reads numeric values or string with numeric characters.
IRI: http://ns.inria.fr/ast/sql#Operator
A SQL operator used to perform arithmetic calculations, comparisons or value assignments.
IRI: http://ns.inria.fr/ast/sql#OuterJoin
IRI: http://ns.inria.fr/ast/sql#Predicate
Represents a collection of one or more expressions or subqueries, that may be combined with logical operators, and when evaluated returns one of the TRUE / FALSE / UNKNOWN truth values.
IRI: http://ns.inria.fr/ast/sql#ProceduralStatement
A SQL procedural statement.
IRI: http://ns.inria.fr/ast/sql#Between
Ternary predicate used to assert whether a provided expression's value is within the range specified by two other expressions' values.
IRI: http://ns.inria.fr/ast/sql#Index
A SQL reference or declarative statement for a catalog index.
IRI: http://ns.inria.fr/ast/sql#IndexIdentifier
A valid identifier for a catalog index.
IRI: http://ns.inria.fr/ast/sql#Procedure
A SQL reference or declarative statement for a catalog procedure.
IRI: http://ns.inria.fr/ast/sql#ProcedureIdentifier
A valid identifier for a catalog procedure.
IRI: http://ns.inria.fr/ast/sql#Role
A SQL reference or declarative statement for a catalog role.
IRI: http://ns.inria.fr/ast/sql#RoleIdentifier
A valid identifier for a catalog role.
IRI: http://ns.inria.fr/ast/sql#Schema
A SQL reference or declarative statement for a catalog schema.
IRI: http://ns.inria.fr/ast/sql#SchemaIdentifier
A valid identifier for a catalog schema.
IRI: http://ns.inria.fr/ast/sql#Sequence
A SQL reference or declarative statement for a catalog sequence.
IRI: http://ns.inria.fr/ast/sql#SequenceIdentifier
A valid identifier for a catalog sequence.
IRI: http://ns.inria.fr/ast/sql#Column
A SQL reference or declarative statement for a catalog column.
IRI: http://ns.inria.fr/ast/sql#ColumnIdentifier
A valid identifier for a catalog column.
IRI: http://ns.inria.fr/ast/sql#Table
A SQL reference or declarative statement for a catalog table.
IRI: http://ns.inria.fr/ast/sql#TableIdentifier
A valid identifier for a catalog table.
IRI: http://ns.inria.fr/ast/sql#User
A SQL reference or declarative statement for a catalog user.
IRI: http://ns.inria.fr/ast/sql#UserIdentifier
A valid identifier for a catalog user.
IRI: http://ns.inria.fr/ast/sql#View
A SQL reference or declarative statement for a catalog view.
IRI: http://ns.inria.fr/ast/sql#ViewIdentifier
A valid identifier for a catalog view.
IRI: http://ns.inria.fr/ast/sql#Revoke
A SQL REVOKE statement used to revoke a set of privileges or roles to specified users from specified database objects.
IRI: http://ns.inria.fr/ast/sql#RightOuterJoin
IRI: http://ns.inria.fr/ast/sql#RollbackStatement
A SQL ROLLBACK statement.
IRI: http://ns.inria.fr/ast/sql#ScalarFunction
A SQL scalar function returns a single value given the input values. The input value is of a SQL primitive data type.
IRI: http://ns.inria.fr/ast/sql#SelectStatement
A SQL SELECT statement.
IRI: http://ns.inria.fr/ast/sql#SessionManagementStatement
A SQL session management statement.
IRI: http://ns.inria.fr/ast/sql#SetOperator
A SQL set operator used to perform set operation on the results of two or more queries.
IRI: http://ns.inria.fr/ast/sql#SetStatement
Assigns a value to a session variable identifier.
IRI: http://ns.inria.fr/ast/sql#StdDev
The SQL aggregate function that returns the standard deviation (square root of the variance) of the given expression.
IRI: http://ns.inria.fr/ast/sql#Statement
A SQL statement.
IRI: http://ns.inria.fr/ast/sql#StringOperator
IRI: http://ns.inria.fr/ast/sql#Minus
IRI: http://ns.inria.fr/ast/sql#Sum
The SQL aggregate function that returns the sum of the given expression.
IRI: http://ns.inria.fr/ast/sql#TableFunction
A SQL function that returns a virtual table.
IRI: http://ns.inria.fr/ast/sql#TransactionManagementStatement
A SQL transaction management statement.
IRI: http://ns.inria.fr/ast/sql#Union
A SQL union operator used to return the combination of the result sets of two or more queries.
IRI: http://ns.inria.fr/ast/sql#UnionAll
A SQL union operator used to return the combination of the result sets of two or more queries, including duplicate result rows.
IRI: http://ns.inria.fr/ast/sql#UpdateStatement
A SQL UPDATE statement.
IRI: http://ns.inria.fr/ast/sql#Var
The SQL aggregate function that returns the variance of the given expression.
IRI: http://ns.inria.fr/ast/sql#args
The generic predicate used to attach an ordered list of children to any AST node.
IRI: http://ns.inria.fr/ast/sql#clause
A SQL clause is a part of a statement.
IRI: http://ns.inria.fr/ast/sql#from
The SQL clause that specifies the target sources (tables, views, subqueries) of a SELECT statement.
IRI: http://ns.inria.fr/ast/sql#groupBy
The SQL clause that specifies the columns whose values will be considered to cluster the result set of selected rows.
IRI: http://ns.inria.fr/ast/sql#having
The SQL clause that specifies the filtering predicates to apply on the clustered result set resulting of a GROUP BY clause.
IRI: http://ns.inria.fr/ast/sql#into
The SQL clause that specifies into which table the following values or select statement's result set are to be inserted into.
IRI: http://ns.inria.fr/ast/sql#limit
The SQL clause that limits the number of records from the result set that should be returned.
IRI: http://ns.inria.fr/ast/sql#orderBy
The SQL clause that specifies how to order the result set's records according to given expressions.
IRI: http://ns.inria.fr/ast/sql#select
The SQL clause that specifies the projected tables of a SELECT statement.
IRI: http://ns.inria.fr/ast/sql#values
The SQL clause that specifies a list of values or expressions evaluating to values.
IRI: http://ns.inria.fr/ast/sql#where
The SQL clause that specifies the filtering predicates to apply on the inputs of the FROM clause.
HTML layout adapted after processing of the RDFS vocabulary through LODE, Live OWL Documentation Environment.
A SQL access control statement.