XQuery: An XML Query Language


1 Introduction

As increasing amounts of information are stored, exchanged, and presented using XML, the ability to intelligently query XML data sources becomes increasingly important.

XQuery is derived from an XML query language called Quilt. XQuery is designed to meet the requirements identified by the W3C XML Query Working Group. It is designed to be a small, easily implementable language in which queries are concise and easily understood. It is also flexible enough to query a broad spectrum of XML information sources, including both databases and documents.

2 The XQuery Language

XQuery is a functional language in which a query is represented as an expression. XQuery supports several kinds of expressions, and the structure and appearance of a query may differ significantly depending on which kinds of expressions are used.

The principal forms of XQuery expressions are as follows:

  1. Path expressions
  2. Element constructors
  3. FLWR expressions
  4. Expressions involving operators and functions
  5. Conditional expressions
  6. Quantified expressions
  7. Expressions that test or modify datatypes

In XQuery, keywords (such as FOR and LET) are case-insensitive, whereas identifiers are case-sensitive. A query may contain a comment, which is ignored during query processing. The beginning delimiter of a comment is a pound symbol ("#") and the ending delimiter is a newline character, as illustrated below.

# This is a comment

2.1 Path Expressions

One of the forms of an XQuery expression is a path expression, based on the syntax of XPath. XPath is a notation for navigating along "paths" in an XML document.

A path expression can begin with an expression that identifies a specific node or sequence of nodes in a document. For example, the function document(string) returns the root node of a named document. A path expression can also begin with “/” or  "//" which represents an implicit root node, determined by the environment in which the path expression is executed. The execution environment of a path expression also defines a "context node," which can be referenced by dot (".") inside the path expression. The result of a path expression is a sequence of nodes or primitive values. The nodes in a path expression result are ordered according to their position in the original hierarchy, in document order. 

2.2 Element Constructors

In addition to searching for elements in existing documents, a query often needs to generate new elements. The simplest way to generate a new element is to embed the element directly in a query using XML notation. This type of an XQuery expression is called an element constructor. By adopting XML notation for element constructors, XQuery allows literal XML fragments to be "pasted" into queries.

Generate an <emp> element that has an "empid" attribute and nested <name> and <job> elements.

<emp empid = "10">

Often the content of an element or the value of an attribute needs to be computed by some expression. An XQuery expression that is used inside an element constructor is enclosed in curly braces to indicate that the expression is to be evaluated rather than treated as text. In the following example, attribute values and element contents are specified in the form of variables named $id, $name, and $job.

Generate an <emp> element that has an "empid" attribute. The value of the attribute and the content of the element are specified by variables that are bound in other parts of the query.

<emp empid = {$id}>

2.3 FLWR Expressions

A FLWR (pronounced "flower") expression is constructed from FOR, LET, WHERE, and RETURN clauses, which must appear in a specific order. A FLWR expression binds values to one or more variables and then uses these variables to construct a result.

A FOR-clause is used whenever iteration is needed. The result of the FOR-clause is a sequence of tuples, each of which contains a binding for each of the variables in the FOR-clause. A LET-clause is also used to bind one or more variables to one or more expressions. Unlike a FOR-clause, however, a LET-clause simply binds each variable to the value of its respective expression without iteration. A FLWR expression may contain several FOR and LET-clauses. The result of the FOR and LET clauses is an ordered sequence of tuples of bound variables.  A FLWR expression that contains no FOR-clauses generates  exactly one binding-tuple. The order of the tuples generated by the FOR and LET clauses is determined by the order in which values are returned by the FOR-clause expressions.

 The binding-tuples generated by the FOR and LET clauses are subject to further filtering by an optional WHERE-clause. Only those tuples for which the condition in the WHERE-clause is true are used to invoke the RETURN clause. The WHERE-clause may contain several predicates, connected by AND and OR.

Consider some examples of FLWR expressions based on a document named "booklist.xml" that contains a sequence of <book> elements. Each <book> element, in turn, contains a <title> element, one or more <author> elements, a <publisher> element, a <year> element, and a <price> element.

To List the titles of books published by Pearson in 2006,

FOR $a IN document("booklist.xml")//book
WHERE $a/publisher = "Pearson"
AND $a/year = "2006"
RETURN $a/title 

 2.4 Sorting

It is sometimes necessary to control the order of elements in a sequence. A sequence can be ordered by means of a SORTBY clause that contains one or more "ordering expressions." Each ordering expression can be followed by the word ASCENDING or DESCENDING, which specifies the direction of the sort (ASCENDING is the default)
List all books with price greater than $100, in order by first author; within each group of books with the same first author, list the books in order by title.

document("bib.xml")//book[price > 100] SORTBY (author[1], title)

2.5 Operators in Expressions
Like most query languages, XQuery provides a variety of operators that can be used in expressions, and allows parenthesized expressions to serve as operands.

2.5.1 Arithmetic operators




Expr ("+" | "-") Expr



Expr ("*" | "div" | "mod") Expr



("-" | "+") Expr

XQuery provides the usual arithmetic operators for addition, subtraction, multiplication, division, and modulus, in the usual binary and unary forms and with the usual meanings When one or more operands is a node, the content of the node is extracted by an implicit call to the data function and converted to a number before the operation is performed; if this conversion is not possible, an error results.

2.5.2 Comparison operators




Expr ("=" | "! =" | "= =" | "!= =") Expr



Expr ("<" | "<=" | ">" | ">=") Expr

XQuery supports several comparison operators, each of which takes two operands and returns a Boolean result.

2.5.3 Logical operators




Expr "or" Expr



Expr "and" Expr

The AND and OR operators of XQuery take two Boolean operands and return a Boolean result. Unlike many query languages, XQuery does not support a logical NOT operator. However, it does provide a function not() that takes a Boolean value as its argument and returns the logical negation of the argument.

2.5.4 Sequence-related Operators




"(" ExprSequence? ")"



Expr ("," Expr)*



Expr "to" Expr



Expr ("union" | "|") Expr



Expr ("intersect" | "except") Expr



Expr ("before" | "after") Expr



The basic XQuery operator for forming sequences is the comma operator (","). The comma operator can be applied to any two expressions to combine them into a sequence. The comma operator is also used to separate the arguments of a function call, parentheses may be needed when a sequence is used as the argument of a function, as illustrated

f(1, 2, 3)

Denotes a function call   with three scalar arguments.

f((1, 2), 3)

Denotes a function call with   two arguments, the first of which  is a sequence of two values.

Another way to generate a sequence is by means of the TO operator. TO is a binary operator that converts both of its operands to integers. It then generates a sequence containing all the integers from the left-hand operand to the right-hand operand, inclusive. For example, the expression 12 TO 8 is equivalent to the expression 12, 11, 10, 9, 8.
The operators UNION, INTERSECT, and EXCEPT can be used to combine node sequences to form new node sequences. UNION (equivalent to "|") returns a sequence containing those nodes that are members of either the left-hand or the right-hand operand. INTERSECT returns a sequence containing those nodes that are members of both the left-hand and right-hand operands. EXCEPT returns a sequence containing those nodes that are members of the left-hand but not the right-hand operand. The result of UNION, INTERSECT, or EXCEPT contains no duplicate nodes,

2.6 Conditional Expressions




"if" "(" Expr ")" "then" Expr "else" Expr

A conditional expression evaluates a test expression and then returns one of two result expressions. If the value of the test expression is True, the value of the first result expression is returned; otherwise, the value of the second result expression is returned.

As an example of a conditional expression,Make a list of holdings, ordered by title.For journals, include the editor, and for all other holdings, include the author.

FOR $h IN //holding
       IF ($h/@type = "Journal")
       THEN $h/editor
       ELSE $h/author
SORTBY (title)

2.7 Quantified Expressions

Occasionally it is necessary to test for existence of some element that satisfies a condition, or to determine whether all elements in some collection satisfy a condition. For this purpose, XQuery provides two forms of expression called the "some" expression and the "every" expression. These forms of expression are also known as quantified expressions. The "some" expression uses an existential quantifier, and the "every" expression uses a universal quantifier.




"some" Variable "in" Expr "satisfies" Expr



"every" Variable "in" Expr "satisfies" Expr

The value of a "some" expression is always True or False. To find titles of books in which both sailing and windsurfing are mentioned in the same paragraph.

FOR $b IN //book
   (contains($p, "sailing") AND contains($p, "windsurfing"))
RETURN $b/title

2.8 Datatypes

XQuery has a type system that is based on XML Schema. By using the datatype names defined in the namespace http://www.w3.org/2001/XMLSchema (hereafter abbreviated as xsd), all the primitive and derived datatypes of XML Schema can be used in queries.

In XQuery, type names appear in function declarations where they specify the types of the function parameters and result. Certain XML Schema datatypes have literal forms that are recognized by XQuery, as illustrated by the following examples:


Example of literal




47, -369





2.9 Functions

XQuery provides a core library of built-in functions. The XQuery core function library contains aggregation functions such as avg, sum, count, max, and min, and many other useful functions. For example, the distinct function eliminates duplicate nodes from a sequence, and the empty function returns True if and only if its argument is an empty sequence.

In addition to the core functions, XQuery allows users to define functions of their own. A function definition specifies the name of the function, the names and datatypes of the parameters, and the datatype of the result. A function definition also provides an expression (called the "function body") that defines how the result of the function is computed from its parameters.




"define" "function" QName "(" ParamList? ")"

("returns" Datatype)? EnclosedExpr



Param ("," Param)*



Datatype? Variable



QName "(" (Expr ("," Expr)*)? ")"

If a function parameter is declared using a name but no type, it is considered to have the default type "any node." If the RETURNS clause is omitted from a function definition, the result-type of the function is considered to be "any sequence of nodes."

XQuery Version 1 does not allow user-defined functions to be overloaded. Although XQuery does not allow overloading of user-defined functions, some of the built-in functions in the XQuery core library are overloaded--for example, the string function of XPath can convert an instance of almost any type into a string, and it can be invoked with either one argument or zero arguments.

A function may be defined recursively--that is, it may reference its own definition. The next query contains an example of a recursive function that computes the depth of an element hierarchy. In its definition, the user-defined function depth calls the built-in functions empty and max.

 Find the maximum depth of the document named "partlist.xml."

NAMESPACE xsd = "http://www.w3.org/2001/XMLSchema"
DEFINE FUNCTION depth($e) RETURNS xsd:integer
   # An empty element has depth 1
   # Otherwise, add 1 to max depth of children
   IF (empty($e/*)) THEN 1
   ELSE max(depth($e/*)) + 1


3 Conclusion

With the emergence of XML, the distinctions among various forms of information, such as documents and databases, are quickly disappearing. XQuery is designed to support queries against a broad spectrum of information sources. The versatility of XQuery will help XML to realize its potential as a universal medium for data interchange.


For further clarification you can mail us to