XML XPath
XPath is a way to select tags from an XML Document, it works a bit like a File-path and a bit like a CSS-Selector.
Table of Contents
Example Document
For the Examples on this page the following XML-Document is assumed.
An Overview of Dataformats.
Describes Dataformats
JSON
Data Serialization
XML
Documents
Querying Nodes
XPath works like a file-path.
Example: To address the title
tag (node in XPath terminology).
/catalog/title
Note: If you are inside a context (i.e. one of the item
elements) you can also use it like a relative file-path: use
, ./name
, ../title
For wildcards on a single level one can use a *
for the tag name.
Example: Same as previous but with a wildcard.
/*/title
By using //
one can select all elements matching the following description no matter where in the document they are.
Example: Select all three use
tags.
//use
Note: Using the //
like /catalog//use
will also work.
Querying Attributes
To get the attribute of a tag use the notation /path/to/tag/@attribute
.
Example: To get the href
attributes:
//link/@href
Dealing with Multiple Elements using Predicates
When specifying an XPath that yields multiple results maybe one wants not all elements that match a tag.
Writing a number in square brackets will work similar to an array access in a programming language: tag[1]
Example: Select the second item
from the catalog:
/catalog/item[2]
In place of the Number on can also use expressions like last()
, last()-1
or position()>1
.
One can also compare against attributes here.
Example: Only select the type="wikipedia"
links.
//link[@type='wikipedia']
Note on Quoting: XPath uses single quotes so one can easily use it in XML attributes.
Boolean Logic: Boolean logic is implemented with the and
and or
keywords and the not()
function.
These are way more possibilities, the w3schools tutorial has a more complete predicate list.
Getting Node Text
To get the text inside a node use /text()
to select it.
Example: Get the text from the first items name:
//item[1]/name/text()
Getting Attribute Text
To get the text from an attribute one can prefix the attribute name with an @
.
Example: To get all URLs from the link elements from the example document.
//link/@href
Note: This will select the attribute as a key-value pair in some contexts. To only get the value wrap it like this: concat('',//link/@href)
(only keeps first element) or string-join(//link/@href,' ')
(keeps all elements but requires XPath 2 support).
xquilla
to output all link URLs from the example document using newlines as delimiters. The printf
is used to convert the \n
to a real newline.
|
Combine multiple XPaths
Sometimes the result of one XPath isn't enough and you want the combined results of multiple XPaths. (Like a SQL union)
To achieve this you join multiple XPaths using a pipe |
character like this: /xpath1 | /xpath1
XPath with Namespaces
If some of your Nodes are part of a namespace, i.e. using the <namespace:tag>
syntax or the <tag xmlns="https://example.org">
attribute
Note: When using the xmlns
attribute all children of that node are also in that namespace unless declared otherwise.
When a namespace is used you'll notice that your usual queries don't work for some strange reason.
slatecave.net
<!-- Rest of feed -->
To get the title one may want to try the XPath /feed/title
, but this will fail because of the namespace.
If it is possible to declare the namespace (i.e. in an XSLT-Sheet) then do that and select the element with the namespace prefix, i.e. /atom:feed/atom:title
.
If it is not possible to declare the Namespace you can work around that by using the local-name()
function like this: *[local-name()='feed']/*[local-name()='title']
Useful Links
All of the following lead to the w3schools page.
Note: There is also the related RFC 9535: JSONPath: Query Expressions for JSON that might be interesting.
Playing with XPath
Using xmllint
The xmllint
command line utility can do XPath.
Note: xmllint
and XPath are the jq
of XML, if they seem a bit clunky that is because they have been around for a freaking long time.
Compatibility: xmllint
only works with XPath 1.0, if you get an error stating that a function is unregistred, thats why. It mostly affects functions that somehow involve lists.
Using xquilla
The xquilla
command line tool supports XPath 2.0 and a whole lot of other XML functionality. It usually reads commands from a file and applies it to an XML-Document.
|
Getting information from HTML
Example: Get the title of the HTML Document.
| \
Example: Get the social media preview description.
| \
Explanation: Both examples use -
to tell xmllint
to read from its standard input (that is the curl
output here), enable the HTML parser with the --html
option and discard any errors from xmllint
using 2>&-
, which tells the shell to close the standard error.