Date: — Lang: — by Slatian

XPath is a way to select tags from an XML Document, it works a bit like a File-path and a bit like a CSS-Selector.

Example Document

For the Examples on this page the following XML-Document is assumed.

<?xml version="1.0" encoding="UTF-8"?>
	<title>An Overview of Dataformats.</title>
	<use>Describes Dataformats</use>
		<use>Data Serialization</use>
		<link type="website" href="https://www.json.org" />
		<link type="wikipedia" href="https://en.wikipedia.org/wiki/JSON" />
		<link type="website" href="https://www.xml.com/" />
		<link type="wikipedia" href="https://en.wikipedia.org/wiki/XML" />

Querying Nodes

XPath works like a file-path.

Example: To address the title tag (node in XPath terminology).


Note: If you are inside a context (i.e. one of the item elements) you can also use it like a relative file-path: use, ./name, ../title

For wildcards on a single level one can use a * for the tag name.

Example: Same as previous but with a wildcard.


By using // one can select all elements matching the following description no matter where in the document they are.

Example: Select all three use tags.


Note: Using the // like /catalog//use will also work.

Querying Attributes

To get the attribute of a tag use the notation /path/to/tag/@attribute.

Example: To get the href attributes:


Dealing with Multiple Elements using Predicates

When specifying an XPath that yields multiple results maybe one wants not all elements that match a tag.

Writing a number in square brackets will work similar to an array access in a programming language: tag[1]

Example: Select the second item from the catalog:


In place of the Number on can also use expressions like last(), last()-1 or position()>1.

One can also compare against attributes here.

Example: Only select the type="wikipedia" links.


Note on Quoting: XPath uses single quotes so one can easily use it in XML attributes.

Boolean Logic: Boolean logic is implemented with the and and or keywords and the not() function.

These are way more possibilities, the w3schools tutorial has a more complete predicate list.

Getting Node Text

To get the text inside a node use /text() to select it.

Example: Get the text from the first items name:


Combine multiple XPaths

Sometimes the result of one XPath isn't enough and you want the combined results of multiple XPaths. (Like a SQL union)

To achieve this you join multiple XPaths using a pipe | character like this: /xpath1 | /xpath1

XPath with Namespaces

If some of your Nodes are part of a namespace, i.e. using the <namespace:tag> syntax or the <tag xmlns="https://example.org"> attribute

Note: When using the xmlns attribute all children of that node are also in that namespace unless declared otherwise.

When a namespace is used you'll notice that your usual queries don't work for some strange reason.

Snippet of Atom-Feed serving as an example documents here.
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
    <!-- Rest of feed -->

To get the title one may want to try the XPath /feed/title, but this will fail because of the namespace.

If it is possible to declare the namespace (i.e. in an XSLT-Sheet) then do that and select the element with the namespace prefix, i.e. /atom:feed/atom:title.

If it is not possible to declare the Namespace you can work around that by using the local-name() function like this: *[local-name()='feed']/*[local-name()='title']

All of the following lead to the w3schools page.

Playing with XPath

Using xmllint

The xmllint command line utility can do XPath.

xmllint <file.xml> --xpath "<xpath>"

Note: xmllint and XPath are the jq of XML, if they seem a bit clunky that is because they have been around for a freaking long time.

Getting information from HTML

Example: Get the title of the HTML Document.

curl https://slatecave.net | \
xmllint - --html --xpath "/html/head/title/text()" 2>&-

Example: Get the social media preview description.

curl https://slatecave.net | \
xmllint - --html --xpath \
	/html/head/meta[@property='og:description']/@content |
	)" 2>&-

Explanation: Both examples use - to tell xmllint to read from its standard input (that is the curl output here), enable the HTML parser with the --html option and discard any errors from xmllint using 2>&-, which tells the shell to close the standard error.