RDF and IRI Syntax
RDF uses IRIs to refer to resources.
This page pulls together the syntax requirements for IRIs into one place. There are several documents, mostly RFCs, that define IRIs and specific IRI schemes. Getting IRIs right means data is more likely to be readable by other systems when the data is published.
Just because an IRI passes all the syntax rules, it does not make it a good choice.
URI schemes can add constraints on the URI syntax; for example, RFC 7230 defines the http and https shemes.
This article introduces the terminology ‘RDF Reference’ to put all the implications into one definition.
- “file” URI scheme
- RDF References
This article will use URI and IRI interchangeably. “IRIs are a generalization of URIs that permits a wider range of Unicode characters.”
The RDF Concepts document says:
“IRIs in the RDF abstract syntax MUST be absolute, and MAY contain a fragment identifier.”
“Relative IRIs must be resolved against a base IRI to make them absolute.”
As of RFC 3986, relative IRIs are called “relative references”.
Definition of URI syntax
In RFC3986 section-4.1 defines “URI”.
A “URI” is URI-reference after it has been resolved.
The relevant part of the grammar in RFC3986 appendix-A is:
absolute-URI = scheme ":" hier-part [ "?" query ] URI = scheme ":" hier-part [ "?" query ] [ "#" fragment ] hier-part = "//" authority path-abempty / path-absolute / path-rootless / path-empty
We’ve already got to one important point - an absolute URI. An absolute URI has a URI scheme, and does not have a fragment.
An absolute URI with a fragment is just “URI”.
What we want is the full URI rule and also require it uses the “// authority” rule if a scheme involves that component.
Specifc URI schemes can add additional requirments.
HTTP (RFC 7230)
requires an http URI to have the part
the urn scheme does not have an authority.
URN (RFC 8141) does not have an authority
part - it uses the
path-rootless production. It does additionally require the
path to have two colons, the NID part must be at
least two characters, and the NSS part at least one character.
RDF syntax may use relative references. The process of parsing of a document means that any relative reference is converted to an URI to ensure it identifies the same resource everywhere. This is called resolving against a base URI - there is always a base.
Relative references are short cuts, for the full URI after resolving against the base.
By the time they get to RDF abstrat syntax (the datastructure), relative references should have been converted.
By the definitions in RFC 3986:
are absolute URIs - it has a URI scheme, it does not have a fragment.
With non-strict resolution, and an valid HTTP base URI, the examples won’t appear in RDF abstract syntax.
Notes about IRIs/URIs
- Space (U+0200) is not legal in an IRI.
:are legal in the path, query and fragment components.
}are not legal in IRIs.
]are only legal as IPv6 address delimiters.
- Encoding with “%hex” is not an escape mechanism.
%20puts three characters into the IRI:
Normalization of the syntax (RFC 3998, section 6.2.2 gives some simple rules to make it easier to compare URIs as strings.
By the URI syntax the two characters after
must be legal hex digits (
%ST is a syntax error).
Normalization prefers the letters “A-F” to be uppercase.
http-URI = "http:" "//" authority path-abempty [ "?" query ] [ "#" fragment ]
so the http URI scheme adds a requirement that there must be a “//” and authority (host and port), followed by an absolute path (starts with “/”) or is absent (empty string, no “/”).
assigned-name = "urn" ":" NID ":" NSS NID = (alphanum) 0*30(ldh) (alphanum) ldh = alphanum / "-" NSS = pchar *(pchar / "/")
The older RFC 2141 allowed “X-…” as NID.
While a URN allows
f-component, the latter
being a URI fragment, usually it is just the
assigned-name form used for
The urn scheme only applies to ASCII, not the addition characters of IRIs.
An NID must be at least 2 characters, and the first and last must be alphanumeric.
A NSS must be at least one character.
It is not uncommon to see
urn:x:... in test data - unfortunately, that isn’t a
legal URI in the urn scheme because the “x” is the NID part and is too short.
By the general URI syntax, the URI path component is the “NID:NSS” part of the URN.
The correct way to use UUIDs
Hex digits should be lower case.
There was a proposal for a “uuid” scheme, it is is sometimes seen but it was only ever a draft.
“file” URI scheme
The file URI scheme had for a long time been only loosely defined in RFC 1738 section 3.10. Common usage was beyond the definition; character set issues were unclear.
RFC 8089 is a formal defintion compatible
with the URI syntax for RFC 3986. It includes common usage such as relative
filenames (relative to their point of use), for example
file:directory/file.txt, can be used.
While for RDF the file schema is of limited use, the file URI scheme is useful
when working with local files, for example using RDF for configuration files on the
local machine. Such URIs will be of the form
file:/// (that is, 3 ‘/’) using
the file scheme convention that “localhost” can be dropped.
It woudl be useful to pull all these considerations together into a distinct piece of terminolgy
- It is a URI
That means after resolving against a base.
It has a scheme name. It may have query and fragment parts. There is always a “path” even if it is the empty string.
It follows the additional restrictions of the URI scheme.
This can be tricky for the parser to check if it does not know the scheme, but when generating URIs, the software generating a URI should follow the URI scheme.
The scheme-specific rules for http, https and urn schemes are required:
- If ‘http:’ it follows the HTTP scheme rule:
- If ‘urn:’, it matches the requirment for “urn:2+chars:1+char”
- If ‘http:’ it follows the HTTP scheme rule:
Hex in %-encoding SHOULD be uppercase.
In Turtle related synatxes, there are two places where “partial URIs” are used.
PREFIX u: <urn:uuid:> u:66d5b9e2-5abe-49be-bfc9-1ed0d997e07f
urn:uuid: is not a legal URN.
BASE <urn:> ## Resolves to urn:uuid:66d5... <uuid:66d5b9e2-5abe-49be-bfc9-1ed0d997e07f>
urn: is not a legal URNs.
Links to relevant documents: