RDF Binary using Apache Thrift

RDF Binary using Apache Thrift (“RDF Thrift”) is a binary format for RDF and RDF-related data.

The W3C standard RDF syntaxes are text or XML based. These incur costs in parsing; the most human-readable formats also incur high costs to write, and have limited scalability due to the need to analyse the data for pretty printing rather than simply stream to output.

N-Triples or N-Quads are often used for datastore dump formats and for publishing large datasets because they have been found to be the best formats to read and write for this usages. Yet these are still text formats.

SPARQL result sets have a number of related syntaxes, based on using JSON, using XML, or using TSV or CSV. Again, these are text-based formats.

Binary formats are faster to process - they do not incur the parsing costs of text-base formats. “RDF Thrift” defines basic encoding for RDF terms then builds data formats for RDF graphs, RDF datasets and for SPARQL result sets. This gives a basis for high-performance linked data systems.

Apache Thrift provides a efficient, wide-used binary encoding layer with a large number of language bindings. (Thrift also provides a cross-language, service interaction model on top of this encoding layer - this is not used here.)

Document license: Apache License .