Gravsearch: Virtual Graph Search

Basic Concept

Gravsearch is intended to offer the advantages of SPARQL endpoints (particularly the ability to perform queries using complex search criteria) while avoiding their drawbacks in terms of performance and security (see The Enduring Myth of the SPARQL Endpoint). It also has the benefit of enabling clients to work with a simpler RDF data model than the one the API actually uses to store data in the triplestore and makes it possible to provide better error-checking.

Rather than being processed directly by the triplestore, a Gravsearch query is interpreted by the API, which enforces certain restrictions on the query, and implements paging and permission checking. The API server generates SPARQL based on the Gravsearch query submitted, queries the triplestore, filters the results according to the user's permissions, and returns each page of query results as an API response. Thus, Gravsearch is a hybrid between a RESTful API and a SPARQL endpoint.

A Gravsearch query conforms to a subset of the syntax of a SPARQL CONSTRUCT query, with some additional restrictions and functionality. In particular, the variable representing the top-level (or 'main') resource that will appear in each search result must be identified, statements must be included to specify the types of the entities being queried, OFFSET is used to control paging, and ORDER BY is used to sort the results.

It is certainly possible to write Gravsearch queries by hand, but we expect that in general, they will be automatically generated by client software, e.g. by a client user interface.

For a more detailed overview of Gravsearch, see Gravsearch: Transforming SPARQL to query humanities data.

Submitting Gravsearch Queries

The recommended way to submit a Gravsearch query is via HTTP POST:

HTTP POST to http://host/v2/searchextended

This works like query via POST directly in the SPARQL 1.1 Protocol: the query is sent unencoded as the HTTP request message body, in the UTF-8 charset.

It is also possible to submit a Gravsearch query using HTTP GET. The entire query must be URL-encoded and included as the last element of the URL path:

HTTP GET to http://host/v2/searchextended/QUERY

The response to a Gravsearch query is an RDF graph, which can be requested in various formats (see Responses Describing Resources).

To request the number of results rather than the results themselves, you can do a count query:

HTTP POST to http://host/v2/searchextended/count

The response to a count query request is an object with one predicate, http://schema.org/numberOfItems, with an integer value.

If a gravsearch query times out, a 504 Gateway Timeout will be returned.

Gravsearch and API Schemas

A Gravsearch query can be written in either of the two DSP-API v2 schemas. The simple schema is easier to work with, and is sufficient if you don't need to query anything below the level of a DSP-API value. If your query needs to refer to standoff markup, you must use the complex schema. Each query must use a single schema, with one exception (see Date Comparisons).

Gravsearch query results can be requested in the simple or complex schema; see API Schema.

All examples hereafter run with the DSP stack started locally. If you access another stack, you can check the IRI of the ontology you are targeting by requesting the ontologies metadata.

Using the Simple Schema

To write a query in the simple schema, use the knora-api ontology in the simple schema, and use the simple schema for any other DSP ontologies the query refers to, e.g.:

PREFIX knora-api: <http://api.knora.org/ontology/knora-api/simple/v2#>
PREFIX incunabula: <http://0.0.0.0:3333/ontology/0803/incunabula/simple/v2#>

In the simple schema, DSP-API values are represented as literals, which can be used in FILTER expressions (see Filtering on Values in the Simple Schema).

Using the Complex Schema

To write a query in the complex schema, use the knora-api ontology in the complex schema, and use the complex schema for any other DSP ontologies the query refers to, e.g.:

PREFIX knora-api: <http://api.knora.org/ontology/knora-api/v2#>
PREFIX incunabula: <http://0.0.0.0:3333/ontology/0803/incunabula/v2#>

In the complex schema, DSP-API values are represented as objects belonging to subclasses of knora-api:Value, e.g. knora-api:TextValue, and have predicates of their own, which can be used in FILTER expressions (see Filtering on Values in the Complex Schema).

Main and Dependent Resources

The main resource is the top-level resource in a search result. Other resources that are in some way connected to the main resource are referred to as dependent resources. If the client asks for a resource A relating to a resource B, then all matches for A will be presented as main resources and those for B as dependent resources. The main resource must be represented by a variable, marked with knora-api:isMainResource, as explained under CONSTRUCT Clause.

Virtual incoming Links

Depending on the ontology design, a resource A points to B or vice versa. For example, a page A is part of a book B using the property incunabula:partOf. If A is marked as the main resource, then B is nested as a dependent resource in its link value incunabula:partOfValue. But in case B is marked as the main resource, B does not have a link value pointing to A because in fact B is pointed to by A. Instead, B has a virtual property knora-api:hasIncomingLink containing A's link value:

"knora-api:hasIncomingLinkValue" : {
    "@id" : "http://rdfh.ch/A/values/xy",
    "@type" : "knora-api:LinkValue",
    "knora-api:linkValueHasSource" : {
      "@id" : "http://rdfh.ch/A",
      "@type" : "incunabula:page",
      "incunabula:partOfValue" : {
        "@id" : "http://rdfh.ch/A/values/xy",
        "@type" : "knora-api:LinkValue",
        "knora-api:linkValueHasTargetIri" : {
          "@id" : "http://rdfh.ch/B"
        }
      }
    }
  },

Note that the virtually inserted link value inverts the relation by using knora-api:linkValueHasSource. The source of the link is A and its target B is only represented by an IRI (knora-api:linkValueHasTargetIri) since B is the main resource.

Dedicated endpoint for querying the incoming links

Internal use only

This endpoint is intended for internal use only and may be subject to change.

The dedicated endpoint for querying the incoming links was introduced in DSP-API version v31.10.0.

HTTP GET to http://host/v2/searchIncomingLinks/[resourceIri]?offset=[pageNumber]

The route have two parameters, the resource IRI and the page number which translates to the SPARQL OFFSET. The default value for the page numbers is 0. The entire query must be URL-encoded and included as the last element of the URL path. Here is an example of the request for the resource http://rdfh.ch/0001/a-thing-picture with the page number 1:

http://host/v2/searchIncomingLinks/http%3A%2F%2Frdfh.ch%2F0001%2Fa-thing-picture?offset=1

Graph Patterns and Result Graphs

The WHERE clause of a Gravsearch query specifies a graph pattern. Each query result will match this graph pattern, and will have the form of a graph whose starting point is a main resource. The query's graph pattern, and hence each query result graph, can span zero more levels of relations between resources. For example, a query could request regions in images on pages of books written by a certain author, articles by authors who were students of a particular professor, or authors of texts that refer to events that took place within a certain date range.

Permission Checking

Each matching resource is returned with the values that the user has permission to see. If the user does not have permission to see a matching main resource, it is hidden in the results. If a user does not have permission to see a matching dependent resource, the link value is hidden.

Paging

Gravsearch results are returned in pages. The maximum number of main resources per page is determined by the API (and can be configured in application.conf via the setting app/v2/resources-sequence/results-per-page). If some resources have been filtered out because the user does not have permission to see them, a page could contain fewer results, or no results. If it is possible that more results are available in subsequent pages, the Gravsearch response will contain the predicate knora-api:mayHaveMoreResults with the boolean value true, otherwise it will not contain this predicate. Therefore, to retrieve all available results, the client must request each page one at a time, until the response does not contain knora-api:mayHaveMoreResults.

Inference

Gravsearch queries are understood to imply a subset of RDFS reasoning. This is done by the API by expanding the incoming query.

Specifically, if a statement pattern specifies a property, the pattern will also match subproperties of that property, and if a statement specifies that a subject has a particular rdf:type, the statement will also match subjects belonging to subclasses of that type.

If you know that reasoning will not return any additional results for your query, you can disable it by adding this line to the WHERE clause, which may improve query performance:

knora-api:GravsearchOptions knora-api:useInference false .

Gravsearch Syntax

Every Gravsearch query is a valid SPARQL 1.1 CONSTRUCT query. However, Gravsearch only supports a subset of the elements that can be used in a SPARQL Construct query, and a Gravsearch CONSTRUCT Clause has to indicate which variable is to be used for the main resource in each search result.

Supported SPARQL Syntax

The current version of Gravsearch accepts CONSTRUCT queries whose WHERE clauses use the following patterns, with the specified restrictions:

OPTIONAL: cannot be nested in a UNION.
UNION: cannot be nested in a UNION.
FILTER: may contain a complex expression using the Boolean operators AND and OR, as well as comparison operators. The left argument of a comparison operator must be a query variable. A Knora ontology entity IRI used in a FILTER must be a property IRI.
FILTER NOT EXISTS
MINUS
OFFSET: the OFFSET is needed for paging. It does not actually refer to the number of triples to be returned, but to the requested page of results. The default value is 0, which refers to the first page of results.
ORDER BY: In SPARQL, the result of a CONSTRUCT query is an unordered set of triples. However, a Gravsearch query returns an ordered list of resources, which can be ordered by the values of specified properties. If the query is written in the complex schema, items below the level of DSP-API values may not be used in ORDER BY.
BIND: The value assigned must be a DSP resource IRI.

Resources, Properties, and Values

Resources can be represented either by an IRI or by a variable, except for the main resource, which must be represented by a variable.

It is possible to do a Gravsearch query in which the IRI of the main resource is already known, e.g. to request specific information about that resource and perhaps about linked resources. In this case, the IRI of the main resource must be assigned to a variable using BIND. Note that BIND statements slow the query down, therefore we recommend that you do not use them unless you have to.

Properties can be represented by an IRI or a query variable. If a property is represented by a query variable, it can be restricted to certain property IRIs using a FILTER.

A Knora value (i.e. a value attached to a knora-api:Resource) must be represented as a query variable.

Filtering on Values

Filtering on Values in the Simple Schema

In the simple schema, a variable representing a DSP-API value can be used directly in a FILTER expression. For example:

?book incunabula:title ?title .
FILTER(?title = "Zeitglöcklein des Lebens und Leidens Christi")

Here the type of ?title is xsd:string.

The following value types can be compared with literals in FILTER expressions in the simple schema:

Text values (xsd:string)
URI values (xsd:anyURI)
Integer values (xsd:integer)
Decimal values (xsd:decimal)
Boolean values (xsd:boolean)
Date values (knora-api:Date)
List values (knora-api:ListNode)

List values can only be searched for using the equal operator (=), performing an exact match on a list node's label. Labels can be given in different languages for a specific list node. If one of the given list node labels matches, it is considered a match. Note that in the simple schema, uniqueness is not guaranteed (as opposed to the complex schema).

A DSP-API value may not be represented as the literal object of a predicate; for example, this is not allowed:

?book incunabula:title "Zeitglöcklein des Lebens und Leidens Christi" .

Filtering on Values in the Complex Schema

In the complex schema, variables representing DSP-API values are not literals. You must add something to the query (generally a statement) to get a literal from a DSP-API value. For example:

?book incunabula:title ?title .
?title knora-api:valueAsString "Zeitglöcklein des Lebens und Leidens Christi" .

Here the type of ?title is knora-api:TextValue. Note that no FILTER is needed in this example. But if you want to use a different comparison operator, you need a FILTER:

?page incunabula:seqnum ?seqnum .
?seqnum knora-api:intValueAsInt ?seqnumInt .
FILTER(?seqnumInt <= 10)

To match a date value in the complex schema, you must use the knora-api:toSimpleDate function in a FILTER (see Date Comparisons). The predicates of knora-api:DateValue (knora-api:dateValueHasStartYear, etc.) are not available in Gravsearch.

Date Comparisons

In the simple schema, you can compare a date value directly with a knora-api:Date in a FILTER:

?book incunabula:pubdate ?pubdate .
FILTER(?pubdate < "JULIAN:1497"^^knora-api:Date)

In the complex schema, you must use the function knora-api:toSimpleDate, passing it the variable representing the date value. The date literal used in the comparison must still be a knora-api:Date in the simple schema. This is the only case in which you can use both schemas in a single query:

PREFIX incunabula: <http://0.0.0.0:3333/ontology/0803/incunabula/v2#>
PREFIX knora-api: <http://api.knora.org/ontology/knora-api/v2#>
PREFIX knora-api-simple: <http://api.knora.org/ontology/knora-api/simple/v2#>

CONSTRUCT {
    ?book knora-api:isMainResource true .
    ?book incunabula:pubdate ?pubdate .
} WHERE {
    ?book a incunabula:book .
    ?book incunabula:pubdate ?pubdate .
    FILTER(knora-api:toSimpleDate(?pubdate) < "JULIAN:1497"^^knora-api-simple:Date)
} ORDER BY ?pubdate

You can also use knora-api:toSimpleDate with to search for date tags in standoff text markup (see Matching Standoff Dates).

Note that the given date value for comparison must have the following format:

(GREGORIAN|JULIAN|ISLAMIC):\d{1,4}(-\d{1,2}(-\d{1,2})?)?( BC| AD| BCE| CE)?(:\d{1,4}(-\d{1,2}(-\d{1,2})?)?( BC| AD| BCE| CE)?)?

E.g. an exact date like GREGORIAN:2015-12-03 or a period like GREGORIAN:2015-12-03:2015-12-04. Dates may also have month or year precision, e.g. ISLAMIC:1407-02 (the whole month of december) or JULIAN:1330 (the whole year 1330). An optional ERA indicator term (BCE, CE, or BC, AD) can be added to the date, when no era is provided the default era AD will be considered. Era can be given as GREGORIAN:1220 BC or in range as GREGORIAN:600 BC:480 BC.

Searching for Matching Words

The function knora-api:matchText searches for matching words anywhere in a text value and is implemented using a full-text search index if available. The first argument must represent a text value (a knore-api:TextValue in the complex schema, or an xsd:string in the simple schema). The second argument is a string literal containing the words to be matched, separated by spaces. The function supports the Lucene Query Parser syntax. Note that Lucene's default operator is a logical OR when submitting several search terms.

This function can only be used as the top-level expression in a FILTER.

For example, to search for titles that contain the words 'Zeitglöcklein' and 'Lebens':

?book incunabule:title ?title .
FILTER knora-api:matchText(?title, "Zeitglöcklein Lebens")

Filtering Text by Language

To filter a text value by language in the simple schema, use the SPARQL lang function on the text value, e.g.:

FILTER(lang(?text) = "fr")

In the complex schema, the lang function is not supported. Use the text value's knora-api:textValueHasLanguage predicate instead:

?text knora-api:textValueHasLanguage "fr" .

Regular Expressions

The SPARQL regex function is supported. In the simple schema, you can use it directly on the text value, e.g.

?book incunabula:title ?title .
FILTER regex(?title, "Zeit", "i")

In the complex schema, use it on the object of the text value's knora-api:valueAsString predicate:

?book incunabula:title ?title .
?title knora-api:valueAsString ?titleStr .
FILTER regex(?titleStr, "Zeit", "i")

Searching for Text Markup

To refer to standoff markup in text values, you must write your query in the complex schema.

A knora-api:TextValue can have the property knora-api:textValueHasStandoff, whose objects are the standoff markup tags in the text. You can match the tags you're interested in using rdf:type or other properties of each tag.

Matching Text in a Standoff Tag

The function knora-api:matchTextInStandoff searches for standoff tags containing certain terms. The implementation is optimised using the full-text search index if available. The function takes three arguments:

A variable representing a text value.
A variable representing a standoff tag.
A string literal containing space-separated search terms.

This function can only be used as the top-level expression in a FILTER. For example:

PREFIX knora-api: <http://api.knora.org/ontology/knora-api/v2#>
PREFIX standoff: <http://api.knora.org/ontology/standoff/v2#>
PREFIX beol: <http://0.0.0.0:3333/ontology/0801/beol/v2#>

CONSTRUCT {
    ?letter knora-api:isMainResource true .
    ?letter beol:hasText ?text .
} WHERE {
    ?letter a beol:letter .
    ?letter beol:hasText ?text .
    ?text knora-api:textValueHasStandoff ?standoffParagraphTag .
    ?standoffParagraphTag a standoff:StandoffParagraphTag .
    FILTER knora-api:matchTextInStandoff(?text, ?standoffParagraphTag, "Grund Richtigkeit")
}

Here we are looking for letters containing the words "Grund" and "Richtigkeit" within a single paragraph.

Matching Standoff Links

If you are only interested in specifying that a resource has some text value containing a standoff link to another resource, the most efficient way is to use the property knora-api:hasStandoffLinkTo, whose subjects and objects are resources. This property is automatically maintained by the API. For example:

PREFIX knora-api: <http://api.knora.org/ontology/knora-api/v2#>
PREFIX beol: <http://0.0.0.0:3333/ontology/0801/beol/v2#>

CONSTRUCT {
    ?letter knora-api:isMainResource true .
    ?letter beol:hasText ?text .
} WHERE {
    ?letter a beol:letter .
    ?letter beol:hasText ?text .
    ?letter knora-api:hasStandoffLinkTo ?person .
    ?person a beol:person .
    ?person beol:hasIAFIdentifier ?iafIdentifier .
    ?iafIdentifier knora-api:valueAsString "(VIAF)271899510" .
}

Here we are looking for letters containing a link to the historian Claude Jordan, who is identified by his Integrated Authority File identifier, (VIAF)271899510.

However, if you need to specify the context in which the link tag occurs, you must use the function knora-api:standoffLink. It takes three arguments:

A variable or IRI representing the resource that is the source of the link.
A variable representing the standoff link tag.
A variable or IRI representing the resource that is the target of the link.

This function can only be used as the top-level expression in a FILTER. For example:

PREFIX knora-api: <http://api.knora.org/ontology/knora-api/v2#>
PREFIX standoff: <http://api.knora.org/ontology/standoff/v2#>
PREFIX beol: <http://0.0.0.0:3333/ontology/0801/beol/v2#>

CONSTRUCT {
    ?letter knora-api:isMainResource true .
    ?letter beol:hasText ?text .
} WHERE {
    ?letter a beol:letter .
    ?letter beol:hasText ?text .
    ?text knora-api:textValueHasStandoff ?standoffLinkTag .
    ?standoffLinkTag a knora-api:StandoffLinkTag .
    FILTER knora-api:standoffLink(?letter, ?standoffLinkTag, ?person)
    ?person a beol:person .
    ?person beol:hasIAFIdentifier ?iafIdentifier .
    ?iafIdentifier knora-api:valueAsString "(VIAF)271899510" .
    ?standoffLinkTag knora-api:standoffTagHasStartParent ?standoffItalicTag .
    ?standoffItalicTag a standoff:StandoffItalicTag .
}

This has the same effect as the previous example, except that because we are matching the link tag itself, we can specify that its immediate parent is a StandoffItalicTag.

If you actually want to get the target of the link (in this example, ?person) in the search results, you need to add a statement like ?letter knora-api:hasStandoffLinkTo ?person . to the WHERE clause and to the CONSTRUCT clause:

PREFIX knora-api: <http://api.knora.org/ontology/knora-api/v2#>
PREFIX standoff: <http://api.knora.org/ontology/standoff/v2#>
PREFIX beol: <http://0.0.0.0:3333/ontology/0801/beol/v2#>

CONSTRUCT {
    ?letter knora-api:isMainResource true .
    ?letter beol:hasText ?text .
    ?letter knora-api:hasStandoffLinkTo ?person .
} WHERE {
    ?letter a beol:letter .
    ?letter beol:hasText ?text .
    ?text knora-api:textValueHasStandoff ?standoffLinkTag .
    ?standoffLinkTag a knora-api:StandoffLinkTag .
    FILTER knora-api:standoffLink(?letter, ?standoffLinkTag, ?person)
    ?person a beol:person .
    ?person beol:hasIAFIdentifier ?iafIdentifier .
    ?iafIdentifier knora-api:valueAsString "(VIAF)271899510" .
    ?standoffLinkTag knora-api:standoffTagHasStartParent ?standoffItalicTag .
    ?standoffItalicTag a standoff:StandoffItalicTag .
    ?letter knora-api:hasStandoffLinkTo ?person .
}

Matching Standoff Dates

You can use the knora-api:toSimpleDate function (see @refDate Comparisons) to match dates in standoff date tags, i.e. instances of knora-api:StandoffDateTag or of one of its subclasses. For example, here we are looking for a text containing an anything:StandoffEventTag (which is a project-specific subclass of knora-api:StandoffDateTag) representing an event that occurred sometime during the month of December 2016:

PREFIX knora-api: <http://api.knora.org/ontology/knora-api/v2#>
PREFIX anything: <http://0.0.0.0:3333/ontology/0001/anything/v2#>
PREFIX knora-api-simple: <http://api.knora.org/ontology/knora-api/simple/v2#>

CONSTRUCT {
    ?thing knora-api:isMainResource true .
    ?thing anything:hasText ?text .
} WHERE {
    ?thing a anything:Thing .
    ?thing anything:hasText ?text .
    ?text knora-api:textValueHasStandoff ?standoffEventTag .
    ?standoffEventTag a anything:StandoffEventTag .
    FILTER(knora-api:toSimpleDate(?standoffEventTag) = "GREGORIAN:2016-12 CE"^^knora-api-simple:Date)
}

Matching Ancestor Tags

Suppose we want to search for a standoff date in a paragraph, but we know that the paragraph tag might not be the immediate parent of the date tag. For example, the date tag might be in an italics tag, which is in a paragraph tag. In that case, we can use the inferred property knora-api:standoffTagHasStartAncestor. We can modify the previous example to do this:

PREFIX knora-api: <http://api.knora.org/ontology/knora-api/v2#>
PREFIX standoff: <http://api.knora.org/ontology/standoff/v2#>
PREFIX anything: <http://0.0.0.0:3333/ontology/0001/anything/v2#>
PREFIX knora-api-simple: <http://api.knora.org/ontology/knora-api/simple/v2#>

CONSTRUCT {
    ?thing knora-api:isMainResource true .
    ?thing anything:hasText ?text .
} WHERE {
    ?thing a anything:Thing .
    ?thing anything:hasText ?text .
    ?text knora-api:textValueHasStandoff ?standoffDateTag .
    ?standoffDateTag a knora-api:StandoffDateTag .
    FILTER(knora-api:toSimpleDate(?standoffDateTag) = "GREGORIAN:2016-12-24 CE"^^knora-api-simple:Date)
    ?standoffDateTag knora-api:standoffTagHasStartAncestor ?standoffParagraphTag .
    ?standoffParagraphTag a standoff:StandoffParagraphTag .
}

Filtering on `rdfs:label`

The rdfs:label of a resource is not a DSP-API value, but you can still search for it. This can be done in the same ways in the simple or complex schema:

Using a string literal object:

?book rdfs:label "Zeitglöcklein des Lebens und Leidens Christi" .

Using a variable and a FILTER:

?book rdfs:label ?label .
FILTER(?label = "Zeitglöcklein des Lebens und Leidens Christi")

Using the regex function:

?book rdfs:label ?bookLabel .
FILTER regex(?bookLabel, "Zeit", "i")

To match words in an rdfs:label using the full-text search index, use the knora-api:matchLabel function, which works like knora-api:matchText, except that the first argument is a variable representing a resource:

FILTER knora-api:matchLabel(?book, "Zeitglöcklein")

Filtering on Resource IRIs

A FILTER can compare a variable with another variable or IRI representing a resource. For example, to find a letter whose author and recipient are different persons:

PREFIX beol: <http://0.0.0.0:3333/ontology/0801/beol/v2#>
PREFIX knora-api: <http://api.knora.org/ontology/knora-api/v2#>

CONSTRUCT {
    ?letter knora-api:isMainResource true .
    ?letter beol:hasAuthor ?person1 .
    ?letter beol:hasRecipient ?person2 .
} WHERE {
    ?letter a beol:letter .
    ?letter beol:hasAuthor ?person1 .
    ?letter beol:hasRecipient ?person2 .
    FILTER(?person1 != ?person2) .
}
OFFSET 0

To find a letter whose author is not a person with a specified IRI:

PREFIX beol: <http://0.0.0.0:3333/ontology/0801/beol/v2#>
PREFIX knora-api: <http://api.knora.org/ontology/knora-api/v2#>

CONSTRUCT {
    ?letter knora-api:isMainResource true .
    ?letter beol:hasAuthor ?person1 .
    ?letter beol:hasRecipient ?person2 .
} WHERE {
    ?letter a beol:letter .
    ?letter beol:hasAuthor ?person1 .
    ?letter beol:hasRecipient ?person2 .
    FILTER(?person1 != <http://rdfh.ch/0801/F4n1xKa3TCiR4llJeElAGA>) .
}
OFFSET 0

CONSTRUCT Clause

In the CONSTRUCT clause of a Gravsearch query, the variable representing the main resource must be indicated with knora-api:isMainResource true. Exactly one variable representing a resource must be marked in this way.

Any other statements in the CONSTRUCT clause must also be present in the WHERE clause. If a variable representing a resource or value is used in the WHERE clause but not in the CONSTRUCT clause, the matching resources or values will not be included in the results.

If the query is written in the complex schema, all variables in the CONSTRUCT clause must refer to DSP-API resources, DSP-API values, or properties. Data below the level of values may not be mentioned in the CONSTRUCT clause.

Predicates from the rdf, rdfs, and owl ontologies may not be used in the CONSTRUCT clause. The rdfs:label of each matching resource is always returned, so there is no need to mention it in the query.

Gravsearch by Example

In this section, we provide some sample queries of different complexity to illustrate the usage of Gravsearch.

Getting All the Components of a Compound Resource

In order to get all the components of a compound resource, the following Gravsearch query can be sent to the API.

In this case, the compound resource is an incunabula:book identified by the IRI http://rdfh.ch/0803/c5058f3a and the components are of type incunabula:page (test data for the Incunabula project). Since inference is assumed, we can use knora-api:StillImageRepresentation (incunabula:page is one of its subclasses). This makes the query more generic and allows for reuse (for instance, a client would like to query different types of compound resources defined in different ontologies).

ORDER BY is used to sort the components by their sequence number.

OFFSET is set to 0 to get the first page of results.

PREFIX knora-api: <http://api.knora.org/ontology/knora-api/simple/v2#>

CONSTRUCT {
   ?component knora-api:isMainResource true . # marking of the component searched for as the main resource, required
   ?component knora-api:seqnum ?seqnum . # return the sequence number in the response
   ?component knora-api:hasStillImageFileValue ?file . # return the StillImageFile in the response
} WHERE {
   ?component a knora-api:StillImageRepresentation . # restriction of the type of component
   ?component knora-api:isPartOf <http://rdfh.ch/0803/c5058f3a> . # component relates to a compound resource via this property
   ?component knora-api:seqnum ?seqnum . # component must have a sequence number
   ?component knora-api:hasStillImageFileValue ?file . # component must have a StillImageFile
}
ORDER BY ASC(?seqnum) # order by sequence number, ascending
OFFSET 0 # get first page of results

The incunabula:book with the IRI http://rdfh.ch/0803/c5058f3a has 402 pages. (This result can be obtained by doing a count query; see Submitting Gravsearch Queries.) However, with OFFSET 0, only the first page of results is returned. The same query can be sent again with OFFSET 1 to get the next page of results, and so forth. When a page of results is not full (see settings in app/v2 in application.conf) or is empty, no more results are available.

By design, it is not possible for the client to get more than one page of results at a time; this is intended to prevent performance problems that would be caused by huge responses. A client that wants to download all the results of a query must request each page sequentially.

Let's assume the client is not interested in all of the book's pages, but just in first ten of them. In that case, the sequence number can be restricted using a FILTER that is added to the query's WHERE clause:

FILTER (?seqnum <= 10)

The first page starts with sequence number 1, so with this FILTER only the first ten pages are returned.

This query would be exactly the same in the complex schema, except for the expansion of the knora-api prefix:

PREFIX knora-api: <http://api.knora.org/ontology/knora-api/v2#>

Traversing Multiple Links

Here we are looking for regions of pages that are part of books that have a particular title. In the simple schema:

PREFIX incunabula: <http://0.0.0.0:3333/ontology/0803/incunabula/simple/v2#>
PREFIX knora-api: <http://api.knora.org/ontology/knora-api/simple/v2#>

CONSTRUCT {
  ?region knora-api:isMainResource true ;
    knora-api:isRegionOf ?page .

  ?page incunabula:partOf ?book .

  ?book incunabula:title ?title .
} WHERE {
  ?region a knora-api:Region ;
    knora-api:isRegionOf ?page .

  ?page a incunabula:page ;
    incunabula:partOf ?book .

  ?book incunabula:title ?title .

  FILTER(?title = "Zeitglöcklein des Lebens und Leidens Christi")
}

In the complex schema:

PREFIX incunabula: <http://0.0.0.0:3333/ontology/0803/incunabula/v2#>
PREFIX knora-api: <http://api.knora.org/ontology/knora-api/v2#>

CONSTRUCT {
  ?region knora-api:isMainResource true ;
    knora-api:isRegionOf ?page .

  ?page incunabula:partOf ?book .

  ?book incunabula:title ?title .
} WHERE {
  ?region a knora-api:Region ;
    knora-api:isRegionOf ?page .

  ?page a incunabula:page ;
    incunabula:partOf ?book .

  ?book incunabula:title ?title .

  ?title knora-api:valueAsString "Zeitglöcklein des Lebens und Leidens Christi" .
}

If we remove the line ?book incunabula:title ?title . from the CONSTRUCT clause, so that the CONSTRUCT clause no longer mentions ?title, the response will contain the same matching resources, but the titles of those resources will not be included in the response.

Requesting a Graph Starting with a Known Resource

Here the IRI of the main resource is already known and we want specific information about it, as well as about related resources. In this case, the IRI of the main resource must be assigned to a variable using BIND:

PREFIX beol: <http://0.0.0.0:3333/ontology/0801/beol/simple/v2#>
PREFIX knora-api: <http://api.knora.org/ontology/knora-api/simple/v2#>

CONSTRUCT {
  ?letter knora-api:isMainResource true ;
    beol:creationDate ?date ;
    ?linkingProp1 ?person1 .

  ?person1 beol:hasFamilyName ?familyName .
} WHERE {
  BIND(<http://rdfh.ch/0801/_B3lQa6tSymIq7_7SowBsA> AS ?letter)

  ?letter a beol:letter ;
    beol:creationDate ?date ;
    ?linkingProp1 ?person1 .

  FILTER(?linkingProp1 = beol:hasAuthor || ?linkingProp1 = beol:hasRecipient)

  ?person1 beol:hasFamilyName ?familyName .
} ORDER BY ?date

This query would be the same in the complex schema, except for the prefix expansions:

PREFIX beol: <http://0.0.0.0:3333/ontology/0801/beol/v2#>
PREFIX knora-api: <http://api.knora.org/ontology/knora-api/v2#>

Searching for a List Value Referring to a Particular List Node

Since list nodes are represented by their IRI in the complex schema, uniqueness is guranteed (as opposed to the simple schema). Also all the subnodes of the given list node are considered a match.

PREFIX knora-api: <http://api.knora.org/ontology/knora-api/v2#>
PREFIX anything: <http://0.0.0.0:3333/ontology/0001/anything/v2#>

CONSTRUCT {
    ?thing knora-api:isMainResource true .
    ?thing anything:hasListItem ?listItem .
} WHERE {
    ?thing anything:hasListItem ?listItem .
    ?listItem knora-api:listValueAsListNode <http://rdfh.ch/lists/0001/treeList02> .
}

Type Inference

Gravsearch needs to be able to determine the types of the entities that query variables and IRIs refer to in the WHERE clause. In most cases, it can infer these from context and from the ontologies used. In particular, it needs to know:

The type of the subject and object of each statement.
The type that is expected as the object of each predicate.

Type Annotations

When one or more types cannot be inferred, Gravsearch will return an error message indicating the entities for which it could not determine types. The missing information must then be given by adding type annotations to the query. This can always done by adding statements with the predicate rdf:type. The subject must be a resource or value, and the object must either be knora-api:Resource (if the subject is a resource) or the subject's specific type (if it is a value).

For example, consider this query that uses a non-DSP property:

PREFIX incunabula: <http://0.0.0.0:3333/ontology/0803/incunabula/simple/v2#>
PREFIX knora-api: <http://api.knora.org/ontology/knora-api/simple/v2#>
PREFIX dcterms: <http://purl.org/dc/terms/>

CONSTRUCT {
    ?book knora-api:isMainResource true ;
        dcterms:title ?title .

} WHERE {
    ?book dcterms:title ?title .
}

This produces the error message:

The types of one or more entities could not be determined:
  ?book, <http://purl.org/dc/terms/title>, ?title

To solve this problem, it is enough to specify the types of ?book and ?title; the type of the expected object of dcterms:title can then be inferred from the type of ?title.

PREFIX incunabula: <http://0.0.0.0:3333/ontology/0803/incunabula/simple/v2#>
PREFIX knora-api: <http://api.knora.org/ontology/knora-api/simple/v2#>
PREFIX dcterms: <http://purl.org/dc/terms/>

CONSTRUCT {
    ?book knora-api:isMainResource true ;
        dcterms:title ?title .

} WHERE {

    ?book rdf:type incunabula:book ;
        dcterms:title ?title .

    ?title rdf:type xsd:string .

}

It would also be possible to annotate the property itself, using the predicate knora-api:objectType; then the type of ?title would be inferred:

PREFIX incunabula: <http://0.0.0.0:3333/ontology/0803/incunabula/simple/v2#>
PREFIX knora-api: <http://api.knora.org/ontology/knora-api/simple/v2#>
PREFIX dcterms: <http://purl.org/dc/terms/>

CONSTRUCT {
    ?book knora-api:isMainResource true ;
        dcterms:title ?title .

} WHERE {

    ?book rdf:type incunabula:book ;
        dcterms:title ?title .

    dcterms:title knora-api:objectType xsd:string .

}

Note that it only makes sense to use dcterms:title in the simple schema, because its object is supposed to be a literal.

Here is another example, using a non-DSP class:

PREFIX knora-api: <http://api.knora.org/ontology/knora-api/simple/v2#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>

CONSTRUCT {
    ?person knora-api:isMainResource true .
} WHERE {
    ?person a foaf:Person .
    ?person foaf:familyName ?familyName .
    FILTER(?familyName = "Meier")
}

This produces the error message:

Types could not be determined for one or more entities: ?person

The solution is to specify that ?person is a knora-api:Resource:

PREFIX knora-api: <http://api.knora.org/ontology/knora-api/simple/v2#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>

CONSTRUCT {
    ?person knora-api:isMainResource true .
} WHERE {
    ?person a foaf:Person .
    ?person a knora-api:Resource .
    ?person foaf:familyName ?familyName .
    FILTER(?familyName = "Meier")
}

Inconsistent Types

Gravsearch will also reject a query if an entity is used with inconsistent types. For example:

PREFIX incunabula: <http://0.0.0.0:3333/ontology/0803/incunabula/simple/v2#>
PREFIX knora-api: <http://api.knora.org/ontology/knora-api/simple/v2#>

CONSTRUCT {
    ?book knora-api:isMainResource true ;
        incunabula:pubdate ?pubdate .
} WHERE {
    ?book a incunabula:book ;
        incunabula:pubdate ?pubdate .

  FILTER(?pubdate = "JULIAN:1497-03-01") .
}

This returns the error message:

One or more entities have inconsistent types:

<http://0.0.0.0:3333/ontology/0803/incunabula/simple/v2#pubdate>
  knora-api:objectType <http://api.knora.org/ontology/knora-api/simple/v2#Date> ;
  knora-api:objectType <http://www.w3.org/2001/XMLSchema#string> .

?pubdate rdf:type <http://api.knora.org/ontology/knora-api/simple/v2#Date> ;
  rdf:type <http://www.w3.org/2001/XMLSchema#string> .

This is because the incunabula ontology says that the object of incunabula:pubdate must be a knora-api:Date, but the FILTER expression compares ?pubdate with an xsd:string. The solution is to specify the type of the literal in the FILTER:

PREFIX incunabula: <http://0.0.0.0:3333/ontology/0803/incunabula/simple/v2#>
PREFIX knora-api: <http://api.knora.org/ontology/knora-api/simple/v2#>

CONSTRUCT {
    ?book knora-api:isMainResource true ;
        incunabula:pubdate ?pubdate .
} WHERE {
    ?book a incunabula:book ;
        incunabula:pubdate ?pubdate .

  FILTER(?pubdate = "JULIAN:1497-03-01"^^knora-api:Date) .
}

Scoping Issues

SPARQL is evaluated from the bottom up. A UNION block therefore opens a new scope, in which variables bound at higher levels are not necessarily in scope. This can cause unexpected results if queries are not carefully designed. Gravsearch tries to prevent this by rejecting queries in the following cases.

FILTER in UNION

A FILTER in a UNION block can only use variables that are bound in the same block, otherwise the query will be rejected. This query is invalid because ?text is not bound in the UNION block containing the FILTER where the variable is used:

PREFIX knora-api: <http://api.knora.org/ontology/knora-api/simple/v2#>
PREFIX mls: <http://0.0.0.0:3333/ontology/0807/mls/simple/v2#>

CONSTRUCT {
    ?lemma knora-api:isMainResource true .
    ?lemma mls:hasLemmaText ?text .        
} WHERE {
    ?lemma a mls:Lemma .
    ?lemma mls:hasLemmaText ?text .

    {
        ?lemma mls:hasPseudonym ?pseudo .
        FILTER regex(?pseudo, "Abel", "i") .
    } UNION {
        FILTER regex(?text, "Abel", "i") .
    }
}
ORDER BY ASC(?text)
OFFSET 0

It can be corrected like this:

PREFIX knora-api: <http://api.knora.org/ontology/knora-api/simple/v2#>
PREFIX mls: <http://0.0.0.0:3333/ontology/0807/mls/simple/v2#>

CONSTRUCT {
    ?lemma knora-api:isMainResource true .
    ?lemma mls:hasLemmaText ?text .        
} WHERE {
    ?lemma a mls:Lemma .
    ?lemma mls:hasLemmaText ?text .

    {
        ?lemma mls:hasPseudonym ?pseudo .
        FILTER regex(?pseudo, "Abel", "i") .
    } UNION {
        ?lemma mls:hasLemmaText ?text .
        FILTER regex(?text, "Abel", "i") .
    }
}
ORDER BY ASC(?text)
OFFSET 0

ORDER BY

A variable used in ORDER BY must be bound at the top level of the WHERE clause. This query is invalid, because ?int is not bound at the top level of the WHERE clause:

PREFIX knora-api: <http://api.knora.org/ontology/knora-api/v2#>
PREFIX anything: <http://0.0.0.0:3333/ontology/0001/anything/v2#>

CONSTRUCT {
    ?thing knora-api:isMainResource true .
    ?thing anything:hasInteger ?int .
    ?thing anything:hasRichtext ?richtext .
    ?thing anything:hasText ?text .
} WHERE {
    ?thing a knora-api:Resource .
    ?thing a anything:Thing .

    {
        ?thing anything:hasRichtext ?richtext .
        FILTER knora-api:matchText(?richtext, "test")
        ?thing anything:hasInteger ?int .
    }
    UNION
    {
        ?thing anything:hasText ?text .
        FILTER knora-api:matchText(?text, "test")
        ?thing anything:hasInteger ?int .
    }
}
ORDER BY (?int)

It can be corrected like this:

PREFIX knora-api: <http://api.knora.org/ontology/knora-api/v2#>
PREFIX anything: <http://0.0.0.0:3333/ontology/0001/anything/v2#>

CONSTRUCT {
    ?thing knora-api:isMainResource true .
    ?thing anything:hasInteger ?int .
    ?thing anything:hasRichtext ?richtext .
    ?thing anything:hasText ?text .
} WHERE {
    ?thing a knora-api:Resource .
    ?thing a anything:Thing .
    ?thing anything:hasInteger ?int .

    {
        ?thing anything:hasRichtext ?richtext .
        FILTER knora-api:matchText(?richtext, "test")
    }
    UNION
    {
        ?thing anything:hasText ?text .
        FILTER knora-api:matchText(?text, "test")
    }
}
ORDER BY (?int)

Query Optimization by Dependency

The query performance of triplestores, such as Fuseki, is highly dependent on the order of query patterns. To improve performance, Gravsearch automatically reorders the statement patterns in the WHERE clause according to their dependencies on each other, to minimise the number of possible matches for each pattern.

Consider the following Gravsearch query:

PREFIX beol: <http://0.0.0.0:3333/ontology/0801/beol/v2#>
PREFIX knora-api: <http://api.knora.org/ontology/knora-api/v2#>

CONSTRUCT {
  ?letter knora-api:isMainResource true .
  ?letter ?linkingProp1  ?person1 .
  ?letter ?linkingProp2  ?person2 .
  ?letter beol:creationDate ?date .
} WHERE {
  ?letter beol:creationDate ?date .

  ?letter ?linkingProp1 ?person1 .
  FILTER(?linkingProp1 = beol:hasAuthor || ?linkingProp1 = beol:hasRecipient )

  ?letter ?linkingProp2 ?person2 .
  FILTER(?linkingProp2 = beol:hasAuthor || ?linkingProp2 = beol:hasRecipient )

  ?person1 beol:hasIAFIdentifier ?gnd1 .
  ?gnd1 knora-api:valueAsString "(DE-588)118531379" .

  ?person2 beol:hasIAFIdentifier ?gnd2 .
  ?gnd2 knora-api:valueAsString "(DE-588)118696149" .
} ORDER BY ?date

Gravsearch optimises the performance of this query by moving these statements to the top of the WHERE clause:

  ?gnd1 knora-api:valueAsString "(DE-588)118531379" .
  ?gnd2 knora-api:valueAsString "(DE-588)118696149" .

The rest of the WHERE clause then reads:

  ?person1 beol:hasIAFIdentifier ?gnd1 .
  ?person2 beol:hasIAFIdentifier ?gnd2 .
  ?letter ?linkingProp1 ?person1 .
  FILTER(?linkingProp1 = beol:hasAuthor || ?linkingProp1 = beol:hasRecipient )

  ?letter ?linkingProp2 ?person2 .
  FILTER(?linkingProp2 = beol:hasAuthor || ?linkingProp2 = beol:hasRecipient )
 ?letter beol:creationDate ?date .

Gravsearch: Virtual Graph Search

Basic Concept

Submitting Gravsearch Queries

Gravsearch and API Schemas

Using the Simple Schema

Using the Complex Schema

Main and Dependent Resources

Virtual incoming Links

Dedicated endpoint for querying the incoming links

Graph Patterns and Result Graphs

Permission Checking

Paging

Inference

Gravsearch Syntax

Supported SPARQL Syntax

Resources, Properties, and Values

Filtering on Values

Filtering on Values in the Simple Schema

Filtering on Values in the Complex Schema

Date Comparisons

Searching for Matching Words

Filtering Text by Language

Regular Expressions

Searching for Text Markup

Matching Text in a Standoff Tag

Matching Standoff Links

Matching Standoff Dates

Matching Ancestor Tags

Filtering on rdfs:label

Filtering on Resource IRIs

CONSTRUCT Clause

Gravsearch by Example

Getting All the Components of a Compound Resource

Traversing Multiple Links

Requesting a Graph Starting with a Known Resource

Searching for a List Value Referring to a Particular List Node

Type Inference

Type Annotations

Inconsistent Types

Scoping Issues

FILTER in UNION

ORDER BY

Query Optimization by Dependency

Filtering on `rdfs:label`