API

Atom feeds

The media items published on Open Images are also offered through a range of Atom feeds. These Atom feeds can be used to display content of Open Images in external applications (like Miro) or websites by using a number of search criteria (recent additions to the platform, recent additions of a specific user, material related to a certain media item, material related to a certain query, etc.).

OAI-PMH

All Open Images media items and their descriptions (metadata) are also accessible via an Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) API. This allows third parties to access Open Images in a structured way. OAI-PMH is a powerful tool for data and metadata sharing between institutions and platforms. For example, OAI-PMH can be used to harvest all data available on the server, or to request specific records and periodic updates.

The specification of OAI-PMH can be found here. It covers all options for this type of API. Open Images implements this protocol with two specific metadata formats. The specific details of these format are discussed below for developers / administrators who are already familiar with this type of API. If you are not familiar with OAI-PMH we recommend you to read this introduction to OAI-PMH.

Available Elements

The Open Images OAI implementation uses two different metadata formats. This includes a required minimum data set of OAI-PMH records called 'oai_dc' (OAI Dublin Core). Dublin Core is a set of elements that can describe physical objects. oai_dc contains 15 elements specified by Dublin Core. The second, more comprehensive, set of metadata elements is a refinement of these core elements. 'oai_oi' (OAI Open Images) is a Open Images specific implementation consisting of a mixture of DC Terms and an XML interpretation of ccREL.

We strongly recommend using the oai_oi metadata format when using this API, because it contains more detailed information about the files in Open Images. The oai_dc format is simpler and more easily interchangeable with others OAI implementations that are based on Dublin Core.

OAI-PMH is XML based. All requests to this API can be validated with corresponding XSDs. An XML Schema Document XSD describes what an XML file looks like, which elements are and are not allowed, what type they are and how often they occur.

These are the most important XSDs for Open Images:

http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd
This XSD describes what a result of an OAI-PMH request looks like. The document allows you to check whether the request is properly answered.

http://www.openbeelden.nl/feeds/oai/oai_oi.xsd
When you use the Open Images specific metadata format you can check whether each individual record is valid using this XSD.

http://www.openarchives.org/OAI/2.0/oai_dc.xsd
When you use the Dublin Core metadata minimum metadata format you can check the validity of each individual record with this XSD.

Please refer to http://www.openbeelden.nl/feeds/oai/?verb=ListMetadataFormats for an OAI implementation of this information.

An implementation of Dublin Core elements is always an interpretation of Dublin Core’s guidelines. Below is a list of every field in both metadata formats with explanation and comments:

oai_dc

element

explanation

comments

dc:title Title of the item. -
dc:creator The creator/producer of the item. In the case of a person name the format "surname, name” is used, with optional brackets with a role included. For example: "Doe, John (producer)"
dc:subject Words that describe the item, usually from a closed vocabulary (thesaurus). This includes person names of people that are present in the media item. These follow the same format as above. Multiple keywords are possible.
dc:description A textual description of the item. -
dc:publisher The uploader of the item to Open Images. For this field are two values are present, the user name and a URL to the profile of the user on Open Images.
dc:contributor Persons/entities that have contributed to the creation of the item. In the case of person names, the same format as mentioned above is used. Multiple values are possible.
dc:date The original publication date of the item. By default, this is the moment of uploading to Open Images, users can adjust this manually (if necessary).
dc:type The media type of the item. Items on Open Images are of the types video, audio or still image and are indicated by: http://dublincore.org/documents/dcmi-type-vocabulary/
dc:format The various formats in which the item is available Open Images. There are always multiple formats of an item present.
dc:identifier The catalog number of the item (if derived from an existing collection). -
dc:source A reference to the original carrier/source of the items (if any). -
dc:language The language the items themselves (not the description). This value is indicated by the ISO 639-1 standard.
dc:relation A statement about the sources from which the item is a derivative (if any). -
dc:coverage The geographic location(s) of the item. Usually this is a written place name. Multiple values are possible.
dc:rights The license conditions under which the item has been made available. All items on Open Images are available under a Creative Commons license or are in the public domain. The value of this field is expressed in the form of a URL. For example: "http://creativecommons.org/licenses/by-sa/3.0/nl/deed.en"
oai_oi

elements

explanation

comments

oi:title Title of the item. -
oi:alternative Subtitle of the item (optional). -
oi:creator The creator/producer of the item. In the case of a person name the format "surname, name” is used, with optional brackets with a role included. For example: "Doe, John (producer)"
oi:subject Words that describe the item, usually from a closed vocabulary (thesaurus). This includes person names of people that are present in the media item. These follow the same format as above. Multiple keywords are possible.
oi:description An introductory description of the item. -
oi:abstract A detailed description of the item. -
oi:publisher The uploader of the item to Open Images. For this field are two values are present, the user name and a URL to the profile of the user on Open Images.
oi:contributor Persons/entities that have contributed to the creation of the item. In the case of person names, the same format as mentioned above is used. Multiple values are possible.
oi:date The original publication date of the item. By default, this is the moment of uploading to Open Images, users can adjust this manually (if necessary).
oi:type The media type of the item. Items on Open Images are of the types video, audio or still image and are indicated by: http://dublincore.org/documents/dcmi-type-vocabulary/
oi:extent The length of the item. The duration is indicated by: http://en.wikipedia.org/wiki/ISO_8601#Durations
oi:medium The various formats in which the item is available Open Images. There are always multiple formats of an item present.
oi:identifier The catalog number of the item (if derived from an existing collection). -
oi:source A reference to the original carrier/source of the items (if any). -
oi:language The language the items themselves (not the description). This value is indicated by the ISO 639-1 standard.
oi:references A statement about the sources from which the item is a derivative (if any). -
oi:spatial The geographic location(s) of the item. Usually this is a written place name. Multiple values are possible.
oi:attributionName The name of one or more makers, in the case of reuse this information needs to be mentioned for proper attribution. In the case of person names, the same format as mentioned above is used. Multiple values are possible.
oi:attributionURL The location of the original item that, in the case of reuse of the item, should be referenced. The value of this field is expressed in the form of a URL that refers to the item on Open Images.For example: "http://www.openimages.eu/media/23173"
oi:license The license conditions under which the item has been made available. All items on Open Images are available under a Creative Commons license or are in the public domain. The value of this field is expressed in the form of a URL. For example: "http://creativecommons.org/licenses/by-sa/3.0/nl/deed.en"

Request Examples

Resumption Tokens

OAI-PMH makes use of resumption tokens to query more than 100 records. The OAI resumption tokens can be used to query the next 100 results, which will contain a new token for the following 100 records. An example of a resumption token is given below:

<resumptionToken cursor="100" completeListSize="12000">!f!u!oai_dc!100</resumptionToken>

If you are, for example, interested in all media that have last been edited in 2010, you can use the following query:

http://www.openbeelden.nl/feeds/oai/?verb=ListRecords&metadataPrefix=oai_dc&from=2010-01-01&until=2010-12-31

This will probably results in more than 100 records. By utilizing the provided resumption token, you can request several pages of 100 records each:

http://www.openbeelden.nl/feeds/oai/?verb=ListRecords&resumptionToken=!f2009-01-01!u!oai_dc!100

Metadata Formats

Open Images supports two metadata formats with its OAI implementation. The metadata prefix fields in any OAI-PMH request identifies in which format you want to receive your results. You have the following options:

  • metadataPrefix=oai_dc
  • metadataPrefix=oai_oi
Identification

For many automatic harvesters of an OAI implementation identification is important:

http://www.openbeelden.nl/feeds/oai/?verb=Identify

This request identifies the OAI resource and holds other information about this implementation.

Retrieve All Information

With the request below you can query all records in this repository with the oai_dc metadata format. This request can be used for an initial copy of all data on Open Images. We do not recommend using this often: it is a heavy demand on our database:

http://www.openbeelden.nl/feeds/oai/?verb=ListRecords&metadataPrefix=oai_dc

Periodic Updates

Using the following request you can update your copy of the data periodically. For example, you can check every day whether new or updated material is available on Open Images. By applying the 'from' and 'until' variable (YYYY-MM-DD) you can retrieve specific periods. If one of these variables is missing, the system will use from the first or last possible record.

http://www.openbeelden.nl/feeds/oai/?verb=ListRecords&metadataPrefix=oai_dc&from=2010-11-01&until=2010-12-01

Retrieve Specific Collections

OAI-PHM has the ability to request specific sets of objects. Use the following URL for a list of the sets available:

http://www.openbeelden.nl/feeds/oai/?verb=ListSets

Use the information in the element ‘setspec’ to dig deeper into a specific collection. All the above examples can also be used with the 'set' variable.

For example, you can request all records in the set "beeldengeluid" edited or uploaded in November 2010 by using the following URL:  

http://www.openbeelden.nl/feeds/oai/?verb=ListRecords&metadataPrefix=oai_dc&from=2010-11-01&until=2010-12-01&set=beeldengeluid

Retrieve Specific Records

If you want to retrieve information from a specific record you can use the following structure:

http://www.openbeelden.nl/feeds/oai/?verb=GetRecord&identifier=oai:openimages.eu:47540&metadataPrefix=oai_dc

Each record has its own unique identifier on Open Images (the numeric value that follows after /media/ in the media item URL). You can use this identifier to retrieve and keep up to date with one record.

Software

There are some good tools available to make OAI-PMH API calls. For example, there is a Python implementation that works well. See:

http://www.infrae.com/download/OAI/pyoai

This tool is used by Wikimedia to harverst Open Images on a monthly basis. This allows material with the compatible license conditions to be automatically uploaded to Wikimedia Commons, for reuse within Wikipedia for example. See:

https://fisheye.toolserver.org/browse/multichill/bot/openbeelden/openbeelden_uploader.py?r=HEAD