Background

Sara El-Gebali

19 June 2025

Introduction sentence

📚 1. Start with Metadata

DataCite metadata is structured data in JSON. It tells us things like:

{
“title”: “Climate Data 2024”,
“creator”: “Alice Smith”,
“publicationYear”: “2024”
}

This is useful, but only humans (or systems programmed specifically for this format) know what these fields mean.

🧠 2. Add Meaning with an Ontology

An ontology defines:

What creator means (e.g., it refers to a Person)
What title means (e.g., it’s a label for a resource)
How these things relate (e.g., every Dataset must have at least one title)

It uses RDF, RDFS, and OWL to formalize this. Once you do that, machines can understand the structure and the meaning. This is great for automated reasoning, quality checks, and linking with other systems.

🔗 3. Align Concepts with Mappings

But DataCite isn’t the only standard. Others, like DCAT, schema.org, Dublin Core, and Wikidata, also define things like title, creator, and publicationYear. To make DataCite metadata understandable to those systems, we create mappings.

Mappings say:

DataCite creator = foaf:maker or schema:author
DataCite title = dct:title or schema:name

They use statements like:
ex:hasCreator owl:equivalentProperty schema:author .

So if a tool understands schema:author, it can now understand DataCite’s creator field too.

🔁 4. Transform Records with Crosswalks

A crosswalk is a recipe for transforming a full record from one format/schema into another. Think of it like a conversion chart:

DataCite Field	schema.org Equivalent
title	name
creator.name	author.name
publicationYear	datePublished

This allows you to:

Export DataCite metadata in schema.org JSON-LD
Harvest and reuse metadata in another system (e.g., a library catalog or web search engine)

🚀 Putting It All Together

Here’s how they all relate:

Role	What it does	How it helps with DataCite
Ontology	Describes what the data means using concepts, logic, and relationships	Helps machines understand and reason over DataCite metadata
Mapping	Aligns concepts in DataCite to other vocabularies (like schema.org or Dublin Core)	Enables linking and integration with other systems
Crosswalk	Translates complete records from one metadata format into another	Makes DataCite metadata usable in external formats like schema.org JSON-LD

Ontologies define meaning; mappings align that meaning across systems.

🧩 Example in Action

Let’s say you want to expose a DataCite record in schema.org so Google Dataset Search can find it:

Ontology: Define that your resource is a Dataset, and creator is a Person.
Mapping: Link your creator field to schema:author.
Crosswalk: Use a table to translate the JSON fields:

✅ This is a schema.org JSON-LD representation of a dataset.
🟢 It reflects metadata from DataCite fields, translated into schema.org terms.

{ “@type”: “Dataset”,
“name”: “Climate Data 2024”,
“author”: {
“name”: “Alice Smith”
},
“datePublished”: “2024”
}

The crosswalk can be:

A table
A script or mapping file (e.g., XSLT, SPARQL CONSTRUCT, or Python script)
An ontology alignment (owl:equivalentProperty)

It’s what enables the conversion of a DataCite JSON record into the JSON-LD snippet

Now your data is:
✅ Machine-readable
✅ Interoperable
✅ Findable on the web

Understanding Crosswalks and Mappings

Crosswalks and mappings play a critical role in making metadata interoperable across different schemas, domains, and systems. While ontologies formalize the semantics of a single schema, crosswalks and mappings are about connecting concepts between multiple schemas.

A crosswalk is a structured mapping between elements in two or more metadata standards. It shows how a field or concept in one schema (e.g., DataCite) aligns with one in another (e.g., Dublin Core, schema.org, MARC).

Mappings can be:

Exact (e.g., DataCite:title is equivalent to dct:title)
Close (e.g., creatorName maps to foaf:name, though with some nuance)
One-to-many (e.g., a complex DataCite creator object may map to multiple elements in another schema)

Why Crosswalks & Mappings Matter

Interoperability: They allow metadata from one system to be interpreted and used in another.
Aggregation: Services like Europeana or OpenAIRE rely on mappings to bring together data from multiple sources.
Conversion & Exchange: Enables tools to export/import data in different formats (e.g., JSON-LD, XML, RDF)
Schema Alignment: Helps unify different vocabularies under common terms (e.g., aligning DataCite and schema.org for dataset indexing by search engines).

Common Mapping Targets for DataCite

DataCite Field	Target Vocabulary Term
identifier	dct:identifier
creator	foaf:Person, schema:creator
title	dct:title, schema:name
publicationYear	dct:issued, schema:datePublished
resourceType	dcat:Dataset, bibo:Document

Mappings can be formalized using semantic web technologies:

owl:equivalentProperty or owl:equivalentClass for exact semantic alignment
skos:exactMatch, skos:closeMatch, and skos:relatedMatch for vocabulary-level mapping

Crosswalks can also be maintained in structured formats like spreadsheets, RDF, JSON-LD, or XSLT depending on the application context.

Understanding Ontologies in the context of DataCite

An ontology defines the concepts, relationships, constraints, and logical rules that describe a particular domain. Unlike a traditional metadata schema that primarily captures structure and formatting, an ontology captures semantic meaning. This enables automated reasoning, data integration, and intelligent querying across systems.

In practice, an ontology allows you to:

Define classes (e.g., Dataset, Person, Organization)
Specify properties (e.g., hasAuthor, hasPublicationYear)
Declare relationships (e.g., a Dataset is authored by a Person)
Add logic and constraints (e.g., every Dataset must have at least one title)

Metadata Schema vs Ontology

Feature	Metadata Schema	Ontology
Purpose	Structural – defines fields and formats	Semantic – defines meaning, logic, and relationships
Example (DataCite)	JSON structure without formal semantics	OWL representation with reasoning support

While the DataCite metadata schema is highly structured, it lacks formal semantics. It defines fields like creator, title, and publicationYear, but it does not describe their interrelationships or logical constraints in a machine-interpretable way.

Why Use an Ontology for DataCite?

From Structure to Meaning

Metadata Schema (e.g., DataCite JSON): says “This field is called creator and its value is a name.”
Ontology (RDF/OWL): expresses “This Dataset must be linked to a Person as a creator, who may have an ORCID, and may be affiliated with an institution that has a country of operation.”

Benefits

1. Semantic Search Enables concept-based search: “Find all works authored by researchers affiliated with EU institutions.” Even if the institution names vary, ontologies allow reasoning over geographic and organizational relationships.
2. Reasoning Lets machines infer new facts: If an author is affiliated with an institution located in France, the system can infer the author is a French researcher, unless appropriate constraints are added.
3. Knowledge Graph Integration Ontologies support linking to external vocabularies like:

dct:title, dct:identifier
foaf:Person, schema:Dataset
owl:sameAs for external identifiers like ORCID, ROR

Example mappings
ex:hasTitle owl:equivalentProperty dct:title .
ex:hasCreator owl:equivalentProperty foaf:maker .
ex:Dataset owl:equivalentClass schema:Dataset .

4. Machine Interoperability Supports intelligent agents, recommendation engines, and automated workflows.

How RDF, RDFS, OWL, and SKOS Compare

Language	What it Does	Why Use It for DataCite
RDF	Language	Foundation layer to model metadata as a graph
RDFS	What it Does	Defines the roles and types of metadata elements
OWL	Stores data as triples	Enables validation, reasoning, AI integration
SKOS	Adds class/property structure	Ideal for tagging, browsing, subject vocabularies

JSON vs RDF/RDFS/OWL - Example Comparison

| **Concept** | **DataCite JSON** | **Explanation (JSON)** | **RDF** | **Explanation (RDF)** | **RDFS** | **Explanation (RDFS)** | **OWL** | **Explanation (OWL)** | |--------------|-------------------|------------------------|---------|-----------------------|----------|------------------------|---------|------------------------| | Creator | ""creators": [{"name": "Smith, Alice"}] | Lists the dataset's creator using a simple string value. ex:Dataset123 ex:hasCreator ex:AliceSmith | Links the dataset to a named individual as a resource. ex:hasCreator rdfs:domain ex:Dataset ; rdfs:range ex:Person | Declares that hasCreator links datasets to persons. Restriction: someValuesFrom ex:Person | Adds logic: every dataset must have at least one creator of type person. | | | | | Creator ORCID | ""nameIdentifier": "https://orcid.org/..."" | Specifies a string ORCID ID attached to the creator. ex:AliceSmith ex:hasORCID "https://orcid.org/..." | Connects the person to their ORCID using a data property. ex:hasORCID rdfs:range xsd:anyURI | Declares the expected type of ORCID values (URI strings). ex:AliceSmith owl:sameAs | Semantically links the individual to their global ORCID identity. | | | |

SKOS: A Lightweight Alternative

SKOS (Simple Knowledge Organization System) is used to model controlled vocabularies, taxonomies, and thesauri.

Use SKOS when you want:

Subject tagging (e.g., marine biology)
Simple hierarchies (broader/narrower concepts)
Multilingual support

Example:
ex:marineBiology a skos:Concept ;
skos:prefLabel “Marine Biology”@en ;
skos:broader ex:biology .

SKOS does not support logical constraints or inference, unlike OWL.