Background
Introduction sentence
📚 1. Start with Metadata
DataCite metadata is structured data in JSON. It tells us things like:
{
“title”: “Climate Data 2024”,
“creator”: “Alice Smith”,
“publicationYear”: “2024”
}
This is useful, but only humans (or systems programmed specifically for this format) know what these fields mean.
🧠 2. Add Meaning with an Ontology
An ontology defines:
- What creator means (e.g., it refers to a Person)
- What title means (e.g., it’s a label for a resource)
- How these things relate (e.g., every Dataset must have at least one title)
It uses RDF, RDFS, and OWL to formalize this. Once you do that, machines can understand the structure and the meaning. This is great for automated reasoning, quality checks, and linking with other systems.
🔗 3. Align Concepts with Mappings
But DataCite isn’t the only standard. Others, like DCAT, schema.org, Dublin Core, and Wikidata, also define things like title, creator, and publicationYear. To make DataCite metadata understandable to those systems, we create mappings.
Mappings say:
- DataCite creator = foaf:maker or schema:author
- DataCite title = dct:title or schema:name
They use statements like: |
---|
ex:hasCreator owl:equivalentProperty schema:author . |
So if a tool understands schema:author, it can now understand DataCite’s creator field too.
🔁 4. Transform Records with Crosswalks
A crosswalk is a recipe for transforming a full record from one format/schema into another. Think of it like a conversion chart:
DataCite Field | schema.org Equivalent |
---|---|
title | name |
creator.name | author.name |
publicationYear | datePublished |
This allows you to:
- Export DataCite metadata in schema.org JSON-LD
- Harvest and reuse metadata in another system (e.g., a library catalog or web search engine)
🚀 Putting It All Together
Here’s how they all relate:
Role | What it does | How it helps with DataCite |
---|---|---|
Ontology | Describes what the data means using concepts, logic, and relationships | Helps machines understand and reason over DataCite metadata |
Mapping | Aligns concepts in DataCite to other vocabularies (like schema.org or Dublin Core) | Enables linking and integration with other systems |
Crosswalk | Translates complete records from one metadata format into another | Makes DataCite metadata usable in external formats like schema.org JSON-LD |
Ontologies define meaning; mappings align that meaning across systems.
🧩 Example in Action
Let’s say you want to expose a DataCite record in schema.org so Google Dataset Search can find it:
- Ontology: Define that your resource is a Dataset, and creator is a Person.
- Mapping: Link your creator field to schema:author.
- Crosswalk: Use a table to translate the JSON fields:
✅ This is a schema.org JSON-LD representation of a dataset.
🟢 It reflects metadata from DataCite fields, translated into schema.org terms.
{ “@type”: “Dataset”,
“name”: “Climate Data 2024”,
“author”: {
“name”: “Alice Smith”
},
“datePublished”: “2024”
}
The crosswalk can be:
- A table
- A script or mapping file (e.g., XSLT, SPARQL CONSTRUCT, or Python script)
- An ontology alignment (owl:equivalentProperty)
It’s what enables the conversion of a DataCite JSON record into the JSON-LD snippet
Now your data is:
✅ Machine-readable
✅ Interoperable
✅ Findable on the web
Understanding Crosswalks and Mappings
Crosswalks and mappings play a critical role in making metadata interoperable across different schemas, domains, and systems. While ontologies formalize the semantics of a single schema, crosswalks and mappings are about connecting concepts between multiple schemas.
A crosswalk is a structured mapping between elements in two or more metadata standards. It shows how a field or concept in one schema (e.g., DataCite) aligns with one in another (e.g., Dublin Core, schema.org, MARC).
Mappings can be:
- Exact (e.g., DataCite:title is equivalent to dct:title)
- Close (e.g., creatorName maps to foaf:name, though with some nuance)
- One-to-many (e.g., a complex DataCite creator object may map to multiple elements in another schema)
Why Crosswalks & Mappings Matter
- Interoperability: They allow metadata from one system to be interpreted and used in another.
- Aggregation: Services like Europeana or OpenAIRE rely on mappings to bring together data from multiple sources.
- Conversion & Exchange: Enables tools to export/import data in different formats (e.g., JSON-LD, XML, RDF)
- Schema Alignment: Helps unify different vocabularies under common terms (e.g., aligning DataCite and schema.org for dataset indexing by search engines).
Common Mapping Targets for DataCite
DataCite Field | Target Vocabulary Term |
---|---|
identifier | dct:identifier |
creator | foaf:Person, schema:creator |
title | dct:title, schema:name |
publicationYear | dct:issued, schema:datePublished |
resourceType | dcat:Dataset, bibo:Document |
Mappings can be formalized using semantic web technologies:
- owl:equivalentProperty or owl:equivalentClass for exact semantic alignment
- skos:exactMatch, skos:closeMatch, and skos:relatedMatch for vocabulary-level mapping
Crosswalks can also be maintained in structured formats like spreadsheets, RDF, JSON-LD, or XSLT depending on the application context.
Understanding Ontologies in the context of DataCite
An ontology defines the concepts, relationships, constraints, and logical rules that describe a particular domain. Unlike a traditional metadata schema that primarily captures structure and formatting, an ontology captures semantic meaning. This enables automated reasoning, data integration, and intelligent querying across systems.
In practice, an ontology allows you to:
- Define classes (e.g., Dataset, Person, Organization)
- Specify properties (e.g., hasAuthor, hasPublicationYear)
- Declare relationships (e.g., a Dataset is authored by a Person)
- Add logic and constraints (e.g., every Dataset must have at least one title)
Metadata Schema vs Ontology
Feature | Metadata Schema | Ontology |
---|---|---|
Purpose | Structural – defines fields and formats | Semantic – defines meaning, logic, and relationships |
Example (DataCite) | JSON structure without formal semantics | OWL representation with reasoning support |
While the DataCite metadata schema is highly structured, it lacks formal semantics. It defines fields like creator, title, and publicationYear, but it does not describe their interrelationships or logical constraints in a machine-interpretable way.
Why Use an Ontology for DataCite?
From Structure to Meaning
- Metadata Schema (e.g., DataCite JSON): says “This field is called creator and its value is a name.”
- Ontology (RDF/OWL): expresses “This Dataset must be linked to a Person as a creator, who may have an ORCID, and may be affiliated with an institution that has a country of operation.”
Benefits
1. Semantic Search Enables concept-based search:
“Find all works authored by researchers affiliated with EU institutions.”
Even if the institution names vary, ontologies allow reasoning over geographic and organizational relationships.
2. Reasoning Lets machines infer new facts:
If an author is affiliated with an institution located in France, the system can infer the author is a French researcher, unless appropriate constraints are added.
3. Knowledge Graph Integration Ontologies support linking to external vocabularies like:
- dct:title, dct:identifier
- foaf:Person, schema:Dataset
- owl:sameAs for external identifiers like ORCID, ROR
Example mappings |
---|
ex:hasTitle owl:equivalentProperty dct:title . |
ex:hasCreator owl:equivalentProperty foaf:maker . |
ex:Dataset owl:equivalentClass schema:Dataset . |
4. Machine Interoperability Supports intelligent agents, recommendation engines, and automated workflows.
How RDF, RDFS, OWL, and SKOS Compare
Language | What it Does | Why Use It for DataCite |
---|---|---|
RDF | Language | Foundation layer to model metadata as a graph |
RDFS | What it Does | Defines the roles and types of metadata elements |
OWL | Stores data as triples | Enables validation, reasoning, AI integration |
SKOS | Adds class/property structure | Ideal for tagging, browsing, subject vocabularies |
JSON vs RDF/RDFS/OWL - Example Comparison
SKOS: A Lightweight Alternative
SKOS (Simple Knowledge Organization System) is used to model controlled vocabularies, taxonomies, and thesauri.
Use SKOS when you want:
- Subject tagging (e.g., marine biology)
- Simple hierarchies (broader/narrower concepts)
- Multilingual support
Example:
ex:marineBiology a skos:Concept ;
skos:prefLabel “Marine Biology”@en ;
skos:broader ex:biology .
SKOS does not support logical constraints or inference, unlike OWL.