This document is a DRAFT RO-Crate profile for Language Data resources. The
profile specifies the contents of RO-Crate Metadata Documents for language
resources and gives guidance on how to structure language data collections both
at the RO-Crate package level and in a repository containing multiple packages.
This profile assumes that the principles and standards set out in the PILARS protocols, or similar compatible approaches, are being used.
The core metadata vocabularies for this profile are:
RO-Crate recommendations for data packaging and basic discoverability metadata,
which is mostly Schema.org terms with a handful of additions. Following
RO-Crate practice, basic metadata terms such as "who, what, where" and
bibliographic-style descriptions are chosen from Schema.org (in preference to
other vocabularies such as Dublin Core or FOAF) where possible, with domain-specific
vocabularies used for things which are not common across domains
(such as types of language).
This document is primarily for use by tool developers, data scientists
and metadata specialists developing scripts or systems for user
communities. It is not intended for use by non-specialists.
Just as we would not expect repository users to type Dublin Core
metadata in XML format by hand, we do not expect our users to have to
deal directly with the JSON-LD presented here. This document is for tool
developers to build systems that crosswalk data from existing systems,
or allow for user-friendly data entry.
About this Profile
This profile covers various kinds of crate metadata:
Structural RO-Crate metadata: how the root dataset links to files, and
the abstract structure of nested collections (e.g. collections/corpora or other
curated datasets) and objects of study; linguistic Items, Sessions or Texts).
This profile assumes that a repository (for example, an OCFL storage root,
with an API for accessing it) exists and that it can at a minimum support
(a) listing all items of the repository and returning their RO-Crate metadata, and
(b) retrieving an item given its ID.
Types of language data: is this resource a dialogue? A written text? A
transcript or other annotation? Which file has which kind of data in it? What
is inside CSV and other structured files? The vocabulary used for
language-specific data is the
Language Data Commons vocabulary
which is being developed alongside this profile.
Contextual metadata: how to link people who had speaking,
authoring, collection roles, places, subjects.
Structural Metadata
The structural elements of a Language Data Commons RO-Crate are:
A Collection / Object hierarchy to allow language data to be
grouped. For example, a corpus with sub-corpora, or collections of
items (objects) from a particular region.
Dataset and File entities (as per RO-Crate). Files may be referenced
locally or via URI, for example, from an API. If an RO-Crate contains files, they MUST be linked to the root dataset as per the RO-Crate specification using either:
hasPart relationships on the object(s), or
isPartOf relationships on the file(s).
NOTE: The terms Collection and Object
are encoded in RO-Crate metadata using RepositoryCollection and
RepositoryObject types respectively. These in turn are re-named versions
of the Portland Common Data Model types,
pcdm:Collection
and
pcdm:Object.
A conformant RO-Crate:
Class: Dataset #class_Dataset
A body of structured information describing some topic(s) of interest.
Instances of this type MAY be present in the crate.
The contributor renders the discourse recorded in the resource into another language in real time, or the contributor explains the discourse recorded in the resource.
The participant performed some portion of a recorded, filmed, or transcribed resource. It is recommended that this term be used only for creative participants whose role is not better indicated by a more specific term, such as 'speaker', 'signer', or 'singer'.
The contributor was a principal signer in a resource that consists of a recording, a film, or a transcription of a recorded resource. Signers are those whose gestures predominate in a recorded or filmed resource. (The resource may be a transcription of that recording).
The participant sang, either individually or as part of a group, in a resource that consists of a recording, a film, or a transcription of a recorded resource.
The contributor was a principal speaker in a resource that consists of a recording, a film, or a transcription of a recorded resource. Speakers are those whose voices predominate in a recorded or filmed resource. (The resource may be a transcription of that recording).
The place(s) that are the focus of the content. It is a sub-property of contentLocation intended primarily for more technical and detailed materials. For example, with a dataset, it indicates areas that the dataset describes: a dataset Cape York languages would have spatialCoverage which was the place: the outline of the Cape.
The range of years of creation for items in this dataset using a slash, e.g. 1900/1945. If there are sub-collections with different coverages put this on the sub-collections not the top-level.
Additional information on licensing options for using the data, e.g. 'Contact the Data Steward to discuss license terms'.
http://schema.org/Text
A collection such as a corpus may be stored in a repository or
transmitted either as:
A distributed collection: a set of individual RO-Crates which
reference separate collection records with one Object and one
Collection per crate.
A bundled single crate: contains all the Collection and
Object data.
Distributed collections may reference member collections or Objects in
pcdm:hasMember property but should not include descriptions of Objects that
are stored elsewhere in the repository.
Classes
In linked data, a class is a resource that represents a concept or entity. Classes specific to the Language Data Commons Schema include:
A description of how this Object or Collection was obtained, such as the strategy used for selecting written source texts, or the prompts given to participants.
A license document, setting out terms for reuse of data.
Bidirectional Relationships
The relational hierachy between Collections, Objects and Files are represented bidirectionally in an RO-Crate by the terms hasPart/isPartOf and pcdm:hasMember/pcdm:memberOf.
Superset Term
Inverse Of
Subset Term
pcdm:hasMember
⟷
pcdm:memberOf
hasPart
⟷
isPartOf
Objects are placed in a Collection using the pcdm:memberOf property, which is required. The inverse will be encoded automatically using the pcdm:hasMember property on a Collection. Similarly, if using pcdm:hasMember, pcdm:memberOf will also be automatically encoded.
The same relationship applies for hasPart and isPartOf at the Object and File levels.
Superset Level
Relationship
Subset Level
Collection
→
pcdm:hasMember
→
Object
Collection
←
pcdm:memberOf
←
Object
Object
→
hasPart
→
File
Object
←
isPartOf
←
File
Depending on the data, using one term over another may be preferable when creating the hierarchical relationship. For example, if you are describing multiple files in a spreadsheet, it is easier to use isPartOf at the File level referencing the Object it belongs to, rather than listing all the hasPart entries at the Object level.
The following diagram shows how these relationships are encoded in a single "bundled" RO-Crate.
The next diagram shows how distributed crates (with one RO-Crate per Object and Collection) are linked.
Which linking strategy is used is an implementation choice for
repository developers.
When to choose collection-as-crate ("bundled") vs collection-in-multiple-crates ("distributed")
Use a single bundled crate for a collection when all of these conditions are true:
The collection is final and is expected to be stable, i.e. there is
negligible chance of having to withdraw any of its contents or
files.
The collection and all its files can easily be transferred in a
single transaction - say 20 GB total.
All the material in the corpus shares the same license for reuse.
Split a collection into distributed RepositoryCollection and
RepositoryObject crates, with one crate per repository object,
when any of these conditions are true:
The collection is not yet stable:
New items are being added or changed.
There is a chance that some data may have to be taken down or withdrawn at the request of participants.
The total size of the collection will present challenges for
data transfer.
There is more than one data reuse license applicable.
Collection
A collection is a group of related Objects. Examples of collections
include corpora, and sub-corpora, as well as aggregations of cultural
objects such as PARADISEC collections which bring together items
collected in a region or on a session with informants. This follows the
Alveo usage:
Items [Objects in this model] are grouped into collections which might
correspond to curated corpora such as ACE or informal collections such as a
sample of documents from the AustLit archive.
When an RO-Crate is used to package a collection that is part of
another Collection it has a pcdm:memberOf property which references a
resolvable ID (within the context of a repository or service) of the
parent Collection. The Collection may also list its members in a pcdm:hasMember
property, but this is not required.
The root dataset must have at least these @type values: ["Dataset", "RepositoryCollection"]
A Collection is a group of resources. Collections have descriptive metadata, access metadata, and may links to works and/or collections. By default, member works and collections are an unordered set, but can be ordered using the ORE Proxy class.
Instances of this type MAY be present in the crate.
Date information which cannot be put in one of the standard date formats, e.g. 'mid-1970s', or it is not clear, for example, if it is a creation or publication date.
An Object is a single unit linked to tightly related files, for example,
a dialogue or session in a speech study, or a work (document) in a written
corpus. This is based on the use of the term Item in Alveo:
The data model that we have developed for the storage of language
resources is built around the concept of an item which corresponds
(loosely) to a record of a single communication event. An item is
often associated with a single text, audio or video resource but could
include a number of resources, for example, the different channels of
audio recording, or an audio recording and associated textual
transcript. Items are grouped into collections which might correspond
to curated corpora such as ACE or informal collections such as a
sample of documents from the AustLit archive.
https://www.researchonline.mq.edu.au/vital/access/services/Download/mq:37347/DS01
The definition of an object is necessarily loose and needs to reflect
what data owners have chosen to do with their collections in the past.
If an RO-Crate contains a single Object, the Root Dataset would have a
@type property of ["Dataset", "RepositoryObject"] with a
conformsTo property pointing to the Language Data Commons Object profile
https://w3id.org/ldac/profile#Object (this document).
If an RO-Crate contains an entire collection, each Object has a
@type property of ["Dataset", "RepositoryObject"] and a conformsTo
property referencing this document. For example:
Objects SHOULD have files (which may be included in an RO-Crate for the
object, or as part of a collection crate).
In this example, the Object in question is an interview from a speech
corpus with three files. The diagram shows the relationships between
the object and its files (and the contextual metadata of a Person who
takes the role of the speaker/informant (discussed in more detail
below).
There are a number of terms that can be used to characterise resources -
these use the Schema.org mechanism of DefinedTerm and DefinedTermSet.
A RepositoryObject:
Class: RepositoryObject #class_RepositoryObject
An Object is an intellectual entity, sometimes called a "work", "digital object", etc. Objects have descriptive metadata, access metadata, may contain files and other Objects as member "components". Each level of a work is therefore represented by an Object instance, and is capable of standing on its own, being linked to from Collections and other Objects. Member Objects can be ordered using the ORE Proxy class.
Instances of this type MAY be present in the crate.
The identifier property represents any kind of identifier for any kind of [[Thing]], such as ISBNs, GTIN codes, UUIDs etc. Schema.org provides dedicated properties for representing many of these, either as textual strings or as URL (URI) links. See background notes for more details.
The temporalCoverage of a CreativeWork indicates the period that the content applies to, i.e. that it describes, either as a DateTime or as a textual string indicating a time period in ISO 8601 time interval format. In the case of a Dataset it will typically indicate the relevant time period in a precise notation (e.g. for a 2011 census dataset, the year 2011 would be written "2011/2012"). Other forms of content, e.g. ScholarlyArticle, Book, TVSeries or TVEpisode, may indicate their temporalCoverage in broader terms - textually or via well-known URL. Written works such as books may sometimes have precise temporal coverage too, e.g. a work set in 1939 - 1945 can be indicated in ISO 8601 interval format format via "1939/1945". Open-ended date ranges can be written with ".." in place of the end date. For example, "2015-11/.." indicates a range beginning in November 2015 and with no specified final date. This is tentative and might be updated in future when ISO 8601 is officially updated.
http://schema.org/Text
Files
There are three important types of files (or references to other
works) that may be included: ldac:PrimaryMaterial which is a recording or
original text, or a citation of or proxy for it, ldac:DerivedMaterial which
has been generated or sampled from primary material by a process such as format
conversion or digitization, and ldac:Annotation, which contains one or more types of
analysis of the ldac:PrimaryMaterial or ldac:DerivedMaterial.
A File:
Class: File #class_File
A media object, such as an image, video, audio, or text object embedded in a web page or a downloadable dataset i.e. DataDownload.
Instances of this type MAY be present in the crate.
This property references another resource from which the current resource is derived, e.g. downsampling audio or video files, or extracting text from a PDF.
ldac:PrimaryMaterial may be a video or audio file if it is available, or may be a ContextualEntity referencing a primary text such as a book.
ldac:DerivedMaterial
ldac:DerivedMaterial is a non-analytical derivation from ldac:PrimaryMaterial, for example, downsampled video or excerpted text.
ldac:Annotation
ldac:Annotation is a description or analysis of other material. More than one type of annotation may be present in a file.
Describing the columns in CSV or other tabular data
CSV or similar tabular files are often used to represent transcribed
speech or sign language data, sometimes also with time codes. To enable
automated location of which column is which, use a frictionless Table
Schema described by a File entity in the crate.
For example:
${exampleEntities('art', ['art_schema.json'])}
Places
The place in which data was collected may be indicated using the contentLocation property.
Identifiers for Objects and Collections MUST be URIs.
Internally, identifiers for all entities that do not have their own URIs
may use the Archive and Packaging identifier scheme (ARCP), which allows for a DNS-like namespacing of
identifiers. For example, the Sydney Speaks corpus top-level
collection would have the ID:
Some corpora express ages and other demographics of participants - this
presents a data modelling challenge, as age and some other variables change
over time, so if the same person appears over time then we need to have a
base Person with date of birth etc. as well as time-based instances of the person
with an age, social status, gender etc. at that time.
There are three levels at which contributions to an object can be
modelled:
Include one or more Person items as context in a crate and reference
them with properties such as creator or the
Language Data Commons Vocabulary properties such as ldac:compiler
or ldac:depositor. The @id of the person MUST be a URI and SHOULD
be re-used where the same person appears in multiple objects in a
collection or repository.
For longitudinal studies where it is important to record changing
demographic information for a Person, or where precision is
required in listing contributions to a work use
ldac:PersonSnapshot.
If it is important to record lots of contributions to a work (e.g. in
analysis of a joint work) use Action. If more precision is
required in describing the provenance of items, e.g. this work on
The Declaration of Rights of Man and of the
Citizen
(Lorber-Kasunic & Sweetapple).
NOTE: If this approach is used, special care will have to be taken in
developing user interfaces and/or training communities to use this way
of modelling metadata; the user need not see the underlying
structure. This profile does not give advice about how to do this as
we have not seen a use case that requires it.
Collection events such as "Sessions"
Where data is collected from participants in a speech study with
elicitation tasks such as "sessions" (see this IMDI
document)
or field interviews, this can be recorded in metadata via the
CollectionEvent class.
The indirection in this conforms-to relationship is to allow multiple
objects to have a conformsTo property which indicates that they conform
to the same schema while having a local copy of the schema, as per
RO-Crate best practice of having all local context to use a data
packages in the package where possible.
References
Himmelmann, Nikolaus P. 2012. Linguistic data types and the interface
between language documentation and description. Language documentation
& conservation. University of Hawai'i Press 6. 187--207.
Indicates that a DataReuseLicense requires some kind of authorization step, from SelfAuthorization (click-through) to processes that require a data steward to grant permission.
An annotation that assigns lexical elements of language to classes on the basis of their distributional properties (for sign languages, the term 'sign class' is appropriate).
An annotation that includes information about the sound system of a language, such as the contrasts between sounds which make up the sound system and the locally conditioned realisations of sounds which characterise speech in the language.
An annotation that provides a symbolic record of intonation, stress, tone or other suprasegmental features, which is expressed independently of regular phonetic transcription.
A user is expected to explicitly agree to a set of license terms, this may be combined with AccessControlList - to note that even if a user has been pre-approved for a license they must agree to license terms.
Users may apply for a license via some workflow, such as a form, with the decision being made by a DataSteward or their delegate about whether to grant the license.
A user can be authorised to access data by clicking that they agree to a license, or filling out a form to check their understanding, which can be validated by a machine and does not require human intervention.
An interactive discourse with two or more participants. Examples of dialogues include conversations, interviews, correspondence, consultations, greetings and leave-takings.
Language whose primary function is to be part of play, or a style of speech that involves a creative manipulation of the structures of the language. Examples of ludic discourse are play languages, jokes, secret languages, and speech disguises.
A discourse, monologic or co-constructed, which represents temporally organised events. Types of narratives include historical, traditional, and personal narratives, myths, folktales, fables, and humorous stories.
The art of public speaking, or of speaking eloquently according to rules or conventions. Examples of oratory include sermons, lectures, political speeches, and invocations.
This is derived from another source, such as a Primary Material, via some process, e.g. a downsampled video or a sample or an abstract of a resource that is not an annotation (an analysis or description).
The resource was written using a writing implement such as a pen, pencil, brush or computer stylus (except where the digital handwriting is converted to standard text).
The resource contains text produced on a typewriter.
30 classes · 99 properties
Classes
CollectionEvent — A description of an event at which one or more PrimaryMaterials were captured, e #class_CollectionEvent
CreativeWork — The most generic kind of creative work, including books, movies, photographs, software programs, etc #class_CreativeWork
DataDepositLicense — A license document setting out terms for deposit into a repository #class_DataDepositLicense
DataLicense — A license document for data licensing #class_DataLicense
DataReuseLicense — A license document, setting out terms for reuse of data #class_DataReuseLicense
Dataset — A body of structured information describing some topic(s) of interest #class_Dataset
dct:Collection — An aggregation of resources #class_dct:Collection
dct:Dataset — Data encoded in a defined structure #class_dct:Dataset
dct:Event — A non-persistent, time-based occurrence #class_dct:Event
dct:Image — A visual representation other than text #class_dct:Image
dct:InteractiveResource — A resource requiring interaction from the user to be understood, executed, or experienced #class_dct:InteractiveResource
dct:MovingImage — A series of visual representations imparting an impression of motion when shown in succession #class_dct:MovingImage
dct:PhysicalObject — An inanimate, three-dimensional object or substance #class_dct:PhysicalObject
dct:Service — A system that provides one or more functions #class_dct:Service
dct:Software — A computer program in source or compiled form #class_dct:Software
dct:Sound — A resource primarily intended to be heard #class_dct:Sound
dct:StillImage — A static visual representation #class_dct:StillImage
dct:Text — A resource consisting primarily of words for reading #class_dct:Text
File — A media object, such as an image, video, audio, or text object embedded in a web page or a downloadable dataset i #class_File
Geometry — A coherent set of direct positions in space #class_Geometry
Language — Natural languages such as Spanish, Tamil, Hindi, English, etc #class_Language
ldac:CollectionProtocol — A description of how this Object or Collection was obtained, such as the strategy used for selecting written source texts, or the prompts given to participants #class_ldac:CollectionProtocol
Organization — An organization such as a school, NGO, corporation, club, etc #class_Organization
Person — A person (alive, dead, undead, or fictional) #class_Person
Place — Entities that have a somewhat fixed, physical extension #class_Place
README Entity — An Data Package MUST contain a README file that describes the contents of the package #README_Entity
RepositoryCollection — A Collection is a group of resources #class_RepositoryCollection
RepositoryObject — An Object is an intellectual entity, sometimes called a "work", "digital object", etc #class_RepositoryObject
RO-Crate Metadata Descriptor — An RO-Crate @graph must contain an entity of Type @CreativeWork which is known as the RO-Crate Metadata descriptor #RO-Crate_Metadata_Descriptor
Root Data Entity — The Root Data Entity for an RO-Crate #Root_Data_Entity
conformsTo — A link to the language data commons RO-Crate profile for collections #prop_conformsTo_RepositoryCollection
conformsTo — A link to the language data commons RO-Crate profile for collections #prop_conformsTo_RepositoryObject
contentLocation — The location depicted or described in the content #prop_contentLocation_RepositoryCollection
contentSize — File size in (mega/kilo)bytes #prop_contentSize_File
creator — The creator/author of this CreativeWork #prop_creator_RepositoryObject
creditText — A free text bibliographic citation for this material, e #prop_creditText_Dataset
dateCreated — The (earliest) date the data in this dataset were created #prop_dateCreated_RepositoryCollection
dateCreated — The date on which the CreativeWork was created or the item was added to a DataFeed #prop_dateCreated_RepositoryObject
datePublished — A date that this collection was published #prop_datePublished_Dataset
dct:rightsHolder — The person or organisation owning or managing rights over the resource #prop_dct:rightsHolder_Dataset
description — An abstract of the collection #prop_description_Dataset
description — A description of the item #prop_description_RepositoryObject
encodingFormat — The media type typically expressed using a MIME format #prop_encodingFormat_File
funder — The organisation(s) responsible for funding the creation or collection of this dataset #prop_funder_Dataset
geo — The geographic coordinates of the place #prop_geo_Place
geosparql:asWKT — The WKT serialisation of the geometry #prop_geosparql:asWKT_Geometry
hasPart — An item or CreativeWork that is part of this item, or CreativeWork (in some sense) #prop_hasPart_Dataset
hasPart — An item or CreativeWork that is part of this item, or CreativeWork (in some sense) #prop_hasPart_File
holdingArchive — Organisation where the original of this work or collection is housed #prop_holdingArchive_RepositoryCollection
identifier — The identifier property represents any kind of identifier for any kind of [[Thing]], such as ISBNs, GTIN codes, UUIDs etc #prop_identifier_RepositoryObject
inLanguage — The language in which the resource is written #prop_inLanguage_RepositoryCollection
isAccessibleForFree — This is available under an Open Access license #prop_isAccessibleForFree_Dataset
isBasedOn — Link to or description of an original resource #prop_isBasedOn_Dataset
isbn — The ISBN for this work, if applicable #prop_isbn_CreativeWork
isPartOf — An item or CreativeWork that this item, or CreativeWork (in some sense), is part of #prop_isPartOf_Dataset
issn — The ISSN for this publication #prop_issn_CreativeWork
ldac:access — Whether this is an open or restricted access license #prop_ldac:access_DataReuseLicense
ldac:accessControlList — When a license has an authorizationWorkflow property with a value of the DefinedTerm AccessControlList this property has a URI value that points to a list of userIDs #prop_ldac:accessControlList_DataReuseLicense
ldac:age — The age of a person #prop_ldac:age_Person
ldac:annotationOf — This resource contains some kind of description that adds information to the resource it references #prop_ldac:annotationOf_Dataset
ldac:annotationType — The type of an Annotation resource #prop_ldac:annotationType_CreativeWork
ldac:annotator — The participant produced an annotation of this or a related resource #prop_ldac:annotator_Dataset
ldac:authorizationWorkflow — By what process a user is granted authorization to a license #prop_ldac:authorizationWorkflow_DataReuseLicense
ldac:channels — The number of audio channels this resource contains (e #prop_ldac:channels_CreativeWork
ldac:collectionEventType — A kind of CollectionEvent characterised by some specific procedures, e #prop_ldac:collectionEventType_CollectionEvent
ldac:collectionProtocolType — A description of the process used to collect or collate data, such as prompts given to participants, or how texts are selected for inclusion in a collection #prop_ldac:collectionProtocolType_ldac:CollectionProtocol
ldac:communicationMode — The mode (spoken, written, signed etc #prop_ldac:communicationMode_CreativeWork
ldac:compiler — The participant is responsible for collecting the sub-parts of the resource together #prop_ldac:compiler_Dataset
ldac:consultant — The participant contributes expertise to the creation of a work, for example by contributing knowledge of their native language #prop_ldac:consultant_Dataset
ldac:dataInputter — The participant responsible for entering, re-typing, and/or structuring the data contained in the resource #prop_ldac:dataInputter_Dataset
ldac:dateFreeText — Date information which cannot be put in one of the standard date formats, e #prop_ldac:dateFreeText_RepositoryCollection
ldac:depositor — The participant responsible for depositing the resource in an archive #prop_ldac:depositor_Dataset
ldac:derivationOf — This property references another resource from which the current resource is derived, e #prop_ldac:derivationOf_File
ldac:developer — The participant developed the methodology or tools (including software) that constitute the resource, or that were used to create the resource #prop_ldac:developer_Dataset
ldac:doi — A Digital Object Identifier, e #prop_ldac:doi_Dataset
ldac:editor — The participant reviewed, corrected, and/or tested the resource #prop_ldac:editor_Dataset
ldac:hasAnnotation — This resource is referenced by another resource that adds information to it such as a translation, transcription or other analysis #prop_ldac:hasAnnotation_RepositoryObject
ldac:hasCollectionProtocol — A link to a CollectionProtocol object with (at least) a summary of how resources were selected or elicited for this collection/sub-collection #prop_ldac:hasCollectionProtocol_Dataset
ldac:hasDerivation — This property references another resource that is derived from it, such as a downsampled audio or video file, or text extracted from a PDF #prop_ldac:hasDerivation_File
ldac:illustrator — The participant contributed drawings or other illustrations to the resource #prop_ldac:illustrator_Dataset
ldac:indexableText — One or more target File(s) that together contain the full text of an item – each file should indicate its language #prop_ldac:indexableText_CreativeWork
ldac:interpreter — The contributor renders the discourse recorded in the resource into another language in real time, or the contributor explains the discourse recorded in the resource #prop_ldac:interpreter_Dataset
ldac:interviewee — The participant was a respondent in an interview #prop_ldac:interviewee_Dataset
ldac:interviewer — The participant conducted an interview that forms part of the resource #prop_ldac:interviewer_Dataset
ldac:isDeIdentified — The data in this item has had potentially identifying information removed, which may include replacing names with pseudonyms #prop_ldac:isDeIdentified_CreativeWork
ldac:itemLocation — Current location of the item, e #prop_ldac:itemLocation_RepositoryCollection
ldac:linguisticGenre — A linguistic classification of the genre of this resource #prop_ldac:linguisticGenre_CreativeWork
ldac:material — Description of the original media, e #prop_ldac:material_CreativeWork
ldac:materialType — Indicates whether the material in a file is the original (primary) source or is derived from it or describes it via annotation #prop_ldac:materialType_File
ldac:openAccessIndex — One or more public index types allowed by a license, e #prop_ldac:openAccessIndex_CreativeWork
ldac:participant — The participant was present during the creation of the resource, but did not contribute substantially to its content #prop_ldac:participant_Dataset
ldac:performer — The participant performed some portion of a recorded, filmed, or transcribed resource #prop_ldac:performer_Dataset
ldac:photographer — The participant took the photograph, or shot the film, that appears in or constitutes the resource #prop_ldac:photographer_Dataset
ldac:recorder — The participant operated the recording machinery used to create the resource #prop_ldac:recorder_Dataset
ldac:register — The type of register (any of the varieties of a language that a speaker uses in a particular social context [Merriam-Webster]) of the contents of a language resource #prop_ldac:register_CreativeWork
ldac:researcher — The resource was created as part of the participant's research, or the research presents interim or final results from the participant's research #prop_ldac:researcher_Dataset
ldac:researchParticipant — The participant acted as a research subject or responded to a questionnaire, the results of which study form the basis of the resource #prop_ldac:researchParticipant_Dataset
ldac:responder — The participant was an interlocutor in some sort of discourse event, but only reacted to the contributions of others #prop_ldac:responder_Dataset
ldac:reviewDate — The date that this license should be reviewed #prop_ldac:reviewDate_DataLicense
ldac:signer — The contributor was a principal signer in a resource that consists of a recording, a film, or a transcription of a recorded resource #prop_ldac:signer_Dataset
ldac:singer — The participant sang, either individually or as part of a group, in a resource that consists of a recording, a film, or a transcription of a recorded resource #prop_ldac:singer_Dataset
ldac:speaker — The contributor was a principal speaker in a resource that consists of a recording, a film, or a transcription of a recorded resource #prop_ldac:speaker_Dataset
ldac:sponsor — The participant contributed financial support to the creation of the resource #prop_ldac:sponsor_Dataset
ldac:subjectLanguage — The languages that the materials in the collection are about (not the language that it is in) #prop_ldac:subjectLanguage_RepositoryCollection
ldac:transcriber — The participant produced a transcription of this or a related resource #prop_ldac:transcriber_Dataset
ldac:translator — The participant produced a translation of this or a related resource #prop_ldac:translator_Dataset
ldac:writtenLanguageFormat — The format of the resource resulting from the way the text was produced (handwritten, typeset, typewritten) #prop_ldac:writtenLanguageFormat_CreativeWork
license — A license document that applies to this content, typically indicated by URL #prop_license_Dataset
license — A license document that applies to this content, typically indicated by URL #prop_license_RepositoryObject
location — A location for the organisation, e #prop_location_Organization
name — The name of this data collection #prop_name_Dataset
pcdm:hasMember — The sub-collections, if any, associated with this collection #prop_pcdm:hasMember_Dataset
pcdm:memberOf — Links from a Repository Object or Collection to a containing Repository Object or Collection #prop_pcdm:memberOf_Dataset
publisher — The organisation that published this work #prop_publisher_CreativeWork
publisher — The organisation responsible for releasing this dataset #prop_publisher_Dataset
recipient — The person or organisation responsible for creating this work #prop_recipient_CreativeWork
spatialCoverage — The place(s) that are the focus of the content #prop_spatialCoverage_Dataset
temporalCoverage — The range of years of creation for items in this dataset using a slash, e #prop_temporalCoverage_Dataset
temporalCoverage — The temporalCoverage of a CreativeWork indicates the period that the content applies to, i #prop_temporalCoverage_RepositoryObject
usageInfo — Additional information on licensing options for using the data, e #prop_usageInfo_Dataset
Types of entities (specializations of Classes) and expected Properties
Class: CollectionEvent #class_CollectionEvent
A description of an event at which one or more PrimaryMaterials were captured, e.g. as video or audio.
Instances of this type MAY be present in the crate.
The type of register (any of the varieties of a language that a speaker uses in a particular social context [Merriam-Webster]) of the contents of a language resource.
When a license has an authorizationWorkflow property with a value of the DefinedTerm AccessControlList this property has a URI value that points to a list of userIDs.
The contributor renders the discourse recorded in the resource into another language in real time, or the contributor explains the discourse recorded in the resource.
The participant performed some portion of a recorded, filmed, or transcribed resource. It is recommended that this term be used only for creative participants whose role is not better indicated by a more specific term, such as 'speaker', 'signer', or 'singer'.
The contributor was a principal signer in a resource that consists of a recording, a film, or a transcription of a recorded resource. Signers are those whose gestures predominate in a recorded or filmed resource. (The resource may be a transcription of that recording).
The participant sang, either individually or as part of a group, in a resource that consists of a recording, a film, or a transcription of a recorded resource.
The contributor was a principal speaker in a resource that consists of a recording, a film, or a transcription of a recorded resource. Speakers are those whose voices predominate in a recorded or filmed resource. (The resource may be a transcription of that recording).
The place(s) that are the focus of the content. It is a sub-property of contentLocation intended primarily for more technical and detailed materials. For example, with a dataset, it indicates areas that the dataset describes: a dataset Cape York languages would have spatialCoverage which was the place: the outline of the Cape.
The range of years of creation for items in this dataset using a slash, e.g. 1900/1945. If there are sub-collections with different coverages put this on the sub-collections not the top-level.
This property references another resource from which the current resource is derived, e.g. downsampling audio or video files, or extracting text from a PDF.
A description of how this Object or Collection was obtained, such as the strategy used for selecting written source texts, or the prompts given to participants.
Instances of this type MAY be present in the crate.
A description of the process used to collect or collate data, such as prompts given to participants, or how texts are selected for inclusion in a collection.
A Collection is a group of resources. Collections have descriptive metadata, access metadata, and may links to works and/or collections. By default, member works and collections are an unordered set, but can be ordered using the ORE Proxy class.
Instances of this type MAY be present in the crate.
Date information which cannot be put in one of the standard date formats, e.g. 'mid-1970s', or it is not clear, for example, if it is a creation or publication date.
An Object is an intellectual entity, sometimes called a "work", "digital object", etc. Objects have descriptive metadata, access metadata, may contain files and other Objects as member "components". Each level of a work is therefore represented by an Object instance, and is capable of standing on its own, being linked to from Collections and other Objects. Member Objects can be ordered using the ORE Proxy class.
Instances of this type MAY be present in the crate.
The identifier property represents any kind of identifier for any kind of [[Thing]], such as ISBNs, GTIN codes, UUIDs etc. Schema.org provides dedicated properties for representing many of these, either as textual strings or as URL (URI) links. See background notes for more details.
The temporalCoverage of a CreativeWork indicates the period that the content applies to, i.e. that it describes, either as a DateTime or as a textual string indicating a time period in ISO 8601 time interval format. In the case of a Dataset it will typically indicate the relevant time period in a precise notation (e.g. for a 2011 census dataset, the year 2011 would be written "2011/2012"). Other forms of content, e.g. ScholarlyArticle, Book, TVSeries or TVEpisode, may indicate their temporalCoverage in broader terms - textually or via well-known URL. Written works such as books may sometimes have precise temporal coverage too, e.g. a work set in 1939 - 1945 can be indicated in ISO 8601 interval format format via "1939/1945". Open-ended date ranges can be written with ".." in place of the end date. For example, "2015-11/.." indicates a range beginning in November 2015 and with no specified final date. This is tentative and might be updated in future when ISO 8601 is officially updated.
This property on the RO-Crate Metadata Descriptor references the Root Data Entity. In a SoSS+ profile there may be Schemas present for more than one 'flavour' of Root Data Entity with different @type arrays or @conformsTo references (or other specializations).
There must with the path README.html in the root of the RO-Crate, and it must be described by an entity of type README_Entity with an @id of README.html.
The Root Data Entity for an RO-Crate. This is the main entity of the RO-Crate and is the one that is referenced by the RO-Crate Metadata Descriptor. In this profile, it is a Dataset and RepositoryCollection.
At least 1 instances of this type MUST be present in the crate.
A maximum of 1 instances of this type MAY be present in the crate.
There must with the path README.html in the root of the RO-Crate, and it must be described by an entity of type README_Entity with an @id of README.html.
Property: about ⓘ #RO-Crate_Metadata_Descriptor.about
Description
Range
Occurs in Domain(s)
This property on the RO-Crate Metadata Descriptor references the Root Data Entity. In a SoSS+ profile there may be Schemas present for more than one 'flavour' of Root Data Entity with different @type arrays or @conformsTo references (or other specializations).
The identifier property represents any kind of identifier for any kind of [[Thing]], such as ISBNs, GTIN codes, UUIDs etc. Schema.org provides dedicated properties for representing many of these, either as textual strings or as URL (URI) links. See background notes for more details.
When a license has an authorizationWorkflow property with a value of the DefinedTerm AccessControlList this property has a URI value that points to a list of userIDs.
A description of the process used to collect or collate data, such as prompts given to participants, or how texts are selected for inclusion in a collection.
Date information which cannot be put in one of the standard date formats, e.g. 'mid-1970s', or it is not clear, for example, if it is a creation or publication date.
This property references another resource from which the current resource is derived, e.g. downsampling audio or video files, or extracting text from a PDF.
The contributor renders the discourse recorded in the resource into another language in real time, or the contributor explains the discourse recorded in the resource.
The participant performed some portion of a recorded, filmed, or transcribed resource. It is recommended that this term be used only for creative participants whose role is not better indicated by a more specific term, such as 'speaker', 'signer', or 'singer'.
The type of register (any of the varieties of a language that a speaker uses in a particular social context [Merriam-Webster]) of the contents of a language resource.
The contributor was a principal signer in a resource that consists of a recording, a film, or a transcription of a recorded resource. Signers are those whose gestures predominate in a recorded or filmed resource. (The resource may be a transcription of that recording).
The participant sang, either individually or as part of a group, in a resource that consists of a recording, a film, or a transcription of a recorded resource.
The contributor was a principal speaker in a resource that consists of a recording, a film, or a transcription of a recorded resource. Speakers are those whose voices predominate in a recorded or filmed resource. (The resource may be a transcription of that recording).
The place(s) that are the focus of the content. It is a sub-property of contentLocation intended primarily for more technical and detailed materials. For example, with a dataset, it indicates areas that the dataset describes: a dataset Cape York languages would have spatialCoverage which was the place: the outline of the Cape.
The range of years of creation for items in this dataset using a slash, e.g. 1900/1945. If there are sub-collections with different coverages put this on the sub-collections not the top-level.
The temporalCoverage of a CreativeWork indicates the period that the content applies to, i.e. that it describes, either as a DateTime or as a textual string indicating a time period in ISO 8601 time interval format. In the case of a Dataset it will typically indicate the relevant time period in a precise notation (e.g. for a 2011 census dataset, the year 2011 would be written "2011/2012"). Other forms of content, e.g. ScholarlyArticle, Book, TVSeries or TVEpisode, may indicate their temporalCoverage in broader terms - textually or via well-known URL. Written works such as books may sometimes have precise temporal coverage too, e.g. a work set in 1939 - 1945 can be indicated in ISO 8601 interval format format via "1939/1945". Open-ended date ranges can be written with ".." in place of the end date. For example, "2015-11/.." indicates a range beginning in November 2015 and with no specified final date. This is tentative and might be updated in future when ISO 8601 is officially updated.