⬇️ Download profile metadata (JSON-LD)  |  πŸ“ View whole crate on GitHub

Language Data Commons Schema

This is a language data schema, in the style of the Schema.org schema. It is based on OLAC terms for use in the LDaCA project and is published at https://w3id.org/ldac/terms. This schema builds on Schema.org and is intended to be used with the Language Data Commons RO-Crate Profile: https://w3id.org/ldac/profile.

5 classes · 54 properties

Classes

  • CollectionEvent β€” A description of an event at which one or more PrimaryMaterials were captured, e
  • CollectionProtocol β€” A description of how this Object or Collection was obtained, such as the strategy used for selecting written source texts, or the prompts given to participants
  • DataDepositLicense β€” A license document setting out terms for deposit into a repository
  • DataLicense β€” A license document for data licensing
  • DataReuseLicense β€” A license document, setting out terms for reuse of data

Properties

  • access β€” Whether this is an open or restricted access license
  • accessControlList β€” When a license has an authorizationWorkflow property with a value of the DefinedTerm AcessControlList this property has a URI value that points to a list of userIDs
  • age β€” The age or age range of a person, e
  • annotationOf β€” This resource contains some kind of description that adds information to the resource it references
  • annotationType β€” The type of annotation for Annotation resources
  • annotator β€” The participant produced an annotation of this or a related resource
  • authorizationWorkflow β€” By what process a user is granted authorization to a license
  • channels β€” The number of audio channels this resource contains (e
  • collectionEventType β€” A kind of CollectionEvent characterised by some specific procedures, e
  • collectionProtocolType β€” A description of the process used to collect or collate data, such as prompts given to participants, or how texts are selected for inclusion in a collection
  • communicationMode β€” The mode (spoken, written, signed etc
  • compiler β€” The participant is responsible for collecting the sub-parts of the resource together
  • consultant β€” The participant contributes expertise to the creation of a work, for example by contributing knowledge of their native language
  • dataInputter β€” The participant was responsible for entering, re-typing, and/or structuring the data contained in the resource
  • dateFreeText β€” Date information that cannot be put in one of the standard date formats, e
  • depositor β€” The participant was responsible for depositing the resource in an archive
  • derivationOf β€” This property references another resource from which the current resource is derived, e
  • developer β€” The participant developed the methodology or tools (including software) that constitute the resource, or that were used to create the resource
  • doi β€” A Digital Object Identifier
  • editor β€” The participant reviewed, corrected, and/or tested the resource
  • geoJSON β€” A valid GEOJson feature or feature collection as a string that can be parsed as JSON
  • hasAnnotation β€” This resource is referenced by another resource that describes it such as a translation, transcription or other analysis
  • hasCollectionProtocol β€” This resource was assembled or collected according to the linked protocol
  • hasDerivation β€” This property references another resource that is derived from it such as a downsampled audio or video file, or text extracted from a PDF
  • illustrator β€” The participant contributed drawings or other illustrations to the resource
  • indexableText β€” One or more target File(s) that together contain the full text of an item – each file should indicate its language
  • interpreter β€” The contributor renders the discourse recorded in the resource into another language in real time, or the contributor explains the discourse recorded in the resource
  • interviewee β€” The participant was a respondent in an interview
  • interviewer β€” The participant conducted an interview that forms part of the resource
  • isDeIdentified β€” The data in this item has had potentially identifying information removed, which may include replacing names with pseudonyms
  • itemLocation β€” Current location of the item, e
  • linguisticGenre β€” A linguistic classification of the genre of this resource
  • mainText β€” Identifies the most relevant sub-component for computational text analytics
  • material β€” Description of the original media, e
  • materialType β€” Indicates whether the material in a file is the original (primary) source or is derived from it or describes it via annotation
  • openAccessIndex β€” One or more public index types allowed by a license, e
  • orthographicNotes β€” A description of the specific orthographic writing system(s) used in the material (e
  • participant β€” The participant was present during the creation of the resource, but did not contribute substantially to its content
  • performer β€” The participant performed some portion of a recorded, filmed, or transcribed resource
  • photographer β€” The participant took the photograph, or shot the film, that appears in or constitutes the resource
  • recorder β€” The participant operated the recording machinery used to create the resource
  • register β€” The type of register (any of the varieties of a language that a speaker uses in a particular social context [Merriam-Webster]) of the contents of a language resource
  • researcher β€” The resource was created as part of the participant's research, or the research presents interim or final results from the participant's research
  • researchParticipant β€” The participant acted as a research subject or responded to a questionnaire, the results of which study form the basis of the resource
  • responder β€” The participant was an interlocutor in some sort of discourse event, but only reacted to the contributions of others
  • reviewDate β€” The date that this license should be reviewed
  • signer β€” The contributor was a principal signer in a resource that consists of a recording, a film, or a transcription of a recorded resource
  • singer β€” The participant sang, either individually or as part of a group, in a resource that consists of a recording, a film, or a transcription of a recorded resource
  • speaker β€” The contributor was a principal speaker in a resource that consists of a recording, a film, or a transcription of a recorded resource
  • sponsor β€” The participant contributed financial support to the creation of the resource
  • subjectLanguage β€” The language(s) that this annotation resource is about
  • transcriber β€” The participant produced a transcription of this or a related resource
  • translator β€” The participant produced a translation of this or a related resource
  • writtenLanguageFormat β€” The format of the resource resulting from the way the text was produced (handwritten, typeset, typewritten)

Types of entities (specializations of Classes) and expected Properties

Class: CollectionEvent

A description of an event at which one or more PrimaryMaterials were captured, e.g. as video or audio.

Instances of this type MAY be present in the crate.

Min Count Max Count
N/A N/A
Property Required Description Range Value
@type Yes
collectionEventType No A kind of CollectionEvent characterised by some specific procedures, e.g. a psycholinguistic experiment. Session

Class: CollectionProtocol

A description of how this Object or Collection was obtained, such as the strategy used for selecting written source texts, or the prompts given to participants.

Instances of this type MAY be present in the crate.

Min Count Max Count
N/A N/A
Property Required Description Range Value
@type Yes
collectionProtocolType No A description of the process used to collect or collate data, such as prompts given to participants, or how texts are selected for inclusion in a collection.

Class: DataDepositLicense

A license document setting out terms for deposit into a repository.

Instances of this type MAY be present in the crate.

Min Count Max Count
N/A N/A
Property Required Description Range Value
@type Yes
No properties defined for this class

Class: DataLicense

A license document for data licensing. This is a superclass of DataReuseLicense and DataDepositLicense.

Instances of this type MAY be present in the crate.

Min Count Max Count
N/A N/A
Property Required Description Range Value
@type Yes
reviewDate No The date that this license should be reviewed.

Class: DataReuseLicense

A license document, setting out terms for reuse of data.

Instances of this type MAY be present in the crate.

Min Count Max Count
N/A N/A
Property Required Description Range Value
@type Yes
access No Whether this is an open or restricted access license.
accessControlList No When a license has an authorizationWorkflow property with a value of the DefinedTerm AcessControlList this property has a URI value that points to a list of userIDs. http://schema.org/URL
authorizationWorkflow No By what process a user is granted authorization to a license.

All Properties

Property: access

Description Range Occurs in Domain(s)
Whether this is an open or restricted access license. DataReuseLicense

Property: accessControlList

Description Range Occurs in Domain(s)
When a license has an authorizationWorkflow property with a value of the DefinedTerm AcessControlList this property has a URI value that points to a list of userIDs. http://schema.org/URL DataReuseLicense

Property: age

Description Range Occurs in Domain(s)
The age or age range of a person, e.g. 25, 30-50, >50. If an age is specified, a specializationOf pointing to a 'canonical' ageless version of that Person can also be included. http://schema.org/Text http://schema.org/Person

Property: annotationOf

Description Range Occurs in Domain(s)
This resource contains some kind of description that adds information to the resource it references. PrimaryMaterial http://schema.org/CreativeWork

Property: annotationType

Description Range Occurs in Domain(s)
The type of annotation for Annotation resources. http://schema.org/CreativeWork

Property: annotator

Description Range Occurs in Domain(s)
The participant produced an annotation of this or a related resource. http://schema.org/Organization, http://schema.org/Person http://schema.org/CreativeWork

Property: authorizationWorkflow

Description Range Occurs in Domain(s)
By what process a user is granted authorization to a license. DataReuseLicense

Property: channels

Description Range Occurs in Domain(s)
The number of audio channels this resource contains (e.g. 1, 2, 5.1). http://schema.org/CreativeWork

Property: collectionEventType

Description Range Occurs in Domain(s)
A kind of CollectionEvent characterised by some specific procedures, e.g. a psycholinguistic experiment. Session CollectionEvent

Property: collectionProtocolType

Description Range Occurs in Domain(s)
A description of the process used to collect or collate data, such as prompts given to participants, or how texts are selected for inclusion in a collection. CollectionProtocol

Property: communicationMode

Description Range Occurs in Domain(s)
The mode (spoken, written, signed etc.) of this resource. There may be more than one value for this property. http://schema.org/CreativeWork

Property: compiler

Description Range Occurs in Domain(s)
The participant is responsible for collecting the sub-parts of the resource together. http://schema.org/Organization, http://schema.org/Person http://schema.org/CreativeWork

Property: consultant

Description Range Occurs in Domain(s)
The participant contributes expertise to the creation of a work, for example by contributing knowledge of their native language. http://schema.org/Organization, http://schema.org/Person http://schema.org/CreativeWork

Property: dataInputter

Description Range Occurs in Domain(s)
The participant was responsible for entering, re-typing, and/or structuring the data contained in the resource. http://schema.org/Organization, http://schema.org/Person http://schema.org/CreativeWork

Property: dateFreeText

Description Range Occurs in Domain(s)
Date information that cannot be put in one of the standard date formats, e.g. "mid-1970s", or it is not clear, for example, if it is a creation or publication date. http://schema.org/CreativeWork

Property: depositor

Description Range Occurs in Domain(s)
The participant was responsible for depositing the resource in an archive. http://schema.org/Organization, http://schema.org/Person http://schema.org/CreativeWork

Property: derivationOf

Description Range Occurs in Domain(s)
This property references another resource from which the current resource is derived, e.g. downsampling audio or video files, or extracting text from a PDF. Annotation, PrimaryMaterial DerivedMaterial

Property: developer

Description Range Occurs in Domain(s)
The participant developed the methodology or tools (including software) that constitute the resource, or that were used to create the resource. http://schema.org/Organization, http://schema.org/Person http://schema.org/CreativeWork

Property: doi

Description Range Occurs in Domain(s)
A Digital Object Identifier. http://schema.org/CreativeWork

Property: editor

Description Range Occurs in Domain(s)
The participant reviewed, corrected, and/or tested the resource. http://schema.org/Organization, http://schema.org/Person http://schema.org/CreativeWork

Property: geoJSON

Description Range Occurs in Domain(s)
A valid GEOJson feature or feature collection as a string that can be parsed as JSON. Text http://schema.org/GeoCoordinates, http://schema.org/GeoShape, http://schema.org/Language

Property: hasAnnotation

Description Range Occurs in Domain(s)
This resource is referenced by another resource that describes it such as a translation, transcription or other analysis. Annotation http://schema.org/CreativeWork

Property: hasCollectionProtocol

Description Range Occurs in Domain(s)
This resource was assembled or collected according to the linked protocol. CollectionProtocol http://schema.org/CreativeWork

Property: hasDerivation

Description Range Occurs in Domain(s)
This property references another resource that is derived from it such as a downsampled audio or video file, or text extracted from a PDF. DerivedMaterial PrimaryMaterial

Property: illustrator

Description Range Occurs in Domain(s)
The participant contributed drawings or other illustrations to the resource. http://schema.org/Organization, http://schema.org/Person http://schema.org/CreativeWork

Property: indexableText

Description Range Occurs in Domain(s)
One or more target File(s) that together contain the full text of an item – each file should indicate its language. http://schema.org/MediaObject http://schema.org/CreativeWork

Property: interpreter

Description Range Occurs in Domain(s)
The contributor renders the discourse recorded in the resource into another language in real time, or the contributor explains the discourse recorded in the resource. http://schema.org/Organization, http://schema.org/Person http://schema.org/CreativeWork

Property: interviewee

Description Range Occurs in Domain(s)
The participant was a respondent in an interview. http://schema.org/Organization, http://schema.org/Person http://schema.org/CreativeWork

Property: interviewer

Description Range Occurs in Domain(s)
The participant conducted an interview that forms part of the resource. http://schema.org/Organization, http://schema.org/Person http://schema.org/CreativeWork

Property: isDeIdentified

Description Range Occurs in Domain(s)
The data in this item has had potentially identifying information removed, which may include replacing names with pseudonyms. http://schema.org/Boolean http://schema.org/CreativeWork, http://schema.org/Person

Property: itemLocation

Description Range Occurs in Domain(s)
Current location of the item, e.g. where a set of audio tapes are stored. http://schema.org/Organization, http://schema.org/Place http://pcdm.org/models#Object, http://schema.org/Collection

Property: linguisticGenre

Description Range Occurs in Domain(s)
A linguistic classification of the genre of this resource. http://schema.org/CreativeWork

Property: mainText

Description Range Occurs in Domain(s)
Identifies the most relevant sub-component for computational text analytics. http://schema.org/MediaObject http://schema.org/CreativeWork

Property: material

Description Range Occurs in Domain(s)
Description of the original media, e.g. audio cassette tapes, participant questionnaires, field notes. http://schema.org/CreativeWork

Property: materialType

Description Range Occurs in Domain(s)
Indicates whether the material in a file is the original (primary) source or is derived from it or describes it via annotation. http://schema.org/MediaObject

Property: openAccessIndex

Description Range Occurs in Domain(s)
One or more public index types allowed by a license, e.g. FullText indexing may be allowed for discovery even when an item is not. http://schema.org/CreativeWork

Property: orthographicNotes

Description Range Occurs in Domain(s)
A description of the specific orthographic writing system(s) used in the material (e.g. Latin, Cyrillic, Australian English, IPA), or particular conventions required to understand the material (e.g. O* = ΓΈ). http://schema.org/Text http://schema.org/CreativeWork

Property: participant

Description Range Occurs in Domain(s)
The participant was present during the creation of the resource, but did not contribute substantially to its content. http://schema.org/Organization, http://schema.org/Person http://schema.org/CreativeWork

Property: performer

Description Range Occurs in Domain(s)
The participant performed some portion of a recorded, filmed, or transcribed resource. It is recommended that this term be used only for creative participants whose role is not better indicated by a more specific term, such as 'speaker', 'signer', or 'singer'. http://schema.org/Organization, http://schema.org/Person http://schema.org/CreativeWork

Property: photographer

Description Range Occurs in Domain(s)
The participant took the photograph, or shot the film, that appears in or constitutes the resource. http://schema.org/Organization, http://schema.org/Person http://schema.org/CreativeWork

Property: recorder

Description Range Occurs in Domain(s)
The participant operated the recording machinery used to create the resource. http://schema.org/Organization, http://schema.org/Person http://schema.org/CreativeWork

Property: register

Description Range Occurs in Domain(s)
The type of register (any of the varieties of a language that a speaker uses in a particular social context [Merriam-Webster]) of the contents of a language resource. http://schema.org/CreativeWork

Property: researcher

Description Range Occurs in Domain(s)
The resource was created as part of the participant's research, or the research presents interim or final results from the participant's research. http://schema.org/Organization, http://schema.org/Person http://schema.org/CreativeWork

Property: researchParticipant

Description Range Occurs in Domain(s)
The participant acted as a research subject or responded to a questionnaire, the results of which study form the basis of the resource. http://schema.org/Organization, http://schema.org/Person http://schema.org/CreativeWork

Property: responder

Description Range Occurs in Domain(s)
The participant was an interlocutor in some sort of discourse event, but only reacted to the contributions of others. http://schema.org/Organization, http://schema.org/Person http://schema.org/CreativeWork

Property: reviewDate

Description Range Occurs in Domain(s)
The date that this license should be reviewed. DataLicense

Property: signer

Description Range Occurs in Domain(s)
The contributor was a principal signer in a resource that consists of a recording, a film, or a transcription of a recorded resource. Signers are those whose gestures predominate in a recorded or filmed resource. (The resource may be a transcription of that recording). http://schema.org/Organization, http://schema.org/Person http://schema.org/CreativeWork

Property: singer

Description Range Occurs in Domain(s)
The participant sang, either individually or as part of a group, in a resource that consists of a recording, a film, or a transcription of a recorded resource. http://schema.org/Organization, http://schema.org/Person http://schema.org/CreativeWork

Property: speaker

Description Range Occurs in Domain(s)
The contributor was a principal speaker in a resource that consists of a recording, a film, or a transcription of a recorded resource. Speakers are those whose voices predominate in a recorded or filmed resource. (The resource may be a transcription of that recording). http://schema.org/Organization, http://schema.org/Person http://schema.org/CreativeWork

Property: sponsor

Description Range Occurs in Domain(s)
The participant contributed financial support to the creation of the resource. http://schema.org/Organization, http://schema.org/Person http://schema.org/CreativeWork

Property: subjectLanguage

Description Range Occurs in Domain(s)
The language(s) that this annotation resource is about. http://schema.org/Language http://schema.org/CreativeWork

Property: transcriber

Description Range Occurs in Domain(s)
The participant produced a transcription of this or a related resource. http://schema.org/Organization, http://schema.org/Person http://schema.org/CreativeWork

Property: translator

Description Range Occurs in Domain(s)
The participant produced a translation of this or a related resource. http://schema.org/Organization, http://schema.org/Person http://schema.org/CreativeWork

Property: writtenLanguageFormat

Description Range Occurs in Domain(s)
The format of the resource resulting from the way the text was produced (handwritten, typeset, typewritten). http://schema.org/CreativeWork

Defined Term Sets

Defined Term Set: AccessTypes

ID: https://w3id.org/ldac/terms#AccessTypes

A set of defined terms to specify whether a DataReuseLicense allows open or restricted (authorised) access.

Term Description
access Whether this is an open or restricted access license.
AuthorizedAccess Indicates that a DataReuseLicense requires some kind of authorization step, from SelfAuthorization (click-through) to processes that require a data steward to grant permission.
OpenAccess Data covered by this license may be accessed as long as the license is served alongside it, and does not require any specific authorization step.

Defined Term Set: AnnotationTypeTerms

ID: https://w3id.org/ldac/terms#AnnotationTypeTerms

The set of expected values for annotation types.

Term Description
annotationType The type of annotation for Annotation resources.
Gestural The resource describes the gestural content of the resource it annotates.
Orthographic The resource contains annotations using orthography (a writing system) as opposed to a coded representation such as a phonetic transcription.
PartOfSpeech An annotation that assigns lexical elements of language to classes on the basis of their distributional properties (for sign languages, the term 'sign class' is appropriate).
Phonemic An annotation that represents speech in terms of the sound contrasts made in a language.
Phonetic A representation of speech in terms of the sounds produced, typically using the International Phonetic Alphabet.
Phonological An annotation that includes information about the sound system of a language, such as the contrasts between sounds which make up the sound system and the locally conditioned realisations of sounds which characterise speech in the language.
Prosodic An annotation that provides a symbolic record of intonation, stress, tone or other suprasegmental features, which is expressed independently of regular phonetic transcription.
Semantic The resource includes annotation or analysis concerning the encoding of meaning.
Syntactic The resource contains annotation or analysis describing the combinatorial patterns of words in another resource.
Transcription The resource contains a transcription, which is a written representation (orthographic or coded) of an audio or visual signal.
Translation This is a translation of a resource in another language.

Defined Term Set: AuthorizationWorkflows

ID: https://w3id.org/ldac/terms#AuthorizationWorkflows

A set of DefinedTerms for access authorization mechanisms - some of these may be combined, e.g. AccessControlList and AgreeToTerms, but SelfAuthorization and AgreeToTerms would be redundant.

Term Description
AccessControlList License grants access to data based on a list of approved users, specified using the property accessControlList.
AgreeToTerms A user is expected to explicitly agree to a set of license terms, this may be combined with AccessControlList - to note that even if a user has been pre-approved for a license they must agree to license terms.
AuthorizationByApplication Users may apply for a license via some workflow, such as a form, with the decision being made by a DataSteward or their delegate about whether to grant the license.
AuthorizationByInvitation A data steward or administrator is expected to use an access control system to invite users, for example, participants, collaborators or students.
authorizationWorkflow By what process a user is granted authorization to a license.
SelfAuthorization A user can be authorised to access data by clicking that they agree to a license, or filling out a form to check their understanding, which can be validated by a machine and does not require human intervention.

Defined Term Set: CollectionEventTypeTerms

ID: https://w3id.org/ldac/terms#CollectionEventTypeTerms

A set of terms which are expected values for CollectionEvent types.

Term Description
collectionEventType A kind of CollectionEvent characterised by some specific procedures, e.g. a psycholinguistic experiment.
Session A collection event that is a recording or elicitation session with participants.

Defined Term Set: CollectionProtocolTypeTerms

ID: https://w3id.org/ldac/terms#CollectionProtocolTypeTerms

A set of terms which are expected values for CollectionProtocol types.

Term Description
collectionProtocolType A description of the process used to collect or collate data, such as prompts given to participants, or how texts are selected for inclusion in a collection.
ElicitationTask The collection protocol includes a task-based prompt to participants.
MaterialSelectionCriteria A description of the criteria used to select texts in a collection.

Defined Term Set: CommunicationModeTerms

ID: https://w3id.org/ldac/terms#CommunicationModeTerms

A set of expected values for the property communicationMode.

Term Description
communicationMode The mode (spoken, written, signed etc.) of this resource. There may be more than one value for this property.
Gesture The resource contains non-linguistic gestural communication (i.e. not sign language).
SignedLanguage The resource contains data for which the medium of interaction was signing.
Song "Words or sounds [articulated] in succession with musical inflections or modulations of the voice" OED.
SpokenLanguage The resource contains data for which the medium of interaction was speech.
WhistledLanguage The resource contains data for which the medium of interaction was whistling.
WrittenLanguage The resource contains data for which the medium of interaction was writing.

Defined Term Set: IndexTypes

ID: https://w3id.org/ldac/terms#IndexTypes

A set of defined terms for types of indexing, such as FullText.

Term Description
FullText A text index that makes the full text of a data resource findable via a search interface.
openAccessIndex One or more public index types allowed by a license, e.g. FullText indexing may be allowed for discovery even when an item is not.

Defined Term Set: LinguisticGenreTerms

ID: https://w3id.org/ldac/terms#LinguisticGenreTerms

A set of expected values for the linguistic genre of a resource.

Term Description
Dialogue An interactive discourse with two or more participants. Examples of dialogues include conversations, interviews, correspondence, consultations, greetings and leave-takings.
Drama A planned, creative, rendition of discourse with two or more participants intended for presentation to an audience.
Formulaic The resource is a ritually or conventionally structured discourse.
Informational Discourse whose primary purpose is to inform the audience about the natural or social world.
Interview The resource is a conversation where one or more speakers are directing the conversation.
Lexicon The resource includes a systematic listing of lexical items.
linguisticGenre A linguistic classification of the genre of this resource.
Ludic Ludic discourse is language whose primary function is to be part of play, or a style of speech that involves a creative manipulation of the structures of the language. Examples of ludic discourse are play languages, jokes, secret languages, and speech disguises.
Narrative A discourse, monologic or co-constructed, which represents temporally organised events. Types of narratives include historical, traditional, and personal narratives, myths, folktales, fables, and humorous stories.
Oratory The art of public speaking, or of speaking eloquently according to rules or conventions. Examples of oratory include sermons, lectures, political speeches, and invocations.
Procedural An explanation or description of a method, process, or situation having ordered steps.
Report A factual account of some event or circumstance.
Thesaurus The resource contains a list or data structure consisting of words or concepts arranged according to sense.

Defined Term Set: MaterialTypes

ID: https://w3id.org/ldac/terms#MaterialTypes

A set of terms describing the relationship of a resource to the original data source.

Term Description
materialType Indicates whether the material in a file is the original (primary) source or is derived from it or describes it via annotation.

Defined Term Set: WrittenLanguageTypeTerms

ID: https://w3id.org/ldac/terms#WrittenLanguageTypeTerms

A set of expected types for WrittenLanguage communication mode (this set is incomplete - more work needed).

Term Description
Handwritten The resource was written using a writing implement such as a pen, pencil, brush or computer stylus (except where the digital handwriting is converted to standard text).
Typeset The resource has been formatted for printing or display.
Typewritten The resource contains text produced on a typewriter.
writtenLanguageFormat The format of the resource resulting from the way the text was produced (handwritten, typeset, typewritten).