This is a language data schema, in the style of the Schema.org schema. It is based on OLAC terms for use in the LDaCA project and is published at https://w3id.org/ldac/terms. This schema builds on Schema.org and is intended to be used with the Language Data Commons RO-Crate Profile: https://w3id.org/ldac/profile.
5 classes · 54 properties
Classes
CollectionEvent β A description of an event at which one or more PrimaryMaterials were captured, e
CollectionProtocol β A description of how this Object or Collection was obtained, such as the strategy used for selecting written source texts, or the prompts given to participants
DataDepositLicense β A license document setting out terms for deposit into a repository
DataLicense β A license document for data licensing
DataReuseLicense β A license document, setting out terms for reuse of data
Properties
access β Whether this is an open or restricted access license
accessControlList β When a license has an authorizationWorkflow property with a value of the DefinedTerm AcessControlList this property has a URI value that points to a list of userIDs
annotationOf β This resource contains some kind of description that adds information to the resource it references
annotationType β The type of annotation for Annotation resources
annotator β The participant produced an annotation of this or a related resource
authorizationWorkflow β By what process a user is granted authorization to a license
channels β The number of audio channels this resource contains (e
collectionEventType β A kind of CollectionEvent characterised by some specific procedures, e
collectionProtocolType β A description of the process used to collect or collate data, such as prompts given to participants, or how texts are selected for inclusion in a collection
compiler β The participant is responsible for collecting the sub-parts of the resource together
consultant β The participant contributes expertise to the creation of a work, for example by contributing knowledge of their native language
dataInputter β The participant was responsible for entering, re-typing, and/or structuring the data contained in the resource
dateFreeText β Date information that cannot be put in one of the standard date formats, e
depositor β The participant was responsible for depositing the resource in an archive
derivationOf β This property references another resource from which the current resource is derived, e
developer β The participant developed the methodology or tools (including software) that constitute the resource, or that were used to create the resource
editor β The participant reviewed, corrected, and/or tested the resource
geoJSON β A valid GEOJson feature or feature collection as a string that can be parsed as JSON
hasAnnotation β This resource is referenced by another resource that describes it such as a translation, transcription or other analysis
hasCollectionProtocol β This resource was assembled or collected according to the linked protocol
hasDerivation β This property references another resource that is derived from it such as a downsampled audio or video file, or text extracted from a PDF
illustrator β The participant contributed drawings or other illustrations to the resource
indexableText β One or more target File(s) that together contain the full text of an item β each file should indicate its language
interpreter β The contributor renders the discourse recorded in the resource into another language in real time, or the contributor explains the discourse recorded in the resource
interviewee β The participant was a respondent in an interview
interviewer β The participant conducted an interview that forms part of the resource
isDeIdentified β The data in this item has had potentially identifying information removed, which may include replacing names with pseudonyms
materialType β Indicates whether the material in a file is the original (primary) source or is derived from it or describes it via annotation
openAccessIndex β One or more public index types allowed by a license, e
orthographicNotes β A description of the specific orthographic writing system(s) used in the material (e
participant β The participant was present during the creation of the resource, but did not contribute substantially to its content
performer β The participant performed some portion of a recorded, filmed, or transcribed resource
photographer β The participant took the photograph, or shot the film, that appears in or constitutes the resource
recorder β The participant operated the recording machinery used to create the resource
register β The type of register (any of the varieties of a language that a speaker uses in a particular social context [Merriam-Webster]) of the contents of a language resource
researcher β The resource was created as part of the participant's research, or the research presents interim or final results from the participant's research
researchParticipant β The participant acted as a research subject or responded to a questionnaire, the results of which study form the basis of the resource
responder β The participant was an interlocutor in some sort of discourse event, but only reacted to the contributions of others
reviewDate β The date that this license should be reviewed
signer β The contributor was a principal signer in a resource that consists of a recording, a film, or a transcription of a recorded resource
singer β The participant sang, either individually or as part of a group, in a resource that consists of a recording, a film, or a transcription of a recorded resource
speaker β The contributor was a principal speaker in a resource that consists of a recording, a film, or a transcription of a recorded resource
sponsor β The participant contributed financial support to the creation of the resource
subjectLanguage β The language(s) that this annotation resource is about
transcriber β The participant produced a transcription of this or a related resource
translator β The participant produced a translation of this or a related resource
writtenLanguageFormat β The format of the resource resulting from the way the text was produced (handwritten, typeset, typewritten)
Types of entities (specializations of Classes) and expected Properties
Class: CollectionEvent
A description of an event at which one or more PrimaryMaterials were captured, e.g. as video or audio.
Instances of this type MAY be present in the crate.
A description of how this Object or Collection was obtained, such as the strategy used for selecting written source texts, or the prompts given to participants.
Instances of this type MAY be present in the crate.
A description of the process used to collect or collate data, such as prompts given to participants, or how texts are selected for inclusion in a collection.
Class: DataDepositLicense
A license document setting out terms for deposit into a repository.
Instances of this type MAY be present in the crate.
Min Count
Max Count
N/A
N/A
Property
Required
Description
Range
Value
@type
Yes
No properties defined for this class
Class: DataLicense
A license document for data licensing. This is a superclass of DataReuseLicense and DataDepositLicense.
Instances of this type MAY be present in the crate.
When a license has an authorizationWorkflow property with a value of the DefinedTerm AcessControlList this property has a URI value that points to a list of userIDs.
When a license has an authorizationWorkflow property with a value of the DefinedTerm AcessControlList this property has a URI value that points to a list of userIDs.
The age or age range of a person, e.g. 25, 30-50, >50. If an age is specified, a specializationOf pointing to a 'canonical' ageless version of that Person can also be included.
http://schema.org/Text
http://schema.org/Person
Property: annotationOf
Description
Range
Occurs in Domain(s)
This resource contains some kind of description that adds information to the resource it references.
A description of the process used to collect or collate data, such as prompts given to participants, or how texts are selected for inclusion in a collection.
Date information that cannot be put in one of the standard date formats, e.g. "mid-1970s", or it is not clear, for example, if it is a creation or publication date.
http://schema.org/CreativeWork
Property: depositor
Description
Range
Occurs in Domain(s)
The participant was responsible for depositing the resource in an archive.
This property references another resource from which the current resource is derived, e.g. downsampling audio or video files, or extracting text from a PDF.
One or more target File(s) that together contain the full text of an item β each file should indicate its language.
http://schema.org/MediaObject
http://schema.org/CreativeWork
Property: interpreter
Description
Range
Occurs in Domain(s)
The contributor renders the discourse recorded in the resource into another language in real time, or the contributor explains the discourse recorded in the resource.
A linguistic classification of the genre of this resource.
http://schema.org/CreativeWork
Property: mainText
Description
Range
Occurs in Domain(s)
Identifies the most relevant sub-component for computational text analytics.
http://schema.org/MediaObject
http://schema.org/CreativeWork
Property: material
Description
Range
Occurs in Domain(s)
Description of the original media, e.g. audio cassette tapes, participant questionnaires, field notes.
http://schema.org/CreativeWork
Property: materialType
Description
Range
Occurs in Domain(s)
Indicates whether the material in a file is the original (primary) source or is derived from it or describes it via annotation.
http://schema.org/MediaObject
Property: openAccessIndex
Description
Range
Occurs in Domain(s)
One or more public index types allowed by a license, e.g. FullText indexing may be allowed for discovery even when an item is not.
http://schema.org/CreativeWork
Property: orthographicNotes
Description
Range
Occurs in Domain(s)
A description of the specific orthographic writing system(s) used in the material (e.g. Latin, Cyrillic, Australian English, IPA), or particular conventions required to understand the material (e.g. O* = ΓΈ).
http://schema.org/Text
http://schema.org/CreativeWork
Property: participant
Description
Range
Occurs in Domain(s)
The participant was present during the creation of the resource, but did not contribute substantially to its content.
The participant performed some portion of a recorded, filmed, or transcribed resource. It is recommended that this term be used only for creative participants whose role is not better indicated by a more specific term, such as 'speaker', 'signer', or 'singer'.
The type of register (any of the varieties of a language that a speaker uses in a particular social context [Merriam-Webster]) of the contents of a language resource.
http://schema.org/CreativeWork
Property: researcher
Description
Range
Occurs in Domain(s)
The resource was created as part of the participant's research, or the research presents interim or final results from the participant's research.
The contributor was a principal signer in a resource that consists of a recording, a film, or a transcription of a recorded resource. Signers are those whose gestures predominate in a recorded or filmed resource. (The resource may be a transcription of that recording).
The participant sang, either individually or as part of a group, in a resource that consists of a recording, a film, or a transcription of a recorded resource.
The contributor was a principal speaker in a resource that consists of a recording, a film, or a transcription of a recorded resource. Speakers are those whose voices predominate in a recorded or filmed resource. (The resource may be a transcription of that recording).
Indicates that a DataReuseLicense requires some kind of authorization step, from SelfAuthorization (click-through) to processes that require a data steward to grant permission.
An annotation that assigns lexical elements of language to classes on the basis of their distributional properties (for sign languages, the term 'sign class' is appropriate).
An annotation that includes information about the sound system of a language, such as the contrasts between sounds which make up the sound system and the locally conditioned realisations of sounds which characterise speech in the language.
An annotation that provides a symbolic record of intonation, stress, tone or other suprasegmental features, which is expressed independently of regular phonetic transcription.
A set of DefinedTerms for access authorization mechanisms - some of these may be combined, e.g. AccessControlList and AgreeToTerms, but SelfAuthorization and AgreeToTerms would be redundant.
A user is expected to explicitly agree to a set of license terms, this may be combined with AccessControlList - to note that even if a user has been pre-approved for a license they must agree to license terms.
Users may apply for a license via some workflow, such as a form, with the decision being made by a DataSteward or their delegate about whether to grant the license.
A user can be authorised to access data by clicking that they agree to a license, or filling out a form to check their understanding, which can be validated by a machine and does not require human intervention.
A description of the process used to collect or collate data, such as prompts given to participants, or how texts are selected for inclusion in a collection.
An interactive discourse with two or more participants. Examples of dialogues include conversations, interviews, correspondence, consultations, greetings and leave-takings.
Ludic discourse is language whose primary function is to be part of play, or a style of speech that involves a creative manipulation of the structures of the language. Examples of ludic discourse are play languages, jokes, secret languages, and speech disguises.
A discourse, monologic or co-constructed, which represents temporally organised events. Types of narratives include historical, traditional, and personal narratives, myths, folktales, fables, and humorous stories.
The art of public speaking, or of speaking eloquently according to rules or conventions. Examples of oratory include sermons, lectures, political speeches, and invocations.
The resource was written using a writing implement such as a pen, pencil, brush or computer stylus (except where the digital handwriting is converted to standard text).