Search CORE

5,586 research outputs found

Building a Disciplinary, World-Wide Data Infrastructure

Author: Almas Bridget M.
Arviset Christophe
Bartolo Laura
Broeder Daan
Genova Françoise
Law Emily
McMahon Brian
Publication venue: 'Ubiquity Press, Ltd.'
Publication date: 19/03/2017
Field of study

Sharing scientific data, with the objective of making it fully discoverable, accessible, assessable, intelligible, usable, and interoperable, requires work at the disciplinary level to define in particular how the data should be formatted and described. Each discipline has its own organization and history as a starting point, and this paper explores the way a range of disciplines, namely materials science, crystallography, astronomy, earth sciences, humanities and linguistics get organized at the international level to tackle this question. In each case, the disciplinary culture with respect to data sharing, science drivers, organization and lessons learnt are briefly described, as well as the elements of the specific data infrastructure which are or could be shared with others. Commonalities and differences are assessed. Common key elements for success are identified: data sharing should be science driven; defining the disciplinary part of the interdisciplinary standards is mandatory but challenging; sharing of applications should accompany data sharing. Incentives such as journal and funding agency requirements are also similar. For all, it also appears that social aspects are more challenging than technological ones. Governance is more diverse, and linked to the discipline organization. CODATA, the RDA and the WDS can facilitate the establishment of disciplinary interoperability frameworks. Being problem-driven is also a key factor of success for building bridges to enable interdisciplinary research.Comment: Proceedings of the session "Building a disciplinary, world-wide data infrastructure" of SciDataCon 2016, held in Denver, CO, USA, 12-14 September 2016, to be published in ICSU CODATA Data Science Journal in 201

arXiv.org e-Print Archive

Directory of Open Access Journals

Linking thesauri to the linked open data cloud for improved media retrieval

Author: Braeckman Karel
De Sutter Robbie
Debevere Pedro
Mannens Erik
Van de Walle Rik
Van Deursen Davy
Publication venue
Publication date: 01/01/2011
Field of study

Ghent University Academic Bibliography

Recommended from our members

A short survey of discourse representation models

Author: Buckingham Shum S.
Clark T.
de Waard A.
Groza T.
Handschuh S.
Publication venue
Publication date: 01/10/2009
Field of study

With the advancement of technology and the wide adoption of ontologies as knowledge representation formats, in the last decade, a handful of models were proposed for the externalization of the rhetoric and argumentation captured within scientific publications. Conceptually, most of these models share a similar representation form of the scientific publication, i.e. as a series of interconnected elementary knowledge items. The main differences are given by the terminology used, the types of rhetorical and/or argumentation relations connecting the knowledge items and the foundational theories supporting these relations. This paper analyzes the state of the art and provides a concise comparative overview of the ﬁve most prominent discourse representation models, with the goal of sketching an uniﬁed model for discourse representation

Open Research Online (The Open University)

A cross-linguistic database of phonetic transcription systems

Author: Anderson C.
Chacon T.
Fehn A.
Forkel R.
List J.
Tresoldi T.
Walworth M.
Publication venue: 'Walter de Gruyter GmbH'
Publication date: 01/01/2018
Field of study

Contrary to what non-practitioners might expect, the systems of phonetic notation used by linguists are highly idiosyncratic. Not only do various linguistic subfields disagree on the specific symbols they use to denote the speech sounds of languages, but also in large databases of sound inventories considerable variation can be found. Inspired by recent efforts to link cross-linguistic data with help of reference catalogues (Glottolog, Concepticon) across different resources, we present initial efforts to link different phonetic notation systems to a catalogue of speech sounds. This is achieved with the help of a database accompanied by a software framework that uses a limited but easily extendable set of non-binary feature values to allow for quick and convenient registration of different transcription systems, while at the same time linking to additional datasets with restricted inventories. Linking different transcription systems enables us to conveniently translate between different phonetic transcription systems, while linking sounds to databases allows users quick access to various kinds of metadata, including feature values, statistics on phoneme inventories, and information on prosody and sound classes. In order to prove the feasibility of this enterprise, we supplement an initial version of our cross-linguistic database of phonetic transcription systems (CLTS), which currently registers five transcription systems and links to fifteen datasets, as well as a web application, which permits users to conveniently test the power of the automatic translation across transcription systems

Biblioteka Nauki - repozytorium artykuÅÃ³w

MPG.PuRe

Language resources and linked data: a practical perspective

Author: Baron Ciro
Dojchinovski Milan
Flati Tiziano
Gracia del Río Jorge
McCra John P.
Vila Suero Daniel
Publication venue: E.T.S. de Ingenieros Informáticos (UPM)
Publication date: 01/01/2014
Field of study

Recently, experts and practitioners in language resources have started recognizing the benefits of the linked data (LD) paradigm for the representation and exploitation of linguistic data on the Web. The adoption of the LD principles is leading to an emerging ecosystem of multilingual open resources that conform to the Linguistic Linked Open Data Cloud, in which datasets of linguistic data are interconnected and represented following common vocabularies, which facilitates linguistic information discovery, integration and access. In order to contribute to this initiative, this paper summarizes several key aspects of the representation of linguistic information as linked data from a practical perspective. The main goal of this document is to provide the basic ideas and tools for migrating language resources (lexicons, corpora, etc.) as LD on the Web and to develop some useful NLP tasks with them (e.g., word sense disambiguation). Such material was the basis of a tutorial imparted at the EKAW’14 conference, which is also reported in the paper

Archivo Digital UPM

$OntoMath^{PRO}$ Ontology: A Linked Data Hub for Mathematics

Author: C. Bizer
C. David
C. Lange
C. Lange
E. Sirin
E.V. Biryaltsev
F. Kamareddine
H. Barendregt
H.S. Barrows
M. Doerr
M. Kohlhase
N. Sloane
O. Nevzorova
O.A. Nevzorova
Publication venue
Publication date: 01/01/2014
Field of study

In this paper, we present an ontology of mathematical knowledge concepts that covers a wide range of the fields of mathematics and introduces a balanced representation between comprehensive and sensible models. We demonstrate the applications of this representation in information extraction, semantic search, and education. We argue that the ontology can be a core of future integration of math-aware data sets in the Web of Data and, therefore, provide mappings onto relevant datasets, such as DBpedia and ScienceWISE.Comment: 15 pages, 6 images, 1 table, Knowledge Engineering and the Semantic Web - 5th International Conferenc

arXiv.org e-Print Archive

Crossref

Interoperability of language-related information: mapping the BLL Thesaurus to Lexvo and Glottolog

Author: Abromeit Frank
Chiarcos Christian
Dimitrova Vanya
Fäth Christian
Renner-Westermann Heike
Publication venue
Publication date: 27/04/2023
Field of study

Since 2013, the thesaurus of the Bibliography of Linguistic Literature (BLL Thesaurus) has been applied in the context of the Linguistik portal, a hub for linguistically relevant information. Several consecutive projects focus on the modeling of the BLL Thesaurus as ontology and its linking to terminological repositories in the Linguistic Linked Open Data (LLOD) cloud. Those mappings facilitate the connection between the Linguistik portal and the cloud. In the paper, we describe the current efforts to establish interoperability between the language-related index terms and repositories providing language identifiers for the web of Linked Data. After an introduction of Lexvo and Glottolog, we outline the scope, the structure, and the peculiarities of the BLL Thesaurus. We discuss the challenges for the design of scientifically plausible language classification and the linking between divergent classifications. We describe the prototype of the linking model and propose pragmatic solutions for structural or conceptual conflicts. Additionally, we depict the benefits from the envisaged interoperability - for the Linguistik portal, and the Linked Open Data Community in general

OPUS Augsburg

How FAIR are CMC Corpora?

Author: Alexander König
Egon Stemle
Jennifer-Carmen Frey
Publication venue: place:Cergy-Pontoise
Publication date: 01/01/2019
Field of study

In recent years, research data management has also become an important topic in the less data-intensive areas of the Social Sciences and Humanities (SSH). Funding agencies as well as research communities demand that empirical data collected and used for scientific research is managed and preserved in a way that research results are reproducible. In order to account for this the FAIR guiding principles for data stewardship have been established as a framework for good data management, aiming at the findability, accessibility, interoperability, and reusability of research data. This article investigates 24 European CMC corpora with regard to their compliance with the FAIR principles and discusses to what extent the deposit of research data in repositories of data preservation initiatives such as CLARIN, Zenodo or Metashare can assist in the provision of FAIR corpora

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

Models to represent linguistic linked data

Author: A. GÓMEZ-PÉREZ
Borin
Crystal
E. MONTIEL-PONSODA
Ehrmann
Farrar
Fellbaum
Fellbaum
Hanks
Hayes
Hellmann
Ide
J. BOSQUE-GIL
J. GRACIA
Klimek
Mel’cuk
Mel’cuk
Menke
Ogden
Peirce
Pustejovsky
Schuurman
Trippel
Vila-Suero
Publication venue: 'Cambridge University Press (CUP)'
Publication date: 01/01/2018
Field of study

As the interest of the Semantic Web and computational linguistics communities in linguistic linked data (LLD) keeps increasing and the number of contributions that dwell on LLD rapidly grows, scholars (and linguists in particular) interested in the development of LLD resources sometimes find it difficult to determine which mechanism is suitable for their needs and which challenges have already been addressed. This review seeks to present the state of the art on the models, ontologies and their extensions to represent language resources as LLD by focusing on the nature of the linguistic content they aim to encode. Four basic groups of models are distinguished in this work: models to represent the main elements of lexical resources (group 1), vocabularies developed as extensions to models in group 1 and ontologies that provide more granularity on specific levels of linguistic analysis (group 2), catalogues of linguistic data categories (group 3) and other models such as corpora models or service-oriented ones (group 4). Contributions encompassed in these four groups are described, highlighting their reuse by the community and the modelling challenges that are still to be faced

Crossref

Repositorio Universidad de Zaragoza

Archivo Digital UPM