40 research outputs found
Development of linguistic linked open data resources for collaborative data-intensive research in the language sciences
Making diverse data in linguistics and the language sciences open, distributed, and accessible: perspectives from language/language acquistiion researchers and technical LOD (linked open data) researchers. This volume examines the challenges inherent in making diverse data in linguistics and the language sciences open, distributed, integrated, and accessible, thus fostering wide data sharing and collaboration. It is unique in integrating the perspectives of language researchers and technical LOD (linked open data) researchers. Reporting on both active research needs in the field of language acquisition and technical advances in the development of data interoperability, the book demonstrates the advantages of an international infrastructure for scholarship in the field of language sciences. With contributions by researchers who produce complex data content and scholars involved in both the technology and the conceptual foundations of LLOD (linguistics linked open data), the book focuses on the area of language acquisition because it involves complex and diverse data sets, cross-linguistic analyses, and urgent collaborative research. The contributors discuss a variety of research methods, resources, and infrastructures. Contributors Isabelle Barrière, Nan Bernstein Ratner, Steven Bird, Maria Blume, Ted Caldwell, Christian Chiarcos, Cristina Dye, Suzanne Flynn, Claire Foley, Nancy Ide, Carissa Kang, D. Terence Langendoen, Barbara Lust, Brian MacWhinney, Jonathan Masci, Steven Moran, Antonio Pareja-Lora, Jim Reidy, Oya Y. Rieger, Gary F. Simons, Thorsten Trippel, Kara Warburton, Sue Ellen Wright, Claus Zin
Linked disambiguated distributional semantic networks
We present a new hybrid lexical knowledge base that combines the contextual information of distributional models with the conciseness and precision of manually constructed lexical networks. The computation of our count-based distributional model includes the induction of word senses for single-word and multi-word terms, the disambiguation of word similarity lists, taxonomic relations extracted by patterns and context clues for disambiguation in context. In contrast to dense vector representations, our resource is human readable and interpretable, and thus can be easily embedded within the Semantic Web ecosystem
Using Biographical Texts as Linked Data for Prosopographical Research and Applications
This paper argues that representing texts as semantic Linked Data provides a useful basis for analyzing their contents in Digital Humanities research and for Cultural Heritage application development. The idea is to transform Cultural Heritage texts into a knowledge graph and a Linked Data service that can be used flexibly in different applications via a SPARQL endpoint. The argument is discussed and evaluated in the context of biographical and prosopographical research and a case study where over 13 000 life stories form biographical collections of Biographical Centre of the Finnish Literature Society were transformed into RDF, enriched by data linking, and published in a SPARQL endpoint. Tools for biography and prosopography, data clustering, network analysis, and linguistic analysis were created with promising first results.Peer reviewe
A Model for Language Annotations on the Web
Several annotation models have been proposed to enable a multilingual Semantic Web. Such models hone in on the word and its morphology and assume the language tag and URI comes from external resources. These resources, such as ISO 639 and Glottolog, have limited coverage of the world's languages and have a very limited thesaurus-like structure at best, which hampers language annotation, hence constraining research in Digital Humanities and other fields. To resolve this `outsourced' task of the current models, we developed a model for representing information about languages, the \textbf{Mo}del for \textbf{L}anguage \textbf{A}nnotation (\langmod{}), such that basic language information can be recorded consistently and therewith queried and analyzed as well. This includes the various types of languages, families, and the relations among them. \langmod{} is formalized in OWL so that it can integrate with Linguistic Linked Data resources. Sufficient coverage of \langmod{} is demonstrated with the use case of French
Towards an interoperable ecosystem of AI and LT platforms : a roadmap for the implementation of different levels of interoperability
With regard to the wider area of AI/LT platform interoperability, we concentrate on two core aspects: (1) cross-platform search and discovery of resources and services; (2) composition of cross-platform service workflows. We devise five different levels (of increasing complexity) of platform interoperability that we suggest to implement in a wider federation of AI/LT platforms. We illustrate the approach using the five emerging AI/LT platforms AI4EU, ELG, Lynx, QURATOR and SPEAKER