4 research outputs found

    Challenges for the representation of morphology in ontology lexicons

    Get PDF
    Recent years have experienced a growing trend in the publication of language resources as Linguistic Linked Data (LLD) to enhance their discovery, reuse and the interoperability of tools that consume language data. To this aim, the OntoLex-lemon model has emerged as a de facto standard to represent lexical data on the Web. However, traditional dictionaries contain a considerable amount of morphological information which is not straightforwardly representable as LLD within the current model. In order to fill this gap a new Morphology Module of OntoLex-lemon is currently being developed. This paper presents the results of this model as on-going work as well as the underlying challenges that emerged during the module development. Based on the MMoOn Core ontology, it aims to account for a wide range of morphological information, ranging from endings to derive whole paradigms to the decomposition and generation of lexical entries which is in compliance to other OntoLex-lemon modules and facilitates the encoding of complex morphological data in ontology lexicons

    Models to represent linguistic linked data

    Get PDF
    As the interest of the Semantic Web and computational linguistics communities in linguistic linked data (LLD) keeps increasing and the number of contributions that dwell on LLD rapidly grows, scholars (and linguists in particular) interested in the development of LLD resources sometimes find it difficult to determine which mechanism is suitable for their needs and which challenges have already been addressed. This review seeks to present the state of the art on the models, ontologies and their extensions to represent language resources as LLD by focusing on the nature of the linguistic content they aim to encode. Four basic groups of models are distinguished in this work: models to represent the main elements of lexical resources (group 1), vocabularies developed as extensions to models in group 1 and ontologies that provide more granularity on specific levels of linguistic analysis (group 2), catalogues of linguistic data categories (group 3) and other models such as corpora models or service-oriented ones (group 4). Contributions encompassed in these four groups are described, highlighting their reuse by the community and the modelling challenges that are still to be faced

    Representing Swedish Lexical Resources in RDF with lemon

    No full text
    Borin L, Dannells D, Forsberg M, McCrae J. Representing Swedish Lexical Resources in RDF with lemon. In: Proceedings of the ISWC 2014 Posters & Demonstrations Track - a track within the 13th International Semantic Web Conference. 2014: 329-332

    Inducing the Cross-Disciplinary Usage of Morphological Language Data Through Semantic Modelling

    Get PDF
    Despite the enormous technological advancements in the area of data creation and management the vast majority of language data still exists as digital single-use artefacts that are inaccessible for further research efforts. At the same time the advent of digitisation in science increased the possibilities for knowledge acquisition through the computational application of linguistic information for various disciplines. The purpose of this thesis, therefore, is to create the preconditions that enable the cross-disciplinary usage of morphological language data as a sub-area of linguistic data in order to induce a shared reusability for every research area that relies on such data. This involves the provision of morphological data on the Web under an open license and needs to take the prevalent diversity of data compilation into account. Various representation standards emerged across single disciplines which lead to heterogeneous data that differs with regard to complexity, scope and data formats. This situation requires a unifying foundation enabling direct reusability. As a solution to fill the gap of missing open data and to overcome the presence of isolated datasets a semantic data modelling approach is applied. Being rooted in the Linked Open Data (LOD) paradigm it pursues the creation of data as uniquely identifiable resources that are realised as URIs, accessible on the Web, available under an open license, interlinked with other resources, and adhere to Linked Data representation standards such as the RDF format. Each resource then contributes to the LOD cloud in which they are all interconnected. This unification results from ontologically shared bases that formally define the classification of resources and their relation to other resources in a semantically interoperable manner. Subsequently, the possibility of creating semantically structured data has sparked the formation of the Linguistic Linked Open Data (LLOD) research community and LOD sub-cloud containing primarily language resources. Over the last decade, ontologies emerged mainly for the domain of lexical language data which lead to a significant increase in Linked Data-based linguistic datasets. However, an equivalent model for morphological data is still missing, leading to a lack of this type of language data within the LLOD cloud. This thesis presents six publications that are concerned with the peculiarities of morphological data and the exploration of their semantic representation as an enabler of cross-disciplinary reuse. The Multilingual Morpheme Ontology (MMoOn Core) as well as an architectural framework for morphemic dataset creation as RDF resources are proposed as the first comprehensive domain representation model adhering to the LOD paradigm. It will be shown that MMoOn Core permits the joint representation of heterogeneous data sources such as interlinear glossed texts, inflection tables, the outputs of morphological analysers, lists of morphemic glosses or word-formation rules which are all equally labelled as “morphological data” across different research areas. Evidence for the applicability and adequacy of the semantic modelling entailed by the MMoOn Core ontology is provided by two datasets that were transformed from tabular data into RDF: the Hebrew Morpheme Inventory and Xhosa RDF dataset. Both further demonstrate how their integration into the LLOD cloud - by interlinking them with external language resources - yields insights that could not be obtained from the initial source data. Altogether the research conducted in this thesis establishes the foundation for an interoperable data exchange and the enrichment of morphological language data. It strives to achieve the broader goal of advancing language data-driven research by overcoming data barriers and discipline boundaries
    corecore