4 research outputs found

    Models to represent linguistic linked data

    Get PDF
    As the interest of the Semantic Web and computational linguistics communities in linguistic linked data (LLD) keeps increasing and the number of contributions that dwell on LLD rapidly grows, scholars (and linguists in particular) interested in the development of LLD resources sometimes find it difficult to determine which mechanism is suitable for their needs and which challenges have already been addressed. This review seeks to present the state of the art on the models, ontologies and their extensions to represent language resources as LLD by focusing on the nature of the linguistic content they aim to encode. Four basic groups of models are distinguished in this work: models to represent the main elements of lexical resources (group 1), vocabularies developed as extensions to models in group 1 and ontologies that provide more granularity on specific levels of linguistic analysis (group 2), catalogues of linguistic data categories (group 3) and other models such as corpora models or service-oriented ones (group 4). Contributions encompassed in these four groups are described, highlighting their reuse by the community and the modelling challenges that are still to be faced

    When linguistics meets web technologies. Recent advances in modelling linguistic linked data

    Get PDF
    This article provides an up-to-date and comprehensive survey of models (including vocabularies, taxonomies and ontologies) used for representing linguistic linked data (LLD). It focuses on the latest developments in the area and both builds upon and complements previous works covering similar territory. The article begins with an overview of recent trends which have had an impact on linked data models and vocabularies, such as the growing influence of the FAIR guidelines, the funding of several major projects in which LLD is a key component, and the increasing importance of the relationship of the digital humanities with LLD. Next, we give an overview of some of the most well known vocabularies and models in LLD. After this we look at some of the latest developments in community standards and initiatives such as OntoLex-Lemon as well as recent work which has been in carried out in corpora and annotation and LLD including a discussion of the LLD metadata vocabularies META-SHARE and lime and language identifiers. In the following part of the paper we look at work which has been realised in a number of recent projects and which has a significant impact on LLD vocabularies and models

    Identifikasi Karakteristik Dataset untuk Federated SPARQL Query

    Get PDF
    Saat ini telah dikembangkan federated SPARQL query engine yang mempunyai kemampuan untuk melakukan query dari beberapa SPARQL endpoint yang terdistribusi, sehingga data yang berasal berbagai sumber memungkinkan untuk diperoleh. Ketika dijalankan untuk melakukan query, masing-masing query engine mempunyai kinerja yang berbeda-beda. Salah satu faktor yang berpengaruh terhadap kinerja dari query engine adalah karakteristik dari dataset RDF yang diakses, seperti jumlah triple, kelas, property, subjek, entity, objek, dan spreading factor dataset. Tugas Akhir ini dilakukan untuk mengidentifikasi karakteristik dataset RDF serta mengetahui karakteristik dataset yang berpengaruh terhadap kinerja dari query engine. Penelitian dilakukan dengan mengidentifikasi 10 dataset yang diambil dari jurnal penelitian lain. Sedangkan uji coba untuk mengetahui keterkaitan antara karakteristik dataset dengan kinerja dari query engine dilakukan menggunakan federated SPARQL query engine FedX. Dari hasil analisis, diketahui bahwa jumlah triple dan jumlah kelas yang terkait dengan query cenderung berpengaruh terhadap kinerja dari query engine. Sedangkan jumlah property yang terkait dengan dataset, spreading factor dataset, dan spreading factor dataset yang terkait dengan query cenderung tidak berpengaruh terhadap kinerja dari query engine. ======================================================================================================================== Federated SPARQL query engines that are able to query from multiple distributed SPARQL endpoints have been developed, so that data from multiple sources are possible to obtain. When it is used to execute a query, a query engine usually has different performance compared to the others. One of the factors that affect the performance of the query engine is the characteristic of the accessed RDF dataset, such as the number of triples, the number of classes, the number of properties, the number of subjects, the number of entities, the number of objects, and the spreading factor of dataset. This final project is done to identify the characteristic of RDF dataset and to know dataset characteristic which is able influence the performance of query engine. The study was conducted by identifying 10 datasets taken from other research journals. The test to determine the relationship between dataset characteristics and the performance of the query engine is done using federated SPARQL query engine FedX. From the analysis results, it is known that the number of triples and the number of classes associated with the query tend to affect the performance of the query engine. Meanwhile, the number of properties associated with the query, spreading factor of dataset, and spreading factor of dataset associated with the query tend not to have an effect on performance of query engine

    Countering language attrition with PanLex and the Web of Data

    No full text