23 research outputs found

    27 pawns ready for action: A multi-indicator methodology and evaluation of thesaurus management tools from a LOD perspective

    Get PDF
    Purpose – The purpose of this paper is to propose a methodology for assessing thesauri and other controlled vocabularies management tools that can represent content using the Simple Knowledge Organization System (SKOS) data model, and their use in a Linked Open Data (LOD) paradigm. It effectively analyses selected set of tools in order to prove the validity of the method. Design/methodology/approach – A set of 27 criteria grouped in five evaluation indicators is proposed and applied to ten vocabulary management applications which are compliant with the SKOS data model. Previous studies of controlled vocabulary management software are gathered and analyzed, to compare the evaluation parameters used and the results obtained for each tool. Findings – The results indicate that the tool that obtains the highest score in every indicator is Poolparty. The second and third tools are, respectively, TemaTres and Intelligent Theme Manager, but scoring lower in most of the evaluation items. The use of a broad set of criteria to evaluate vocabularies management tools gives satisfactory results. The set of five indicators and 27 criteria proposed here represents a useful evaluation system in the selection of current and future tools to manage vocabularies. Research limitations/implications – The paper only assesses the ten most important/well know software tools applied for thesaurus and vocabulary management until October 2016. However, the evaluation criteria could be applied to new software that could appear in the future to create/manage SKOS vocabularies in compliance with LOD standards. Originality/value – The originality of this paper relies on the proposed indicators and criteria to evaluate vocabulary management tools. Those criteria and indicators can be valuable also for future software that might appear. The indicators are also applied to the most exhaustive and qualified list of this kind of tools. The paper will help designers, information architects, metadata librarians, and other staff involved in the design of digital information systems, to choose the right tool to manage their vocabularies in a LOD/vocabulary scenario

    A decade of Semantic Web research through the lenses of a mixed methods approach

    Get PDF
    The identification of research topics and trends is an important scientometric activity, as it can help guide the direction of future research. In the Semantic Web area, initially topic and trend detection was primarily performed through qualitative, top-down style approaches, that rely on expert knowledge. More recently, data-driven, bottom-up approaches have been proposed that offer a quantitative analysis of the evolution of a research domain. In this paper, we aim to provide a broader and more complete picture of Semantic Web topics and trends by adopting a mixed methods methodology, which allows for the combined use of both qualitative and quantitative approaches. Concretely, we build on a qualitative analysis of the main seminal papers, which adopt a top-down approach, and on quantitative results derived with three bottom-up data-driven approaches (Rexplore, Saffron, PoolParty), on a corpus of Semantic Web papers published between 2006 and 2015. In this process, we both use the latter for “fact-checking” on the former and also to derive key findings in relation to the strengths and weaknesses of top-down and bottom up approaches to research topic identification. Although we provide a detailed study on the past decade of Semantic Web research, the findings and the methodology are relevant not only for our community but beyond the area of the Semantic Web to other research fields as well

    Engineering Agile Big-Data Systems

    Get PDF
    To be effective, data-intensive systems require extensive ongoing customisation to reflect changing user requirements, organisational policies, and the structure and interpretation of the data they hold. Manual customisation is expensive, time-consuming, and error-prone. In large complex systems, the value of the data can be such that exhaustive testing is necessary before any new feature can be added to the existing design. In most cases, the precise details of requirements, policies and data will change during the lifetime of the system, forcing a choice between expensive modification and continued operation with an inefficient design.Engineering Agile Big-Data Systems outlines an approach to dealing with these problems in software and data engineering, describing a methodology for aligning these processes throughout product lifecycles. It discusses tools which can be used to achieve these goals, and, in a number of case studies, shows how the tools and methodology have been used to improve a variety of academic and business systems

    Transforming Thesaurus Records into MARC 21 and MADS: Designing a Framework for Libraries

    Get PDF
    Purpose This paper analyzes various thesaurus formats for converting data and how they can easily be implemented in libraries. These data formats are very important and necessary because they can easily transfer data from one system to another. The main focus of this system is on the data format of the Thesaurus Constructon. Methodology It is made with the TemaTres tool, which is used by many other tools. It has many new and modern features that librarians can use to create a new interface. In other words, it is possible to link other software very easily through these formats. There are four main steps to follow to build this system such as (i) Study the Thesaurus Subject Repositories; (ii) Comparative Study of Controlled Vocabulary Tools; (iii) Construction of Controlled Vocabularies; (iv) Creation of Formats for Thesaurus. Findings Users will benefit a lot from using this interface as they will be able to access all the information they need very easily. In addition, two of these formats, MARC21 and MADS, can be imported into Koha, allowing users to access additional information from Koha\u27s OPAC interface that is located within TemaTres. Originality With these concepts, thesaurus of any subject can be created and data linking between other software can be done. It is possible to publish any types of linked data formats with the help of Apache Jena and Apache Jena Fuseki to external integration for easy access of metadata. Therefore a prototype vocabulary can be created through this system from which all libraries can benefit

    Methods for Building Semantic Portals

    Get PDF
    Semantic portals are information systems which collect information from several sources and combine them using semantic web technologies into a user interface that solves information needs of users. Creating such portals requires methods and tools from multiple disciplines, including knowledge representation, information retrieval, information extraction, and user interface design. This thesis explores methods for building and improving semantic portals and other semantic web applications with contributions in three areas. The studies included in the thesis draw from the design science methodology in information systems research. First, a method for creating of faceted search user interfaces for semantic portals utilizing controlled vocabularies with a complex hierarchical structure is presented. The results show that the method allows the creation of user-centric search facets that hide the complex hierarchies from the user, resulting in a user-friendly faceted search interface. Second, the creation of structured metadata from text documents is enhanced by adapting a state of the art automatic subject indexing system to Finnish language texts. The results show that using a suitable combination of existing tools, automatic subject indexing quality comparable to that of human indexers can be attained in a highly inflected language such as Finnish. Finally, the quality of controlled vocabularies such as thesauri and lightweight ontologies is examined by developing a set of quality criteria for vocabularies expressed using the SKOS standard, and methods for correcting structural problems in SKOS vocabularies are presented. The results show that most published SKOS vocabularies suffer from quality issues and violate the SKOS integrity conditions. However, the great majority of such problems were corrected by the methods presented in this dissertation. The methods have been implemented in several real world applications, including the HealthFinland health information portal, the ARPA information extraction toolkit, and the ONKI ontology library system.Semanttiset portaalit ovat tietojärjestelmiä, jotka keräävät tietoa useista lähteistä ja yhdistävät ne semanttisen webin teknologioiden avulla käyttäjien tiedontarpeita tukevaksi käyttöliittymäksi. Tällaisten portaalien rakentaminen vaatii menetelmiä ja työkaluja useilta tieteenaloilta, mukaan lukien tietämyksen esittäminen, tiedonhaku, tiedon eristäminen ja käyttöliittymäsuunnittelu. Tässä väitöskirjassa tarkastellaan menetelmiä semanttisten portaalien ja muiden semanttisen webin sovellusten rakentamiseksi. Väitöskirjan tulokset jakaantuvat kolmeen osa-alueeseen. Tutkimuksessa käytetyt menetelmät perustuvat informaatiojärjestelmien tutkimuksessa käytettyihin suunnittelutieteen menetelmiin. Ensiksi väitöskirjassa esitetään menetelmä semanttisten portaalien fasettipohjaisten käyttöliittymien luomiseksi monimutkaisten kontrolloitujen sanastojen pohjalta. Tulokset osoittavat, että menetelmä mahdollistaa sellaisten käyttäjäkeskeisten hakunäkymien luomisen, jotka piilottavat monimutkaiset hierarkiat käyttäjältä ja auttavat siten luomaan käyttäjäystävällisen fasettipohjaisen hakukäyttöliittymän. Toiseksi rakenteisen metatiedon tuottamista tekstidokumenteista parannetaan sovittamalla nykyaikainen automaattisen sisällönkuvailun järjestelmä suomenkieliselle tekstiaineistolle. Tulokset osoittavat, että käyttämällä sopivaa yhdistelmää olemassaolevista työkaluista saavutetaan ihmistyönä tehtyyn sisällönkuvailuun verrattavissa oleva automaattisen sisällönkuvailun laatu myös agglutinatiivisella kielellä kuten suomen kielellä esitetyille aineistoille. Kolmanneksi tarkastellaan kontrolloitujen sanastojen kuten asiasanastojen ja kevytontologioiden laatua kehittämällä laatukriteeristö SKOS-standardin avulla esitetyille sanastoille sekä esittämällä menetelmiä SKOS-sanastojen rakenteisten ongelmien korjaamiseksi. Tulokset osoittavat, että useimmat julkaistut SKOS-sanastot kärsivät laatuongelmista eivätkä noudata SKOS-standardin eheyssääntöjä. Suuri osa näistä ongelmista pystyttiin korjaamaan tässä väitöskirjassa esitetyin menetelmin. Menetelmät on toteutettu useissa käytössä olevissa järjestelmissä, kuten TerveSuomi-terveystietoportaalissa, ARPA-tiedoneristämistyökalussa ja ONKI-ontologiakirjastossa

    Engineering Agile Big-Data Systems

    Get PDF
    To be effective, data-intensive systems require extensive ongoing customisation to reflect changing user requirements, organisational policies, and the structure and interpretation of the data they hold. Manual customisation is expensive, time-consuming, and error-prone. In large complex systems, the value of the data can be such that exhaustive testing is necessary before any new feature can be added to the existing design. In most cases, the precise details of requirements, policies and data will change during the lifetime of the system, forcing a choice between expensive modification and continued operation with an inefficient design.Engineering Agile Big-Data Systems outlines an approach to dealing with these problems in software and data engineering, describing a methodology for aligning these processes throughout product lifecycles. It discusses tools which can be used to achieve these goals, and, in a number of case studies, shows how the tools and methodology have been used to improve a variety of academic and business systems

    Supporting the Linked Data Life Cycle Using an Integrated Tool Stack

    Get PDF

    Linked Open Data - Creating Knowledge Out of Interlinked Data: Results of the LOD2 Project

    Get PDF
    Database Management; Artificial Intelligence (incl. Robotics); Information Systems and Communication Servic
    corecore