2,322 research outputs found

    Thesauri on the Web: current developments and trends

    Get PDF
    This article provides an overview of recent developments relating to the application of thesauri in information organisation and retrieval on the World Wide Web. It describes some recent thesaurus projects undertaken to facilitate resource description and discovery and access to wide-ranging information resources on the Internet. Types of thesauri available on the Web, thesauri integrated in databases and information retrieval systems, and multiple-thesaurus systems for cross-database searching are also discussed. Collective efforts and events in addressing the standardisation and novel applications of thesauri are briefly reviewed

    Towards a Universal Wordnet by Learning from Combined Evidenc

    Get PDF
    Lexical databases are invaluable sources of knowledge about words and their meanings, with numerous applications in areas like NLP, IR, and AI. We propose a methodology for the automatic construction of a large-scale multilingual lexical database where words of many languages are hierarchically organized in terms of their meanings and their semantic relations to other words. This resource is bootstrapped from WordNet, a well-known English-language resource. Our approach extends WordNet with around 1.5 million meaning links for 800,000 words in over 200 languages, drawing on evidence extracted from a variety of resources including existing (monolingual) wordnets, (mostly bilingual) translation dictionaries, and parallel corpora. Graph-based scoring functions and statistical learning techniques are used to iteratively integrate this information and build an output graph. Experiments show that this wordnet has a high level of precision and coverage, and that it can be useful in applied tasks such as cross-lingual text classification

    Git4Voc: Git-based Versioning for Collaborative Vocabulary Development

    Full text link
    Collaborative vocabulary development in the context of data integration is the process of finding consensus between the experts of the different systems and domains. The complexity of this process is increased with the number of involved people, the variety of the systems to be integrated and the dynamics of their domain. In this paper we advocate that the realization of a powerful version control system is the heart of the problem. Driven by this idea and the success of Git in the context of software development, we investigate the applicability of Git for collaborative vocabulary development. Even though vocabulary development and software development have much more similarities than differences there are still important differences. These need to be considered within the development of a successful versioning and collaboration system for vocabulary development. Therefore, this paper starts by presenting the challenges we were faced with during the creation of vocabularies collaboratively and discusses its distinction to software development. Based on these insights we propose Git4Voc which comprises guidelines how Git can be adopted to vocabulary development. Finally, we demonstrate how Git hooks can be implemented to go beyond the plain functionality of Git by realizing vocabulary-specific features like syntactic validation and semantic diffs

    Digital information support for concept design

    Get PDF
    This paper outlines the issues in effective utilisation of digital resources in conceptual design. Access to appropriate information acts as stimuli and can lead to better substantiated concepts. This paper addresses the issues of presenting such information in a digital form for effective use, exploring digital libraries and groupware as relevant literature areas, and argues that improved integration of these two technologies is necessary to better support the concept generation task. The development of the LauLima learning environment and digital library is consequently outlined. Despite its attempts to integrate the designers' working space and digital resources, continuing issues in library utilisation and migration of information to design concepts are highlighted through a class study. In light of this, new models of interaction to increase information use are explored

    Report of the Stanford Linked Data Workshop

    No full text
    The Stanford University Libraries and Academic Information Resources (SULAIR) with the Council on Library and Information Resources (CLIR) conducted at week-long workshop on the prospects for a large scale, multi-national, multi-institutional prototype of a Linked Data environment for discovery of and navigation among the rapidly, chaotically expanding array of academic information resources. As preparation for the workshop, CLIR sponsored a survey by Jerry Persons, Chief Information Architect emeritus of SULAIR that was published originally for workshop participants as background to the workshop and is now publicly available. The original intention of the workshop was to devise a plan for such a prototype. However, such was the diversity of knowledge, experience, and views of the potential of Linked Data approaches that the workshop participants turned to two more fundamental goals: building common understanding and enthusiasm on the one hand and identifying opportunities and challenges to be confronted in the preparation of the intended prototype and its operation on the other. In pursuit of those objectives, the workshop participants produced:1. a value statement addressing the question of why a Linked Data approach is worth prototyping;2. a manifesto for Linked Libraries (and Museums and Archives and 
);3. an outline of the phases in a life cycle of Linked Data approaches;4. a prioritized list of known issues in generating, harvesting & using Linked Data;5. a workflow with notes for converting library bibliographic records and other academic metadata to URIs;6. examples of potential “killer apps” using Linked Data: and7. a list of next steps and potential projects.This report includes a summary of the workshop agenda, a chart showing the use of Linked Data in cultural heritage venues, and short biographies and statements from each of the participants

    The development and interlinkage of a drought vocabulary in the EuroGEOSS interoperable catalogue infrastructure

    Get PDF
    Metadata catalogues are used for facilitating the discovery of data and web services in, e.g., growing collections of Earth observation resources. Two conditions need to be met in order to successfully retrieve resources in catalogues: the metadata describing resources have to be complete and accurate and the keywords used in searches semantically related to the keywords contained in the metadata descriptions. One method to increase the rate of successfully retrieved metadata in catalogues is the use of controlled vocabularies. Such vocabularies can be used for annotating metadata with appropriate keywords and then also presented to users of the catalogue for specifying search terms. In the process of preparing metadata for drought-related data and services within the EuroGEOSS project, the need of a drought-specific vocabulary arose. This paper presents this drought vocabulary, the methodology followed for its development, its integration in the EuroGEOSS drought infrastructure and discusses its usefulness for the drought thematic area. The usefulness of the vocabulary is hereby measured by an increased use of search terms coming from an appropriate vocabulary and by an increase in the successful retrieval of resources. In particular, metadata must be annotated with appropriate keywords from a controlled vocabulary, thesaurus or ontology suitable for that particular field

    Automated tagging of environmental data using a novel SKOS formatted environmental thesaurus

    Get PDF
    There is increasing need to use the widest range of data to address issues of environmental management and change, which is reflected in increasing emphasis from government funding agencies for better management and access to environmental data. Bringing together different environmental datasets to confidently enable integrated analysis requires reference to common standards and definitions, which are frequently lacking in environmental data, due to the broad subject area and lack of metadata. Automatic inclusion within datasets of controlled vocabulary concepts from publicly available standard vocabularies facilitates accurate annotation and promotes efficiency of metadata creation. To this end, we have developed a thesaurus capable of describing environmental chemistry datasets. We demonstrate a novel method for tagging datasets, via insertion of this thesaurus into a Laboratory Information Management System, enabling automated tagging of data, thus promoting semantic interoperability between tagged data resources. Being web available, and formatted using the Simple Knowledge Organisation System (SKOS) semantic standard, this thesaurus is capable of providing links both to and from other relevant thesauri, thus facilitating a linked data approach. Future developments will see extension of the thesaurus by the user community, in terms of both concepts included and links to externally hosted vocabularies. By employing a Linked Open Data approach, we anticipate that Web-based tools will be able to use concepts from the thesaurus to discover and link data to other information sources, including use in national assessment of the extent and condition of environmental resources

    27 pawns ready for action: A multi-indicator methodology and evaluation of thesaurus management tools from a LOD perspective

    Get PDF
    Purpose – The purpose of this paper is to propose a methodology for assessing thesauri and other controlled vocabularies management tools that can represent content using the Simple Knowledge Organization System (SKOS) data model, and their use in a Linked Open Data (LOD) paradigm. It effectively analyses selected set of tools in order to prove the validity of the method. Design/methodology/approach – A set of 27 criteria grouped in five evaluation indicators is proposed and applied to ten vocabulary management applications which are compliant with the SKOS data model. Previous studies of controlled vocabulary management software are gathered and analyzed, to compare the evaluation parameters used and the results obtained for each tool. Findings – The results indicate that the tool that obtains the highest score in every indicator is Poolparty. The second and third tools are, respectively, TemaTres and Intelligent Theme Manager, but scoring lower in most of the evaluation items. The use of a broad set of criteria to evaluate vocabularies management tools gives satisfactory results. The set of five indicators and 27 criteria proposed here represents a useful evaluation system in the selection of current and future tools to manage vocabularies. Research limitations/implications – The paper only assesses the ten most important/well know software tools applied for thesaurus and vocabulary management until October 2016. However, the evaluation criteria could be applied to new software that could appear in the future to create/manage SKOS vocabularies in compliance with LOD standards. Originality/value – The originality of this paper relies on the proposed indicators and criteria to evaluate vocabulary management tools. Those criteria and indicators can be valuable also for future software that might appear. The indicators are also applied to the most exhaustive and qualified list of this kind of tools. The paper will help designers, information architects, metadata librarians, and other staff involved in the design of digital information systems, to choose the right tool to manage their vocabularies in a LOD/vocabulary scenario

    HILT : High-Level Thesaurus Project. Phase IV and Embedding Project Extension : Final Report

    Get PDF
    Ensuring that Higher Education (HE) and Further Education (FE) users of the JISC IE can find appropriate learning, research and information resources by subject search and browse in an environment where most national and institutional service providers - usually for very good local reasons - use different subject schemes to describe their resources is a major challenge facing the JISC domain (and, indeed, other domains beyond JISC). Encouraging the use of standard terminologies in some services (institutional repositories, for example) is a related challenge. Under the auspices of the HILT project, JISC has been investigating mechanisms to assist the community with this problem through a JISC Shared Infrastructure Service that would help optimise the value obtained from expenditure on content and services by facilitating subject-search-based resource sharing to benefit users in the learning and research communities. The project has been through a number of phases, with work from earlier phases reported, both in published work elsewhere, and in project reports (see the project website: http://hilt.cdlr.strath.ac.uk/). HILT Phase IV had two elements - the core project, whose focus was 'to research, investigate and develop pilot solutions for problems pertaining to cross-searching multi-subject scheme information environments, as well as providing a variety of other terminological searching aids', and a short extension to encompass the pilot embedding of routines to interact with HILT M2M services in the user interfaces of various information services serving the JISC community. Both elements contributed to the developments summarised in this report

    39 hints to facilitate the use of semantics for data on agriculture and nutrition

    Get PDF
    In this paper, we report on the outputs and adoption of the Agrisemantics Working Group of the Research Data Alliance (RDA), consisting of a set of recommendations to facilitate the adoption of semantic technologies and methods for the purpose of data interoperability in the field of agriculture and nutrition. From 2016 to 2019, the group gathered researchers and practitioners at the crossing point between information technology and agricultural science, to study all aspects in the life cycle of semantic resources: Conceptualization, edition, sharing, standardization, services, alignment, long term support. First, the working group realized a landscape study, a study of the uses of semantics in agrifood, then collected use cases for the exploitation of semantics resources – a generic term to encompass vocabularies, terminologies, thesauri, ontologies. The resulting requirements were synthesized into 39 “hints” for users and developers of semantic resources, and providers of semantic resource services. We believe adopting these recommendations will engage agrifood sciences in a necessary transition to leverage data production, sharing and reuse and the adoption of the FAIR data principles. The paper includes examples of adoption of those requirements, and a discussion of their contribution to the field of data science
    • 

    corecore