2,524 research outputs found

    Query Expansion for Survey Question Retrieval in the Social Sciences

    Full text link
    In recent years, the importance of research data and the need to archive and to share it in the scientific community have increased enormously. This introduces a whole new set of challenges for digital libraries. In the social sciences typical research data sets consist of surveys and questionnaires. In this paper we focus on the use case of social science survey question reuse and on mechanisms to support users in the query formulation for data sets. We describe and evaluate thesaurus- and co-occurrence-based approaches for query expansion to improve retrieval quality in digital libraries and research data archives. The challenge here is to translate the information need and the underlying sociological phenomena into proper queries. As we can show retrieval quality can be improved by adding related terms to the queries. In a direct comparison automatically expanded queries using extracted co-occurring terms can provide better results than queries manually reformulated by a domain expert and better results than a keyword-based BM25 baseline.Comment: to appear in Proceedings of 19th International Conference on Theory and Practice of Digital Libraries 2015 (TPDL 2015

    Towards a Universal Wordnet by Learning from Combined Evidenc

    Get PDF
    Lexical databases are invaluable sources of knowledge about words and their meanings, with numerous applications in areas like NLP, IR, and AI. We propose a methodology for the automatic construction of a large-scale multilingual lexical database where words of many languages are hierarchically organized in terms of their meanings and their semantic relations to other words. This resource is bootstrapped from WordNet, a well-known English-language resource. Our approach extends WordNet with around 1.5 million meaning links for 800,000 words in over 200 languages, drawing on evidence extracted from a variety of resources including existing (monolingual) wordnets, (mostly bilingual) translation dictionaries, and parallel corpora. Graph-based scoring functions and statistical learning techniques are used to iteratively integrate this information and build an output graph. Experiments show that this wordnet has a high level of precision and coverage, and that it can be useful in applied tasks such as cross-lingual text classification

    A Spinning Wheel for YARN: User Interface for a Crowdsourced Thesaurus

    Full text link
    YARN (Yet Another RussNet) project started in 2013 aims at creating a large open thesaurus for Russian using crowdsourcing. This paper describes synset assembly interface developed within the project — motivation behind it, design, usage scenarios, implementation details, and first experimental results

    Indexing Languages for Information Management, a Promising Future or an Obsolete Resource?

    Get PDF
    Indexing languages have traditionally been an essential tool for organizing and retrieving documental information. The inclusion of indexing languages into the digital environment leads to new frontiers, but also new opportunities. This study shows the historical evolution of the indexing languages and its application in document management field. We analyze diverse trends for their digital use from two perspectives: their integration with other digital and linguistic resources, and the adjustment of them into the Web environment. Finally, there is an analysis of how these languages are used in the Web 2.0 and the incorporation of ontologies in the Semantic Web.This work was carried out within the framework of a research Project financed by the Spanish government (Ministerio de Educación y Ciencia, Secretaría de Estado de Universidades e Investigación, TIN 2007-67153)

    Geographical information retrieval with ontologies of place

    Get PDF
    Geographical context is required of many information retrieval tasks in which the target of the search may be documents, images or records which are referenced to geographical space only by means of place names. Often there may be an imprecise match between the query name and the names associated with candidate sources of information. There is a need therefore for geographical information retrieval facilities that can rank the relevance of candidate information with respect to geographical closeness of place as well as semantic closeness with respect to the information of interest. Here we present an ontology of place that combines limited coordinate data with semantic and qualitative spatial relationships between places. This parsimonious model of geographical place supports maintenance of knowledge of place names that relate to extensive regions of the Earth at multiple levels of granularity. The ontology has been implemented with a semantic modelling system linking non-spatial conceptual hierarchies with the place ontology. An hierarchical spatial distance measure is combined with Euclidean distance between place centroids to create a hybrid spatial distance measure. This is integrated with thematic distance, based on classification semantics, to create an integrated semantic closeness measure that can be used for a relevance ranking of retrieved objects

    Automatic thesaurus construction

    Get PDF
    Sydney, NS

    A review of the state of the art in Machine Learning on the Semantic Web: Technical Report CSTR-05-003

    Get PDF
    corecore