4,301 research outputs found

    Using semantic indexing to improve searching performance in web archives

    Get PDF
    The sheer volume of electronic documents being published on the Web can be overwhelming for users if the searching aspect is not properly addressed. This problem is particularly acute inside archives and repositories containing large collections of web resources or, more precisely, web pages and other web objects. Using the existing search capabilities in web archives, results can be compromised because of the size of data, content heterogeneity and changes in scientific terminologies and meanings. During the course of this research, we will explore whether semantic web technologies, particularly ontology-based annotation and retrieval, could improve precision in search results in multi-disciplinary web archives

    HILT IV : subject interoperability through building and embedding pilot terminology web services

    Get PDF
    A report of work carried out within the JISC-funded HILT Phase IV project, the paper looks at the project's context against the background of other recent and ongoing terminologies work, describes its outcome and conclusions, including technical outcomes and terminological characteristics, and considers possible future research and development directions. The Phase IV project has taken HILT to the point where the launch of an operational support service in the area of subject interoperability is a feasible option and where both investigation of specific needs in this area and practical collaborative work are sensible and feasible next steps. Moving forward requires detailed work, not only on terminology interoperability and associated service delivery issues, but also on service and end user needs and engagement, service sustainability issues, and the practicalities of interworking with other terminology services and projects in UK, Europe, and global contexts

    Use of Solr and Xapian in the Invenio document repository software

    Full text link
    Invenio is a free comprehensive web-based document repository and digital library software suite originally developed at CERN. It can serve a variety of use cases from an institutional repository or digital library to a web journal. In order to fully use full-text documents for efficient search and ranking, Solr was integrated into Invenio through a generic bridge. Solr indexes extracted full-texts and most relevant metadata. Consequently, Invenio takes advantage of Solr's efficient search and word similarity ranking capabilities. In this paper, we first give an overview of Invenio, its capabilities and features. We then present our open source Solr integration as well as scalability challenges that arose for an Invenio-based multi-million record repository: the CERN Document Server. We also compare our Solr adapter to an alternative Xapian adapter using the same generic bridge. Both integrations are distributed with the Invenio package and ready to be used by the institutions using or adopting Invenio

    HILT : High-Level Thesaurus Project. Phase IV and Embedding Project Extension : Final Report

    Get PDF
    Ensuring that Higher Education (HE) and Further Education (FE) users of the JISC IE can find appropriate learning, research and information resources by subject search and browse in an environment where most national and institutional service providers - usually for very good local reasons - use different subject schemes to describe their resources is a major challenge facing the JISC domain (and, indeed, other domains beyond JISC). Encouraging the use of standard terminologies in some services (institutional repositories, for example) is a related challenge. Under the auspices of the HILT project, JISC has been investigating mechanisms to assist the community with this problem through a JISC Shared Infrastructure Service that would help optimise the value obtained from expenditure on content and services by facilitating subject-search-based resource sharing to benefit users in the learning and research communities. The project has been through a number of phases, with work from earlier phases reported, both in published work elsewhere, and in project reports (see the project website: http://hilt.cdlr.strath.ac.uk/). HILT Phase IV had two elements - the core project, whose focus was 'to research, investigate and develop pilot solutions for problems pertaining to cross-searching multi-subject scheme information environments, as well as providing a variety of other terminological searching aids', and a short extension to encompass the pilot embedding of routines to interact with HILT M2M services in the user interfaces of various information services serving the JISC community. Both elements contributed to the developments summarised in this report

    CHORUS Deliverable 2.2: Second report - identification of multi-disciplinary key issues for gap analysis toward EU multimedia search engines roadmap

    Get PDF
    After addressing the state-of-the-art during the first year of Chorus and establishing the existing landscape in multimedia search engines, we have identified and analyzed gaps within European research effort during our second year. In this period we focused on three directions, notably technological issues, user-centred issues and use-cases and socio- economic and legal aspects. These were assessed by two central studies: firstly, a concerted vision of functional breakdown of generic multimedia search engine, and secondly, a representative use-cases descriptions with the related discussion on requirement for technological challenges. Both studies have been carried out in cooperation and consultation with the community at large through EC concertation meetings (multimedia search engines cluster), several meetings with our Think-Tank, presentations in international conferences, and surveys addressed to EU projects coordinators as well as National initiatives coordinators. Based on the obtained feedback we identified two types of gaps, namely core technological gaps that involve research challenges, and “enablers”, which are not necessarily technical research challenges, but have impact on innovation progress. New socio-economic trends are presented as well as emerging legal challenges

    Terminology server for improved resource discovery: analysis of model and functions

    Get PDF
    This paper considers the potential to improve distributed information retrieval via a terminologies server. The restriction upon effective resource discovery caused by the use of disparate terminologies across services and collections is outlined, before considering a DDC spine based approach involving inter-scheme mapping as a possible solution. The developing HILT model is discussed alongside other existing models and alternative approaches to solving the terminologies problem. Results from the current HILT pilot are presented to illustrate functionality and suggestions are made for further research and development

    Multiple Retrieval Models and Regression Models for Prior Art Search

    Get PDF
    This paper presents the system called PATATRAS (PATent and Article Tracking, Retrieval and AnalysiS) realized for the IP track of CLEF 2009. Our approach presents three main characteristics: 1. The usage of multiple retrieval models (KL, Okapi) and term index definitions (lemma, phrase, concept) for the three languages considered in the present track (English, French, German) producing ten different sets of ranked results. 2. The merging of the different results based on multiple regression models using an additional validation set created from the patent collection. 3. The exploitation of patent metadata and of the citation structures for creating restricted initial working sets of patents and for producing a final re-ranking regression model. As we exploit specific metadata of the patent documents and the citation relations only at the creation of initial working sets and during the final post ranking step, our architecture remains generic and easy to extend

    BlogForever D3.2: Interoperability Prospects

    Get PDF
    This report evaluates the interoperability prospects of the BlogForever platform. Therefore, existing interoperability models are reviewed, a Delphi study to identify crucial aspects for the interoperability of web archives and digital libraries is conducted, technical interoperability standards and protocols are reviewed regarding their relevance for BlogForever, a simple approach to consider interoperability in specific usage scenarios is proposed, and a tangible approach to develop a succession plan that would allow a reliable transfer of content from the current digital archive to other digital repositories is presented

    CRIS-IR 2006

    Get PDF
    The recognition of entities and their relationships in document collections is an important step towards the discovery of latent knowledge as well as to support knowledge management applications. The challenge lies on how to extract and correlate entities, aiming to answer key knowledge management questions, such as; who works with whom, on which projects, with which customers and on what research areas. The present work proposes a knowledge mining approach supported by information retrieval and text mining tasks in which its core is based on the correlation of textual elements through the LRD (Latent Relation Discovery) method. Our experiments show that LRD outperform better than other correlation methods. Also, we present an application in order to demonstrate the approach over knowledge management scenarios.Fundação para a CiĂȘncia e a Tecnologia (FCT) Denmark's Electronic Research Librar
    • 

    corecore