13,433 research outputs found

    Classifying document types to enhance search and recommendations in digital libraries

    Full text link
    In this paper, we address the problem of classifying documents available from the global network of (open access) repositories according to their type. We show that the metadata provided by repositories enabling us to distinguish research papers, thesis and slides are missing in over 60% of cases. While these metadata describing document types are useful in a variety of scenarios ranging from research analytics to improving search and recommender (SR) systems, this problem has not yet been sufficiently addressed in the context of the repositories infrastructure. We have developed a new approach for classifying document types using supervised machine learning based exclusively on text specific features. We achieve 0.96 F1-score using the random forest and Adaboost classifiers, which are the best performing models on our data. By analysing the SR system logs of the CORE [1] digital library aggregator, we show that users are an order of magnitude more likely to click on research papers and thesis than on slides. This suggests that using document types as a feature for ranking/filtering SR results in digital libraries has the potential to improve user experience.Comment: 12 pages, 21st International Conference on Theory and Practise of Digital Libraries (TPDL), 2017, Thessaloniki, Greec

    An Experimental Digital Library Platform - A Demonstrator Prototype for the DigLib Project at SICS

    Get PDF
    Within the framework of the Digital Library project at SICS, this thesis describes the implementation of a demonstrator prototype of a digital library (DigLib); an experimental platform integrating several functions in one common interface. It includes descriptions of the structure and formats of the digital library collection, the tailoring of the search engine Dienst, the construction of a keyword extraction tool, and the design and development of the interface. The platform was realised through sicsDAIS, an agent interaction and presentation system, and is to be used for testing and evaluating various tools for information seeking. The platform supports various user interaction strategies by providing: search in bibliographic records (Dienst); an index of keywords (the Keyword Extraction Function (KEF)); and browsing through the hierarchical structure of the collection. KEF was developed for this thesis work, and extracts and presents keywords from Swedish documents. Although based on a comparatively simple algorithm, KEF contributes by supplying a long-felt want in the area of Information Retrieval. Evaluations of the tasks and the interface still remain to be done, but the digital library is very much up and running. By implementing the platform through sicsDAIS, DigLib can deploy additional tools and search engines without interfering with already running modules. If wanted, agents providing other services than SICS can supply, can be plugged in

    Natural language processing

    Get PDF
    Beginning with the basic issues of NLP, this chapter aims to chart the major research activities in this area since the last ARIST Chapter in 1996 (Haas, 1996), including: (i) natural language text processing systems - text summarization, information extraction, information retrieval, etc., including domain-specific applications; (ii) natural language interfaces; (iii) NLP in the context of www and digital libraries ; and (iv) evaluation of NLP systems

    Users' perception of relevance of spoken documents

    Get PDF
    We present the results of a study of user's perception of relevance of documents. The aim is to study experimentally how users' perception varies depending on the form that retrieved documents are presented. Documents retrieved in response to a query are presented to users in a variety of ways, from full text to a machine spoken query-biased automatically-generated summary, and the difference in users' perception of relevance is studied. The experimental results suggest that the effectiveness of advanced multimedia information retrieval applications may be affected by the low level of users' perception of relevance of retrieved documents

    DARIAH and the Benelux

    Get PDF

    Integrating information seeking and information structuring: spatial hypertext as an interface to the digital library.

    Get PDF
    Information seeking is the task of finding documents that satisfy the information needs of a person or organisation. Digital Libraries are one means of providing documents to meet the information needs of their users - i.e. as a resource to support information seeking. Therefore, research into the activity of information seeking is key to the development and understanding of digital libraries. Information structuring is the activity of organising documents found in the process of information seeking. Information structuring can be seen as either part of information seeking, or as a sepárate, complementary activity. It is a task performed by the seeker themselves and targeted by them to support their understanding and the management of later seeking activity. Though information structuring is an important task, it receives sparse support in current digital library Systems. Spatial hypertexts are computer software Systems that have been specifically been developed to support information structuring. However, they seldom are connected to Systems that support information seeking. Thus to day, the two inter-related activities of information seeking and information structuring have been supported by disjoint computer Systems. However, a variety of research strongly indicates that in physical environments, information seeking and information structuring are closely inter-related activities. Given this connection, this thesis explores whether a similar relationship can be found in electronic information seeking environments. However, given the absence of a software system that supports both activities well, there is an immédiate practical problem. In this thesis, I introduce an integrated information seeking and structuring System, called Garnet, that provides a spatial hypertext interface that also supports information seeking in a digital library. The opportunity of supporting information seeking by the artefacts of information structuring is explored in the Garnet system, drawing on the benefits previously found in supporting one information seeking activity with the artefacts of another. Garnet and its use are studied in a qualitative user study that results in the comparison of user behaviour in a combined electronic environment with previous studies in physical environments. The response of participants to using Garnet is reported, particularly regarding their perceptions of the combined system and the quality of the interaction. Finally, the potential value of the artefacts of information structuring to support information seeking is also evaluated

    Digital Image Access & Retrieval

    Get PDF
    The 33th Annual Clinic on Library Applications of Data Processing, held at the University of Illinois at Urbana-Champaign in March of 1996, addressed the theme of "Digital Image Access & Retrieval." The papers from this conference cover a wide range of topics concerning digital imaging technology for visual resource collections. Papers covered three general areas: (1) systems, planning, and implementation; (2) automatic and semi-automatic indexing; and (3) preservation with the bulk of the conference focusing on indexing and retrieval.published or submitted for publicatio
    corecore