455,424 research outputs found

    Escaping the Trap of too Precise Topic Queries

    Full text link
    At the very center of digital mathematics libraries lie controlled vocabularies which qualify the {\it topic} of the documents. These topics are used when submitting a document to a digital mathematics library and to perform searches in a library. The latter are refined by the use of these topics as they allow a precise classification of the mathematics area this document addresses. However, there is a major risk that users employ too precise topics to specify their queries: they may be employing a topic that is only "close-by" but missing to match the right resource. We call this the {\it topic trap}. Indeed, since 2009, this issue has appeared frequently on the i2geo.net platform. Other mathematics portals experience the same phenomenon. An approach to solve this issue is to introduce tolerance in the way queries are understood by the user. In particular, the approach of including fuzzy matches but this introduces noise which may prevent the user of understanding the function of the search engine. In this paper, we propose a way to escape the topic trap by employing the navigation between related topics and the count of search results for each topic. This supports the user in that search for close-by topics is a click away from a previous search. This approach was realized with the i2geo search engine and is described in detail where the relation of being {\it related} is computed by employing textual analysis of the definitions of the concepts fetched from the Wikipedia encyclopedia.Comment: 12 pages, Conference on Intelligent Computer Mathematics 2013 Bath, U

    Optimising metadata to make high-value content more accessible to Google users

    Get PDF
    Purpose: This paper shows how information in digital collections that have been catalogued using high-quality metadata can be retrieved more easily by users of search engines such as Google. Methodology/approach: The research and proposals described arose from an investigation into the observed phenomenon that pages from the Glasgow Digital Library (gdl.cdlr.strath.ac.uk) were regularly appearing near the top of Google search results shortly after publication, without any deliberate effort to achieve this. The reasons for this phenomenon are now well understood and are described in the second part of the paper. The first part provides context with a review of the impact of Google and a summary of recent initiatives by commercial publishers to make their content more visible to search engines. Findings/practical implications: The literature research provides firm evidence of a trend amongst publishers to ensure that their online content is indexed by Google, in recognition of its popularity with Internet users. The practical research demonstrates how search engine accessibility can be compatible with use of established collection management principles and high-quality metadata. Originality/value: The concept of data shoogling is introduced, involving some simple techniques for metadata optimisation. Details of its practical application are given, to illustrate how those working in academic, cultural and public-sector organisations could make their digital collections more easily accessible via search engines, without compromising any existing standards and practices

    Challenges in distributed information search in a semantic digital library

    Get PDF
    Nowadays an enormous quantity of heterogeneous and distributed information is stored in the current digital libraries. Access to these collections poses a serious challenge, however, because present search techniques based on manually annotated metadata and linear replay of material selected by the user do not scale effectively or efficiently to large collections. The artificial intelligent and semantic Web provides a common framework that allows knowledge to be shared and reused. In this paper we propose a comprehensive approach for discovering information objects in large digital collections based on analysis of recorded semantic metadata in those objects and the application of expert system technologies. We suggest a conceptual architecture for a semantic and intelligent search engine. OntoFAMA is a collaborative effort that proposes a new form of interaction between people and Digital Library, where the latter is adapted to individuals and their surroundings. We have used Case Based-Reasoning methodology to develop a prototype for supporting efficient retrieval knowledge from digital library of Seville University

    Intelligent information processing in a digital library using semantic web

    Get PDF
    With the explosive growth of information, it is becoming increasingly difficult to retrieve the relevant documents with current search engine only. The information is treated as an ordinary database that manages the contents and positions. To the individual user, there is a great deal of useless information in addition to the substantial amount of useful information. This begets new challenges to docent community and motivates researchers to look for intelligent information retrieval approach and ontologies that search and/or filter information automatically based on some higher level of understanding are required. We study improving the efficiency of search methods and classify the search patrons into several models based on the profiles of agent based on ontology. We have proposed a method to efficiently search for the target information on a Digital Library network with multiple independent information sources. This paper outlines the development of an expert prototype system based in an ontology for retrieval information of the Digital Library University of Seville. The results of this study demonstrate that by improving representation by incorporating more metadata from within the information and the ontology into the retrieval process, the effectiveness of the information retrieval is enhanced. We used Jcolibri and Prótége for developing the ontology and creation the expert system respectively

    Integrated Framework for Discovering Digital Library Collections, An

    Full text link
    Information seekers are generally on their own to discover and use a research library's growing array of digital collections, and coordination of these collections' development and maintenance is often not optimal. The frequent lack of a conscious design for how collections fit together is of equal concern because it means that research libraries are not making the most of the substantial investments they are making in digital initiatives. This paper proposes a framework for a research library's digital collections that offers integrated discovery and a set of best practices to underpin collection building, federated access, and sustainability. The framework's purpose is to give information seekers a powerful and easy way to search across existing and future collections and to retrieve integrated sets of results. The paper and its recommendations are based upon research undertaken by the author and a team of librarians and technologists at Cornell University Library. The team conducted structured interviews of forty-five library staff members involved in digital collection building at Cornell, studied an inventory of the library's more than fifty digital collections, and evaluated seven existing OAI and federated search production or prototype systems. Discusses the author's team's research and the rationale for their recommendations to: present a cohesive view of the library's digital collections for both browsing and searching at the object level; take a programmatic (rather than project-based) approach to digital collection building; require that all new digital collections conform to library-developed and agreed-upon OAI best practices for data providers; and implement organizational structures to sustain the library's digital collections over the long term

    Classifying document types to enhance search and recommendations in digital libraries

    Full text link
    In this paper, we address the problem of classifying documents available from the global network of (open access) repositories according to their type. We show that the metadata provided by repositories enabling us to distinguish research papers, thesis and slides are missing in over 60% of cases. While these metadata describing document types are useful in a variety of scenarios ranging from research analytics to improving search and recommender (SR) systems, this problem has not yet been sufficiently addressed in the context of the repositories infrastructure. We have developed a new approach for classifying document types using supervised machine learning based exclusively on text specific features. We achieve 0.96 F1-score using the random forest and Adaboost classifiers, which are the best performing models on our data. By analysing the SR system logs of the CORE [1] digital library aggregator, we show that users are an order of magnitude more likely to click on research papers and thesis than on slides. This suggests that using document types as a feature for ranking/filtering SR results in digital libraries has the potential to improve user experience.Comment: 12 pages, 21st International Conference on Theory and Practise of Digital Libraries (TPDL), 2017, Thessaloniki, Greec

    Usability aspects of the inside-in approach for ancillary search tasks on the web

    Get PDF
    International audienceGiven the huge amount of data available over the Web nowadays, search engines become essential tools helping users to find the information they are looking for. Nonetheless, search engines often return large sets of results which must be filtered by the users to find the suitable information items. However, in many cases, filtering is not enough, as the results returned by the engine require users to perform a secondary search to complement the current information thus featuring ancillary search tasks. Such ancillary search tasks create a nested context for user tasks that increases the articulatory distance between the users and their ultimate goal. In this paper, we analyze the interplay between such ancillary searches and other primary search tasks on the Web. Moreover, we describe the inside-in approach, which aims at reducing the articulatory distance between interleaved tasks by allowing users to perform ancillary search tasks without losing the context. The inside-in approach is illustrated by means of a case study based on ancillary searches of coauthors in a digital library, using an information visualization technique

    Searching for old news: User interests and behavior within a national collection

    Get PDF
    Modeling user interests helps to improve system support or refine recommendations in Interactive Information Retrieval. The aim of this study is to identify user interests in different parts of an online collection and investigate the related search behavior. To do this, we propose to use the metadata of selected facets and clicked documents as features for clustering sessions identified in user logs. We evaluate the session clusters by measuring their stability over a six-month period. We apply our approach to data from the National Library of the Netherlands, a typical digital library with a richly annotated historical newspaper collection and a faceted search interface. Our results show that users interested in specific parts of the collection use different search techniques. We demonstrate that a metadata-based clustering helps to reveal and understand user interests in terms of the collection, and how search behavior is related to specific parts within the collection.</p

    An examination of automatic video retrieval technology on access to the contents of an historical video archive

    Get PDF
    Purpose – This paper aims to provide an initial understanding of the constraints that historical video collections pose to video retrieval technology and the potential that online access offers to both archive and users. Design/methodology/approach – A small and unique collection of videos on customs and folklore was used as a case study. Multiple methods were employed to investigate the effectiveness of technology and the modality of user access. Automatic keyframe extraction was tested on the visual content while the audio stream was used for automatic classification of speech and music clips. The user access (search vs browse) was assessed in a controlled user evaluation. A focus group and a survey provided insight on the actual use of the analogue archive. The results of these multiple studies were then compared and integrated (triangulation). Findings – The amateur material challenged automatic techniques for video and audio indexing, thus suggesting that the technology must be tested against the material before deciding on a digitisation strategy. Two user interaction modalities, browsing vs searching, were tested in a user evaluation. Results show users preferred searching, but browsing becomes essential when the search engine fails in matching query and indexed words. Browsing was also valued for serendipitous discovery; however the organisation of the archive was judged cryptic and therefore of limited use. This indicates that the categorisation of an online archive should be thought of in terms of users who might not understand the current classification. The focus group and the survey showed clearly the advantage of online access even when the quality of the video surrogate is poor. The evidence gathered suggests that the creation of a digital version of a video archive requires a rethinking of the collection in terms of the new medium: a new archive should be specially designed to exploit the potential that the digital medium offers. Similarly, users' needs have to be considered before designing the digital library interface, as needs are likely to be different from those imagined. Originality/value – This paper is the first attempt to understand the advantages offered and limitations held by video retrieval technology for small video archives like those often found in special collections

    Identifying semantically similar arabic words using a large vocabulary speech recognition system

    Get PDF
    Users search digital libraries for book references using one or more attributes such as keywords, subject and author name. Some book titles might contain the keyword that the user specified and thus these titles will directly qualify as candidate results. On the other hand there are other titles that are relevant but do not contain the same exact search keyword. A user expects to retrieve all titles that are relevant to a specified keyword. Similarly when searching for an author name, the system should be able to retrieve the different forms of the name. The library science community developed a mechanism called authority control that allows the user to do a comprehensive search and retrieve all the records that are relevant to the query keyword. In this paper we propose an approach that allows the user to query an Arabic audio library using voice. We use a combination of class-based language models and robust interpretation to recognize and identify the spoken keywords. The mechanism uses a Large Vocabulary Recognition System (LVCSR) to implement the functionality of the authority control system. A series of experiments were performed to assess the accuracy and the robustness of the proposed approach: restricted grammar recognition with semantic interpretation, class-based statistical language models (CB_SLM) with robust interpretation, and generalized CB-SLM. The results have shown that the combination of CB-SLM and robust interpretation provides better accuracy and robustness than the traditional grammar-based parsing
    corecore