32 research outputs found

    Supporting polyrepresentation in a quantum-inspired geometrical retrieval framework

    Get PDF
    The relevance of a document has many facets, going beyond the usual topical one, which have to be considered to satisfy a user's information need. Multiple representations of documents, like user-given reviews or the actual document content, can give evidence towards certain facets of relevance. In this respect polyrepresentation of documents, where such evidence is combined, is a crucial concept to estimate the relevance of a document. In this paper, we discuss how a geometrical retrieval framework inspired by quantum mechanics can be extended to support polyrepresentation. We show by example how different representations of a document can be modelled in a Hilbert space, similar to physical systems known from quantum mechanics. We further illustrate how these representations are combined by means of the tensor product to support polyrepresentation, and discuss the case that representations of documents are not independent from a user point of view. Besides giving a principled framework for polyrepresentation, the potential of this approach is to capture and formalise the complex interdependent relationships that the different representations can have between each other

    Classification of Visualization Types and Perspectives in Patents

    Full text link
    Due to the swift growth of patent applications each year, information and multimedia retrieval approaches that facilitate patent exploration and retrieval are of utmost importance. Different types of visualizations (e.g., graphs, technical drawings) and perspectives (e.g., side view, perspective) are used to visualize details of innovations in patents. The classification of these images enables a more efficient search and allows for further analysis. So far, datasets for image type classification miss some important visualization types for patents. Furthermore, related work does not make use of recent deep learning approaches including transformers. In this paper, we adopt state-of-the-art deep learning methods for the classification of visualization types and perspectives in patent images. We extend the CLEF-IP dataset for image type classification in patents to ten classes and provide manual ground truth annotations. In addition, we derive a set of hierarchical classes from a dataset that provides weakly-labeled data for image perspectives. Experimental results have demonstrated the feasibility of the proposed approaches. Source code, models, and dataset will be made publicly available.Comment: Accepted in International Conference on Theory and Practice of Digital Libraries (TPDL) 2023 (They have the copyright to publish camera-ready version of this work

    The Janus Faced Scholar:a Festschrift in honour of Peter Ingwersen

    Get PDF

    What is the influence of genre during the perception of structured text for retrieval and search?

    Get PDF
    This thesis presents an investigation into the high value of structured text (or form) in the context of genre within Information Retrieval. In particular, how are these structured texts perceived and why are they not more heavily used within Information Retrieval & Search communities? The main motivation is to show the features in which people can exploit genre within Information Search & Retrieval, in particular, categorisation and search tasks. To do this, it was vital to record and analyse how and why this was done during typical tasks. The literature review highlighted two previous studies (Toms & Campbell 1999a; Watt 2009) which have reported pilot studies consisting of genre categorisation and information searching. Both studies and other findings within the literature review inspired the work contained within this thesis. Genre is notoriously hard to define, but a very useful framework of Purpose and Form, developed by Yates & Orlikowski (1992), was utilised to design two user studies for the research reported within the thesis. The two studies consisted of, first, a categorisation task (e-mails), and second, a set of six simulated situations in Wikipedia, both of which collected quantitative data from eye tracking experiments as well as qualitative user data. The results of both studies showed the extent to which the participants utilised the form features of the stimuli presented, in particular, how these were used, which ocular behaviours (skimming or scanning) and actual features were used, and which were the most important. The main contributions to research made by this thesis were, first of all, that the task-based user evaluations employing simulated search scenarios revealed how and why users make decisions while interacting with the textual features of structure and layout within a discourse community, and, secondly, an extensive evaluation of the quantitative data revealed the features that were used by the participants in the user studies and the effects of the interpretation of genre in the search and categorisation process as well as the perceptual processes used in the various communities. This will be of benefit for the re-development of information systems. As far as is known, this is the first detailed and systematic investigation into the types of features, value of form, perception of features, and layout of genre using eye tracking in online communities, such as Wikipedia

    On Term Selection Techniques for Patent Prior Art Search

    No full text
    A patent is a set of exclusive rights granted to an inventor to protect his invention for a limited period of time. Patent prior art search involves finding previously granted patents, scientific articles, product descriptions, or any other published work that may be relevant to a new patent application. Many well-known information retrieval (IR) techniques (e.g., typical query expansion methods), which are proven effective for ad hoc search, are unsuccessful for patent prior art search. In this thesis, we mainly investigate the reasons that generic IR techniques are not effective for prior art search on the CLEF-IP test collection. First, we analyse the errors caused due to data curation and experimental settings like applying International Patent Classification codes assigned to the patent topics to filter the search results. Then, we investigate the influence of term selection on retrieval performance on the CLEF-IP prior art test collection, starting with the description section of the reference patent and using language models (LM) and BM25 scoring functions. We find that an oracular relevance feedback system, which extracts terms from the judged relevant documents far outperforms the baseline (i.e., 0.11 vs. 0.48) and performs twice as well on mean average precision (MAP) as the best participant in CLEF-IP 2010 (i.e., 0.22 vs. 0.48). We find a very clear term selection value threshold for use when choosing terms. We also notice that most of the useful feedback terms are actually present in the original query and hypothesise that the baseline system can be substantially improved by removing negative query terms. We try four simple automated approaches to identify negative terms for query reduction but we are unable to improve on the baseline performance with any of them. However, we show that a simple, minimal feedback interactive approach, where terms are selected from only the first retrieved relevant document outperforms the best result from CLEF-IP 2010, suggesting the promise of interactive methods for term selection in patent prior art search

    Investigating User Search Tactic Patterns and System Support in Using Digital Libraries

    Get PDF
    This study aims to investigate users\u27 search tactic application and system support in using digital libraries. A user study was conducted with sixty digital library users. The study was designed to answer three research questions: 1) How do users engage in a search process by applying different types of search tactics while conducting different search tasks?; 2) How does the system support users to apply different types of search tactics?; 3) How do users\u27 search tactic application and system support for different types of search tactics affect search outputs? Sixty student subjects were recruited from different disciplines in a state research university. Multiple methods were employed to collect data, including questionnaires, transaction logs and think-aloud protocols. Subjects were asked to conduct three different types of search tasks, namely, known-item search, specific information search and exploratory search, using Library of Congress Digital Libraries. To explore users\u27 search tactic patterns (RQ1), quantitative analysis was conducted, including descriptive statistics, kernel regression, transition analysis, and clustering analysis. Types of system support were explored by analyzing system features for search tactic application. In addition, users\u27 perceived system support, difficulty, and satisfaction with search tactic application were measured using post-search questionnaires (RQ2). Finally, the study examined the causal relationships between search process and search outputs (RQ 3) based on multiple regression and structural equation modeling. This study uncovers unique behavior of users\u27 search tactic application and corresponding system support in the context of digital libraries. First, search tactic selections, changes, and transitions were explored in different task situations - known-item search, specific information search, and exploratory search. Search tactic application patterns differed by task type. In known-item search tasks, users preferred to apply search query creation and following search result evaluation tactics, but less query reformulation or iterative tactic loops were observed. In specific information search tasks, iterative search result evaluation strategies were dominantly used. In exploratory tasks, browsing tactics were frequently selected as well as search result evaluation tactics. Second, this study identified different types of system support for search tactic application. System support, difficulty, and satisfaction were measure in terms of search tactic application focusing on search process. Users perceived relatively high system support for accessing and browsing tactics while less support for query reformulation and item evaluation tactics. Third, the effects of search tactic selections and system support on search outputs were examined based on multiple regression. In known-item searches, frequencies of query creation and accessing forwarding tactics would positively affect search efficiency. In specific information searches, time spent on applying search result evaluation tactics would have a positive impact on success rate. In exploratory searches, browsing tactics turned out to be positively associated with aspectual recall and satisfaction with search results. Based on the findings, the author discussed unique patterns of users\u27 search tactic application as well as system design implications in digital library environments
    corecore