32 research outputs found
Supporting polyrepresentation in a quantum-inspired geometrical retrieval framework
The relevance of a document has many facets, going beyond the usual topical one, which have to be considered to satisfy a user's information need. Multiple representations of documents, like user-given reviews or the actual document content, can give evidence towards certain facets of relevance. In this respect polyrepresentation of documents, where such evidence is combined, is a crucial concept to estimate the relevance of a document. In this paper, we discuss how a geometrical retrieval framework inspired by quantum mechanics can be extended to support polyrepresentation. We show by example how different representations of a document can be modelled in a Hilbert space, similar to physical systems known from quantum mechanics. We further illustrate how these representations are combined by means of the tensor product to support polyrepresentation, and discuss the case that representations of documents are not independent from a user point of view. Besides giving a principled framework for polyrepresentation, the potential of this approach is to capture and formalise the complex interdependent relationships that the different representations can have between each other
Classification of Visualization Types and Perspectives in Patents
Due to the swift growth of patent applications each year, information and
multimedia retrieval approaches that facilitate patent exploration and
retrieval are of utmost importance. Different types of visualizations (e.g.,
graphs, technical drawings) and perspectives (e.g., side view, perspective) are
used to visualize details of innovations in patents. The classification of
these images enables a more efficient search and allows for further analysis.
So far, datasets for image type classification miss some important
visualization types for patents. Furthermore, related work does not make use of
recent deep learning approaches including transformers. In this paper, we adopt
state-of-the-art deep learning methods for the classification of visualization
types and perspectives in patent images. We extend the CLEF-IP dataset for
image type classification in patents to ten classes and provide manual ground
truth annotations. In addition, we derive a set of hierarchical classes from a
dataset that provides weakly-labeled data for image perspectives. Experimental
results have demonstrated the feasibility of the proposed approaches. Source
code, models, and dataset will be made publicly available.Comment: Accepted in International Conference on Theory and Practice of
Digital Libraries (TPDL) 2023 (They have the copyright to publish
camera-ready version of this work
What is the influence of genre during the perception of structured text for retrieval and search?
This thesis presents an investigation into the high value of structured text (or form) in the context of genre within Information Retrieval. In particular, how are these structured texts perceived and why are they not more heavily used within Information Retrieval & Search communities? The main motivation is to show the features in which people can exploit genre within Information Search & Retrieval, in particular, categorisation and search tasks. To do this, it was vital to record and analyse how and why this was done during typical tasks. The literature review highlighted two previous studies (Toms & Campbell 1999a; Watt 2009) which have reported pilot studies consisting of genre categorisation and information searching. Both studies and other findings within the literature review inspired the work contained within this thesis. Genre is notoriously hard to define, but a very useful framework of Purpose and Form, developed by Yates & Orlikowski (1992), was utilised to design two user studies for the research reported within the thesis. The two studies consisted of, first, a categorisation task (e-mails), and second, a set of six simulated situations in Wikipedia, both of which collected quantitative data from eye tracking experiments as well as qualitative user data. The results of both studies showed the extent to which the participants utilised the form features of the stimuli presented, in particular, how these were used, which ocular behaviours (skimming or scanning) and actual features were used, and which were the most important. The main contributions to research made by this thesis were, first of all, that the task-based user evaluations employing simulated search scenarios revealed how and why users make decisions while interacting with the textual features of structure and layout within a discourse community, and, secondly, an extensive evaluation of the quantitative data revealed the features that were used by the participants in the user studies and the effects of the interpretation of genre in the search and categorisation process as well as the perceptual processes used in the various communities. This will be of benefit for the re-development of information systems. As far as is known, this is the first detailed and systematic investigation into the types of features, value of form, perception of features, and layout of genre using eye tracking in online communities, such as Wikipedia
On Term Selection Techniques for Patent Prior Art Search
A patent is a set of exclusive rights granted to an inventor to
protect his invention for
a limited period of time. Patent prior art search involves
finding previously granted
patents, scientific articles, product descriptions, or any other
published work that
may be relevant to a new patent application. Many well-known
information retrieval
(IR) techniques (e.g., typical query expansion methods), which
are proven effective
for ad hoc search, are unsuccessful for patent prior art search.
In this thesis, we
mainly investigate the reasons that generic IR techniques are not
effective for prior
art search on the CLEF-IP test collection. First, we analyse the
errors caused due to
data curation and experimental settings like applying
International Patent Classification
codes assigned to the patent topics to filter the search results.
Then, we investigate
the influence of term selection on retrieval performance on the
CLEF-IP prior art
test collection, starting with the description section of the
reference patent and using
language models (LM) and BM25 scoring functions. We find that an
oracular relevance
feedback system, which extracts terms from the judged relevant
documents
far outperforms the baseline (i.e., 0.11 vs. 0.48) and performs
twice as well on mean
average precision (MAP) as the best participant in CLEF-IP 2010
(i.e., 0.22 vs. 0.48).
We find a very clear term selection value threshold for use when
choosing terms. We
also notice that most of the useful feedback terms are actually
present in the original
query and hypothesise that the baseline system can be
substantially improved by removing
negative query terms. We try four simple automated approaches to
identify
negative terms for query reduction but we are unable to improve
on the baseline
performance with any of them. However, we show that a simple,
minimal feedback
interactive approach, where terms are selected from only the
first retrieved relevant
document outperforms the best result from CLEF-IP 2010,
suggesting the promise of
interactive methods for term selection in patent prior art
search
Investigating User Search Tactic Patterns and System Support in Using Digital Libraries
This study aims to investigate users\u27 search tactic application and system support in using digital libraries. A user study was conducted with sixty digital library users. The study was designed to answer three research questions: 1) How do users engage in a search process by applying different types of search tactics while conducting different search tasks?; 2) How does the system support users to apply different types of search tactics?; 3) How do users\u27 search tactic application and system support for different types of search tactics affect search outputs? Sixty student subjects were recruited from different disciplines in a state research university. Multiple methods were employed to collect data, including questionnaires, transaction logs and think-aloud protocols. Subjects were asked to conduct three different types of search tasks, namely, known-item search, specific information search and exploratory search, using Library of Congress Digital Libraries. To explore users\u27 search tactic patterns (RQ1), quantitative analysis was conducted, including descriptive statistics, kernel regression, transition analysis, and clustering analysis. Types of system support were explored by analyzing system features for search tactic application. In addition, users\u27 perceived system support, difficulty, and satisfaction with search tactic application were measured using post-search questionnaires (RQ2). Finally, the study examined the causal relationships between search process and search outputs (RQ 3) based on multiple regression and structural equation modeling.
This study uncovers unique behavior of users\u27 search tactic application and corresponding system support in the context of digital libraries. First, search tactic selections, changes, and transitions were explored in different task situations - known-item search, specific information search, and exploratory search. Search tactic application patterns differed by task type. In known-item search tasks, users preferred to apply search query creation and following search result evaluation tactics, but less query reformulation or iterative tactic loops were observed. In specific information search tasks, iterative search result evaluation strategies were dominantly used. In exploratory tasks, browsing tactics were frequently selected as well as search result evaluation tactics. Second, this study identified different types of system support for search tactic application. System support, difficulty, and satisfaction were measure in terms of search tactic application focusing on search process. Users perceived relatively high system support for accessing and browsing tactics while less support for query reformulation and item evaluation tactics. Third, the effects of search tactic selections and system support on search outputs were examined based on multiple regression. In known-item searches, frequencies of query creation and accessing forwarding tactics would positively affect search efficiency. In specific information searches, time spent on applying search result evaluation tactics would have a positive impact on success rate. In exploratory searches, browsing tactics turned out to be positively associated with aspectual recall and satisfaction with search results. Based on the findings, the author discussed unique patterns of users\u27 search tactic application as well as system design implications in digital library environments