19 research outputs found
Simulating Users in Interactive Web Table Retrieval
Considering the multimodal signals of search items is beneficial for
retrieval effectiveness. Especially in web table retrieval (WTR) experiments,
accounting for multimodal properties of tables boosts effectiveness. However,
it still remains an open question how the single modalities affect user
experience in particular. Previous work analyzed WTR performance in ad-hoc
retrieval benchmarks, which neglects interactive search behavior and limits the
conclusion about the implications for real-world user environments.
To this end, this work presents an in-depth evaluation of simulated
interactive WTR search sessions as a more cost-efficient and reproducible
alternative to real user studies. As a first of its kind, we introduce
interactive query reformulation strategies based on Doc2Query, incorporating
cognitive states of simulated user knowledge. Our evaluations include two
perspectives on user effectiveness by considering different cost paradigms,
namely query-wise and time-oriented measures of effort. Our multi-perspective
evaluation scheme reveals new insights about query strategies, the impact of
modalities, and different user types in simulated WTR search sessions.Comment: 4 pages + references; accepted at CIKM'2
Context-Driven Interactive Query Simulations Based on Generative Large Language Models
Simulating user interactions enables a more user-oriented evaluation of
information retrieval (IR) systems. While user simulations are cost-efficient
and reproducible, many approaches often lack fidelity regarding real user
behavior. Most notably, current user models neglect the user's context, which
is the primary driver of perceived relevance and the interactions with the
search results. To this end, this work introduces the simulation of
context-driven query reformulations. The proposed query generation methods
build upon recent Large Language Model (LLM) approaches and consider the user's
context throughout the simulation of a search session. Compared to simple
context-free query generation approaches, these methods show better
effectiveness and allow the simulation of more efficient IR sessions.
Similarly, our evaluations consider more interaction context than current
session-based measures and reveal interesting complementary insights in
addition to the established evaluation protocols. We conclude with directions
for future work and provide an entirely open experimental setup.Comment: Accepted at ECIR 2024 (Full Paper
Abstract Images Have Different Levels of Retrievability Per Reverse Image Search Engine
Much computer vision research has focused on natural images, but technical
documents typically consist of abstract images, such as charts, drawings,
diagrams, and schematics. How well do general web search engines discover
abstract images? Recent advancements in computer vision and machine learning
have led to the rise of reverse image search engines. Where conventional search
engines accept a text query and return a set of document results, including
images, a reverse image search accepts an image as a query and returns a set of
images as results. This paper evaluates how well common reverse image search
engines discover abstract images. We conducted an experiment leveraging images
from Wikimedia Commons, a website known to be well indexed by Baidu, Bing,
Google, and Yandex. We measure how difficult an image is to find again
(retrievability), what percentage of images returned are relevant (precision),
and the average number of results a visitor must review before finding the
submitted image (mean reciprocal rank). When trying to discover the same image
again among similar images, Yandex performs best. When searching for pages
containing a specific image, Google and Yandex outperform the others when
discovering photographs with precision scores ranging from 0.8191 to 0.8297,
respectively. In both of these cases, Google and Yandex perform better with
natural images than with abstract ones achieving a difference in retrievability
as high as 54\% between images in these categories. These results affect anyone
applying common web search engines to search for technical documents that use
abstract images.Comment: 20 pages; 7 figures; to be published in the proceedings of the
Drawings and abstract Imagery: Representation and Analysis (DIRA) Workshop
from ECCV 202
Audio Similarity with Siamese Networks
The aim of this thesis was to study the application of Siamese neural networks to the problem of audio similarity measurement. A selection of Siamese networks with different architectures was presented and the results attained by these networks were compared to a group of baseline methods, which consisted of more classic statistical methods as well as non-Siamese networks. The goal was to find out how the Siamese solutions performed in the task of general audio similarity measurements compared to these methods and if the Siamese solution possibly delivered more generalizable results when dealing with a vast selection audio samples.
All of the systems were trained and tested on a dataset of 2000 samples from different kinds of environments, such as sound samples from cities, nature and domestic settings. The best performers of audio similarity measurement with the dataset were deep-learning based methods, including the presented Siamese networks. Siamese networks showed great potential in their ability to generalize to the vast selection of audio classes, however, the best overall results were reached with a non-Siamese network solution using a convolutional neural network. In light of these promising findings, Siamese networks should be studied further regarding audio processing, -measurement and -evaluation tasks, since the amount of existing research was rather limited
To be or NOT to be: The Impact of Negative Annotation in Biomedical Semantic Similarity
Tese de mestrado, Bioinformática e Biologia Computacional, Universidade de Lisboa, Faculdade de Ciências, 2022Classical Semantic Similarity Measures did not consider negative annotations in similarity compu tation, and the impact that these annotations can have in this data mining technique is not well studied.
As such, this work aims to understand how the addition of negative annotations impacts semantic sim ilarity. To do so, two pairwise similarity measures, Best-Match Average and Resnik, were adapted to
create the polar measures PolarBMA and PolarResnik. These were evaluated in two currently relevant
scopes: protein-protein interaction prediction and disease prediction against the original measures. Pairs
of proteins where the proteins were known to interact or not were taken from STRING and enriched with
positive and negative annotations from the Gene Ontology. Synthetic patients were created as sets of
annotations taken from the Mendelian diseases they were designed to have, as well as possible noise or
imprecise annotations. Then semantic similarity was computed with both polar and non-polar measures
between proteins in pairs and between patients and candidate diseases including the Mendelian diseases,
as well as random diseases taken from the Human Phenotype Ontology.
To evaluate if the polar measures performed well in comparison to the baseline, a ranking according
to semantic similarity was made for each measure and scope for evaluation and the rank cumulative
frequencies were plotted. ROC AUC and Precision-Recall curves were also determined for the Protein Protein interaction(PPI) prediction, as well as average precision for the disease prediction dataset. In
PPI prediction, polar measures had an increased performance in the Molecular Function branch for both
experiments where negative annotations were added and also in one of the experiments with the Cellular
Component branch. In the disease prediction scope, polar measures had an improved performance of
approximately ten percent. This improvement was verified in all disease prediction experiments, even
with the addition of noise and imprecision. Considering the results obtained, this work concludes that
negative annotations have an impact on semantic similarity, but the amplitude of this impact requires
further study
Findings of the 2017 DiscoMT Shared Task on Cross-lingual Pronoun Prediction
We describe the design, the setup, and the
evaluation results of the DiscoMT 2017
shared task on cross-lingual pronoun prediction.
The task asked participants to
predict a target-language pronoun given a
source-language pronoun in the context of
a sentence. We further provided a lemmatized
target-language human-authored
translation of the source sentence, and
automatic word alignments between the
source sentence words and the targetlanguage
lemmata. The aim of the task
was to predict, for each target-language
pronoun placeholder, the word that should
replace it from a small, closed set of
classes, using any type of information that
can be extracted from the entire document.
We offered four subtasks, each for a
different language pair and translation
direction: English-to-French, Englishto-German,
German-to-English, and
Spanish-to-English. Five teams participated
in the shared task, making
submissions for all language pairs. The
evaluation results show that all participating
teams outperformed two strong
n-gram-based language model-based
baseline systems by a sizable margin
Towards a Homomorphic Machine Learning Big Data Pipeline for the Financial Services Sector
Machinelearning(ML)istodaycommonlyemployedintheFinancialServicesSector(FSS) to create various models to predict a variety of conditions ranging from financial transactions fraud to outcomes of investments and also targeted marketing campaigns. The common ML technique used for the modeling is supervised learning using regression algorithms and usually involves large amounts of data that needs to be shared and prepared before the actual learning phase. Compliance with privacy laws and confidentiality regulations requires that most, if not all, of the data must be kept in a secure environment, usually in-house, and not outsourced to cloud or multi-tenant shared environments. This paper presents the results of a research collaboration between IBM Research and Banco Bradesco SA to investigate approaches to homomorphically secure a typical ML pipeline commonly employed in the FSS industry.
We investigated and de-constructed a typical ML pipeline used by Banco Bradesco and applied Homo- morphic Encryption (HE) to two of the important ML tasks, namely the variable selection phase of the model generation task and the prediction task. Variable selection, which usually precedes the training phase, is very important when working with data sets for which no prior knowledge of the covariate set exists. Our work provides a way to define an initial covariate set for the training phase while preserving the privacy and confidentiality of the input data sets.
Quality metrics, using real financial data, comprising quantitative, qualitative and categorical features, demonstrated that our HE based pipeline can yield results comparable to state of the art variable selection techniques and the performance results demonstrated that HE technology has reached the inflection point where it can be useful in batch processing in a financial business setting