149 research outputs found
Recommended from our members
Analysis of Statistical Question Classification for Fact-based Questions
Question classication systems play an important role in question answering systems and can be used in a wide range of other domains. The goal of question classication is to accurately assign labels to questions based on expected answer type. Most approaches in the past have relied on matching questions against hand-crafted rules. However, rules require laborious eort to create and often suer from being too specic. Statistical question classication methods overcome these issues by employing machine learning techniques. We empirically show that a statistical approach is robust and achieves good performance on three diverse data sets with little or no hand tuning. Furthermore, we examine the role dierent syntactic and semantic features have on performance. We nd that semantic features tend to increase performance more than purely syntactic features. Finally, we analyze common causes of misclassication error and provide insight into ways they may be overcome
Multi-Task Learning for Email Search Ranking with Auxiliary Query Clustering
User information needs vary significantly across different tasks, and
therefore their queries will also differ considerably in their expressiveness
and semantics. Many studies have been proposed to model such query diversity by
obtaining query types and building query-dependent ranking models. These
studies typically require either a labeled query dataset or clicks from
multiple users aggregated over the same document. These techniques, however,
are not applicable when manual query labeling is not viable, and aggregated
clicks are unavailable due to the private nature of the document collection,
e.g., in email search scenarios. In this paper, we study how to obtain query
type in an unsupervised fashion and how to incorporate this information into
query-dependent ranking models. We first develop a hierarchical clustering
algorithm based on truncated SVD and varimax rotation to obtain coarse-to-fine
query types. Then, we study three query-dependent ranking models, including two
neural models that leverage query type information as additional features, and
one novel multi-task neural model that views query type as the label for the
auxiliary query cluster prediction task. This multi-task model is trained to
simultaneously rank documents and predict query types. Our experiments on tens
of millions of real-world email search queries demonstrate that the proposed
multi-task model can significantly outperform the baseline neural ranking
models, which either do not incorporate query type information or just simply
feed query type as an additional feature.Comment: CIKM 201
Gen-IR @ SIGIR 2023: The First Workshop on Generative Information Retrieval
Generative information retrieval (IR) has experienced substantial growth
across multiple research communities (e.g., information retrieval, computer
vision, natural language processing, and machine learning), and has been highly
visible in the popular press. Theoretical, empirical, and actual user-facing
products have been released that retrieve documents (via generation) or
directly generate answers given an input request. We would like to investigate
whether end-to-end generative models are just another trend or, as some claim,
a paradigm change for IR. This necessitates new metrics, theoretical grounding,
evaluation methods, task definitions, models, user interfaces, etc. The goal of
this workshop (https://coda.io/@sigir/gen-ir) is to focus on previously
explored Generative IR techniques like document retrieval and direct Grounded
Answer Generation, while also offering a venue for the discussion and
exploration of how Generative IR can be applied to new domains like
recommendation systems, summarization, etc. The format of the workshop is
interactive, including roundtable and keynote sessions and tends to avoid the
one-sided dialogue of a mini-conference.Comment: Accepted SIGIR 23 worksho
Modeling Temporal Evidence from External Collections
Newsworthy events are broadcast through multiple mediums and prompt the
crowds to produce comments on social media. In this paper, we propose to
leverage on this behavioral dynamics to estimate the most relevant time periods
for an event (i.e., query). Recent advances have shown how to improve the
estimation of the temporal relevance of such topics. In this approach, we build
on two major novelties. First, we mine temporal evidences from hundreds of
external sources into topic-based external collections to improve the
robustness of the detection of relevant time periods. Second, we propose a
formal retrieval model that generalizes the use of the temporal dimension
across different aspects of the retrieval process. In particular, we show that
temporal evidence of external collections can be used to (i) infer a topic's
temporal relevance, (ii) select the query expansion terms, and (iii) re-rank
the final results for improved precision. Experiments with TREC Microblog
collections show that the proposed time-aware retrieval model makes an
effective and extensive use of the temporal dimension to improve search results
over the most recent temporal models. Interestingly, we observe a strong
correlation between precision and the temporal distribution of retrieved and
relevant documents.Comment: To appear in WSDM 201
Stretching Sentence-pair NLI Models to Reason over Long Documents and Clusters
Natural Language Inference (NLI) has been extensively studied by the NLP
community as a framework for estimating the semantic relation between sentence
pairs. While early work identified certain biases in NLI models, recent
advancements in modeling and datasets demonstrated promising performance. In
this work, we further explore the direct zero-shot applicability of NLI models
to real applications, beyond the sentence-pair setting they were trained on.
First, we analyze the robustness of these models to longer and out-of-domain
inputs. Then, we develop new aggregation methods to allow operating over full
documents, reaching state-of-the-art performance on the ContractNLI dataset.
Interestingly, we find NLI scores to provide strong retrieval signals, leading
to more relevant evidence extractions compared to common similarity-based
methods. Finally, we go further and investigate whole document clusters to
identify both discrepancies and consensus among sources. In a test case, we
find real inconsistencies between Wikipedia pages in different languages about
the same topic.Comment: Findings of EMNLP 202
- …