1,755 research outputs found
Lexical Query Modeling in Session Search
Lexical query modeling has been the leading paradigm for session search. In
this paper, we analyze TREC session query logs and compare the performance of
different lexical matching approaches for session search. Naive methods based
on term frequency weighing perform on par with specialized session models. In
addition, we investigate the viability of lexical query models in the setting
of session search. We give important insights into the potential and
limitations of lexical query modeling for session search and propose future
directions for the field of session search.Comment: ICTIR2016, Proceedings of the 2nd ACM International Conference on the
Theory of Information Retrieval. 201
CoaCor: Code Annotation for Code Retrieval with Reinforcement Learning
To accelerate software development, much research has been performed to help
people understand and reuse the huge amount of available code resources. Two
important tasks have been widely studied: code retrieval, which aims to
retrieve code snippets relevant to a given natural language query from a code
base, and code annotation, where the goal is to annotate a code snippet with a
natural language description. Despite their advancement in recent years, the
two tasks are mostly explored separately. In this work, we investigate a novel
perspective of Code annotation for Code retrieval (hence called `CoaCor'),
where a code annotation model is trained to generate a natural language
annotation that can represent the semantic meaning of a given code snippet and
can be leveraged by a code retrieval model to better distinguish relevant code
snippets from others. To this end, we propose an effective framework based on
reinforcement learning, which explicitly encourages the code annotation model
to generate annotations that can be used for the retrieval task. Through
extensive experiments, we show that code annotations generated by our framework
are much more detailed and more useful for code retrieval, and they can further
improve the performance of existing code retrieval models significantly.Comment: 10 pages, 2 figures. Accepted by The Web Conference (WWW) 201
Entity Query Feature Expansion Using Knowledge Base Links
Recent advances in automatic entity linking and knowledge base
construction have resulted in entity annotations for document and
query collections. For example, annotations of entities from large
general purpose knowledge bases, such as Freebase and the Google
Knowledge Graph. Understanding how to leverage these entity
annotations of text to improve ad hoc document retrieval is an open
research area. Query expansion is a commonly used technique to
improve retrieval effectiveness. Most previous query expansion
approaches focus on text, mainly using unigram concepts. In this
paper, we propose a new technique, called entity query feature
expansion (EQFE) which enriches the query with features from
entities and their links to knowledge bases, including structured
attributes and text. We experiment using both explicit query entity
annotations and latent entities. We evaluate our technique on TREC
text collections automatically annotated with knowledge base entity
links, including the Google Freebase Annotations (FACC1) data.
We find that entity-based feature expansion results in significant
improvements in retrieval effectiveness over state-of-the-art text
expansion approaches
情報検索における意味的ギャップの解消 : トピックモデルを用いた先進的画像探索
Tohoku University徳山豪課
Enhancing Information Retrieval Relevance Using Touch Dynamics on Search Engine
Using Touch Dynamics on Search Engine is an attempt to establish the possibilities of using user touch behavior which is monitored and several unique features are extracted. The unique features are used for identifying users and their traits according to the touch dynamics. The results can be used for defining automatic user unique searching behavior. Touch dynamics has been discussed in several studies in the context of user authentication and biometric identification for security purposes. This study establishes the possibility of integrating touch dynamics results for identifying user searching preferences and interests. This study investigates a technique of combining personalized search with touch dynamics results information as an approach for determining user preferences, interest measurement and context. Keywords: Personalized Search, Information Retrieval, Touch Dynamics, Search Engin
Fully Automated Fact Checking Using External Sources
Given the constantly growing proliferation of false claims online in recent
years, there has been also a growing research interest in automatically
distinguishing false rumors from factually true claims. Here, we propose a
general-purpose framework for fully-automatic fact checking using external
sources, tapping the potential of the entire Web as a knowledge source to
confirm or reject a claim. Our framework uses a deep neural network with LSTM
text encoding to combine semantic kernels with task-specific embeddings that
encode a claim together with pieces of potentially-relevant text fragments from
the Web, taking the source reliability into account. The evaluation results
show good performance on two different tasks and datasets: (i) rumor detection
and (ii) fact checking of the answers to a question in community question
answering forums.Comment: RANLP-201
Enhancing Semantic Code Search with Multimodal Contrastive Learning and Soft Data Augmentation
Code search aims to retrieve the most semantically relevant code snippet for
a given natural language query. Recently, large-scale code pre-trained models
such as CodeBERT and GraphCodeBERT learn generic representations of source code
and have achieved substantial improvement on code search task. However, the
high-quality sequence-level representations of code snippets have not been
sufficiently explored. In this paper, we propose a new approach with multimodal
contrastive learning and soft data augmentation for code search. Multimodal
contrastive learning is used to pull together the representations of code-query
pairs and push apart the unpaired code snippets and queries. Moreover, data
augmentation is critical in contrastive learning for learning high-quality
representations. However, only semantic-preserving augmentations for source
code are considered in existing work. In this work, we propose to do soft data
augmentation by dynamically masking and replacing some tokens in code sequences
to generate code snippets that are similar but not necessarily
semantic-preserving as positive samples for paired queries. We conduct
extensive experiments to evaluate the effectiveness of our approach on a
large-scale dataset with six programming languages. The experimental results
show that our approach significantly outperforms the state-of-the-art methods.
We also adapt our techniques to several pre-trained models such as RoBERTa and
CodeBERT, and significantly boost their performance on the code search task
Recommended from our members
A quasi-current representation for information needs inspired by Two-State Vector Formalism
Recently, a number of quantum theory (QT)-based information retrieval (IR) models have been proposed for modeling session search task that users issue queries continuously in order to describe their evolving information needs (IN). However, the standard formalism of QT cannot provide a complete description for users’ current IN in a sense that it does not take the ‘future’ information into consideration. Therefore, to seek a more proper and complete representation for users’ IN, we construct a representation of quasi-current IN inspired by an emerging Two-State Vector Formalism (TSVF). With the enlightenment of the completeness of TSVF, a “two-state vector” derived from the ‘future’ (the current query) and the ‘history’ (the previous query) is employed to describe users’ quasi-current IN in a more complete way. Extensive experiments are conducted on the session tracks of TREC 2013 & 2014, and show that our model outperforms a series of compared IR models
- …