Search CORE

1,755 research outputs found

Overview of the TREC 2013 federated web search track

Author: Demeester Thomas
Hiemstra D
Nguyen D
Trieschnigg D
Publication venue
Publication date: 01/01/2013
Field of study

Lexical Query Modeling in Session Search

Author: de Rijke Maarten
Kanoulas Evangelos
Van Gysel Christophe
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2016
Field of study

Lexical query modeling has been the leading paradigm for session search. In this paper, we analyze TREC session query logs and compare the performance of different lexical matching approaches for session search. Naive methods based on term frequency weighing perform on par with specialized session models. In addition, we investigate the viability of lexical query models in the setting of session search. We give important insights into the potential and limitations of lexical query modeling for session search and propose future directions for the field of session search.Comment: ICTIR2016, Proceedings of the 2nd ACM International Conference on the Theory of Information Retrieval. 201

arXiv.org e-Print Archive

Crossref

International Migration, Integration and Social Cohesion online publications

UvA-DARE

CoaCor: Code Annotation for Code Retrieval with Reinforcement Learning

Author: Peddamail Jayavardhan Reddy
Sun Huan
Yao Ziyu
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2019
Field of study

To accelerate software development, much research has been performed to help people understand and reuse the huge amount of available code resources. Two important tasks have been widely studied: code retrieval, which aims to retrieve code snippets relevant to a given natural language query from a code base, and code annotation, where the goal is to annotate a code snippet with a natural language description. Despite their advancement in recent years, the two tasks are mostly explored separately. In this work, we investigate a novel perspective of Code annotation for Code retrieval (hence called `CoaCor'), where a code annotation model is trained to generate a natural language annotation that can represent the semantic meaning of a given code snippet and can be leveraged by a code retrieval model to better distinguish relevant code snippets from others. To this end, we propose an effective framework based on reinforcement learning, which explicitly encourages the code annotation model to generate annotations that can be used for the retrieval task. Through extensive experiments, we show that code annotations generated by our framework are much more detailed and more useful for code retrieval, and they can further improve the performance of existing code retrieval models significantly.Comment: 10 pages, 2 figures. Accepted by The Web Conference (WWW) 201

arXiv.org e-Print Archive

Crossref

Entity Query Feature Expansion Using Knowledge Base Links

Author: Allan James
Dalton Jeffrey
Dietz Laura
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 03/07/2014
Field of study

Recent advances in automatic entity linking and knowledge base construction have resulted in entity annotations for document and query collections. For example, annotations of entities from large general purpose knowledge bases, such as Freebase and the Google Knowledge Graph. Understanding how to leverage these entity annotations of text to improve ad hoc document retrieval is an open research area. Query expansion is a commonly used technique to improve retrieval effectiveness. Most previous query expansion approaches focus on text, mainly using unigram concepts. In this paper, we propose a new technique, called entity query feature expansion (EQFE) which enriches the query with features from entities and their links to knowledge bases, including structured attributes and text. We experiment using both explicit query entity annotations and latent entities. We evaluate our technique on TREC text collections automatically annotated with knowledge base entity links, including the Google Freebase Annotations (FACC1) data. We find that entity-based feature expansion results in significant improvements in retrieval effectiveness over state-of-the-art text expansion approaches

CiteSeerX

Enlighten

情報検索における意味的ギャップの解消 : トピックモデルを用いた先進的画像探索

Author: Nguyen Cam Tu
Publication venue
Publication date: 15/09/2011
Field of study

Tohoku University徳山豪課

Tohoku University Repository (TOUR) / 東北大学機関リポジトリ

Institutional Repositories DataBase (IRDB)

Enhancing Information Retrieval Relevance Using Touch Dynamics on Search Engine

Author: Cheruiyot Wilson
Masoud Athman A.
Ogada Kennedy
Publication venue: The International Institute for Science, Technology and Education (IISTE)
Publication date: 31/08/2015
Field of study

Using Touch Dynamics on Search Engine is an attempt to establish the possibilities of using user touch behavior which is monitored and several unique features are extracted. The unique features are used for identifying users and their traits according to the touch dynamics. The results can be used for defining automatic user unique searching behavior. Touch dynamics has been discussed in several studies in the context of user authentication and biometric identification for security purposes. This study establishes the possibility of integrating touch dynamics results for identifying user searching preferences and interests. This study investigates a technique of combining personalized search with touch dynamics results information as an approach for determining user preferences, interest measurement and context. Keywords: Personalized Search, Information Retrieval, Touch Dynamics, Search Engin

International Institute for Science, Technology and Education (IISTE): E-Journals

Fully Automated Fact Checking Using External Sources

Author: Barron-Cedeno Alberto
Karadzhov Georgi
Koychev Ivan
Marquez Lluis
Nakov Preslav
Publication venue
Publication date: 01/01/2017
Field of study

Given the constantly growing proliferation of false claims online in recent years, there has been also a growing research interest in automatically distinguishing false rumors from factually true claims. Here, we propose a general-purpose framework for fully-automatic fact checking using external sources, tapping the potential of the entire Web as a knowledge source to confirm or reject a claim. Our framework uses a deep neural network with LSTM text encoding to combine semantic kernels with task-specific embeddings that encode a claim together with pieces of potentially-relevant text fragments from the Web, taking the source reliability into account. The evaluation results show good performance on two different tasks and datasets: (i) rumor detection and (ii) fact checking of the answers to a question in community question answering forums.Comment: RANLP-201

arXiv.org e-Print Archive

Crossref

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

Enhancing Semantic Code Search with Multimodal Contrastive Learning and Soft Data Augmentation

Author: Du Lun
Gub Wenchao
Han Shi
Shi Ensheng
Sun Hongbin
Wang Yanlin
Zhang Dongmei
Zhang Hongyu
Publication venue
Publication date: 07/04/2022
Field of study

Code search aims to retrieve the most semantically relevant code snippet for a given natural language query. Recently, large-scale code pre-trained models such as CodeBERT and GraphCodeBERT learn generic representations of source code and have achieved substantial improvement on code search task. However, the high-quality sequence-level representations of code snippets have not been sufficiently explored. In this paper, we propose a new approach with multimodal contrastive learning and soft data augmentation for code search. Multimodal contrastive learning is used to pull together the representations of code-query pairs and push apart the unpaired code snippets and queries. Moreover, data augmentation is critical in contrastive learning for learning high-quality representations. However, only semantic-preserving augmentations for source code are considered in existing work. In this work, we propose to do soft data augmentation by dynamically masking and replacing some tokens in code sequences to generate code snippets that are similar but not necessarily semantic-preserving as positive samples for paired queries. We conduct extensive experiments to evaluate the effectiveness of our approach on a large-scale dataset with six programming languages. The experimental results show that our approach significantly outperforms the state-of-the-art methods. We also adapt our techniques to several pre-trained models such as RoBERTa and CodeBERT, and significantly boost their performance on the code search task

arXiv.org e-Print Archive

Recommended from our members

A quasi-current representation for information needs inspired by Two-State Vector Formalism

Author: Hou Yuexian
Li Jingfei
Li Wenjie
Song Dawei
Wang Panpan
Zhang Yazhou
Publication venue: 'Elsevier BV'
Publication date: 01/09/2017
Field of study

Recently, a number of quantum theory (QT)-based information retrieval (IR) models have been proposed for modeling session search task that users issue queries continuously in order to describe their evolving information needs (IN). However, the standard formalism of QT cannot provide a complete description for users’ current IN in a sense that it does not take the ‘future’ information into consideration. Therefore, to seek a more proper and complete representation for users’ IN, we construct a representation of quasi-current IN inspired by an emerging Two-State Vector Formalism (TSVF). With the enlightenment of the completeness of TSVF, a “two-state vector” derived from the ‘future’ (the current query) and the ‘history’ (the previous query) is employed to describe users’ quasi-current IN in a more complete way. Extensive experiments are conducted on the session tracks of TREC 2013 & 2014, and show that our model outperforms a series of compared IR models

Open Research Online (The Open University)