Search CORE

61,670 research outputs found

Recommended from our members

Neural Models for Information Retrieval without Labeled Data

Author: Zamani Hamed
Publication venue: ScholarWorks@UMass Amherst
Publication date: 30/10/2019
Field of study

Recent developments of machine learning models, and in particular deep neural networks, have yielded significant improvements on several computer vision, natural language processing, and speech recognition tasks. Progress with information retrieval (IR) tasks has been slower, however, due to the lack of large-scale training data as well as neural network models specifically designed for effective information retrieval. In this dissertation, we address these two issues by introducing task-specific neural network architectures for a set of IR tasks and proposing novel unsupervised or \emph{weakly supervised} solutions for training the models. The proposed learning solutions do not require labeled training data. Instead, in our weak supervision approach, neural models are trained on a large set of noisy and biased training data obtained from external resources, existing models, or heuristics. We first introduce relevance-based embedding models that learn distributed representations for words and queries. We show that the learned representations can be effectively employed for a set of IR tasks, including query expansion, pseudo-relevance feedback, and query classification. We further propose a standalone learning to rank model based on deep neural networks. Our model learns a sparse representation for queries and documents. This enables us to perform efficient retrieval by constructing an inverted index in the learned semantic space. Our model outperforms state-of-the-art retrieval models, while performing as efficiently as term matching retrieval models. We additionally propose a neural network framework for predicting the performance of a retrieval model for a given query. Inspired by existing query performance prediction models, our framework integrates several information sources, such as retrieval score distribution and term distribution in the top retrieved documents. This leads to state-of-the-art results for the performance prediction task on various standard collections. We finally bridge the gap between retrieval and recommendation models, as the two key components in most information systems. Search and recommendation often share the same goal: helping people get the information they need at the right time. Therefore, joint modeling and optimization of search engines and recommender systems could potentially benefit both systems. In more detail, we introduce a retrieval model that is trained using user-item interaction (e.g., recommendation data), with no need to query-document relevance information for training. Our solutions and findings in this dissertation smooth the path towards learning efficient and effective models for various information retrieval and related tasks, especially when large-scale training data is not available

ScholarWorks@UMass Amherst

Strategies for Searching Video Content with Text Queries or Video Examples

Author: Chang Xiaojun
Du Xingzhong
Gan Chuang
Hauptmann Alexander G.
Jiang Lu
Lan Zhenzhong
Li Huan
Li Xuanchong
Lin Ming
Ma Zhigang
Mao Zexi
Meng Deyu
Xu Shicheng
Xu Zhongwen
Yang Yi
Yu Shoou-I
Publication venue
Publication date: 01/01/2016
Field of study

The large number of user-generated videos uploaded on to the Internet everyday has led to many commercial video search engines, which mainly rely on text metadata for search. However, metadata is often lacking for user-generated videos, thus these videos are unsearchable by current search engines. Therefore, content-based video retrieval (CBVR) tackles this metadata-scarcity problem by directly analyzing the visual and audio streams of each video. CBVR encompasses multiple research topics, including low-level feature design, feature fusion, semantic detector training and video search/reranking. We present novel strategies in these topics to enhance CBVR in both accuracy and speed under different query inputs, including pure textual queries and query by video examples. Our proposed strategies have been incorporated into our submission for the TRECVID 2014 Multimedia Event Detection evaluation, where our system outperformed other submissions in both text queries and video example queries, thus demonstrating the effectiveness of our proposed approaches

arXiv.org e-Print Archive

OPUS - University of Technology Sydney

Multi-Task Learning for Email Search Ranking with Auxiliary Query Clustering

Author: Bendersky Michael
Karimzadehgan Maryam
Metzler Donald
Qin Zhen
Shen Jiaming
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 14/09/2018
Field of study

User information needs vary significantly across different tasks, and therefore their queries will also differ considerably in their expressiveness and semantics. Many studies have been proposed to model such query diversity by obtaining query types and building query-dependent ranking models. These studies typically require either a labeled query dataset or clicks from multiple users aggregated over the same document. These techniques, however, are not applicable when manual query labeling is not viable, and aggregated clicks are unavailable due to the private nature of the document collection, e.g., in email search scenarios. In this paper, we study how to obtain query type in an unsupervised fashion and how to incorporate this information into query-dependent ranking models. We first develop a hierarchical clustering algorithm based on truncated SVD and varimax rotation to obtain coarse-to-fine query types. Then, we study three query-dependent ranking models, including two neural models that leverage query type information as additional features, and one novel multi-task neural model that views query type as the label for the auxiliary query cluster prediction task. This multi-task model is trained to simultaneously rank documents and predict query types. Our experiments on tens of millions of real-world email search queries demonstrate that the proposed multi-task model can significantly outperform the baseline neural ranking models, which either do not incorporate query type information or just simply feed query type as an additional feature.Comment: CIKM 201

arXiv.org e-Print Archive

Crossref

Exact and efficient top-K inference for multi-target prediction by querying separable linear relational models

Author: De Baets Bernard
Dembczynski Krzysztof
Stock Michiel
Waegeman Willem
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

Many complex multi-target prediction problems that concern large target spaces are characterised by a need for efficient prediction strategies that avoid the computation of predictions for all targets explicitly. Examples of such problems emerge in several subfields of machine learning, such as collaborative filtering, multi-label classification, dyadic prediction and biological network inference. In this article we analyse efficient and exact algorithms for computing the top-

K

predictions in the above problem settings, using a general class of models that we refer to as separable linear relational models. We show how to use those inference algorithms, which are modifications of well-known information retrieval methods, in a variety of machine learning settings. Furthermore, we study the possibility of scoring items incompletely, while still retaining an exact top-K retrieval. Experimental results in several application domains reveal that the so-called threshold algorithm is very scalable, performing often many orders of magnitude more efficiently than the naive approach

arXiv.org e-Print Archive

Ghent University Academic Bibliography

Simulated evaluation of faceted browsing based on feature selection

Author: Bernejo Lopez P.
Hopfgartner F.
Jose J.M.
Urruty T.
Villa R.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

In this paper we explore the limitations of facet based browsing which uses sub-needs of an information need for querying and organising the search process in video retrieval. The underlying assumption of this approach is that the search effectiveness will be enhanced if such an approach is employed for interactive video retrieval using textual and visual features. We explore the performance bounds of a faceted system by carrying out a simulated user evaluation on TRECVid data sets, and also on the logs of a prior user experiment with the system. We first present a methodology to reduce the dimensionality of features by selecting the most important ones. Then, we discuss the simulated evaluation strategies employed in our evaluation and the effect on the use of both textual and visual features. Facets created by users are simulated by clustering video shots using textual and visual features. The experimental results of our study demonstrate that the faceted browser can potentially improve the search effectiveness

Enlighten