Search CORE

6 research outputs found

Multi-Task Learning for Email Search Ranking with Auxiliary Query Clustering

Author: Bendersky Michael
Karimzadehgan Maryam
Metzler Donald
Qin Zhen
Shen Jiaming
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 14/09/2018
Field of study

User information needs vary significantly across different tasks, and therefore their queries will also differ considerably in their expressiveness and semantics. Many studies have been proposed to model such query diversity by obtaining query types and building query-dependent ranking models. These studies typically require either a labeled query dataset or clicks from multiple users aggregated over the same document. These techniques, however, are not applicable when manual query labeling is not viable, and aggregated clicks are unavailable due to the private nature of the document collection, e.g., in email search scenarios. In this paper, we study how to obtain query type in an unsupervised fashion and how to incorporate this information into query-dependent ranking models. We first develop a hierarchical clustering algorithm based on truncated SVD and varimax rotation to obtain coarse-to-fine query types. Then, we study three query-dependent ranking models, including two neural models that leverage query type information as additional features, and one novel multi-task neural model that views query type as the label for the auxiliary query cluster prediction task. This multi-task model is trained to simultaneously rank documents and predict query types. Our experiments on tens of millions of real-world email search queries demonstrate that the proposed multi-task model can significantly outperform the baseline neural ranking models, which either do not incorporate query type information or just simply feed query type as an additional feature.Comment: CIKM 201

arXiv.org e-Print Archive

Crossref

Accelerated Convergence for Counterfactual Learning to Rank

Author: Chapelle Olivier
Duchi John
Hazan Elad
Kingma Diederik P
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2020
Field of study

Counterfactual Learning to Rank (LTR) algorithms learn a ranking model from logged user interactions, often collected using a production system. Employing such an offline learning approach has many benefits compared to an online one, but it is challenging as user feedback often contains high levels of bias. Unbiased LTR uses Inverse Propensity Scoring (IPS) to enable unbiased learning from logged user interactions. One of the major difficulties in applying Stochastic Gradient Descent (SGD) approaches to counterfactual learning problems is the large variance introduced by the propensity weights. In this paper we show that the convergence rate of SGD approaches with IPS-weighted gradients suffers from the large variance introduced by the IPS weights: convergence is slow, especially when there are large IPS weights. To overcome this limitation, we propose a novel learning algorithm, called CounterSample, that has provably better convergence than standard IPS-weighted gradient descent methods. We prove that CounterSample converges faster and complement our theoretical findings with empirical results by performing extensive experimentation in a number of biased LTR scenarios -- across optimizers, batch sizes, and different degrees of position bias.Comment: SIGIR 2020 full conference pape

arXiv.org e-Print Archive

Crossref

International Migration, Integration and Social Cohesion online publications

UvA-DARE

Separate and Attend in Personal Email Search

Author: Abadi Martin
Devlin Jacob
Duchi John C.
Jaana Kalervo
Soboroff Ian
Soboroff Ian
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 21/11/2019
Field of study

In personal email search, user queries often impose different requirements on different aspects of the retrieved emails. For example, the query "my recent flight to the US" requires emails to be ranked based on both textual contents and recency of the email documents, while other queries such as "medical history" do not impose any constraints on the recency of the email. Recent deep learning-to-rank models for personal email search often directly concatenate dense numerical features (e.g., document age) with embedded sparse features (e.g., n-gram embeddings). In this paper, we first show with a set of experiments on synthetic datasets that direct concatenation of dense and sparse features does not lead to the optimal search performance of deep neural ranking models. To effectively incorporate both sparse and dense email features into personal email search ranking, we propose a novel neural model, SepAttn. SepAttn first builds two separate neural models to learn from sparse and dense features respectively, and then applies an attention mechanism at the prediction level to derive the final prediction from these two models. We conduct a comprehensive set of experiments on a large-scale email search dataset, and demonstrate that our SepAttn model consistently improves the search quality over the baseline models.Comment: WSDM 202

arXiv.org e-Print Archive

Crossref

Reducción de dimensionalidad en Machine Learning. Diagnóstico de cáncer de mama bsado en datos genómicos y de imagen

Author: Galarza Hernández Javier
Publication venue: 'Universitat Politecnica de Valencia'
Publication date: 12/12/2017
Field of study

The target of the current Project consist in analyzing some of the Matching Learning techniques used in the current treatment of Big Data. It includes the study of the statistical and algebraic tools involved in the calculations, and an application to the diagnosis and clasification of breast cancer based on genomic and image data. To extract information from Big Data, the data obtained require to be pre-processed. In this Project we present different pre-processing techniques and analyze them and their impact on the resulting prediction models. Two Machine Learning Models are presented: One of them is focused on the diagnosis of breast cáncer base don image data. The second one is devoted to the classification of the different types of breast cancer and to the discovery of different patterns using genomic and proteinomic data. The two data basis are particularly convenient to present the Marchine Learning techniques analyzed in the Project and the corresponding pre-processing strategies.El objetivo del proyecto es analizar algunas técnicas de aprendizaje automático (Machine Learning) que se emplean en la actualidad para extracción de información de grandes cantidades de datos, estudiar las herramientas estadísticas y algebraicas que emplean en los cálculos, y aplicarlas al diagnóstico y clasificación de tipos de cáncer de mama. El manejo de grandes cantidades de datos requiere de un pre-procesamiento de los mismos para poder ser empleados. En este proyecto se presentan y analizan también distintas herramientas utilizadas en el pre-procesado de datos y su impacto en el modelo de predicción. En el trabajo se crean dos modelos de aprendizaje automático: Uno enfocado al diagnóstico del cáncer de mama utilizando indicadores de imagen, y otro focalizado en la clasificación de subtipos y descubrimiento de patrones utilizando datos genómicos y proteómicos. Las dos bases de datos elegidas son particularmente adecuadas para mostrar el funcionamiento de las técnicas de Machine Learning analizadas y del correspondiente pre-procesamiento requerido.Galarza Hernández, J. (2017). Reducción de dimensionalidad en Machine Learning. Diagnóstico de cáncer de mama bsado en datos genómicos y de imagen. http://hdl.handle.net/10251/92565TFG

RiuNet

A Novel Data Mining and Knowledge Discovery Framework for Digital Library Recommendations System based on User’s Feedback and Personalization

Author: Almaghrabi Maram
Publication venue
Publication date: 01/01/2021
Field of study

University of Canberra Research Repository