Search CORE

61 research outputs found

An Empirical Analysis on Point-wise Machine Learning Techniques using Regression Trees for Web-search Ranking

Author: Mohan Ananth
Publication venue: Washington University Open Scholarship
Publication date: 01/01/2010
Field of study

Learning how to rank a set of objects relative to an user defined query has received much interest in the machine learning community during the past decade. In fact, there have been two recent competitions hosted by internationally prominent search companies to encourage research on ranking web site documents. Recent literature on learning to rank has focused on three approaches: point-wise, pair-wise, and list-wise. Many different kinds of classifiers, including boosted decision trees, neural networks, and SVMs have proven successful in the field. This thesis surveys traditional point-wise techniques that use regression trees for web-search ranking. The thesis contains empirical studies on Random Forests and Gradient Boosted Decision Trees, with novel augmentations to them on real world data sets. We also analyze how these point-wise techniques perform on new areas of research for web-search ranking: transfer learning and feature-cost aware models

Washington University St. Louis: Open Scholarship

Living analytics methods for the social web

Author: Diaz-Aviles Ernesto
Publication venue: Gottfried Wilhelm Leibniz Universität Hannover
Publication date: 01/01/2013
Field of study

[no abstract

Institutionelles Repositorium der Leibniz Universität Hannover

Combining Word Embedding Interactions and LETOR Feature Evidences for Supervised QPP

Author: Datta Suchana
Ganguly Debasis
Mothe Josiane
Ullah Md Zia
Publication venue
Publication date: 31/03/2023
Field of study

In information retrieval, query performance prediction aims to predict whether a search engine is likely to succeed in retrieving potentially relevant documents to a user’s query. This problem is usually cast into a regression problem where a machine should predict the effectiveness (in terms of an information retrieval measure) of the search engine on a given query. The solutions range from simple unsupervised approaches where a single source of information (e.g., the variance of the retrieval similarity scores in NQC), predicts the search engine effectiveness for a given query, to more involved ones that rely on supervised machine learning making use of several sources of information, e.g., the learning to rank (LETOR) features, word embedding similarities etc. In this paper, we investigate the combination of two different types of evidences into a single neural network model. While our first source of information corresponds to the semantic interaction between the terms in queries and their top-retrieved documents, our second source of information corresponds to that of LETOR features

Enlighten

Biomedical information extraction for matching patients to clinical trials

Author: Araújo Gonçalo Carmo de
Publication venue
Publication date: 01/01/2018
Field of study

Digital Medical information had an astonishing growth on the last decades, driven by an unprecedented number of medical writers, which lead to a complete revolution in what and how much information is available to the health professionals. The problem with this wave of information is that performing a precise selection of the information retrieved by medical information repositories is very exhaustive and time consuming for physicians. This is one of the biggest challenges for physicians with the new digital era: how to reduce the time spent finding the perfect matching document for a patient (e.g. intervention articles, clinical trial, prescriptions). Precision Medicine (PM) 2017 is the track by the Text REtrieval Conference (TREC), that is focused on this type of challenges exclusively for oncology. Using a dataset with a large amount of clinical trials, this track is a good real life example on how information retrieval solutions can be used to solve this types of problems. This track can be a very good starting point for applying information extraction and retrieval methods, in a very complex domain. The purpose of this thesis is to improve a system designed by the NovaSearch team for TREC PM 2017 Clinical Trials task, which got ranked on the top-5 systems of 2017. The NovaSearch team also participated on the 2018 track and got a 15% increase on precision compared to the 2017 one. It was used multiple IR techniques for information extraction and processing of data, including rank fusion, query expansion (e.g. Pseudo relevance feedback, Mesh terms expansion) and experiments with Learning to Rank (LETOR) algorithms. Our goal is to retrieve the best possible set of trials for a given patient, using precise documents filters to exclude the unwanted clinical trials. This work can open doors in what can be done for searching and perceiving the criteria to exclude or include the trials, helping physicians even on the more complex and difficult information retrieval tasks

Repositório da Universidade Nova de Lisboa

Optimizing web search engines with interactions

Author: Grotov A.
Publication venue
Publication date: 01/01/2018
Field of study

International Migration, Integration and Social Cohesion online publications

Optimizing web search engines with interactions

Author: Grotov A.
Publication venue
Publication date: 01/01/2018
Field of study

International Migration, Integration and Social Cohesion online publications

Acceleration of ListNet for ranking using reconfigurable architecture

Author: Li Qiang
Publication venue: Electrical and Electronic Engineering, Imperial College London
Publication date: 01/04/2020
Field of study

Document ranking is used to order query results by relevance with ranking models. ListNet is a well-known ranking approach for constructing and training learning-to-rank models. Compared with traditional learning approaches, ListNet delivers better accuracy, but is computationally too expensive to learn models with large data sets due to the large number of permutations and documents involved in computing the gradients. Currently, the long training time limits the practicality of ListNet in ranking applications such as breaking news search and stock prediction, and this situation is getting worse with the increase in data-set size. In order to tackle the challenge of long training time, this thesis optimises the ListNet algorithm, and designs hardware accelerators for learning the ListNet algorithm using Field Programmable Gate Arrays (FPGAs), making the algorithm more practical for real-world application. The contributions of this thesis include: 1) A novel computation method of the ListNet algorithm for ranking. The proposed computation method exposes more fine-grained parallelism for FPGA implementation. 2) A weighted sampling method that takes into account the ranking positions, along with an effective quantisation method based on FPGA devices. The proposed design achieves a 4.42x improvement over GPU implementation speed, while still guaranteeing the accuracy. 3) A full reconfigurable architecture for the ListNet training using multiple bitstream kernels. The proposed method achieves a higher model accuracy than pure fixed point training, and a better throughput than pure floating point training. This thesis has resulted in the acceleration of the ListNet algorithm for ranking using FPGAs by applying the above techniques. Significant improvements in speed have been achieved in this work against CPU and GPU implementations.Open Acces

Spiral - Imperial College Digital Repository

iQPP: A Benchmark for Image Query Performance Prediction

Author: Ionescu Radu Tudor
Mothe Josiane
Poesina Eduard
Publication venue
Publication date: 10/04/2023
Field of study

To date, query performance prediction (QPP) in the context of content-based image retrieval remains a largely unexplored task, especially in the query-by-example scenario, where the query is an image. To boost the exploration of the QPP task in image retrieval, we propose the first benchmark for image query performance prediction (iQPP). First, we establish a set of four data sets (PASCAL VOC 2012, Caltech-101, ROxford5k and RParis6k) and estimate the ground-truth difficulty of each query as the average precision or the precision@k, using two state-of-the-art image retrieval models. Next, we propose and evaluate novel pre-retrieval and post-retrieval query performance predictors, comparing them with existing or adapted (from text to image) predictors. The empirical results show that most predictors do not generalize across evaluation scenarios. Our comprehensive experiments indicate that iQPP is a challenging benchmark, revealing an important research gap that needs to be addressed in future work. We release our code and data as open source at https://github.com/Eduard6421/iQPP, to foster future research.Comment: Accepted at SIGIR 202

arXiv.org e-Print Archive

Finding related sentence pairs in MEDLINE

Author: C Friedman
CJ Rijsbergen van
DK Milton
EW Sayers
GW Furnas
H Zou
KL Currie
L Smith
L Smith
Larry H. Smith
P Langley
Q Ma
R Artstein
R Wadden
S Jellali
T Dietterich
V Vapnik
W Wilbur
W. John Wilbur
WG Kim
WJ Wilbur
WJ Wilbur
Z Lu
Publication venue: Springer Netherlands
Publication date: 01/01/2010
Field of study

We explore the feasibility of automatically identifying sentences in different MEDLINE abstracts that are related in meaning. We compared traditional vector space models with machine learning methods for detecting relatedness, and found that machine learning was superior. The Huber method, a variant of Support Vector Machines which minimizes the modified Huber loss function, achieves 73% precision when the score cutoff is set high enough to identify about one related sentence per abstract on average. We illustrate how an abstract viewed in PubMed might be modified to present the related sentences found in other abstracts by this automatic procedure

Crossref

Springer - Publisher Connector

PubMed Central