Search CORE

4,935 research outputs found

Benchmarking news recommendations: the CLEF NewsREEL use case

Author: Brodt Torben
Hopfgartner Frank
Kille Benjamin
Larson Martha
Lommatzsch Andreas
Seiler Jonas
Serény Andrá
Turrin Roberto
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/12/2015
Field of study

The CLEF NewsREEL challenge is a campaign-style evaluation lab allowing participants to evaluate and optimize news recommender algorithms. The goal is to create an algorithm that is able to generate news items that users would click, respecting a strict time constraint. The lab challenges participants to compete in either a "living lab" (Task 1) or perform an evaluation that replays recorded streams (Task 2). In this report, we discuss the objectives and challenges of the NewsREEL lab, summarize last year's campaign and outline the main research challenges that can be addressed by participating in NewsREEL 2016

Crossref

Enlighten

Query Expansion with Locally-Trained Word Embeddings

Author: Craswell Nick
Diaz Fernando
Mitra Bhaskar
Publication venue
Publication date: 01/01/2016
Field of study

Continuous space word embeddings have received a great deal of attention in the natural language processing and machine learning communities for their ability to model term similarity and other relationships. We study the use of term relatedness in the context of query expansion for ad hoc information retrieval. We demonstrate that word embeddings such as word2vec and GloVe, when trained globally, underperform corpus and query specific embeddings for retrieval tasks. These results suggest that other tasks benefiting from global embeddings may also benefit from local embeddings

arXiv.org e-Print Archive

Crossref

Technology Assisted Reviews: Finding the Last Few Relevant Documents by Asking Yes/No Questions to Reviewers

Author: Gordon
Grossman Maura R.
Kanoulas Evangelos
O'Mara-Eves Alison
Wen Zheng
Publication venue
Publication date: 01/01/2018
Field of study

The goal of a technology-assisted review is to achieve high recall with low human effort. Continuous active learning algorithms have demonstrated good performance in locating the majority of relevant documents in a collection, however their performance is reaching a plateau when 80\%-90\% of them has been found. Finding the last few relevant documents typically requires exhaustively reviewing the collection. In this paper, we propose a novel method to identify these last few, but significant, documents efficiently. Our method makes the hypothesis that entities carry vital information in documents, and that reviewers can answer questions about the presence or absence of an entity in the missing relevance documents. Based on this we devise a sequential Bayesian search method that selects the optimal sequence of questions to ask. The experimental results show that our proposed method can greatly improve performance requiring less reviewing effort.Comment: This paper is accepted by SIGIR 201

arXiv.org e-Print Archive

Crossref

International Migration, Integration and Social Cohesion online publications

UvA-DARE

Teaching a New Dog Old Tricks: Resurrecting Multilingual Retrieval Using Zero-shot Learning

Author: B Mitra
C Carpineto
C Peters
G Amati
KD Onal
M Braschler
M Johnson
P Yang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 30/12/2019
Field of study

While billions of non-English speaking users rely on search engines every day, the problem of ad-hoc information retrieval is rarely studied for non-English languages. This is primarily due to a lack of data set that are suitable to train ranking algorithms. In this paper, we tackle the lack of data by leveraging pre-trained multilingual language models to transfer a retrieval system trained on English collections to non-English queries and documents. Our model is evaluated in a zero-shot setting, meaning that we use them to predict relevance scores for query-document pairs in languages never seen during training. Our results show that the proposed approach can significantly outperform unsupervised retrieval techniques for Arabic, Chinese Mandarin, and Spanish. We also show that augmenting the English training collection with some examples from the target language can sometimes improve performance.Comment: ECIR 2020 (short

arXiv.org e-Print Archive

Crossref

REST: A Thread Embedding Approach for Identifying and Classifying User-specified Information in Security Forums

Author: Faloutsos Michalis
Gharibshah Joobin
Papalexakis Evangelos E
Publication venue: eScholarship, University of California
Publication date: 08/01/2020
Field of study

How can we extract useful information from a security forum? We focus on identifying threads of interest to a security professional: (a) alerts of worrisome events, such as attacks, (b) offering of malicious services and products, (c) hacking information to perform malicious acts, and (d) useful security-related experiences. The analysis of security forums is in its infancy despite several promising recent works. Novel approaches are needed to address the challenges in this domain: (a) the difficulty in specifying the "topics" of interest efficiently, and (b) the unstructured and informal nature of the text. We propose, REST, a systematic methodology to: (a) identify threads of interest based on a, possibly incomplete, bag of words, and (b) classify them into one of the four classes above. The key novelty of the work is a multi-step weighted embedding approach: we project words, threads and classes in appropriate embedding spaces and establish relevance and similarity there. We evaluate our method with real data from three security forums with a total of 164k posts and 21K threads. First, REST robustness to initial keyword selection can extend the user-provided keyword set and thus, it can recover from missing keywords. Second, REST categorizes the threads into the classes of interest with superior accuracy compared to five other methods: REST exhibits an accuracy between 63.3-76.9%. We see our approach as a first step for harnessing the wealth of information of online forums in a user-friendly way, since the user can loosely specify her keywords of interest

arXiv.org e-Print Archive

eScholarship - University of California

Association for the Advancement of Artificial Intelligence: AAAI Publications

Recommended from our members

REST: A thread embedding approach for identifying and classifying user-specified information in security forums

Author: Faloutsos Michalis
Publication venue: eScholarship, University of California
Publication date: 01/07/2020
Field of study

eScholarship - University of California

Language Models

Author: Hiemstra D.
Publication venue: Springer Verlag
Publication date: 01/01/2009
Field of study

Contains fulltext : 227630.pdf (preprint version ) (Open Access

Radboud Repository

University of Twente Research Information

Deriving query suggestions for site search

Author: Albakour
Albakour
Albakour
Baeza-Yates
Baeza-Yates
Beitzel
Belkin
Chau
Clark
Di Caro
Dorigo
Dumais
Efthimiadis
Fonseca
Gayo-Avello
Hawking
Jansen
Jansen
Jansen
Jansen
Joachims
Justeson
Kruschwitz
Kruschwitz
Kruschwitz
Kruschwitz
Kruschwitz
Manning
Marchionini
Markey
Martens
Ruthven
Silvestri
Socha
Tunkelang
Wang
White
White
Publication venue: 'Wiley'
Publication date: 01/01/2013
Field of study

Modern search engines have been moving away from simplistic interfaces that aimed at satisfying a user's need with a single-shot query. Interactive features are now integral parts of web search engines. However, generating good query modification suggestions remains a challenging issue. Query log analysis is one of the major strands of work in this direction. Although much research has been performed on query logs collected on the web as a whole, query log analysis to enhance search on smaller and more focused collections has attracted less attention, despite its increasing practical importance. In this article, we report on a systematic study of different query modification methods applied to a substantial query log collected on a local website that already uses an interactive search engine. We conducted experiments in which we asked users to assess the relevance of potential query modification suggestions that have been constructed using a range of log analysis methods and different baseline approaches. The experimental results demonstrate the usefulness of log analysis to extract query modification suggestions. Furthermore, our experiments demonstrate that a more fine-grained approach than grouping search requests into sessions allows for extraction of better refinement terms from query log files. © 2013 ASIS&T

University of Essex Research Repository

CiteSeerX

Crossref

University of Regensburg Publication Server

Open Research Online