Search CORE

13 research outputs found

Contextual compositionality detection with external knowledge bases and word embeddings

Author: Li Q.
Lima L. C.
Lioma C.
Simonsen J. G.
Wang D.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2019
Field of study

When the meaning of a phrase cannot be inferred from the individual meanings of its words (e.g., hot dog), that phrase is said to be non-compositional. Automatic compositionality detection in multiword phrases is critical in any application of semantic processing, such as search engines [9]; failing to detect non-compositional phrases can hurt system effectiveness notably. Existing research treats phrases as either compositional or non-compositional in a deterministic manner. In this paper, we operationalize the viewpoint that compositionality is contextual rather than deterministic, i.e., that whether a phrase is compositional or non-compositional depends on its context. For example, the phrase \ufffdgreen card\ufffd is compositional when referring to a green colored card, whereas it is non-compositional when meaning permanent residence authorization. We address the challenge of detecting this type of contextual compositionality as follows: given a multi-word phrase, we enrich the word embedding representing its semantics with evidence about its global context (terms it often collocates with) as well as its local context (narratives where that phrase is used, which we call usage scenarios). We further extend this representation with information extracted from external knowledge bases. The resulting representation incorporates both localized context and more general usage of the phrase and allows to detect its compositionality in a non-deterministic and contextual way. Empirical evaluation of our model on a dataset of phrase compositionality1, manually collected by crowdsourcing contextual compositionality assessments, shows that our model outperforms state-of-the-art baselines notably on detecting phrase compositionality

arXiv.org e-Print Archive

Copenhagen University Research Information System

Archivio istituzionale della ricerca - Università di Padova

Exploiting the Bipartite Structure of Entity Grids for Document Coherence and Retrieval

Author: Barzilay R.
Guinaudeau C.
Gunning R.
Halliday M. K.
Kayaaslan E.
Kincaid J. P.
Li J.
Lin Z.
Lioma C.
McLaughlin G. H.
Michelbacher L.
Newman M.
Parveen D.
Xiong D.
Zhang R.
Çelikyilmaz A.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2016
Field of study

International audienceDocument coherence describes how much sense text makes in terms of its logical organisation and discourse flow. Even though coherence is a relatively difficult notion to quantify precisely, it can be approximated automatically. This type of coherence modelling is not only interesting in itself, but also useful for a number of other text processing tasks, including Information Retrieval (IR), where adjusting the ranking of documents according to both their relevance and their coherence has been shown to increase retrieval effectiveness.The state of the art in unsupervised coherence modelling represents documents as bipartite graphs of sentences and discourse entities, and then projects these bipartite graphs into one–mode undirected graphs. However, one–mode projections may incur significant loss of the information present in the original bipartite structure. To address this we present three novel graph metrics that compute document coherence on the original bipartite graph of sentences and entities. Evaluation on standard settings shows that: (i) one of our coherence metrics beats the state of the art in terms of coherence accuracy; and (ii) all three of our coherence metrics improve retrieval effectiveness because, as closer analysis reveals, they capture aspects of document quality that go undetected by both keyword-based standard ranking and by spam filtering. This work contributes document coherence metrics that are theoretically principled, parameter-free, and useful to IR

arXiv.org e-Print Archive

Crossref

Copenhagen University Research Information System

VBN

Investigating the statistical properties of user-generated documents

Author: B. Krishnamurthy
C. Lioma
C. Lioma
C. Macdonald
C.D. Manning
D. Metzler
G. Inches
G. Wilcock
H. Qi
J. Allan
M. Carullo
M. Serrano
R. Bache
R.A. Baeza-Yates
S.C.H. Haichao Dong
T. Kucukyilmaz
V.H. Tuulos
Publication venue
Publication date: 01/01/2011
Field of study

The importance of the Internet as a communication medium is reflected in the large amount of documents being generated every day by users of the different services that take place online. In this work we aim at analyzing the properties of these online user-generated documents for some of the established services over the Internet (Kongregate, Twitter, Myspace and Slashdot) and comparing them with a consolidated collection of standard information retrieval documents (from the Wall Street Journal, Associated Press and Financial Times, as part of the TREC ad-hoc collection). We investigate features such as document similarity, term burstiness, emoticons and Part-Of-Speech analysis, highlighting the applicability and limits of traditional content analysis and indexing techniques used in information retrieval to the new online usergenerated documents

Crossref

RERO DOC Digital Library

User perspectives on query difficulty

Author: A. Josang
B. He
C. Lioma
D. Carmel
G. Kumaran
K.S. Jones
M. Sanderson
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

Crossref

Copenhagen University Research Information System

VBN

On clustering and polyrepresentation

Author: B. Larsen
C. Lioma
D. Kelly
F. Raiber
I. Frommholz
M. Efron
M. Lykke
M. Skov
P. Ingwersen
P. Ingwersen
R.W. White
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

Polyrepresentation is one of the most prominent principles in a cognitive approach to interactive information seeking and retrieval. When it comes to interactive retrieval, clustering is another method for accessing information. While polyrepresentation has been explored and validated in a scenario where a system returns a ranking of documents, so far there are no insights if and how polyrepresentation and clustering can be combined. In this paper we discuss how both are related and present an approach to integrate polyrepresentation into clustering. We further report some initial evaluation results

Crossref

University of Bedfordshire Repository

Personalized social query expansion using social annotations

Author: A Hotho
A Mantrach
B Krause
C Lioma
D Benz
D Carmel
D Vallet
MF Porter
MG Noll
MR Bouadjenek
MR Bouadjenek
MR Bouadjenek
NJ Belkin
P Mika
RA Baeza-Yates
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

© 2019, Springer-Verlag GmbH Germany, part of Springer Nature. Query expansion is a query pre-processing technique that adds to a given query, terms that are likely to occur in relevant documents in order to improve information retrieval accuracy. A key problem to solve is “how to identify the terms to be added to a query?” While considering social tagging systems as a data source, we propose an approach that selects terms based on (i) the semantic similarity between tags composing a query, (ii) a social proximity between the query and the user for a personalized expansion, and (iii) a strategy for expanding, on the fly, user queries. We demonstrate the effectiveness of our approach by an intensive evaluation on three large public datasets crawled from delicious, Flickr, and CiteULike. We show that the expanded queries built by our method provide more accurate results as compared to the initial queries, by increasing the MAP in a range of 10 to 16% on the three datasets. We also compare our method to three state of the art baselines, and we show that our query expansion method allows significant improvement in the MAP, with a boost in a range between 5 to 18%

ZU Scholars (Zayed University)

Deakin Research Online

Crossref

Report on ECIR 2016: 38th European Conference on Information Retrieval

Author: Alonso Omar
Clough Paul D.
Crestani Fabio
Di Nunzio Giorgio Maria
Ferro Nicola
Hauff Claudia
Kekäläinen Jaana
Lioma Christina
Mizzaro Stefano
Moens Marie-Francine
Mothe Josiane
Pasi Gabriella
Rosso Paolo
Serdyukov Pavel
Silvello Gianmaria
Silvestri Fabrizio
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2016
Field of study

International audienceThe 38th European Conference on Information Retrieval took place from the 20th to the 23rd of March 2016 in Padua, Italy. This report summarizes the conference in terms of the presented keynotes, scientific and social programme, industry day, tutorials, workshops and student support

Crossref

Scientific Publications of the University of Toulouse II Le Mirail

Archivio istituzionale della ricerca - Università degli Studi di Udine

Open Archive Toulouse Archive Ouverte

Archivio della ricerca- Università di Roma La Sapienza

Archivio istituzionale della ricerca - Università di Padova

A study of factuality, objectivity and relevance:three desiderata in large-scale information retrieval?

Author: Fader A.
Fallows D.
Fisher R. A.
Froehlich T.
Gamon M.
Gunning R.
Hargittai E.
Horn C.
Khatib K. Al
Kincaid J. P.
Langacker R.
Lioma C.
McClure G.
McLaughlin G. H.
Michelbacher L.
Osgood C. E.
Platt J. C.
Riloff E.
Shanahan J. G.
Yessenalina A.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2016
Field of study

Much of the information processed by Information Retrieval (IR) systems is unreliable, biased, and generally untrustworthy [1], [2], [3]. Yet, factuality & objectivity detection is not a standard component of IR systems, even though it has been possible in Natural Language Processing (NLP) in the last decade. Motivated by this, we ask if and how factuality & objectivity detection may benefit IR. We answer this in two parts. First, we use state-of-the-art NLP to compute the probability of document factuality & objectivity in two TREC collections, and analyse its relation to document relevance. We find that factuality is strongly and positively correlated to document relevance, but objectivity is not. Second, we study the impact of factuality & objectivity to retrieval effectiveness by treating them as query independent features that we combine with a competitive language modelling baseline. Experiments with 450 TREC queries show that factuality improves precision >10% over strong baselines, especially for uncurated data used in web search; objectivity gives mixed results. An overall clear trend is that document factuality & objectivity is much more beneficial to IR when searching uncurated (e.g. web) documents vs. curated (e.g. state documentation and newswire articles). To our knowledge, this is the first study of factuality & objectivity for back-end IR, contributing novel findings about the relation between relevance and factuality/objectivity, and statistically significant gains to retrieval effectiveness in the competitive web search task

arXiv.org e-Print Archive

Crossref

Copenhagen University Research Information System

VBN

CLEF 2004: Ad Hoc Track Overview and Results Analysis

Author: A. Chen
A. Singhal
C. Cleverdon
C. Lioma
C. Peters
D. Hull
D. Nadeau
F. Martínez-Santiago
F.C. Gey
G. Serasset
G.-A. Levow
G.G. Judge
G.J.F. Jones
G.M. Nunzio Di
J. Kamps
J. Savoy
J. Savoy
J. Tague-Sutcliffe
J.M. Goñi-Menoyo
M. Braschler
M. Braschler
M.E. Ruiz
R. Besançon
R. Hackl
S. Tomlinson
W.J. Conover
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2005
Field of study

We describe the objectives and organization of the CLEF 2004 ad hoc track and discuss the main characteristics of the experiments. The results are analyzed and commented and their statistical significance is investigated. The paper concludes with some observations on the impact of the CLEF campaign on the state-of-the-art in cross-language information retrieval

CiteSeerX

Crossref

ZHAW digitalcollection

Archivio istituzionale della ricerca - Università di Padova

Influence of Powder Morphology and Microstructure on the Cold Spray and Mechanical Properties of Ti6Al4V Coatings

Author: A Moridi
A Papyrin
AM Birt
AM Birt
AWY Tan
B Jodoin
CM Lewandowski
D Goldbaum
D Goldbaum
D Goldbaum
D Goldbaum
D Lioma
D MacDonald
DE Wolfe
E Brandl
E Brandl
GB Viswanathan
H Assadi
H Assadi
H Aydin
I Sen
J Cizek
JI Qazi
KH Kim
LE Murr
M Winnicki
MV Vidaller
NW Khun
P Vo
PH Gao
Phuong Vo
Richard R. Chromik
S Yin
T Schmidt
TB Britton
Valary Akinyi
Venkata Naga Vamsi Munagala
VK Champagne
VS Bhattiprolu
W Sun
W Sun
W Wong
WC Oliver
WD Nix
XT Luo
Y Zhang
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref