Search CORE

7,324 research outputs found

Constructing Datasets for Multi-hop Reading Comprehension Across Documents

Author: Riedel Sebastian
Stenetorp Pontus
Welbl Johannes
Publication venue
Publication date: 28/05/2018
Field of study

Most Reading Comprehension methods limit themselves to queries which can be answered using a single sentence, paragraph, or document. Enabling models to combine disjoint pieces of textual evidence would extend the scope of machine comprehension methods, but currently there exist no resources to train and test this capability. We propose a novel task to encourage the development of models for text understanding across multiple documents and to investigate the limits of existing methods. In our task, a model learns to seek and combine evidence - effectively performing multi-hop (alias multi-step) inference. We devise a methodology to produce datasets for this task, given a collection of query-answer pairs and thematically linked documents. Two datasets from different domains are induced, and we identify potential pitfalls and devise circumvention strategies. We evaluate two previously proposed competitive models and find that one can integrate information across documents. However, both models struggle to select relevant information, as providing documents guaranteed to be relevant greatly improves their performance. While the models outperform several strong baselines, their best accuracy reaches 42.9% compared to human performance at 74.0% - leaving ample room for improvement.Comment: This paper directly corresponds to the TACL version (https://transacl.org/ojs/index.php/tacl/article/view/1325) apart from minor changes in wording, additional footnotes, and appendice

arXiv.org e-Print Archive

UCL Discovery

Spoken content retrieval: A survey of techniques and technologies

Author: Ani Nenkova
C A. Nenkova
K. Mckeown
Kathleen Mckeown
Publication venue: 'Now Publishers'
Publication date: 01/01/2012
Field of study

Speech media, that is, digital audio and video containing spoken content, has blossomed in recent years. Large collections are accruing on the Internet as well as in private and enterprise settings. This growth has motivated extensive research on techniques and technologies that facilitate reliable indexing and retrieval. Spoken content retrieval (SCR) requires the combination of audio and speech processing technologies with methods from information retrieval (IR). SCR research initially investigated planned speech structured in document-like units, but has subsequently shifted focus to more informal spoken content produced spontaneously, outside of the studio and in conversational settings. This survey provides an overview of the field of SCR encompassing component technologies, the relationship of SCR to text IR and automatic speech recognition and user interaction issues. It is aimed at researchers with backgrounds in speech technology or IR who are seeking deeper insight on how these fields are integrated to support research and development, thus addressing the core challenges of SCR

CiteSeerX

Crossref

Irish Universities

DCU Online Research Access Service

The complexity of resolving conflicts on MAC

Author: Vaya Shailesh
Publication venue
Publication date: 05/08/2013
Field of study

We consider the fundamental problem of multiple stations competing to transmit on a multiple access channel (MAC). We are given

n

stations out of which at most

d

are active and intend to transmit a message to other stations using MAC. All stations are assumed to be synchronized according to a time clock. If

l

stations node transmit in the same round, then the MAC provides the feedback whether

l=0

l=2

(collision occurred) or

l=1

. When

l=1

, then a single station is indeed able to successfully transmit a message, which is received by all other nodes. For the above problem the active stations have to schedule their transmissions so that they can singly, transmit their messages on MAC, based only on the feedback received from the MAC in previous round. For the above problem it was shown in [Greenberg, Winograd, {\em A Lower bound on the Time Needed in the Worst Case to Resolve Conflicts Deterministically in Multiple Access Channels}, Journal of ACM 1985] that every deterministic adaptive algorithm should take

\Omega(d (\lg n)/(\lg d))

rounds in the worst case. The fastest known deterministic adaptive algorithm requires

O(d \lg n)

rounds. The gap between the upper and lower bound is

O(\lg d)

round. It is substantial for most values of

d

: When

d =

constant and

d \in O(n^{\epsilon})

(for any constant

\epsilon \leq 1

, the lower bound is respectively

O(\lg n)

and O(n), which is trivial in both cases. Nevertheless, the above lower bound is interesting indeed when

d \in

poly(

\lg n

). In this work, we present a novel counting argument to prove a tight lower bound of

\Omega(d \lg n)

rounds for all deterministic, adaptive algorithms, closing this long standing open question.}Comment: Xerox internal report 27th July; 7 page

arXiv.org e-Print Archive

CiteSeerX

Monotonic Prefix Consistency in Distributed Systems

Author: E Brewer
H Attiya
J Garay
L Lamport
M Herlihy
R Guerraoui
R Pass
S Gilbert
Publication venue
Publication date: 18/06/2018
Field of study

We study the issue of data consistency in distributed systems. Specifically, we consider a distributed system that replicates its data at multiple sites, which is prone to partitions, and which is assumed to be available (in the sense that queries are always eventually answered). In such a setting, strong consistency, where all replicas of the system apply synchronously every operation, is not possible to implement. However, many weaker consistency criteria that allow a greater number of behaviors than strong consistency, are implementable in available distributed systems. We focus on determining the strongest consistency criterion that can be implemented in a convergent and available distributed system that tolerates partitions. We focus on objects where the set of operations can be split into updates and queries. We show that no criterion stronger than Monotonic Prefix Consistency (MPC) can be implemented.Comment: Submitted pape

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Crossref

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

HAL-Rennes 1

Statistical structures for internet-scale data management

Author: Ntarmos N.
Triantafillou P.
Weikum G.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

Efficient query processing in traditional database management systems relies on statistics on base data. For centralized systems, there is a rich body of research results on such statistics, from simple aggregates to more elaborate synopses such as sketches and histograms. For Internet-scale distributed systems, on the other hand, statistics management still poses major challenges. With the work in this paper we aim to endow peer-to-peer data management over structured overlays with the power associated with such statistical information, with emphasis on meeting the scalability challenge. To this end, we first contribute efficient, accurate, and decentralized algorithms that can compute key aggregates such as Count, CountDistinct, Sum, and Average. We show how to construct several types of histograms, such as simple Equi-Width, Average-Shifted Equi-Width, and Equi-Depth histograms. We present a full-fledged open-source implementation of these tools for distributed statistical synopses, and report on a comprehensive experimental performance evaluation, evaluating our contributions in terms of efficiency, accuracy, and scalability

CiteSeerX

Springer - Publisher Connector

Enlighten

MPG.PuRe

Fully decentralized computation of aggregates over data streams

Author: Adi Rosen
Becchetti Luca
Bordino Ilaria
Leonardi Stefano
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2010
Field of study

In several emerging applications, data is collected in massive streams at several distributed points of observation. A basic and challenging task is to allow every node to monitor a neighbourhood of interest by issuing continuous aggregate queries on the streams observed in its vicinity. This class of algorithms is fully decentralized and diffusive in nature: collecting all data at few central nodes of the network is unfeasible in networks of low capability devices or in the presence of massive data sets. The main difficulty in designing diffusive algorithms is to cope with duplicate detections. These arise both from the observation of the same event at several nodes of the network and/or receipt of the same aggregated information along multiple paths of diffusion. In this paper, we consider fully decentralized algorithms that answer locally continuous aggregate queries on the number of distinct events, total number of events and the second frequency moment in the scenario outlined above. The proposed algorithms use in the worst case or on realistic distributions sublinear space at every node. We also propose strategies that minimize the communication needed to update the aggregates when new events are observed. We experimentally evaluate for the efficiency and accuracy of our algorithms on realistic simulated scenarios

Archivio della ricerca- Università di Roma La Sapienza

On User Modelling for Personalised News Video Recommendation

Author: Hopfgartner F.
Jose J.M.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/06/2009
Field of study

In this paper, we introduce a novel approach for modelling user interests. Our approach captures users evolving information needs, identifies aspects of their need and recommends relevant news items to the users. We introduce our approach within the context of personalised news video retrieval. A news video data set is used for experimentation. We employ a simulated user evaluation

Enlighten

Neural Architecture for Question Answering Using a Knowledge Graph and Web Corpus

Author: Sawant Uma
Garg Saurabh
Chakrabarti Soumen
Ramakrishnan Ganesh
Publication venue
Publication date: 06/12/2018
Field of study

In Web search, entity-seeking queries often trigger a special Question Answering (QA) system. It may use a parser to interpret the question to a structured query, execute that on a knowledge graph (KG), and return direct entity responses. QA systems based on precise parsing tend to be brittle: minor syntax variations may dramatically change the response. Moreover, KG coverage is patchy. At the other extreme, a large corpus may provide broader coverage, but in an unstructured, unreliable form. We present AQQUCN, a QA system that gracefully combines KG and corpus evidence. AQQUCN accepts a broad spectrum of query syntax, between well-formed questions to short `telegraphic' keyword sequences. In the face of inherent query ambiguities, AQQUCN aggregates signals from KGs and large corpora to directly rank KG entities, rather than commit to one semantic interpretation of the query. AQQUCN models the ideal interpretation as an unobservable or latent variable. Interpretations and candidate entity responses are scored as pairs, by combining signals from multiple convolutional networks that operate collectively on the query, KG and corpus. On four public query workloads, amounting to over 8,000 queries with diverse query syntax, we see 5--16% absolute improvement in mean average precision (MAP), compared to the entity ranking performance of recent systems. Our system is also competitive at entity set retrieval, almost doubling F1 scores for challenging short queries.Comment: Accepted to Information Retrieval Journa

arXiv.org e-Print Archive

Biblioteca Digital de la Comunidad de Madrid