36,023 research outputs found
On the Feasibility of Automated Detection of Allusive Text Reuse
The detection of allusive text reuse is particularly challenging due to the
sparse evidence on which allusive references rely---commonly based on none or
very few shared words. Arguably, lexical semantics can be resorted to since
uncovering semantic relations between words has the potential to increase the
support underlying the allusion and alleviate the lexical sparsity. A further
obstacle is the lack of evaluation benchmark corpora, largely due to the highly
interpretative character of the annotation process. In the present paper, we
aim to elucidate the feasibility of automated allusion detection. We approach
the matter from an Information Retrieval perspective in which referencing texts
act as queries and referenced texts as relevant documents to be retrieved, and
estimate the difficulty of benchmark corpus compilation by a novel
inter-annotator agreement study on query segmentation. Furthermore, we
investigate to what extent the integration of lexical semantic information
derived from distributional models and ontologies can aid retrieving cases of
allusive reuse. The results show that (i) despite low agreement scores, using
manual queries considerably improves retrieval performance with respect to a
windowing approach, and that (ii) retrieval performance can be moderately
boosted with distributional semantics
NPLDA: A Deep Neural PLDA Model for Speaker Verification
The state-of-art approach for speaker verification consists of a neural
network based embedding extractor along with a backend generative model such as
the Probabilistic Linear Discriminant Analysis (PLDA). In this work, we propose
a neural network approach for backend modeling in speaker recognition. The
likelihood ratio score of the generative PLDA model is posed as a
discriminative similarity function and the learnable parameters of the score
function are optimized using a verification cost. The proposed model, termed as
neural PLDA (NPLDA), is initialized using the generative PLDA model parameters.
The loss function for the NPLDA model is an approximation of the minimum
detection cost function (DCF). The speaker recognition experiments using the
NPLDA model are performed on the speaker verificiation task in the VOiCES
datasets as well as the SITW challenge dataset. In these experiments, the NPLDA
model optimized using the proposed loss function improves significantly over
the state-of-art PLDA based speaker verification system.Comment: Published in Odyssey 2020, the Speaker and Language Recognition
Workshop (VOiCES Special Session). Link to GitHub Implementation:
https://github.com/iiscleap/NeuralPlda. arXiv admin note: substantial text
overlap with arXiv:2001.0703
Explicit diversification of event aspects for temporal summarization
During major events, such as emergencies and disasters, a large volume of information is reported on newswire and social media platforms. Temporal summarization (TS) approaches are used to automatically produce concise overviews of such events by extracting text snippets from related articles over time. Current TS approaches rely on a combination of event relevance and textual novelty for snippet selection. However, for events that span multiple days, textual novelty is often a poor criterion for selecting snippets, since many snippets are textually unique but are semantically redundant or non-informative. In this article, we propose a framework for the diversification of snippets using explicit event aspects, building on recent works in search result diversification. In particular, we first propose two techniques to identify explicit aspects that a user might want to see covered in a summary for different types of event. We then extend a state-of-the-art explicit diversification framework to maximize the coverage of these aspects when selecting summary snippets for unseen events. Through experimentation over the TREC TS 2013, 2014, and 2015 datasets, we show that explicit diversification for temporal summarization significantly outperforms classical novelty-based diversification, as the use of explicit event aspects reduces the amount of redundant and off-topic snippets returned, while also increasing summary timeliness
- …