48 research outputs found
Query-Based Keyphrase Extraction from Long Documents
Transformer-based architectures in natural language processing force input
size limits that can be problematic when long documents need to be processed.
This paper overcomes this issue for keyphrase extraction by chunking the long
documents while keeping a global context as a query defining the topic for
which relevant keyphrases should be extracted. The developed system employs a
pre-trained BERT model and adapts it to estimate the probability that a given
text span forms a keyphrase. We experimented using various context sizes on two
popular datasets, Inspec and SemEval, and a large novel dataset. The presented
results show that a shorter context with a query overcomes a longer one without
the query on long documents
Claim-Dissector: An Interpretable Fact-Checking System with Joint Re-ranking and Veracity Prediction
We present Claim-Dissector: a novel latent variable model for fact-checking
and analysis, which given a claim and a set of retrieved evidences jointly
learns to identify: (i) the relevant evidences to the given claim, (ii) the
veracity of the claim. We propose to disentangle the per-evidence relevance
probability and its contribution to the final veracity probability in an
interpretable way -- the final veracity probability is proportional to a linear
ensemble of per-evidence relevance probabilities. In this way, the individual
contributions of evidences towards the final predicted probability can be
identified. In per-evidence relevance probability, our model can further
distinguish whether each relevant evidence is supporting (S) or refuting (R)
the claim. This allows to quantify how much the S/R probability contributes to
the final verdict or to detect disagreeing evidence.
Despite its interpretable nature, our system achieves results competitive
with state-of-the-art on the FEVER dataset, as compared to typical two-stage
system pipelines, while using significantly fewer parameters. It also sets new
state-of-the-art on FAVIQ and RealFC datasets. Furthermore, our analysis shows
that our model can learn fine-grained relevance cues while using coarse-grained
supervision, and we demonstrate it in 2 ways. (i) We show that our model can
achieve competitive sentence recall while using only paragraph-level relevance
supervision. (ii) Traversing towards the finest granularity of relevance, we
show that our model is capable of identifying relevance at the token level. To
do this, we present a new benchmark TLR-FEVER focusing on token-level
interpretability -- humans annotate tokens in relevant evidences they
considered essential when making their judgment. Then we measure how similar
are these annotations to the tokens our model is focusing on.Comment: updated acknowledgemen
Multi-mJ, kHz, 2.1-μm OPCPA for high-flux soft X-ray high-harmonic radiation
We report on a multi-mJ 2.1-μm OPCPA system operating at a 1-kHz repetition rate, pumped by a picosecond cryogenic Yb:YAG pump laser, and the phase-matched high-flux high-harmonic soft X-ray generation