Search CORE

48 research outputs found

Query-Based Keyphrase Extraction from Long Documents

Author: Docekal Martin
Smrz Pavel
Publication venue: 'University of Florida George A Smathers Libraries'
Publication date: 11/05/2022
Field of study

Transformer-based architectures in natural language processing force input size limits that can be problematic when long documents need to be processed. This paper overcomes this issue for keyphrase extraction by chunking the long documents while keeping a global context as a query defining the topic for which relevant keyphrases should be extracted. The developed system employs a pre-trained BERT model and adapts it to estimate the probability that a given text span forms a keyphrase. We experimented using various context sizes on two popular datasets, Inspec and SemEval, and a large novel dataset. The presented results show that a shorter context with a query overcomes a longer one without the query on long documents

arXiv.org e-Print Archive

Claim-Dissector: An Interpretable Fact-Checking System with Joint Re-ranking and Veracity Prediction

Author: Fajcik Martin
Motlicek Petr
Smrz Pavel
Publication venue
Publication date: 01/06/2023
Field of study

We present Claim-Dissector: a novel latent variable model for fact-checking and analysis, which given a claim and a set of retrieved evidences jointly learns to identify: (i) the relevant evidences to the given claim, (ii) the veracity of the claim. We propose to disentangle the per-evidence relevance probability and its contribution to the final veracity probability in an interpretable way -- the final veracity probability is proportional to a linear ensemble of per-evidence relevance probabilities. In this way, the individual contributions of evidences towards the final predicted probability can be identified. In per-evidence relevance probability, our model can further distinguish whether each relevant evidence is supporting (S) or refuting (R) the claim. This allows to quantify how much the S/R probability contributes to the final verdict or to detect disagreeing evidence. Despite its interpretable nature, our system achieves results competitive with state-of-the-art on the FEVER dataset, as compared to typical two-stage system pipelines, while using significantly fewer parameters. It also sets new state-of-the-art on FAVIQ and RealFC datasets. Furthermore, our analysis shows that our model can learn fine-grained relevance cues while using coarse-grained supervision, and we demonstrate it in 2 ways. (i) We show that our model can achieve competitive sentence recall while using only paragraph-level relevance supervision. (ii) Traversing towards the finest granularity of relevance, we show that our model is capable of identifying relevance at the token level. To do this, we present a new benchmark TLR-FEVER focusing on token-level interpretability -- humans annotate tokens in relevant evidences they considered essential when making their judgment. Then we measure how similar are these annotations to the tokens our model is focusing on.Comment: updated acknowledgemen

arXiv.org e-Print Archive

Multi-mJ, kHz, 2.1-μm OPCPA for high-flux soft X-ray high-harmonic radiation

Author: Hong Kyung-Han
Kaertner Franz X.
Krogen Peter Ra
Lai Chein-Jen
Lai Chien-Jen
Moses Jeffrey
Siqueira Jonathas
Smrz Martin
Zapata Luis E.
Publication venue: 'The Optical Society'
Publication date: 01/01/2014
Field of study

We report on a multi-mJ 2.1-μm OPCPA system operating at a 1-kHz repetition rate, pumped by a picosecond cryogenic Yb:YAG pump laser, and the phase-matched high-flux high-harmonic soft X-ray generation

DSpace@MIT

Crossref

Kilowatt-class high energy frequency conversion to 95 J at 10 Hz at 515 nm

Author: Chris Edwards
Danielle Clarke
Jan Pilar
John Collier
Jonathan Phillips
Martin Divoky
Martin Hanus
Martin Smrz
Ondrej Denk
Patricie Severova
Petr Navratil
Thomas Butcher
Tomas Mocek
Tomas Paliesek
Publication venue: Cambridge University Press
Publication date
Field of study

Directory of Open Access Journals