10 research outputs found
On the Definition of Prescriptive Annotation Guidelines for Language-Agnostic Subjectivity Detection
A Corpus for Sentence-level Subjectivity Detection on English News Articles
We present a novel corpus for subjectivity detection at the sentence level.
We develop new annotation guidelines for the task, which are not limited to
language-specific cues, and apply them to produce a new corpus in English. The
corpus consists of 411 subjective and 638 objective sentences extracted from
ongoing coverage of political affairs from online news outlets. This new
resource paves the way for the development of models for subjectivity detection
in English and across other languages, without relying on language-specific
tools like lexicons or machine translation. We evaluate state-of-the-art
multilingual transformer-based models on the task, both in mono- and
cross-lingual settings, the latter with a similar existing corpus in Italian
language. We observe that enriching our corpus with resources in other
languages improves the results on the task
Argument mining as rapid screening tool of COVID-19 literature quality: Preliminary evidence
BackgroundThe COVID-19 pandemic prompted the scientific community to share timely evidence, also in the form of pre-printed papers, not peer reviewed yet.PurposeTo develop an artificial intelligence system for the analysis of the scientific literature by leveraging on recent developments in the field of Argument Mining.MethodologyScientific quality criteria were borrowed from two selected Cochrane systematic reviews. Four independent reviewers gave a blind evaluation on a 1–5 scale to 40 papers for each review. These scores were matched with the automatic analysis performed by an AM system named MARGOT, which detected claims and supporting evidence for the cited papers. Outcomes were evaluated with inter-rater indices (Cohen's Kappa, Krippendorff's Alpha, s* statistics).ResultsMARGOT performs differently on the two selected Cochrane reviews: the inter-rater indices show a fair-to-moderate agreement of the most relevant MARGOT metrics both with Cochrane and the skilled interval scores, with larger values for one of the two reviews.Discussion and conclusionsThe noted discrepancy could rely on a limitation of the MARGOT system that can be improved; yet, the level of agreement between human reviewers also suggests a different complexity between the two reviews in debating controversial arguments. These preliminary results encourage to expand and deepen the investigation to other topics and a larger number of highly specialized reviewers, to reduce uncertainty in the evaluation process, thus supporting the retraining of AM systems
Advanced techniques for cross-language annotation projection in legal texts
Nowadays, the majority of the services we benefit from, are provided online and their use is regulated by the acceptance to the terms of service by the users. All our data are handled accordingly with the clauses of such document and all our behaviours must comply with it. Given so, it would be very useful to find automated techniques to ensure fairness of the document or inform the users about possible threats.
The focus of this work, is to create resources aimed to the development of such tools in languages other than English, which may lack in linguistic resources and annotated corpus.
The enormous breakthroughs of the last years in Natural Language Processing techniques made it possible the creation of such tools through automated and unsupervised process. One of the means to achieve that is through the annotation projection between two parallel corpora.
The difficulties and costs of creating ad hoc resource for every language has brought the need to find another way for achieving the goal.\\
This work investigates the cross language annotation projection technique based on sentence embedding and similarity metrics to find matches between sentences. Several combination of methods and algorithms are compared, among which there are monolingual and multilingual embedding neural models. The experiments are conducted on two datasets, where the reference language is always English and the projection are evaluated on Italian, German and Polish.
The results obtained provide a robust and reliable technique for the task and a good starting point to build multilingual tools
PM100: A Job Power Consumption Dataset of a Large-Scale HPC System
<p>The dataset is a collection of jobs extracted from the job_table data of M100 (<a href="https://doi.org/10.5281/zenodo.7588815">https://doi.org/10.5281/zenodo.7588815</a>), a collection of data extracted from a Tier-0 supercomputer hosted at CINECA (Marconi100, <a href="https://www.hpc.cineca.it/hardware/marconi100">https://www.hpc.cineca.it/hardware/marconi100</a>). The original job data present in M100 are filtered out by considering only the jobs running exclusively on the resources. Each job entry included in PM100 contains the power consumption of the job recorded at Node level, CPU level and Memory level. The final dataset contains 231116 jobs, executed on Marconi100 between May and October 2020. </p><p>The dataset is stored as a parquet file, where each entry contains the information on a job execution. </p><p>The structure of the data, as well as the code to generate them, is contained in the official GitHub repository of the project: <a href="https://github.com/francescoantici/PM100-data/">https://github.com/francescoantici/PM100-data/</a>.</p>
A Corpus for Sentence-Level Subjectivity Detection on English News Articles
We develop novel annotation guidelines for sentence-level subjectivity detection, which are not limited to language-specific cues. We use our guidelines to collect NewsSD-ENG, a corpus of 638 objective and 411 subjective sentences extracted from English news articles on controversial topics. Our corpus paves the way for subjectivity detection in English and across other languages without relying on language-specific tools, such as lexicons or machine translation. We evaluate state-of-the-art multilingual transformer-based models on the task in mono-, multi-, and cross-language settings. For this purpose, we re-annotate an existing Italian corpus. We observe that models trained in the multilingual setting achieve the best performance on the task
Overview of the CLEF-2023 CheckThat! Lab: Task 2 on Subjectivity in News Articles
We describe the outcome of the 2023 edition of the CheckThat!Lab at CLEF. We focus on subjectivity (Task 2), which has been proposed for the first time. It aims at fostering the technology for the identification of subjective text fragments in news articles. For that, we produced corpora consisting of 9,530 manually-annotated sentences, covering six languages - Arabic, Dutch, English, German, Italian, and Turkish. Task 2 attracted 12 teams, which submitted a total of 40 final runs covering all languages. The most successful approaches addressed the task using state-of-the-art multilingual transformer models, which were fine-tuned on language-specific data. Teams also experimented with a rich set of other neural architectures, including foundation models, zero-shot classifiers, and standard transformers, mainly coupled with data augmentation and multilingual training strategies to address class imbalance. We publicly release all the datasets and evaluation scripts, with the purpose of promoting further research on this topic
A european proposal for the compton gamma-ray source of eli-np
A European proposal is under preparation for the Compton gamma-ray Source of ELI-NP.In the Romanian pillar of ELI (the European Extreme Light Infrastructure) an advancedgamma-ray beam is foreseen, coupled to two 10 PW laser systems. The photons will begenerated by Compton back-scattering in the collision between a high quality electron beamand a high power laser. A European collaboration formed by INFN, Univ. of Roma La Sapienza,Orsay-LAL of IN2P3, Univ. de Paris Sud XI and ASTeC at Daresbury, is preparing a TDRexploring the feasibility of a machine expected to achieve the Gamma-ray beamspecifications: energy tunable between 1 and 20 MeV, narrow bandwidth (0.3%) and highspectral density, 104 photons/sec/eV. We will describe the lay-out of the 720MeV RF Linac and the collision laser with the associated optical cavity, as well as theoptimized beam dynamics to achieve maximum phase space density at the collision. Thepredicted gamma-ray spectra have been evaluated for the case at 360 MeV. Copyright ©2012 by IEEE