Search CORE

77 research outputs found

Event-based Access to Historical Italian War Memoirs

Author: Nanni Federico
Ponzetto Simone Paolo
Rovera Marco
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2021
Field of study

The progressive digitization of historical archives provides new, often domain specific, textual resources that report on facts and events which have happened in the past; among these, memoirs are a very common type of primary source. In this paper, we present an approach for extracting information from Italian historical war memoirs and turning it into structured knowledge. This is based on the semantic notions of events, participants and roles. We evaluate quantitatively each of the key-steps of our approach and provide a graph-based representation of the extracted knowledge, which allows to move between a Close and a Distant Reading of the collection.Comment: 23 pages, 6 figure

arXiv.org e-Print Archive

MAnnheim DOCument Server

Political Text Scaling Meets Computational Semantics

Author: Glavas Goran
Nanni Federico
Ponzetto Simone Paolo
Rehbein Ines
Stuckenschmidt Heiner
Publication venue
Publication date: 01/01/2021
Field of study

During the last fifteen years, automatic text scaling has become one of the key tools of the Text as Data community in political science. Prominent text scaling algorithms, however, rely on the assumption that latent positions can be captured just by leveraging the information about word frequencies in documents under study. We challenge this traditional view and present a new, semantically aware text scaling algorithm, SemScale, which combines recent developments in the area of computational linguistics with unsupervised graph-based clustering. We conduct an extensive quantitative analysis over a collection of speeches from the European Parliament in five different languages and from two different legislative terms, and show that a scaling approach relying on semantic document representations is often better at capturing known underlying political dimensions than the established frequency-based (i.e., symbolic) scaling method. We further validate our findings through a series of experiments focused on text preprocessing and feature selection, document representation, scaling of party manifestos, and a supervised extension of our algorithm. To catalyze further research on this new branch of text scaling methods, we release a Python implementation of SemScale with all included data sets and evaluation procedures.Comment: Updated version - accepted for Transactions on Data Science (TDS

arXiv.org e-Print Archive

MAnnheim DOCument Server

Cross-lingual classification of topics in political texts

Author: Glavaš Goran
Nanni Federico
Ponzetto Simone Paolo
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2017
Field of study

Crossref

MAnnheim DOCument Server

SLaTE: a system for labeling topics with entities

Author: Lauscher Anne
Nanni Federico
Ponzetto Simone Paolo
Publication venue: McGill Université ; Université de Montréal
Publication date: 01/01/2017
Field of study

MAnnheim DOCument Server

Entity relatedness for retrospective analyses of global events

Author: Dietz Laura
Nanni Federico
Ponzetto Simone Paolo
Publication venue: 'American College of Medical Physics (ACMP)'
Publication date: 01/01/2016
Field of study

Tracking global events through time would ease many diachronic analyses which are currently carried out manually by social scientists. While entity linking algorithms can be adapted to track events that go by a common name, such a name is often not established in early stages leading up to the event. This study evaluates the utility of entity relatedness for the task of identifying related entities and textual resources that describe the involvement of the entity in the event. In a small study we find that simple relatedness methods obtain MAP score of 0.74 outperforming many advanced baseline systems such as Stics and Wiki2Vec. A small adaptation of this method provides sufficient explanations of entity involvement or 68% of relevant entities

MAnnheim DOCument Server

Unsupervised cross-lingual scaling of political texts

Author: Glavaš Goran
Nanni Federico
Ponzetto Simone Paolo
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2017
Field of study

Crossref

MAnnheim DOCument Server

Domain-specific named entity disambiguation in historical memoirs

Author: Goy Anna
Nanni Federico
Ponzetto Simone Paolo
Rovera Marco
Publication venue: RWTH
Publication date: 01/01/2017
Field of study

This paper presents the results of the extraction of named entities from a collection of historical memoirs about the italian Resistance during the World War II. The methodology followed for the extraction and disambiguation task will be discussed, as well as its evaluation. For the semantic annotations of the dataset, we have developed a pipeline based on established practices for extracting and disambiguating Named Entities. This has been necessary, considering the poor performances of out-of-the-box Named Entity Recognition and Disambiguation (NERD) tools tested in the initial phase of this work.Questo articolo presenta l’attività di estrazione di entità nominate realizzata su una collezione di memorie relative al periodo della Resistenza italiana nella Seconda Guerra Mondiale. Verrà discussa la metodologia sviluppata per il processo di estrazione e disambiguazione delle entità nominate, nonché la sua valutazione. L’implementazione di una metodologia di estrazione e disambiguazione basata su lookup si è resa necessaria in considerazione delle scarse prestazioni dei sistemi di Named Entity Recognition and Disambiguation (NERD), come si evince dalla discussione nella prima parte di questo lavoro

Crossref

MAnnheim DOCument Server

OpenEdition

Domain-specific Named Entity Disambiguation in Historical Memoirs

Author: Federico Nanni
Goy Annamaria
Rovera Marco
Simone Paolo Ponzetto
Publication venue: CEUR
Publication date: 01/01/2017
Field of study

Institutional Research Information System University of Turin

Entities as topic labels : combining entity linking and labeled LDA to improve topic interpretability and evaluability

Author: Lauscher Anne
Nanni Federico
Ponzetto Simone Paolo
Ruiz Fabo Pablo
Publication venue: Accademia University Press
Publication date: 01/01/2016
Field of study

Digital humanities scholars strongly need a corpus exploration method that provides topics easier to interpret than standard LDA topic models. To move towards this goal, here we propose a combination of two techniques, called Entity Linking and Labeled LDA. Our method identifies in an ontology a series of descriptive labels for each document in a corpus. Then it generates a specific topic for each label. Having a direct relation between topics and labels makes interpretation easier; using an ontology as background knowledge limits label ambiguity. As our topics are described with a limited number of clear-cut labels, they promote interpretability and support the quantitative evaluation of the obtained results. We illustrate the potential of the approach by applying it to three datasets, namely the transcription of speeches from the European Parliament fifth mandate, the Enron Corpus and the Hillary Clinton Email Dataset. While some of these resources have already been adopted by the natural language processing community, they still hold a large potential for humanities scholars, part of which could be exploited in studies that will adopt the fine-grained exploration method presented in this paper

Universität Mannheim: MADATA - Mannheim Research Data Repository

MAnnheim DOCument Server

UKParl: A data set for topic detection with semantically annotated text

Author: Cheng Yi-Ru
Dietz Laura
Nanni Federico
Osman Mahmoud
Ponzetto Simone Paolo
Publication venue: LREC
Publication date: 01/01/2018
Field of study

MAnnheim DOCument Server