Search CORE

11 research outputs found

Event-based Access to Historical Italian War Memoirs

Author: Nanni Federico
Ponzetto Simone Paolo
Rovera Marco
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2021
Field of study

The progressive digitization of historical archives provides new, often domain specific, textual resources that report on facts and events which have happened in the past; among these, memoirs are a very common type of primary source. In this paper, we present an approach for extracting information from Italian historical war memoirs and turning it into structured knowledge. This is based on the semantic notions of events, participants and roles. We evaluate quantitatively each of the key-steps of our approach and provide a graph-based representation of the extracted knowledge, which allows to move between a Close and a Distant Reading of the collection.Comment: 23 pages, 6 figure

arXiv.org e-Print Archive

MAnnheim DOCument Server

Exploring The Impact of Stemming on Text Topic-Based Classification Accuracy

Author: Ahmed Refat
Publication venue: CV. Rustam
Publication date: 30/06/2024
Field of study

Text classification attempts to assign written texts to specific group types that share the same linguistic features. One class of features that have been widely employed for a wide range of classification tasks is lexical features. This study explores the impact of stemming on text classification using lexical features. To explore, this study is based on a corpus of thirty texts written by six authors with topics that focus on politics, history, science, prose, sport, and food. These texts are stemmed using a light stemming algorithm. In order to classify these texts according to the topic by means of lexical features, linear hierarchical clustering and non-linear clustering (SOM) is carried out on the stemmed and unstemmed texts. Although both clustering methods are able to classify texts by topic with two models produce accurate and stable results, the results suggest that the impact of a light stemming on the accuracy of text classification by topic is ineffectual. The accuracy is neither increased nor decreased on the stemmed texts, whereby the stemming algorithm helped reducing the dimensionality of feature vector space model

Journal of Linguistics, Culture and Communication

experiments with literature in Portuguese

Author: Alves Daniel
Santos Diana
Publication venue
Publication date: 01/01/2023
Field of study

UIDB/04209/2020 UIDP/04209/2020In this case study we discuss different approaches to the study of literature in digital humanities and try to join two methodologies, namely distant reading and spatial analysis. We first describe shortly the two projects involved, the Atlas of Literary Landscapes of Mainland Portugal and Literateca, highlighting and quantifying the different ways to deal with place in literature in Portuguese. Then we describe some different paths to compare and harmonize the two approaches, focusing on annotation, extraction and geocoding of place names.authorsversionpublishe

Repositório da Universidade Nova de Lisboa

Event-based Access to Historical Italian War Memoirs

Author: Rovera M
Publication venue
Publication date: 01/01/2021
Field of study

Institutional Research Information System University of Turin

Futuro risonho: prolegómenos para uma colaboração entre a Linguateca e o NuPILL

Author: Santos Diana
Publication venue: Biblioteca Universitária da UFSC
Publication date: 01/01/2022
Field of study

info:eu-repo/semantics/publishedVersio

Repositório Comum

Evaluating named entity recognition tools for extracting social networks from novels

Author: Marieke van Erp
Niels Dekker
Tobias Kuhn
Publication venue: 'PeerJ'
Publication date: 01/04/2019
Field of study

The analysis of literary works has experienced a surge in computer-assisted processing. To obtain insights into the community structures and social interactions portrayed in novels, the creation of social networks from novels has gained popularity. Many methods rely on identifying named entities and relations for the construction of these networks, but many of these tools are not specifically created for the literary domain. Furthermore, many of the studies on information extraction from literature typically focus on 19th and early 20th century source material. Because of this, it is unclear if these techniques are as suitable to modern-day literature as they are to those older novels. We present a study in which we evaluate natural language processing tools for the automatic extraction of social networks from novels as well as their network structure. We find that there are no significant differences between old and modern novels but that both are subject to a large amount of variance. Furthermore, we identify several issues that complicate named entity recognition in our set of novels and we present methods to remedy these. We see this work as a step in creating more culturally-aware AI systems

VU Research Portal

Directory of Open Access Journals

Generación de resúmenes audivisuales a partir de obras literarias utilizando análisis de emociones

Author: Milón Flores Daniela Fernanda
Publication venue: Universidad Católica San Pablo
Publication date: 01/01/2019
Field of study

La lectura de obras literarias es una actividad esencial para la comunicación y el aprendizaje humano. Sin embargo, varias tareas relevantes como la selección, el filtrado o el análisis en un gran número de obras se vuelven complejas. Para hacer frente a este requisito, se proponen varias estrategias para inspeccionar rápidamente cantidades sustanciales de texto, o recuperar información previamente leída, explotando los datos gráficos, textuales o auditivos. En este trabajo, proponemos una metodología para generar resúmenes audiovisuales mediante la combinación de una composición musical basada en emociones y una animación basada en grafos. Aplicamos algoritmos de procesamiento de lenguaje natural para extraer emociones y personajes involucrados en la obra literaria. Luego, utilizamos la información extraída para componer una pieza musical que acompaña la narración visual de la historia con el objetivo de transmitir la emoción extraída. Para ello, fijamos características musicales importantes como progresión de acordes, tempo, escala y octavas, y asignamos un conjunto de instrumentos que se adapte mejor a cada emoción. Además, animamos un grafo para resumir los diálogos entre los personajes de la obra. Finalmente, para evaluar la calidad de nuestra metodología, realizamos dos estudios con usuarios que revelan que nuestra propuesta proporciona un alto nivel de comprensión sobre el contenido de la obra literaria además de aportar una experiencia agradable al usuario.Tesi

Registro Nacional de Trabajos de Investigación y Proyectos

Repositorio Institucional Universidad Católica San Pablo

Multimodal representation learning with neural networks

Author: Arevalo Ovalle John Edilson
Publication venue
Publication date: 01/01/2018
Field of study

Abstract: Representation learning methods have received a lot of attention by researchers and practitioners because of their successful application to complex problems in areas such as computer vision, speech recognition and text processing [1]. Many of these promising results are due to the development of methods to automatically learn the representation of complex objects directly from large amounts of sample data [2]. These efforts have concentrated on data involving one type of information (images, text, speech, etc.), despite data being naturally multimodal. Multimodality refers to the fact that the same real-world concept can be described by different views or data types. Addressing multimodal automatic analysis faces three main challenges: feature learning and extraction, modeling of relationships between data modalities and scalability to large multimodal collections [3, 4]. This research considers the problem of leveraging multiple sources of information or data modalities in neural networks. It defines a novel model called gated multimodal unit (GMU), designed as an internal unit in a neural network architecture whose purpose is to find an intermediate representation based on a combination of data from different modalities. The GMU learns to decide how modalities influence the activation of the unit using multiplicative gates. The GMU can be used as a building block for different kinds of neural networks and can be seen as a form of intermediate fusion. The model was evaluated on four supervised learning tasks in conjunction with fully-connected and convolutional neural networks. We compare the GMU with other early and late fusion methods, outperforming classification scores in the evaluated datasets. Strategies to understand how the model gives importance to each input were also explored. By measuring correlation between gate activations and predictions, we were able to associate modalities with classes. It was found that some classes were more correlated with some particular modality. Interesting findings in genre prediction show, for instance, that the model associates the visual information with animation movies while textual information is more associated with drama or romance movies. During the development of this project, three new benchmark datasets were built and publicly released. The BCDR-F03 dataset which contains 736 mammography images and serves as benchmark for mass lesion classification. The MM-IMDb dataset containing around 27000 movie plots, poster along with 50 metadata annotations and that motivates new research in multimodal analysis. And the Goodreads dataset, a collection of 1000 books that encourages the research on success prediction based on the book content. This research also facilitates reproducibility of the present work by releasing source code implementation of the proposed methods.Doctorad

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Universidad Nacional De Colombia - Repositorio Institucional UN

Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020

Author
Publication venue: 'OpenEdition'
Publication date: 01/07/2022
Field of study

On behalf of the Program Committee, a very warm welcome to the Seventh Italian Conference on Computational Linguistics (CLiC-it 2020). This edition of the conference is held in Bologna and organised by the University of Bologna. The CLiC-it conference series is an initiative of the Italian Association for Computational Linguistics (AILC) which, after six years of activity, has clearly established itself as the premier national forum for research and development in the fields of Computational Linguistics and Natural Language Processing, where leading researchers and practitioners from academia and industry meet to share their research results, experiences, and challenges

Directory of Open Access Books (DOAB)