13,430 research outputs found
Probing the topological properties of complex networks modeling short written texts
In recent years, graph theory has been widely employed to probe several
language properties. More specifically, the so-called word adjacency model has
been proven useful for tackling several practical problems, especially those
relying on textual stylistic analysis. The most common approach to treat texts
as networks has simply considered either large pieces of texts or entire books.
This approach has certainly worked well -- many informative discoveries have
been made this way -- but it raises an uncomfortable question: could there be
important topological patterns in small pieces of texts? To address this
problem, the topological properties of subtexts sampled from entire books was
probed. Statistical analyzes performed on a dataset comprising 50 novels
revealed that most of the traditional topological measurements are stable for
short subtexts. When the performance of the authorship recognition task was
analyzed, it was found that a proper sampling yields a discriminability similar
to the one found with full texts. Surprisingly, the support vector machine
classification based on the characterization of short texts outperformed the
one performed with entire books. These findings suggest that a local
topological analysis of large documents might improve its global
characterization. Most importantly, it was verified, as a proof of principle,
that short texts can be analyzed with the methods and concepts of complex
networks. As a consequence, the techniques described here can be extended in a
straightforward fashion to analyze texts as time-varying complex networks
A complex network approach to stylometry
Statistical methods have been widely employed to study the fundamental
properties of language. In recent years, methods from complex and dynamical
systems proved useful to create several language models. Despite the large
amount of studies devoted to represent texts with physical models, only a
limited number of studies have shown how the properties of the underlying
physical systems can be employed to improve the performance of natural language
processing tasks. In this paper, I address this problem by devising complex
networks methods that are able to improve the performance of current
statistical methods. Using a fuzzy classification strategy, I show that the
topological properties extracted from texts complement the traditional textual
description. In several cases, the performance obtained with hybrid approaches
outperformed the results obtained when only traditional or networked methods
were used. Because the proposed model is generic, the framework devised here
could be straightforwardly used to study similar textual applications where the
topology plays a pivotal role in the description of the interacting agents.Comment: PLoS ONE, 2015 (to appear
Efficient Invariant Features for Sensor Variability Compensation in Speaker Recognition
In this paper, we investigate the use of invariant features for speaker recognition. Owing to their characteristics, these features are introduced to cope with the difficult and challenging problem of sensor variability and the source of performance degradation inherent in speaker recognition systems. Our experiments show: (1) the effectiveness of these features in match cases; (2) the benefit of combining these features with the mel frequency cepstral coefficients to exploit their discrimination power under uncontrolled conditions (mismatch cases). Consequently, the proposed invariant features result in a performance improvement as demonstrated by a reduction in the equal error rate and the minimum decision cost function compared to the GMM-UBM speaker recognition systems based on MFCC features
Drawing Elena Ferrante's Profile. Workshop Proceedings, Padova, 7 September 2017
Elena Ferrante is an internationally acclaimed Italian novelist whose real identity has been kept secret by E/O publishing house for more than 25 years. Owing to her popularity, major Italian and foreign newspapers have long tried to discover her real identity. However, only a few attempts have been made to foster a scientific debate on her work.
In 2016, Arjuna Tuzzi and Michele Cortelazzo led an Italian research team that conducted a preliminary study and collected a well-founded, large corpus of Italian novels comprising 150 works published in the last 30 years by 40 different authors. Moreover, they shared their data with a select group of international experts on authorship attribution, profiling, and analysis of textual data: Maciej Eder and Jan Rybicki (Poland), Patrick Juola (United States), Vittorio Loreto and his research team, Margherita Lalli and Francesca Tria (Italy), George Mikros (Greece), Pierre Ratinaud (France), and Jacques Savoy (Switzerland).
The chapters of this volume report the results of this endeavour that were first presented during the international workshop Drawing Elena Ferrante's Profile in Padua on 7 September 2017 as part of the 3rd IQLA-GIAT Summer School in Quantitative Analysis of Textual Data. The fascinating research findings suggest that Elena Ferrante\u2019s work definitely deserves \u201cmany hands\u201d as well as an extensive effort to understand her distinct writing style and the reasons for her worldwide success
Stratified decision forests for accurate anatomical landmark localization in cardiac images
Accurate localization of anatomical landmarks is an important step in medical imaging, as it provides useful prior information for subsequent image analysis and acquisition methods. It is particularly useful for initialization of automatic image analysis tools (e.g. segmentation and registration) and detection of scan planes for automated image acquisition. Landmark localization has been commonly performed using learning based approaches, such as classifier and/or regressor models. However, trained models may not generalize well in heterogeneous datasets when the images contain large differences due to size, pose and shape variations of organs. To learn more data-adaptive and patient specific models, we propose a novel stratification based training model, and demonstrate its use in a decision forest. The proposed approach does not require any additional training information compared to the standard model training procedure and can be easily integrated into any decision tree framework. The proposed method is evaluated on 1080 3D highresolution and 90 multi-stack 2D cardiac cine MR images. The experiments show that the proposed method achieves state-of-theart landmark localization accuracy and outperforms standard regression and classification based approaches. Additionally, the proposed method is used in a multi-atlas segmentation to create a fully automatic segmentation pipeline, and the results show that it achieves state-of-the-art segmentation accuracy
- …