8,825 research outputs found
A complex network approach to stylometry
Statistical methods have been widely employed to study the fundamental
properties of language. In recent years, methods from complex and dynamical
systems proved useful to create several language models. Despite the large
amount of studies devoted to represent texts with physical models, only a
limited number of studies have shown how the properties of the underlying
physical systems can be employed to improve the performance of natural language
processing tasks. In this paper, I address this problem by devising complex
networks methods that are able to improve the performance of current
statistical methods. Using a fuzzy classification strategy, I show that the
topological properties extracted from texts complement the traditional textual
description. In several cases, the performance obtained with hybrid approaches
outperformed the results obtained when only traditional or networked methods
were used. Because the proposed model is generic, the framework devised here
could be straightforwardly used to study similar textual applications where the
topology plays a pivotal role in the description of the interacting agents.Comment: PLoS ONE, 2015 (to appear
The State of the Art in Cartograms
Cartograms combine statistical and geographical information in thematic maps,
where areas of geographical regions (e.g., countries, states) are scaled in
proportion to some statistic (e.g., population, income). Cartograms make it
possible to gain insight into patterns and trends in the world around us and
have been very popular visualizations for geo-referenced data for over a
century. This work surveys cartogram research in visualization, cartography and
geometry, covering a broad spectrum of different cartogram types: from the
traditional rectangular and table cartograms, to Dorling and diffusion
cartograms. A particular focus is the study of the major cartogram dimensions:
statistical accuracy, geographical accuracy, and topological accuracy. We
review the history of cartograms, describe the algorithms for generating them,
and consider task taxonomies. We also review quantitative and qualitative
evaluations, and we use these to arrive at design guidelines and research
challenges
The structure of verbal sequences analyzed with unsupervised learning techniques
Data mining allows the exploration of sequences of phenomena, whereas one
usually tends to focus on isolated phenomena or on the relation between two
phenomena. It offers invaluable tools for theoretical analyses and exploration
of the structure of sentences, texts, dialogues, and speech. We report here the
results of an attempt at using it for inspecting sequences of verbs from French
accounts of road accidents. This analysis comes from an original approach of
unsupervised training allowing the discovery of the structure of sequential
data. The entries of the analyzer were only made of the verbs appearing in the
sentences. It provided a classification of the links between two successive
verbs into four distinct clusters, allowing thus text segmentation. We give
here an interpretation of these clusters by applying a statistical analysis to
independent semantic annotations
- âŠ