22 research outputs found
Probing the Statistical Properties of Unknown Texts: Application to the Voynich Manuscript
While the use of statistical physics methods to analyze large corpora has been useful to unveil many patterns in texts, no comprehensive investigation has been performed on the interdependence between syntactic and semantic factors. In this study we propose a framework for determining whether a text (e.g., written in an unknown alphabet) is compatible with a natural language and to which language it could belong. The approach is based on three types of statistical measurements, i.e. obtained from first-order statistics of word properties in a text, from the topology of complex networks representing texts, and from intermittency concepts where text is treated as a time series. Comparative experiments were performed with the New Testament in 15 different languages and with distinct books in English and Portuguese in order to quantify the dependency of the different measurements on the language and on the story being told in the book. The metrics found to be informative in distinguishing real texts from their shuffled versions include assortativity, degree and selectivity of words. As an illustration, we analyze an undeciphered medieval manuscript known as the Voynich Manuscript. We show that it is mostly compatible with natural languages and incompatible with random texts. We also obtain candidates for keywords of the Voynich Manuscript which could be helpful in the effort of deciphering it. Because we were able to identify statistical measurements that are more dependent on the syntax than on the semantics, the framework may also serve for text analysis in language-dependent applications
Strong correlations between text quality and complex networks features
Concepts of complex networks have been used to obtain metrics that were
correlated to text quality established by scores assigned by human judges.
Texts produced by high-school students in Portuguese were represented as
scale-free networks (word adjacency model), from which typical network features
such as the in/outdegree, clustering coefficient and shortest path were
obtained. Another metric was derived from the dynamics of the network growth,
based on the variation of the number of connected components. The scores
assigned by the human judges according to three text quality criteria
(coherence and cohesion, adherence to standard writing conventions and theme
adequacy/development) were correlated with the network measurements. Text
quality for all three criteria was found to decrease with increasing average
values of outdegrees, clustering coefficient and deviation from the dynamics of
network growth. Among the criteria employed, cohesion and coherence showed the
strongest correlation, which probably indicates that the network measurements
are able to capture how the text is developed in terms of the concepts
represented by the nodes in the networks. Though based on a particular set of
texts and specific language, the results presented here point to potential
applications in other instances of text analysis.Comment: 8 pages, 8 figure
Energy Transfer in Nanostructured Films Containing Poly(p-phenylene vinylene) and Acceptor Species
The combination of luminescent polymers and suitable energy-accepting materials may lead to a molecular-level control of luminescence in nanostructured films. In this study, the properties of layer-by-layer (LbL) films of poly(p-phenylene vinylene) (PPV) were investigated with steady-state and time-resolved fluorescence spectroscopies, where fluorescence quenching was controlled by interposing inert polyelectrolyte layers between the PPV donor and acceptor layers made with either Congo Red (CR) or nickel tetrasulfonated phthalocyanine (NiTsPc). The dynamics of the excited state of PPV was affected by the energy-accepting layers, thus confirming the presence of resonant energy transfer mechanisms. Owing to the layered structured of both energy donor and acceptor units, energy transfer varied with the distance between layers, r, according to 1/rn with n = 2 or 3, rather than with 1/r6 predicted by the Förster theory for interacting point dipoles