21 research outputs found

    Probing the Statistical Properties of Unknown Texts: Application to the Voynich Manuscript

    Get PDF
    While the use of statistical physics methods to analyze large corpora has been useful to unveil many patterns in texts, no comprehensive investigation has been performed on the interdependence between syntactic and semantic factors. In this study we propose a framework for determining whether a text (e.g., written in an unknown alphabet) is compatible with a natural language and to which language it could belong. The approach is based on three types of statistical measurements, i.e. obtained from first-order statistics of word properties in a text, from the topology of complex networks representing texts, and from intermittency concepts where text is treated as a time series. Comparative experiments were performed with the New Testament in 15 different languages and with distinct books in English and Portuguese in order to quantify the dependency of the different measurements on the language and on the story being told in the book. The metrics found to be informative in distinguishing real texts from their shuffled versions include assortativity, degree and selectivity of words. As an illustration, we analyze an undeciphered medieval manuscript known as the Voynich Manuscript. We show that it is mostly compatible with natural languages and incompatible with random texts. We also obtain candidates for keywords of the Voynich Manuscript which could be helpful in the effort of deciphering it. Because we were able to identify statistical measurements that are more dependent on the syntax than on the semantics, the framework may also serve for text analysis in language-dependent applications

    Strong correlations between text quality and complex networks features

    Full text link
    Concepts of complex networks have been used to obtain metrics that were correlated to text quality established by scores assigned by human judges. Texts produced by high-school students in Portuguese were represented as scale-free networks (word adjacency model), from which typical network features such as the in/outdegree, clustering coefficient and shortest path were obtained. Another metric was derived from the dynamics of the network growth, based on the variation of the number of connected components. The scores assigned by the human judges according to three text quality criteria (coherence and cohesion, adherence to standard writing conventions and theme adequacy/development) were correlated with the network measurements. Text quality for all three criteria was found to decrease with increasing average values of outdegrees, clustering coefficient and deviation from the dynamics of network growth. Among the criteria employed, cohesion and coherence showed the strongest correlation, which probably indicates that the network measurements are able to capture how the text is developed in terms of the concepts represented by the nodes in the networks. Though based on a particular set of texts and specific language, the results presented here point to potential applications in other instances of text analysis.Comment: 8 pages, 8 figure

    Energy Transfer in Nanostructured Films Containing Poly(p-phenylene vinylene) and Acceptor Species

    No full text
    The combination of luminescent polymers and suitable energy-accepting materials may lead to a molecular-level control of luminescence in nanostructured films. In this study, the properties of layer-by-layer (LbL) films of poly(p-phenylene vinylene) (PPV) were investigated with steady-state and time-resolved fluorescence spectroscopies, where fluorescence quenching was controlled by interposing inert polyelectrolyte layers between the PPV donor and acceptor layers made with either Congo Red (CR) or nickel tetrasulfonated phthalocyanine (NiTsPc). The dynamics of the excited state of PPV was affected by the energy-accepting layers, thus confirming the presence of resonant energy transfer mechanisms. Owing to the layered structured of both energy donor and acceptor units, energy transfer varied with the distance between layers, r, according to 1/rn with n = 2 or 3, rather than with 1/r6 predicted by the Förster theory for interacting point dipoles
    corecore