Language Trees and Zipping

Benedetto, Dario; Caglioti, Emanuele; Loreto, Vittorio

research

Language Trees and Zipping

Authors: Dario Benedetto
Emanuele Caglioti
Vittorio Loreto
Publication date: 19 December 2001
Publisher: 'American Physical Society (APS)'
Doi

Abstract

In this letter we present a very general method to extract information from a generic string of characters, e.g. a text, a DNA sequence or a time series. Based on data-compression techniques, its key point is the computation of a suitable measure of the remoteness of two bodies of knowledge. We present the implementation of the method to linguistic motivated problems, featuring highly accurate results for language recognition, authorship attribution and language classification.Comment: 5 pages, RevTeX4, 1 eps figure. In press in Phys. Rev. Lett. (January 2002

Similar works

Full text

Open in the Core reader

Download PDF

Available Versions

Archivio della ricerca- Università di Roma La Sapienza

oai:iris.uniroma1.it:11573/251...

Last time updated on 12/11/2016

Crossref

info:doi/10.1103%2Fphysrevlett...

Last time updated on 01/04/2019