10,140 research outputs found

    The "handedness" of language: Directional symmetry breaking of sign usage in words

    Full text link
    Language, which allows complex ideas to be communicated through symbolic sequences, is a characteristic feature of our species and manifested in a multitude of forms. Using large written corpora for many different languages and scripts, we show that the occurrence probability distributions of signs at the left and right ends of words have a distinct heterogeneous nature. Characterizing this asymmetry using quantitative inequality measures, viz. information entropy and the Gini index, we show that the beginning of a word is less restrictive in sign usage than the end. This property is not simply attributable to the use of common affixes as it is seen even when only word roots are considered. We use the existence of this asymmetry to infer the direction of writing in undeciphered inscriptions that agrees with the archaeological evidence. Unlike traditional investigations of phonotactic constraints which focus on language-specific patterns, our study reveals a property valid across languages and writing systems. As both language and writing are unique aspects of our species, this universal signature may reflect an innate feature of the human cognitive phenomenon.Comment: 10 pages, 4 figures + Supplementary Information (15 pages, 8 figures), final corrected versio

    Network analysis of a corpus of undeciphered Indus civilization inscriptions indicates syntactic organization

    Full text link
    Archaeological excavations in the sites of the Indus Valley civilization (2500-1900 BCE) in Pakistan and northwestern India have unearthed a large number of artifacts with inscriptions made up of hundreds of distinct signs. To date there is no generally accepted decipherment of these sign sequences and there have been suggestions that the signs could be non-linguistic. Here we apply complex network analysis techniques to a database of available Indus inscriptions, with the aim of detecting patterns indicative of syntactic organization. Our results show the presence of patterns, e.g., recursive structures in the segmentation trees of the sequences, that suggest the existence of a grammar underlying these inscriptions.Comment: 17 pages (includes 4 page appendix containing Indus sign list), 14 figure

    Indus Valley Civilization: Enigmatic, Exemplary, and Undeciphered

    Get PDF

    Language and Dialect Identification of Cuneiform Texts

    Full text link
    This article introduces a corpus of cuneiform texts from which the dataset for the use of the Cuneiform Language Identification (CLI) 2019 shared task was derived as well as some preliminary language identification experiments conducted using that corpus. We also describe the CLI dataset and how it was derived from the corpus. In addition, we provide some baseline language identification results using the CLI dataset. To the best of our knowledge, the experiments detailed here are the first time automatic language identification methods have been used on cuneiform data

    Statistical analysis of the tables in Mahadevan’s Concordance of the Indus Valley Script

    Get PDF
    NJQL-2017-0037R2The Indus Script originates from the culture known as the Indus Valley Civilization which flourished from approximately 2600 to 1900 BC. Several thousand objects bearing these signs have been found over a wide area of Northern India and Pakistan. In 1977 Iravatham Mahadevan published a concordance of all of the scripts that had been discovered so far. Accompanying the concordance are a set of 9 tables showing the distribution of individual signs by position, archaeological site, object type, field symbol (accompanying image), and direction of writing. Analysis of the frequencies of the signs found so far using Large Numbers of Rare Events (LNRE) models enabled the total vocabulary of the language, including signs not yet found, to be about 857. All the tables were analysed using Pearson’s residuals, and it was found that the signs were not randomly distributed, but some showed statistically significant associations with position, object, field symbol or direction of writing. A more detailed analysis of the relation between signs and field symbols was made using correspondence analysis, which showed that certain signs were associated with the unicorn symbol, while others were associated with the gharial and dotted circle symbols

    Statistical analysis of the Indus script using nn-grams

    Get PDF
    The Indus script is one of the major undeciphered scripts of the ancient world. The small size of the corpus, the absence of bilingual texts, and the lack of definite knowledge of the underlying language has frustrated efforts at decipherment since the discovery of the remains of the Indus civilisation. Recently, some researchers have questioned the premise that the Indus script encodes spoken language. Building on previous statistical approaches, we apply the tools of statistical language processing, specifically nn-gram Markov chains, to analyse the Indus script for syntax. Our main results are that the script has well-defined signs which begin and end texts, that there is directionality and strong correlations in the sign order, and that there are groups of signs which appear to have identical syntactic function. All these require no {\it a priori} suppositions regarding the syntactic or semantic content of the signs, but follow directly from the statistical analysis. Using information theoretic measures, we find the information in the script to be intermediate between that of a completely random and a completely fixed ordering of signs. Our study reveals that the Indus script is a structured sign system showing features of a formal language, but, at present, cannot conclusively establish that it encodes {\it natural} language. Our nn-gram Markov model is useful for predicting signs which are missing or illegible in a corpus of Indus texts. This work forms the basis for the development of a stochastic grammar which can be used to explore the syntax of the Indus script in greater detail

    Iravatham Mahadevan’s Reading of Indus Script: A Critical Review

    Get PDF
    This paper comprehensively summarizes, analyses, and reviews Iravatham Mahadevan’s attempts to decipher the Indus script. Spanning a period of over thirty five years, Iravatham Mahadevan made continuous attempts to interpret and decipher the Indus script. Mahadevan claimed to have adapted the method of parallels between the symbolic representation and the text, between the written object and its designation, between the written symbol itself and its meaning, and the similarity throughout the ancient East of certain portions of the inscriptions, with the assumption that the underlying language of the script is Dravidian. Mahadevan was very flexible in changing his views and finding new interpretations, and gradually he shifted his interpretation of Indus signs from being phonetic/logographic/word to ideographic, leaving unshaken his core personal hypothesis and belief in the Veḷier clan and Tamil cultural settings. While Mahadevan did not succeed in making a self-consistent system of readings applicable to a large number of discovered pieces of writings, he did make a determined, persistent effort to develop a Dravidian framework for deciphering of the Indus script. This study seeks to find weaknesses in the methodology and assumptions of Mahadevan and searches for possible alternatives within that framework

    Data Mining Ancient Script Image Data Using Convolutional Neural Networks

    Get PDF
    The recent surge in ancient scripts has resulted in huge image libraries of ancient texts. Data mining of the collected images enables the study of the evolution of these ancient scripts. In particular, the origin of the Indus Valley script is highly debated. We use convolutional neural networks to test which Phoenician alphabet letters and Brahmi symbols are closest to the Indus Valley script symbols. Surprisingly, our analysis shows that overall the Phoenician alphabet is much closer than the Brahmi script to the Indus Valley script symbols

    A method of identifying allographs in undeciphered scripts and its application to the Indus Valley Script

    Get PDF
    This work describes a general method of testing for redundancies in the sign lists of ancient scripts by data mining the positions of the signs within the inscriptions. The redundant signs are allographs of the same grapheme. The method is applied to the undeciphered Indus Valley Script, which stands out from other ancient scripts by having a large proposed sign list that contains dozens of asymmetric signs that have mirrored pairs. By a statistical analysis of mirrored asymmetric signs, this paper shows that the Indus Valley Script was multi-directional and the mirroring of signs often denotes only the direction of writing without any difference in meaning. For this and five other specific reasons listed in the paper, 50 pairs of signs, 23 mirrored, and 27 non-mirrored, can be grouped together because each pair consists of only insignificant variations of the same original sign. The reduced sign list may make decipherment easier in the future
    corecore