Search CORE

10,140 research outputs found

The "handedness" of language: Directional symmetry breaking of sign usage in words

Author: Ashraf Md Izhar
Sinha Sitabhra
Publication venue
Publication date: 01/01/2018
Field of study

Language, which allows complex ideas to be communicated through symbolic sequences, is a characteristic feature of our species and manifested in a multitude of forms. Using large written corpora for many different languages and scripts, we show that the occurrence probability distributions of signs at the left and right ends of words have a distinct heterogeneous nature. Characterizing this asymmetry using quantitative inequality measures, viz. information entropy and the Gini index, we show that the beginning of a word is less restrictive in sign usage than the end. This property is not simply attributable to the use of common affixes as it is seen even when only word roots are considered. We use the existence of this asymmetry to infer the direction of writing in undeciphered inscriptions that agrees with the archaeological evidence. Unlike traditional investigations of phonotactic constraints which focus on language-specific patterns, our study reveals a property valid across languages and writing systems. As both language and writing are unique aspects of our species, this universal signature may reflect an innate feature of the human cognitive phenomenon.Comment: 10 pages, 4 figures + Supplementary Information (15 pages, 8 figures), final corrected versio

arXiv.org e-Print Archive

Directory of Open Access Journals

Network analysis of a corpus of undeciphered Indus civilization inscriptions indicates syntactic organization

Author: Ashraf Md Izhar
Bagley
Baines
Bryan Kenneth Wells
Caldeira
Chadwick
Coe
Cooper
Coulmas
Dorogovtsev
Fairservis
Farmer
Ferrer i Cancho
Ferrer i Cancho
Garlaschelli
Gelb
Goody
Holme
Mahadevan
Marshall
Mehler
Motter
Newman
Parpola
Possehl
Radev
Raj Kumar Pan
Sampson
Sinha
Sitabhra Sinha
Trigger
Vitevitch
Yadav
Publication venue: 'Elsevier BV'
Publication date: 27/05/2010
Field of study

Archaeological excavations in the sites of the Indus Valley civilization (2500-1900 BCE) in Pakistan and northwestern India have unearthed a large number of artifacts with inscriptions made up of hundreds of distinct signs. To date there is no generally accepted decipherment of these sign sequences and there have been suggestions that the signs could be non-linguistic. Here we apply complex network analysis techniques to a database of available Indus inscriptions, with the aim of detecting patterns indicative of syntactic organization. Our results show the presence of patterns, e.g., recursive structures in the segmentation trees of the sequences, that suggest the existence of a grammar underlying these inscriptions.Comment: 17 pages (includes 4 page appendix containing Indus sign list), 14 figure

arXiv.org e-Print Archive

Crossref

Indus Valley Civilization: Enigmatic, Exemplary, and Undeciphered

Author: Javonillo Charise Joy
Publication venue: DigitalCommons@COD
Publication date: 01/04/2011
Field of study

[email protected].

Language and Dialect Identification of Cuneiform Texts

Author: Alstola Tero
Jauhiainen Heidi
Jauhiainen Tommi
Lindén Krister
Publication venue
Publication date: 01/01/2019
Field of study

This article introduces a corpus of cuneiform texts from which the dataset for the use of the Cuneiform Language Identification (CLI) 2019 shared task was derived as well as some preliminary language identification experiments conducted using that corpus. We also describe the CLI dataset and how it was derived from the corpus. In addition, we provide some baseline language identification results using the CLI dataset. To the best of our knowledge, the experiments detailed here are the first time automatic language identification methods have been used on cuneiform data

arXiv.org e-Print Archive

Crossref

Statistical analysis of the tables in Mahadevan’s Concordance of the Indus Valley Script

Author: Oakes Michael
Publication venue: 'Informa UK Limited'
Publication date: 06/12/2017
Field of study

NJQL-2017-0037R2The Indus Script originates from the culture known as the Indus Valley Civilization which flourished from approximately 2600 to 1900 BC. Several thousand objects bearing these signs have been found over a wide area of Northern India and Pakistan. In 1977 Iravatham Mahadevan published a concordance of all of the scripts that had been discovered so far. Accompanying the concordance are a set of 9 tables showing the distribution of individual signs by position, archaeological site, object type, field symbol (accompanying image), and direction of writing. Analysis of the frequencies of the signs found so far using Large Numbers of Rare Events (LNRE) models enabled the total vocabulary of the language, including signs not yet found, to be about 857. All the tables were analysed using Pearson’s residuals, and it was found that the signs were not randomly distributed, but some showed statistically significant associations with position, object, field symbol or direction of writing. A more detailed analysis of the relation between signs and field symbols was made using correspondence analysis, which showed that certain signs were associated with the unicorn symbol, while others were associated with the gharial and dotted circle symbols

Crossref

Wolverhampton Intellectual Repository and E-theses

Statistical analysis of the Indus script using $n$ -grams

Author: Fabio Rapallo
Hrishikesh Joglekar
Iravatham Mahadevan
Mayank N. Vahia
Nisha Yadav
Rajesh P. N. Rao
Ronojoy Adhikari
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 20/01/2009
Field of study

The Indus script is one of the major undeciphered scripts of the ancient world. The small size of the corpus, the absence of bilingual texts, and the lack of definite knowledge of the underlying language has frustrated efforts at decipherment since the discovery of the remains of the Indus civilisation. Recently, some researchers have questioned the premise that the Indus script encodes spoken language. Building on previous statistical approaches, we apply the tools of statistical language processing, specifically

n

-gram Markov chains, to analyse the Indus script for syntax. Our main results are that the script has well-defined signs which begin and end texts, that there is directionality and strong correlations in the sign order, and that there are groups of signs which appear to have identical syntactic function. All these require no {\it a priori} suppositions regarding the syntactic or semantic content of the signs, but follow directly from the statistical analysis. Using information theoretic measures, we find the information in the script to be intermediate between that of a completely random and a completely fixed ordering of signs. Our study reveals that the Indus script is a structured sign system showing features of a formal language, but, at present, cannot conclusively establish that it encodes {\it natural} language. Our

n

-gram Markov model is useful for predicting signs which are missing or illegible in a corpus of Indus texts. This work forms the basis for the development of a stochastic grammar which can be used to explore the syntax of the Indus script in greater detail

arXiv.org e-Print Archive

Public Library of Science (PLOS)

CiteSeerX

Crossref

Directory of Open Access Journals

PubMed Central

Iravatham Mahadevan’s Reading of Indus Script: A Critical Review

Author: C Jyothibabu
Publication venue: 'Studia Orientalia Electronica'
Publication date: 02/05/2023
Field of study

This paper comprehensively summarizes, analyses, and reviews Iravatham Mahadevan’s attempts to decipher the Indus script. Spanning a period of over thirty five years, Iravatham Mahadevan made continuous attempts to interpret and decipher the Indus script. Mahadevan claimed to have adapted the method of parallels between the symbolic representation and the text, between the written object and its designation, between the written symbol itself and its meaning, and the similarity throughout the ancient East of certain portions of the inscriptions, with the assumption that the underlying language of the script is Dravidian. Mahadevan was very flexible in changing his views and finding new interpretations, and gradually he shifted his interpretation of Indus signs from being phonetic/logographic/word to ideographic, leaving unshaken his core personal hypothesis and belief in the Veḷier clan and Tamil cultural settings. While Mahadevan did not succeed in making a self-consistent system of readings applicable to a large number of discovered pieces of writings, he did make a determined, persistent effort to develop a Dravidian framework for deciphering of the Indus script. This study seeks to find weaknesses in the methodology and assumptions of Mahadevan and searches for possible alternatives within that framework

Journal.fi

Data Mining Ancient Script Image Data Using Convolutional Neural Networks

Author: Daggumati Shruti
Revesz Peter
Publication venue: DigitalCommons@University of Nebraska - Lincoln
Publication date: 01/01/2018
Field of study

The recent surge in ancient scripts has resulted in huge image libraries of ancient texts. Data mining of the collected images enables the study of the evolution of these ancient scripts. In particular, the origin of the Indus Valley script is highly debated. We use convolutional neural networks to test which Phoenician alphabet letters and Brahmi symbols are closest to the Indus Valley script symbols. Surprisingly, our analysis shows that overall the Phoenician alphabet is much closer than the Brahmi script to the Indus Valley script symbols

Crossref

DigitalCommons@University of Nebraska

A method of identifying allographs in undeciphered scripts and its application to the Indus Valley Script

Author: Daggumati Shruti
Revesz Peter
Publication venue: DigitalCommons@University of Nebraska - Lincoln
Publication date: 01/01/2021
Field of study

This work describes a general method of testing for redundancies in the sign lists of ancient scripts by data mining the positions of the signs within the inscriptions. The redundant signs are allographs of the same grapheme. The method is applied to the undeciphered Indus Valley Script, which stands out from other ancient scripts by having a large proposed sign list that contains dozens of asymmetric signs that have mirrored pairs. By a statistical analysis of mirrored asymmetric signs, this paper shows that the Indus Valley Script was multi-directional and the mirroring of signs often denotes only the direction of writing without any difference in meaning. For this and five other specific reasons listed in the paper, 50 pairs of signs, 23 mirrored, and 27 non-mirrored, can be grouped together because each pair consists of only insignificant variations of the same original sign. The reduced sign list may make decipherment easier in the future

DigitalCommons@University of Nebraska