904 research outputs found

    Language and Dialect Identification of Cuneiform Texts

    Full text link
    This article introduces a corpus of cuneiform texts from which the dataset for the use of the Cuneiform Language Identification (CLI) 2019 shared task was derived as well as some preliminary language identification experiments conducted using that corpus. We also describe the CLI dataset and how it was derived from the corpus. In addition, we provide some baseline language identification results using the CLI dataset. To the best of our knowledge, the experiments detailed here are the first time automatic language identification methods have been used on cuneiform data

    Seven Dimensions of Portability for Language Documentation and Description

    Full text link
    The process of documenting and describing the world's languages is undergoing radical transformation with the rapid uptake of new digital technologies for capture, storage, annotation and dissemination. However, uncritical adoption of new tools and technologies is leading to resources that are difficult to reuse and which are less portable than the conventional printed resources they replace. We begin by reviewing current uses of software tools and digital technologies for language documentation and description. This sheds light on how digital language documentation and description are created and managed, leading to an analysis of seven portability problems under the following headings: content, format, discovery, access, citation, preservation and rights. After characterizing each problem we provide a series of value statements, and this provides the framework for a broad range of best practice recommendations.Comment: 8 page

    Towards a linked open data edition of Sumerian corpora

    Get PDF
    Linguistic Linked Open Data (LLOD) is a flourishing line of research in the language resource community, so far mostly adopted for selected aspects of linguistics, natural language processing and the semantic web, as well as for practical applications in localization and lexicography. Yet, computational philology seems to be somewhat decoupled from the recent progress in this area: even though LOD as a concept is gaining significant popularity in Digital Humanities, existing LLOD standards and vocabularies are not widely used in this community, and philological resources are underrepresented in the LLOD cloud diagram (http://linguistic-lod.org/llod-cloud). In this paper, we present an application of Linguistic Linked Open Data in Assyriology. We describe the LLOD edition of a linguistically annotated corpus of Sumerian, as well as its linking with lexical resources, repositories of annotation terminology, and the museum collections in which the artifacts bearing these texts are kept. The chosen corpus is the Electronic Text Corpus of Sumerian Royal Inscriptions, a well curated and linguistically annotated archive of Sumerian text, in preparation for the creating and linking of other corpora of cuneiform texts, such as the corpus of Ur III administrative and legal Sumerian texts, as part of the Machine Translation and Automated Analysis of Cuneiform Languages project (https://cdli-gh.github.io/mtaac/)

    Word segmentation for Akkadian cuneiform

    Get PDF
    We present experiments on word segmentation for Akkadian cuneiform, an ancient writing system and a language used for about 3 millennia in the ancient Near East. To our best knowledge, this is the first study of this kind applied to either the Akkadian language or the cuneiform writing system. As a logosyllabic writing system, cuneiform structurally resembles Eastern Asian writing systems, so, we employ word segmentation algorithms originally developed for Chinese and Japanese. We describe results of rule-based algorithms, dictionary-based algorithms, statistical and machine learning approaches. Our results may indicate possible promising steps in cuneiform word segmentation that can create and improve natural language processing in this area

    Complexity as a mechanism to reconstruct the urban pattern of the Iraqi marshes in the ancient city of Ur and marsh villages

    Get PDF
    This research deals with the characteristics of life complexity which was found by Christopher Alexander and used in the process of measuring and comparing to find the Common physical properties of the architectural shape between the ancient city of Ur and the marsh cottages, according to this, the measuring factors for these characteristics were developed through the table related to it to create the complex sample which eventually reflects the natural characteristics of the pattern language shared between Ur as urban dueling and the cottages of the marshes as a natural environment through the assumption that there is a similarity of the fractal scale between the physical blocks of Ur and the marsh village cottages due to the use of the same scale material shared between them. The measurement was the association of the Sumerian human scale with the reed plant's scale in the fractal triangular style and hexagonal fractal style and the factor (2.7mm) and according to the practical proofs and experiments. The difference in measurements between reed knots is equal to this number and around it. The architectural scales resulted from these measurements associated with civilizations, including Ur civilization's ancient city from the smallest scale to the smallest tool in it to the largest building used in the ziggurats. It was essentially a result of the development of complex environmental patterns. In the fine, the research has some conclusions and recommendations

    Research methodology in montanistic tourism

    Get PDF
    Research methodology in montanistic tourism involves the archival research and study of special literature, surface and underground field survey, the analysis of findings of rock fragments, mineral composition, traces of metallurgical processes, fragments of pottery, etc. A separate problem is the study and evaluation of the development of mining and post-mining landscapes, focusing on the entire supply chain of resource industries and their impact on the cultural development of the country

    Universal morphologies for the Caucasus region

    Get PDF
    The Caucasus region is famed for its rich and diverse arrays of languages and language families, often challenging European-centered views established in traditional linguistics. In this paper, we describe ongoing efforts to improve the coverage of Universal Morphologies for languages of the Caucasus region. The Universal Morphologies (UniMorph) are a recent community project aiming to complement the Universal Dependencies which focus on morphosyntax and syntax. We describe the development of UniMorph resources for Nakh-Daghestanian and Kartvelian languages as a well as for Classical Armenian, we discuss challenges that the complex morphology of these and related languages poses to the current design of UniMorph, and suggest possibilities to improve the applicability of UniMorph for languages of the Caucasus region in particular and for low resource languages in general. We also criticize the UniMorph TSV format for its limited expressiveness, and suggest to complement the existing UniMorph workflow with support for additional source formats on grounds of Linked Open Data technology

    Investigating Machine Learning Methods for Language and Dialect Identification of Cuneiform Texts

    Full text link
    Identification of the languages written using cuneiform symbols is a difficult task due to the lack of resources and the problem of tokenization. The Cuneiform Language Identification task in VarDial 2019 addresses the problem of identifying seven languages and dialects written in cuneiform; Sumerian and six dialects of Akkadian language: Old Babylonian, Middle Babylonian Peripheral, Standard Babylonian, Neo-Babylonian, Late Babylonian, and Neo-Assyrian. This paper describes the approaches taken by SharifCL team to this problem in VarDial 2019. The best result belongs to an ensemble of Support Vector Machines and a naive Bayes classifier, both working on character-level features, with macro-averaged F1-score of 72.10%
    corecore