Search CORE

180 research outputs found

Word segmentation for Akkadian cuneiform

Author: Chiarcos Christian
Homburg Timo
Publication venue
Publication date: 27/04/2023
Field of study

We present experiments on word segmentation for Akkadian cuneiform, an ancient writing system and a language used for about 3 millennia in the ancient Near East. To our best knowledge, this is the first study of this kind applied to either the Akkadian language or the cuneiform writing system. As a logosyllabic writing system, cuneiform structurally resembles Eastern Asian writing systems, so, we employ word segmentation algorithms originally developed for Chinese and Japanese. We describe results of rule-based algorithms, dictionary-based algorithms, statistical and machine learning approaches. Our results may indicate possible promising steps in cuneiform word segmentation that can create and improve natural language processing in this area

OPUS Augsburg

Language and Dialect Identification of Cuneiform Texts

Author: Alstola Tero
Jauhiainen Heidi
Jauhiainen Tommi
Lindén Krister
Publication venue
Publication date: 01/01/2019
Field of study

This article introduces a corpus of cuneiform texts from which the dataset for the use of the Cuneiform Language Identification (CLI) 2019 shared task was derived as well as some preliminary language identification experiments conducted using that corpus. We also describe the CLI dataset and how it was derived from the corpus. In addition, we provide some baseline language identification results using the CLI dataset. To the best of our knowledge, the experiments detailed here are the first time automatic language identification methods have been used on cuneiform data

arXiv.org e-Print Archive

Crossref

Language and Dialect Identification of Cuneiform Texts

Author: Alstola Tero
Jauhiainen Heidi Annika
Jauhiainen Tommi Sakari
Linden Bo Krister Johan
Publication venue: The Association for Computational Linguistics
Publication date: 30/04/2019
Field of study

Peer reviewe

Helsingin yliopiston digitaalinen arkisto

Machine learning for ancient languages: a survey

Author: Androutsopoulos Ion
Assael Yannis
Bodel John
Dyer Chris
Freitas Nando de
Pavlopoulos John
Prag Jonathan
Senior Andrew
Sommerschield Thea
Stefanak Vanessa
Publication venue: MIT Press
Publication date: 10/08/2023
Field of study

Ancient languages preserve the cultures and histories of the past. However, their study is fraught with difficulties, and experts must tackle a range of challenging text-based tasks, from deciphering lost languages to restoring damaged inscriptions, to determining the authorship of works of literature. Technological aids have long supported the study of ancient texts, but in recent years advances in artificial intelligence and machine learning have enabled analyses on a scale and in a detail that are reshaping the field of humanities, similarly to how microscopes and telescopes have contributed to the realm of science. This article aims to provide a comprehensive survey of published research using machine learning for the study of ancient texts written in any language, script, and medium, spanning over three and a half millennia of civilizations around the ancient world. To analyze the relevant literature, we introduce a taxonomy of tasks inspired by the steps involved in the study of ancient documents: digitization, restoration, attribution, linguistic analysis, textual criticism, translation, and decipherment. This work offers three major contributions: first, mapping the interdisciplinary field carved out by the synergy between the humanities and machine learning; second, highlighting how active collaboration between specialists from both fields is key to producing impactful and compelling scholarship; third, highlighting promising directions for future work in this field. Thus, this work promotes and supports the continued collaborative impetus between the humanities and machine learning

Oxford University Research Archive

Restoration of Fragmentary Babylonian Texts Using Recurrent Neural Networks

Author: Aaron Elad
Fetaya Ethan
Gordin Shai
Lifshitz Yonatan
Publication venue
Publication date: 04/03/2020
Field of study

The main source of information regarding ancient Mesopotamian history and culture are clay cuneiform tablets. Despite being an invaluable resource, many tablets are fragmented leading to missing information. Currently these missing parts are manually completed by experts. In this work we investigate the possibility of assisting scholars and even automatically completing the breaks in ancient Akkadian texts from Achaemenid period Babylonia by modelling the language using recurrent neural networks

arXiv.org e-Print Archive

Text segmentation for analysing different languages

Author: Pak Irina *
Teh Phoey Lee *
Publication venue
Publication date: 11/11/2016
Field of study

Over the past several years, researchers have applied different methods of text segmentation. Text segmentation is defined as a method of splitting a document into smaller segments, assuming with its own relevant meaning. Those segments can be classified into the tag, word, sentence, topic, phrase and any information unit. Firstly, this study reviews the different types of text segmentation methods used in different types of documentation, and later discusses the various reasons for utilizing it in opinion mining. The main contribution of this study includes a summarisation of research papers from the past 10 years that applied text segmentation as their main approach in text analysing. Results show that word segmentation was successfully and widely used for processing different languages

Crossref

Sunway Institutional Repository

Chapter 26 Language technology approach to “seeing” in Akkadian

Author: Sahala Aleksi
Svärd Saana
Publication venue: 'Informa UK Limited'
Publication date: 04/02/2022
Field of study

One of the ways meanings of words can be understood is based on their distributional properties. Such methodology offers an interesting quantitative viewpoint on the study of the lexicography of long-extinct languages. This chapter explores the use of Pointwise Mutual Information (PMI), a well-known statistical word association measure used in collocation analysis. PMI is applied to the data in order to gain insights on the semantic nuances of Akkadian verbs of seeing (amāru, naṭālu, palāsu, dagālu, ḫiātu, barû, and subbû). To evaluate the data-driven results, the findings are compared to previous philological work by Ainsley Dicks. The analysis of the top-ranked PMI-extracted collocates provides a good overview of the typical semantic differences between the seven verbs of interest

Directory of Open Access Books (DOAB)

Language technology approach to “seeing” in Akkadian

Author: Sahala Aleksi
Svärd Saana
Publication venue: Routledge, Taylor & Francis
Publication date: 30/09/2021
Field of study

Peer reviewe

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Helsingin yliopiston digitaalinen arkisto

From Sherds of Pottery to Open Egyptological Data

Author: Jauhiainen Heidi
Publication venue
Publication date: 02/12/2022
Field of study

Peer reviewe

Helsingin yliopiston digitaalinen arkisto

Investigating Machine Learning Methods for Language and Dialect Identification of Cuneiform Texts

Author: Doostmohammadi Ehsan
Nassajian Minoo
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2019
Field of study

Identification of the languages written using cuneiform symbols is a difficult task due to the lack of resources and the problem of tokenization. The Cuneiform Language Identification task in VarDial 2019 addresses the problem of identifying seven languages and dialects written in cuneiform; Sumerian and six dialects of Akkadian language: Old Babylonian, Middle Babylonian Peripheral, Standard Babylonian, Neo-Babylonian, Late Babylonian, and Neo-Assyrian. This paper describes the approaches taken by SharifCL team to this problem in VarDial 2019. The best result belongs to an ensemble of Support Vector Machines and a naive Bayes classifier, both working on character-level features, with macro-averaged F1-score of 72.10%

arXiv.org e-Print Archive

Crossref