Search CORE

90 research outputs found

New functions and updates of the resource DiACL - Diachronic Atlas of Compartive Linguistics

Author: Carling Gerd
Larsson Filip
Lundgren Olof
Nilsson Linus
Verhoeven Rob
Publication venue: Pavia University Press
Publication date: 01/01/2021
Field of study

Lund University Publications

Compression-based Parts-of-Speech Tagger for the Arabic Language

Author: Alkhazi Ibrahim
Publication venue
Publication date: 18/12/2019
Field of study

Bangor University Research Portal

Converting Italian Treebanks: Towards an Italian Stanford Dependency Treebank

Author: Bosco Cristina
Maria Simi
Simonetta Montemagni
Publication venue: The Association for Computational Linguistics
Publication date: 01/01/2013
Field of study

The paper addresses the challenge of converting MIDT, an existing dependencybased Italian treebank resulting from the harmonization and merging of smaller resources, into the Stanford Dependencies annotation formalism, with the final aim of constructing a standard–compliant resource for the Italian language. Achieved results include a methodology for converting treebank annotations belonging to the same dependency–based family, the Italian Stanford Dependency Treebank (ISDT), and an Italian localization of the Stanford Dependency scheme

Archivio della Ricerca - Università di Pisa

Institutional Research Information System University of Turin

Color Aesthetics and Social Networks in Complete Tang Poems: Explorations and Discoveries

Author: Cheng Wen-Huei
Chiu Wei-Yun
Hsu Chu-Ting
Liu Chao-Lin
Wang Hongsu
Publication venue
Publication date: 01/01/2015
Field of study

Waseda University Repository

Building the essential resources for Finnish: the Turku Dependency Treebank

Author: Ginter Filip
Haverinen Katri
Kohonen Samuel
Laippala Veronika
Missilä Anna
Nyblom Jenna
Ojala Stina
Salakoski Tapio
Viljanen Timo
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 28/10/2022
Field of study

UTUPub

Statistical Parsing by Machine Learning from a Classical Arabic Treebank

Author: Dukes Kais
Publication venue: University of Leeds
Publication date: 01/09/2013
Field of study

Research into statistical parsing for English has enjoyed over a decade of successful results. However, adapting these models to other languages has met with difficulties. Previous comparative work has shown that Modern Arabic is one of the most difficult languages to parse due to rich morphology and free word order. Classical Arabic is the ancient form of Arabic, and is understudied in computational linguistics, relative to its worldwide reach as the language of the Quran. The thesis is based on seven publications that make significant contributions to knowledge relating to annotating and parsing Classical Arabic. Classical Arabic has been studied in depth by grammarians for over a thousand years using a traditional grammar known as i’rāb (إعغاة ). Using this grammar to develop a representation for parsing is challenging, as it describes syntax using a hybrid of phrase-structure and dependency relations. This work aims to advance the state-of-the-art for hybrid parsing by introducing a formal representation for annotation and a resource for machine learning. The main contributions are the first treebank for Classical Arabic and the first statistical dependency-based parser in any language for ellipsis, dropped pronouns and hybrid representations. A central argument of this thesis is that using a hybrid representation closely aligned to traditional grammar leads to improved parsing for Arabic. To test this hypothesis, two approaches are compared. As a reference, a pure dependency parser is adapted using graph transformations, resulting in an 87.47% F1-score. This is compared to an integrated parsing model with an F1-score of 89.03%, demonstrating that joint dependency-constituency parsing is better suited to Classical Arabic. The Quran was chosen for annotation as a large body of work exists providing detailed syntactic analysis. Volunteer crowdsourcing is used for annotation in combination with expert supervision. A practical result of the annotation effort is the corpus website: http://corpus.quran.com, an educational resource with over two million users per year

White Rose E-theses Online

Digital Classical Philology

Author
Publication venue: 'Walter de Gruyter GmbH'
Publication date
Field of study

The buzzwords “Information Society” and “Age of Access” suggest that information is now universally accessible without any form of hindrance. Indeed, the German constitution calls for all citizens to have open access to information. Yet in reality, there are multifarious hurdles to information access – whether physical, economic, intellectual, linguistic, political, or technical. Thus, while new methods and practices for making information accessible arise on a daily basis, we are nevertheless confronted by limitations to information access in various domains. This new book series assembles academics and professionals in various fields in order to illuminate the various dimensions of information's inaccessability. While the series discusses principles and techniques for transcending the hurdles to information access, it also addresses necessary boundaries to accessability.This book describes the state of the art of digital philology with a focus on ancient Greek and Latin. It addresses problems such as accessibility of information about Greek and Latin sources, data entry, collection and analysis of Classical texts and describes the fundamental role of libraries in building digital catalogs and developing machine-readable citation systems

OAPEN Library

Natural Language Processing Resources for Finnish. Corpus Development in the General and Clinical Domains

Author: Haverinen Katri
Publication venue: Turku Centre for Computer Science
Publication date: 04/09/2014
Field of study

Siirretty Doriast

UTUPub