Search CORE

819 research outputs found

Automatic treebank-based acquisition of Arabic LFG dependency structures

Author: Attia Mohammed
Tounsi Lamia
van Genabith Josef
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2009
Field of study

A number of papers have reported on methods for the automatic acquisition of large-scale, probabilistic LFG-based grammatical resources from treebanks for English (Cahill and al., 2002), (Cahill and al., 2004), German (Cahill and al., 2003), Chinese (Burke, 2004), (Guo and al., 2007), Spanish (O’Donovan, 2004), (Chrupala and van Genabith, 2006) and French (Schluter and van Genabith, 2008). Here, we extend the LFG grammar acquisition approach to Arabic and the Penn Arabic Treebank (ATB) (Maamouri and Bies, 2004), adapting and extending the methodology of (Cahill and al., 2004) originally developed for English. Arabic is challenging because of its morphological richness and syntactic complexity. Currently 98% of ATB trees (without FRAG and X) produce a covering and connected f-structure. We conduct a qualitative evaluation of our annotation against a gold standard and achieve an f-score of 95%

Irish Universities

DCU Online Research Access Service

ORTHOGRAPHIC ENRICHMENT FOR ARABIC GRAMMATICAL ANALYSIS

Author: Mohamed Emad Soliman
Publication venue: [Bloomington, Ind.] : Indiana University
Publication date: 01/01/2010
Field of study

Thesis (Ph.D.) - Indiana University, Linguistics, 2010The Arabic orthography is problematic in two ways: (1) it lacks the short vowels, and this leads to ambiguity as the same orthographic form can be pronounced in many different ways each of which can have its own grammatical category, and (2) the Arabic word may contain several units like pronouns, conjunctions, articles and prepositions without an intervening white space. These two problems lead to difficulties in the automatic processing of Arabic. The thesis proposes a pre-processing scheme that applies word segmentation and word vocalization for the purpose of grammatical analysis: part of speech tagging and parsing. The thesis examines the impact of human-produced vocalization and segmentation on the grammatical analysis of Arabic, then applies a pipeline of automatic vocalization and segmentation for the purpose of Arabic part of speech tagging. The pipeline is then used, along with the POS tags produced, for the purpose of dependency parsing, which produces grammatical relations between the words in a sentence. The study uses the memory-based algorithm for vocalization, segmentation, and part of speech tagging, and the natural language parser MaltParser for dependency parsing. The thesis represents the first approach to the processing of real-world Arabic, and has found that through the correct choice of features and algorithms, the need for pre-processing for grammatical analysis can be minimized

IUScholarWorks (University of Indiana)

Hindi CCGbank: CCG Treebank from the Hindi Dependency Treebank

Author: A Bharati
A Joshi
A Mahajan
B Kumari
Bharat Ram Ambati
C Shastri
D Hays
J Hockenmaier
J Nivre
J Robinson
M Kuhlmann
M Lewis
M Palmer
M Steedman
Mark Steedman
MP Marcus
N Xue
S Clark
S Reddy
S Uematsu
T Mohanan
Tejaswini Deoskar
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

Crossref

Springer - Publisher Connector

Edinburgh Research Explorer

Building representations from natural language

Author: Seifter Mark J
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/2007
Field of study

Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2007.Includes bibliographical references (p. 37-38).In this thesis, I describe a system I built that produces instantiated representations from descriptions embedded in natural language. For example, in the sentence 'The girl walked to the table', my system produces a description of movement along a path (the girl moves on a path to the table), instantiating a general purpose trajectory representation that models movement along a path. I demonstrate that descriptions found by my system enable the imagining of an entire inner world, transforming sentences into three-dimensional graphical descriptions of action. By building action descriptions from ordinary language, I illustrate the gains we can make by exploiting the connection between language and thought. I assert that a small set of simple representations should be able to provide powerful coverage of human expression through natural language. In particular, I examine the sorts of representations that are common in the Wall Street Journal from the Penn Treebank, providing a counterpoint for the many other sorts of analyses of the Penn Treebank in other work. Then, I turn to recognized experts in provoking our imaginations with words, using my system to examine the work of four great authors to uncover commonalities and differences in their styles from the perspective of the way they make representational choices in their work.by Mark J. Seifter.M.Eng

DSpace@MIT

Gradient-based Inference for Networks with Output Constraints

Author: Carbonell Jaime
Lee Jay Yoon
Mehta Sanket Vaibhav
Tristan Jean-Baptiste
Wick Michael
Publication venue
Publication date: 22/04/2019
Field of study

Practitioners apply neural networks to increasingly complex problems in natural language processing, such as syntactic parsing and semantic role labeling that have rich output structures. Many such structured-prediction problems require deterministic constraints on the output values; for example, in sequence-to-sequence syntactic parsing, we require that the sequential outputs encode valid trees. While hidden units might capture such properties, the network is not always able to learn such constraints from the training data alone, and practitioners must then resort to post-processing. In this paper, we present an inference method for neural networks that enforces deterministic constraints on outputs without performing rule-based post-processing or expensive discrete search. Instead, in the spirit of gradient-based training, we enforce constraints with gradient-based inference (GBI): for each input at test-time, we nudge continuous model weights until the network's unconstrained inference procedure generates an output that satisfies the constraints. We study the efficacy of GBI on three tasks with hard constraints: semantic role labeling, syntactic parsing, and sequence transduction. In each case, the algorithm not only satisfies constraints but improves accuracy, even when the underlying network is state-of-the-art.Comment: AAAI 201

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Statistical Parsing by Machine Learning from a Classical Arabic Treebank

Author: Dukes Kais
Publication venue: University of Leeds
Publication date: 01/09/2013
Field of study

Research into statistical parsing for English has enjoyed over a decade of successful results. However, adapting these models to other languages has met with difficulties. Previous comparative work has shown that Modern Arabic is one of the most difficult languages to parse due to rich morphology and free word order. Classical Arabic is the ancient form of Arabic, and is understudied in computational linguistics, relative to its worldwide reach as the language of the Quran. The thesis is based on seven publications that make significant contributions to knowledge relating to annotating and parsing Classical Arabic. Classical Arabic has been studied in depth by grammarians for over a thousand years using a traditional grammar known as i’rāb (إعغاة ). Using this grammar to develop a representation for parsing is challenging, as it describes syntax using a hybrid of phrase-structure and dependency relations. This work aims to advance the state-of-the-art for hybrid parsing by introducing a formal representation for annotation and a resource for machine learning. The main contributions are the first treebank for Classical Arabic and the first statistical dependency-based parser in any language for ellipsis, dropped pronouns and hybrid representations. A central argument of this thesis is that using a hybrid representation closely aligned to traditional grammar leads to improved parsing for Arabic. To test this hypothesis, two approaches are compared. As a reference, a pure dependency parser is adapted using graph transformations, resulting in an 87.47% F1-score. This is compared to an integrated parsing model with an F1-score of 89.03%, demonstrating that joint dependency-constituency parsing is better suited to Classical Arabic. The Quran was chosen for annotation as a large body of work exists providing detailed syntactic analysis. Volunteer crowdsourcing is used for annotation in combination with expert supervision. A practical result of the annotation effort is the corpus website: http://corpus.quran.com, an educational resource with over two million users per year

White Rose E-theses Online