Search CORE

2,574 research outputs found

Developing an Architecture for Translation Engine using Ontology

Author: Kharbat Faten
Publication venue: The International Institute for Science, Technology and Education (IISTE)
Publication date: 01/10/2013
Field of study

In translation, analyzing the input sequence in order to determine its grammatical structure with respect to the given formal grammar is called the parsing procedure (Bataineh & Bataine, 2009). In this research, the main idea of the proposed architecture is to utilize the WordNet ontology to be the syntactic guide along with the Transition Network Grammar to determine the grammatical structure for the text to be translated. This is followed by a mapping process between the source and target languages which will enhance the accuracy of the result. Also, it will guarantee that the output will be syntactically acceptable according to the target language rules. This research is an open research which is having ongoing results and developments. Herein, the main architecture is described to open the door for several future steps for further integration with other techniques and approaches. Keywords: Translation, WordNet, Transition Network Grammars, mapping engine, parsing procedure.

International Institute for Science, Technology and Education (IISTE): E-Journals

Statistical Parsing by Machine Learning from a Classical Arabic Treebank

Author: Dukes Kais
Publication venue: University of Leeds
Publication date: 01/09/2013
Field of study

Research into statistical parsing for English has enjoyed over a decade of successful results. However, adapting these models to other languages has met with difficulties. Previous comparative work has shown that Modern Arabic is one of the most difficult languages to parse due to rich morphology and free word order. Classical Arabic is the ancient form of Arabic, and is understudied in computational linguistics, relative to its worldwide reach as the language of the Quran. The thesis is based on seven publications that make significant contributions to knowledge relating to annotating and parsing Classical Arabic. Classical Arabic has been studied in depth by grammarians for over a thousand years using a traditional grammar known as i’rāb (إعغاة ). Using this grammar to develop a representation for parsing is challenging, as it describes syntax using a hybrid of phrase-structure and dependency relations. This work aims to advance the state-of-the-art for hybrid parsing by introducing a formal representation for annotation and a resource for machine learning. The main contributions are the first treebank for Classical Arabic and the first statistical dependency-based parser in any language for ellipsis, dropped pronouns and hybrid representations. A central argument of this thesis is that using a hybrid representation closely aligned to traditional grammar leads to improved parsing for Arabic. To test this hypothesis, two approaches are compared. As a reference, a pure dependency parser is adapted using graph transformations, resulting in an 87.47% F1-score. This is compared to an integrated parsing model with an F1-score of 89.03%, demonstrating that joint dependency-constituency parsing is better suited to Classical Arabic. The Quran was chosen for annotation as a large body of work exists providing detailed syntactic analysis. Volunteer crowdsourcing is used for annotation in combination with expert supervision. A practical result of the annotation effort is the corpus website: http://corpus.quran.com, an educational resource with over two million users per year

White Rose E-theses Online

Arabic open information extraction system using dependency parsing

Author: Hussein Mahmoud
Mohamed Ali El-Morsy Sally
Mousa Hamdy M.
Publication venue: 'Institute of Advanced Engineering and Science'
Publication date: 01/02/2022
Field of study

Arabic is a Semitic language and one of the most natural languages distinguished by the richness in morphological enunciation and derivation. This special and complex nature makes extracting information from the Arabic language difficult and always needs improvement. Open information extraction systems (OIE) have been emerged and used in different languages, especially in English. However, it has almost not been used for the Arabic language. Accordingly, this paper aims to introduce an OIE system that extracts the relation tuple from Arabic web text, exploiting Arabic dependency parsing and thinking carefully about all possible text relations. Based on clause types' propositions as extractable relations and constituents' grammatical functions, the identities of corresponding clause types are established. The proposed system named Arabic open information extraction(AOIE) can extract highly scalable Arabic text relations while being domain independent. Implementing the proposed system handles the problem using supervised strategies while the system relies on unsupervised extraction strategies. Also, the system has been implemented in several domains to avoid information extraction in a specific field. The results prove that the system achieves high efficiency in extracting clauses from large amounts of text

ZENODO

Institute of Advanced Engineering and Science

Lexical Selection for Machine Translation

Author: Sabtan Yasser
Publication venue
Publication date: 01/08/2011
Field of study

The University of Manchester - Institutional Repository

Proceedings

Author: Ahrenberg Lars
Tiedemann Jörg
Volk Martin
Publication venue
Publication date: 30/11/2010
Field of study

Proceedings of the Workshop on Annotation and Exploitation of Parallel Corpora AEPC 2010. Editors: Lars Ahrenberg, Jörg Tiedemann and Martin Volk. NEALT Proceedings Series, Vol. 10 (2010), 98 pages. © 2010 The editors and contributors. Published by Northern European Association for Language Technology (NEALT) http://omilia.uio.no/nealt . Electronically published at Tartu University Library (Estonia) http://hdl.handle.net/10062/15893

DSpace at Tartu University Library

Null Element Restoration

Author: Gabbard Ryan
Publication venue: ScholarlyCommons
Publication date: 01/01/2010
Field of study

Understanding the syntactic structure of a sentence is a necessary preliminary to understanding its semantics and therefore for many practical applications. The field of natural language processing has achieved a high degree of accuracy in parsing, at least in English. However, the syntactic structures produced by the most commonly used parsers are less detailed than those structures found in the treebanks the parsers were trained on. In particular, these parsers typically lack the null elements used to indicate wh-movement, control, and other phenomena. This thesis presents a system for inserting these null elements into parse trees in English. It then examines the problem in Arabic, which motivates a second, joint- inference system which has improved performance on English as well. Finally, it examines the application of information derived from the Google Web 1T corpus as a way of reducing certain data sparsity issues related to wh-movement

ScholarlyCommons@Penn

When Simple n-gram Models Outperform Syntactic Approaches: Discriminating between Dutch and Flemish

Author: Kroon Martin
Medvedeva Masha
Plank Barbara
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2018
Field of study

The IT University of Copenhagen's Repository

Zero-shot Cross-Linguistic Learning of Event Semantics

Author: Alhafni Bashar
Alikhani Malihe
Chen Yue
Inan Mert
Kober Thomas
Nielsen Elizabeth
Raji Shahab
Steedman Mark
Stone Matthew
Publication venue
Publication date: 18/07/2022
Field of study

Edinburgh Research Explorer