1,883 research outputs found

    A Computational Lexicon and Representational Model for Arabic Multiword Expressions

    Get PDF
    The phenomenon of multiword expressions (MWEs) is increasingly recognised as a serious and challenging issue that has attracted the attention of researchers in various language-related disciplines. Research in these many areas has emphasised the primary role of MWEs in the process of analysing and understanding language, particularly in the computational treatment of natural languages. Ignoring MWE knowledge in any NLP system reduces the possibility of achieving high precision outputs. However, despite the enormous wealth of MWE research and language resources available for English and some other languages, research on Arabic MWEs (AMWEs) still faces multiple challenges, particularly in key computational tasks such as extraction, identification, evaluation, language resource building, and lexical representations. This research aims to remedy this deficiency by extending knowledge of AMWEs and making noteworthy contributions to the existing literature in three related research areas on the way towards building a computational lexicon of AMWEs. First, this study develops a general understanding of AMWEs by establishing a detailed conceptual framework that includes a description of an adopted AMWE concept and its distinctive properties at multiple linguistic levels. Second, in the use of AMWE extraction and discovery tasks, the study employs a hybrid approach that combines knowledge-based and data-driven computational methods for discovering multiple types of AMWEs. Third, this thesis presents a representative system for AMWEs which consists of multilayer encoding of extensive linguistic descriptions. This project also paves the way for further in-depth AMWE-aware studies in NLP and linguistics to gain new insights into this complicated phenomenon in standard Arabic. The implications of this research are related to the vital role of the AMWE lexicon, as a new lexical resource, in the improvement of various ANLP tasks and the potential opportunities this lexicon provides for linguists to analyse and explore AMWE phenomena

    Anaphora resolution for Arabic machine translation :a case study of nafs

    Get PDF
    PhD ThesisIn the age of the internet, email, and social media there is an increasing need for processing online information, for example, to support education and business. This has led to the rapid development of natural language processing technologies such as computational linguistics, information retrieval, and data mining. As a branch of computational linguistics, anaphora resolution has attracted much interest. This is reflected in the large number of papers on the topic published in journals such as Computational Linguistics. Mitkov (2002) and Ji et al. (2005) have argued that the overall quality of anaphora resolution systems remains low, despite practical advances in the area, and that major challenges include dealing with real-world knowledge and accurate parsing. This thesis investigates the following research question: can an algorithm be found for the resolution of the anaphor nafs in Arabic text which is accurate to at least 90%, scales linearly with text size, and requires a minimum of knowledge resources? A resolution algorithm intended to satisfy these criteria is proposed. Testing on a corpus of contemporary Arabic shows that it does indeed satisfy the criteria.Egyptian Government

    Morphosyntactic Linguistic Wavelets for Knowledge Management

    Get PDF

    Abstract syntax as interlingua: Scaling up the grammatical framework from controlled languages to robust pipelines

    Get PDF
    Syntax is an interlingual representation used in compilers. Grammatical Framework (GF) applies the abstract syntax idea to natural languages. The development of GF started in 1998, first as a tool for controlled language implementations, where it has gained an established position in both academic and commercial projects. GF provides grammar resources for over 40 languages, enabling accurate generation and translation, as well as grammar engineering tools and components for mobile and Web applications. On the research side, the focus in the last ten years has been on scaling up GF to wide-coverage language processing. The concept of abstract syntax offers a unified view on many other approaches: Universal Dependencies, WordNets, FrameNets, Construction Grammars, and Abstract Meaning Representations. This makes it possible for GF to utilize data from the other approaches and to build robust pipelines. In return, GF can contribute to data-driven approaches by methods to transfer resources from one language to others, to augment data by rule-based generation, to check the consistency of hand-annotated corpora, and to pipe analyses into high-precision semantic back ends. This article gives an overview of the use of abstract syntax as interlingua through both established and emerging NLP applications involving GF
    • …
    corecore