4,331 research outputs found

    Towards Comprehensive Computational Representations of Arabic Multiword Expressions

    Get PDF
    A successful computational treatment of multiword expressions (MWEs) in natural languages leads to a robust NLP system which considers the long-standing problem of language ambiguity caused primarily by this complex linguistic phenomenon. The first step in addressing this challenge is building an extensive reliable MWEs language resource LR with comprehensive computational representations across all linguistic levels. This forms the cornerstone in understanding the heterogeneous linguistic behaviour of MWEs in their various manifestations. This paper presents a detailed framework for computational representations of Arabic MWEs (ArMWEs) across all linguistic levels based on the state-of-the-art lexical mark-up framework (LMF) with the necessary modifications to suit the distinctive properties of Modern Standard Arabic (MSA). This work forms part of a larger project that aims to develop a comprehensive computational lexicon of ArMWEs for NLP and language pedagogy LP (JOMAL project)

    Towards Comprehensive Computational Representations of Arabic Multiword Expressions

    Get PDF
    A successful computational treatment of multiword expressions (MWEs) in natural languages leads to a robust NLP system which considers the long-standing problem of language ambiguity caused primarily by this complex linguistic phenomenon. The first step in addressing this challenge is building an extensive reliable MWEs language resource LR with comprehensive computational representations across all linguistic levels. This forms the cornerstone in understanding the heterogeneous linguistic behaviour of MWEs in their various manifestations. This paper presents a detailed framework for computational representations of Arabic MWEs (ArMWEs) across all linguistic levels based on the state-of-the-art lexical mark-up framework (LMF) with the necessary modifications to suit the distinctive properties of Modern Standard Arabic (MSA). This work forms part of a larger project that aims to develop a comprehensive computational lexicon of ArMWEs for NLP and language pedagogy LP (JOMAL project)

    A Computational Lexicon and Representational Model for Arabic Multiword Expressions

    Get PDF
    The phenomenon of multiword expressions (MWEs) is increasingly recognised as a serious and challenging issue that has attracted the attention of researchers in various language-related disciplines. Research in these many areas has emphasised the primary role of MWEs in the process of analysing and understanding language, particularly in the computational treatment of natural languages. Ignoring MWE knowledge in any NLP system reduces the possibility of achieving high precision outputs. However, despite the enormous wealth of MWE research and language resources available for English and some other languages, research on Arabic MWEs (AMWEs) still faces multiple challenges, particularly in key computational tasks such as extraction, identification, evaluation, language resource building, and lexical representations. This research aims to remedy this deficiency by extending knowledge of AMWEs and making noteworthy contributions to the existing literature in three related research areas on the way towards building a computational lexicon of AMWEs. First, this study develops a general understanding of AMWEs by establishing a detailed conceptual framework that includes a description of an adopted AMWE concept and its distinctive properties at multiple linguistic levels. Second, in the use of AMWE extraction and discovery tasks, the study employs a hybrid approach that combines knowledge-based and data-driven computational methods for discovering multiple types of AMWEs. Third, this thesis presents a representative system for AMWEs which consists of multilayer encoding of extensive linguistic descriptions. This project also paves the way for further in-depth AMWE-aware studies in NLP and linguistics to gain new insights into this complicated phenomenon in standard Arabic. The implications of this research are related to the vital role of the AMWE lexicon, as a new lexical resource, in the improvement of various ANLP tasks and the potential opportunities this lexicon provides for linguists to analyse and explore AMWE phenomena

    Abstract syntax as interlingua: Scaling up the grammatical framework from controlled languages to robust pipelines

    Get PDF
    Syntax is an interlingual representation used in compilers. Grammatical Framework (GF) applies the abstract syntax idea to natural languages. The development of GF started in 1998, first as a tool for controlled language implementations, where it has gained an established position in both academic and commercial projects. GF provides grammar resources for over 40 languages, enabling accurate generation and translation, as well as grammar engineering tools and components for mobile and Web applications. On the research side, the focus in the last ten years has been on scaling up GF to wide-coverage language processing. The concept of abstract syntax offers a unified view on many other approaches: Universal Dependencies, WordNets, FrameNets, Construction Grammars, and Abstract Meaning Representations. This makes it possible for GF to utilize data from the other approaches and to build robust pipelines. In return, GF can contribute to data-driven approaches by methods to transfer resources from one language to others, to augment data by rule-based generation, to check the consistency of hand-annotated corpora, and to pipe analyses into high-precision semantic back ends. This article gives an overview of the use of abstract syntax as interlingua through both established and emerging NLP applications involving GF

    A preliminary bibliography on focus

    Get PDF
    [I]n its present form, the bibliography contains approximately 1100 entries. Bibliographical work is never complete, and the present one is still modest in a number of respects. It is not annotated, and it still contains a lot of mistakes and inconsistencies. It has nevertheless reached a stage which justifies considering the possibility of making it available to the public. The first step towards this is its pre-publication in the form of this working paper. […] The bibliography is less complete for earlier years. For works before 1970, the bibliographies of Firbas and Golkova 1975 and Tyl 1970 may be consulted, which have not been included here

    SemEval-2017 Task 1: semantic textual similarity - multilingual and cross-lingual focused evaluation

    Get PDF
    Semantic Textual Similarity (STS) measures the meaning similarity of sentences. Applications include machine translation (MT), summarization, generation, question answering (QA), short answer grading, semantic search, dialog and conversational systems. The STS shared task is a venue for assessing the current state-of-the-art. The 2017 task focuses on multilingual and cross-lingual pairs with one sub-track exploring MT quality estimation (MTQE) data. The task obtained strong participation from 31 teams, with 17 participating in all language tracks. We summarize performance and review a selection of well performing methods. Analysis highlights common errors, providing insight into the limitations of existing models. To support ongoing work on semantic representations, the STS Benchmark is introduced as a new shared training and evaluation set carefully selected from the corpus of English STS shared task data (2012-2017)
    corecore