455 research outputs found
Extending the adverbial coverage of a NLP oriented resource for French
This paper presents a work on extending the adverbial entries of LGLex: a NLP
oriented syntactic resource for French. Adverbs were extracted from the
Lexicon-Grammar tables of both simple adverbs ending in -ment '-ly' (Molinier
and Levrier, 2000) and compound adverbs (Gross, 1986; 1990). This work relies
on the exploitation of fine-grained linguistic information provided in existing
resources. Various features are encoded in both LG tables and they haven't been
exploited yet. They describe the relations of deleting, permuting, intensifying
and paraphrasing that associate, on the one hand, the simple and compound
adverbs and, on the other hand, different types of compound adverbs. The
resulting syntactic resource is manually evaluated and freely available under
the LGPL-LR license.Comment: Proceedings of the 5th International Joint Conference on Natural
Language Processing (IJCNLP'11), Chiang Mai : Thailand (2011
Towards Comprehensive Computational Representations of Arabic Multiword Expressions
A successful computational treatment of multiword expressions (MWEs) in natural languages leads to a robust NLP system which considers the long-standing problem of language ambiguity caused primarily by this complex linguistic phenomenon. The first step in addressing
this challenge is building an extensive reliable MWEs language resource LR with comprehensive computational representations across all linguistic levels. This forms the cornerstone in understanding the heterogeneous linguistic behaviour of MWEs in their various manifestations. This paper presents a detailed framework for computational representations of Arabic MWEs (ArMWEs) across all linguistic levels based on the state-of-the-art lexical mark-up framework (LMF) with the necessary modifications to suit the distinctive properties of Modern Standard Arabic (MSA). This work forms part of a larger project that aims to develop a comprehensive computational lexicon of ArMWEs for NLP and language pedagogy LP (JOMAL project)
Towards Comprehensive Computational Representations of Arabic Multiword Expressions
A successful computational treatment of multiword expressions (MWEs) in natural languages leads to a robust NLP system which considers the long-standing problem of language ambiguity caused primarily by this complex linguistic phenomenon. The first step in addressing
this challenge is building an extensive reliable MWEs language resource LR with comprehensive computational representations across all linguistic levels. This forms the cornerstone in understanding the heterogeneous linguistic behaviour of MWEs in their various manifestations. This paper presents a detailed framework for computational representations of Arabic MWEs (ArMWEs) across all linguistic levels based on the state-of-the-art lexical mark-up framework (LMF) with the necessary modifications to suit the distinctive properties of Modern Standard Arabic (MSA). This work forms part of a larger project that aims to develop a comprehensive computational lexicon of ArMWEs for NLP and language pedagogy LP (JOMAL project)
A Computational Lexicon and Representational Model for Arabic Multiword Expressions
The phenomenon of multiword expressions (MWEs) is increasingly recognised as a serious and challenging issue that has attracted the attention of researchers in various language-related disciplines. Research in these many areas has emphasised the primary role of MWEs in the process of analysing and understanding language, particularly in the computational treatment of natural languages. Ignoring MWE knowledge in any NLP system reduces the possibility of achieving high precision outputs. However, despite the enormous wealth of MWE research and language resources available for English and some other languages, research on Arabic MWEs (AMWEs) still faces multiple challenges, particularly in key computational tasks such as extraction, identification, evaluation, language resource building, and lexical representations.
This research aims to remedy this deficiency by extending knowledge of AMWEs and making noteworthy contributions to the existing literature in three related research areas on the way towards building a computational lexicon of AMWEs. First, this study develops a general understanding of AMWEs by establishing a detailed conceptual framework that includes a description of an adopted AMWE concept and its distinctive properties at multiple linguistic levels. Second, in the use of AMWE extraction and discovery tasks, the study employs a hybrid approach that combines knowledge-based and data-driven computational methods for discovering multiple types of AMWEs. Third, this thesis presents a representative system for AMWEs which consists of multilayer encoding of extensive linguistic descriptions.
This project also paves the way for further in-depth AMWE-aware studies in NLP and linguistics to gain new insights into this complicated phenomenon in standard Arabic. The implications of this research are related to the vital role of the AMWE lexicon, as a new lexical resource, in the improvement of various ANLP tasks and the potential opportunities this lexicon provides for linguists to analyse and explore AMWE phenomena
CCG Parsing and Multiword Expressions
This thesis presents a study about the integration of information about
Multiword Expressions (MWEs) into parsing with Combinatory Categorial Grammar
(CCG). We build on previous work which has shown the benefit of adding
information about MWEs to syntactic parsing by implementing a similar pipeline
with CCG parsing. More specifically, we collapse MWEs to one token in training
and test data in CCGbank, a corpus which contains sentences annotated with CCG
derivations. Our collapsing algorithm however can only deal with MWEs when they
form a constituent in the data which is one of the limitations of our approach.
We study the effect of collapsing training and test data. A parsing effect
can be obtained if collapsed data help the parser in its decisions and a
training effect can be obtained if training on the collapsed data improves
results. We also collapse the gold standard and show that our model
significantly outperforms the baseline model on our gold standard, which
indicates that there is a training effect. We show that the baseline model
performs significantly better on our gold standard when the data are collapsed
before parsing than when the data are collapsed after parsing which indicates
that there is a parsing effect. We show that these results can lead to improved
performance on the non-collapsed standard benchmark although we fail to show
that it does so significantly. We conclude that despite the limited settings,
there are noticeable improvements from using MWEs in parsing. We discuss ways
in which the incorporation of MWEs into parsing can be improved and hypothesize
that this will lead to more substantial results.
We finally show that turning the MWE recognition part of the pipeline into an
experimental part is a useful thing to do as we obtain different results with
different recognizers.Comment: MSc thesis, The University of Edinburgh, 2014, School of Informatics,
MSc Artificial Intelligenc
Current trends
Deep parsing is the fundamental process aiming at the representation of the syntactic
structure of phrases and sentences. In the traditional methodology this process is
based on lexicons and grammars representing roughly properties of words and interactions
of words and structures in sentences. Several linguistic frameworks, such as Headdriven
Phrase Structure Grammar (HPSG), Lexical Functional Grammar (LFG), Tree Adjoining
Grammar (TAG), Combinatory Categorial Grammar (CCG), etc., offer different
structures and combining operations for building grammar rules. These already contain
mechanisms for expressing properties of Multiword Expressions (MWE), which, however,
need improvement in how they account for idiosyncrasies of MWEs on the one
hand and their similarities to regular structures on the other hand. This collaborative
book constitutes a survey on various attempts at representing and parsing MWEs in the
context of linguistic theories and applications
Representation and parsing of multiword expressions
This book consists of contributions related to the definition, representation and parsing of MWEs. These reflect current trends in the representation and processing of MWEs. They cover various categories of MWEs such as verbal, adverbial and nominal MWEs, various linguistic frameworks (e.g. tree-based and unification-based grammars), various languages including English, French, Modern Greek, Hebrew, Norwegian), and various applications (namely MWE detection, parsing, automatic translation) using both symbolic and statistical approaches
Abstract syntax as interlingua: Scaling up the grammatical framework from controlled languages to robust pipelines
Syntax is an interlingual representation used in compilers. Grammatical Framework (GF) applies the abstract syntax idea to natural languages. The development of GF started in 1998, first as a tool for controlled language implementations, where it has gained an established position in both academic and commercial projects. GF provides grammar resources for over 40 languages, enabling accurate generation and translation, as well as grammar engineering tools and components for mobile and Web applications. On the research side, the focus in the last ten years has been on scaling up GF to wide-coverage language processing. The concept of abstract syntax offers a unified view on many other approaches: Universal Dependencies, WordNets, FrameNets, Construction Grammars, and Abstract Meaning Representations. This makes it possible for GF to utilize data from the other approaches and to build robust pipelines. In return, GF can contribute to data-driven approaches by methods to transfer resources from one language to others, to augment data by rule-based generation, to check the consistency of hand-annotated corpora, and to pipe analyses into high-precision semantic back ends. This article gives an overview of the use of abstract syntax as interlingua through both established and emerging NLP applications involving GF
- …