434 research outputs found
Representation and parsing of multiword expressions
This book consists of contributions related to the definition, representation and parsing of MWEs. These reflect current trends in the representation and processing of MWEs. They cover various categories of MWEs such as verbal, adverbial and nominal MWEs, various linguistic frameworks (e.g. tree-based and unification-based grammars), various languages including English, French, Modern Greek, Hebrew, Norwegian), and various applications (namely MWE detection, parsing, automatic translation) using both symbolic and statistical approaches
Current trends
Deep parsing is the fundamental process aiming at the representation of the syntactic
structure of phrases and sentences. In the traditional methodology this process is
based on lexicons and grammars representing roughly properties of words and interactions
of words and structures in sentences. Several linguistic frameworks, such as Headdriven
Phrase Structure Grammar (HPSG), Lexical Functional Grammar (LFG), Tree Adjoining
Grammar (TAG), Combinatory Categorial Grammar (CCG), etc., offer different
structures and combining operations for building grammar rules. These already contain
mechanisms for expressing properties of Multiword Expressions (MWE), which, however,
need improvement in how they account for idiosyncrasies of MWEs on the one
hand and their similarities to regular structures on the other hand. This collaborative
book constitutes a survey on various attempts at representing and parsing MWEs in the
context of linguistic theories and applications
Understanding Hidden Memories of Recurrent Neural Networks
Recurrent neural networks (RNNs) have been successfully applied to various
natural language processing (NLP) tasks and achieved better results than
conventional methods. However, the lack of understanding of the mechanisms
behind their effectiveness limits further improvements on their architectures.
In this paper, we present a visual analytics method for understanding and
comparing RNN models for NLP tasks. We propose a technique to explain the
function of individual hidden state units based on their expected response to
input texts. We then co-cluster hidden state units and words based on the
expected response and visualize co-clustering results as memory chips and word
clouds to provide more structured knowledge on RNNs' hidden states. We also
propose a glyph-based sequence visualization based on aggregate information to
analyze the behavior of an RNN's hidden state at the sentence-level. The
usability and effectiveness of our method are demonstrated through case studies
and reviews from domain experts.Comment: Published at IEEE Conference on Visual Analytics Science and
Technology (IEEE VAST 2017
A generic housing grammar for the generation of different housing languages: a generic housing shape grammar for Palladian villas, Prairie and Malagueira houses
Shape grammars have traditionally described a design language and replicated it using a procedure. In the majority of existing studies, one language corresponded to one grammar and vice versa; the generative procedure was univocal and language specific. Generic grammars, which are capable of describing multiple design languages, potentially allow greater flexibility and help describe not only languages but relationships between languages. This study proposes a generic housing process based on a parametric shape grammar, and uses this to investigate relationships between several grammars or families of designs. A study case of three single housing grammars was selected using the Palladian villas, Prairie and Malagueira houses. Specific parameterisation confers the sense of style required to define a language. From the generated corpora two methods were exercised to explore two research questions: 1. A qualitative method tested how the parametric space of a shape grammar corresponded with our intuition of similarities and differences amongst designs. This was performed using a set of questionnaires posed to both laymen and expert observers. 2. A quantitative method was used to test how well the parametric space of a shape grammar coincided with the design space expressed by the different corpora. Principal Components Analysis was used to inform if the set of parameters used to design the solutions would group into clusters. Results indicate that the expected relationships between individual designs are captured by the generic grammar. The design solutions generated by the generic grammar were also naturally perceived by observers and clustering was identified amongst language related design solutions. A tool such as a generic shape grammar captures the principles of design as described by the generative shape rules and its parameterisation, which can be used in academia, practice or analysis to explore design
An implementation of four of Ledgard\u27s mini-languages
For decades humans have been searching for the test way to communicate with an intelligent machine, a computer. Several programming languages have been written with the idea of a universal language have been written with the idea of a universal language which includes solutions to solve as many problems as one can think of. Put the more universal the languages are the more complex they are to study. Henery Ledgard tried to combine these two ideas together. He suggests the idea of studying programming language by dealing with a few key features at a time. He separated the various programming features. grouped the similar ones together and wrote his own small languages called Mini-languages . The programming language landscape\u27 (15) which includes 13 mini-languages was used as the central reference for this thesis work. Each of the four mini-languages was implemented in 2 sections; a compiler and an interpreter. One can write a program in any of the four mini-languages; compile and run (interpret) it to test the correctness
Joint RNN-Based Greedy Parsing and Word Composition
This paper introduces a greedy parser based on neural networks, which
leverages a new compositional sub-tree representation. The greedy parser and
the compositional procedure are jointly trained, and tightly depends on
each-other. The composition procedure outputs a vector representation which
summarizes syntactically (parsing tags) and semantically (words) sub-trees.
Composition and tagging is achieved over continuous (word or tag)
representations, and recurrent neural networks. We reach F1 performance on par
with well-known existing parsers, while having the advantage of speed, thanks
to the greedy nature of the parser. We provide a fully functional
implementation of the method described in this paper.Comment: Published as a conference paper at ICLR 201
Structuring and composability issues in Petri nets modeling
Along Petri nets' history, numerous approaches have been proposed that try to manage model size through the introduction of structuring mechanisms allowing hierarchical representations and model composability. This paper proposes a classification system for Petri nets' structuring mechanisms and discusses each one of them. These include node fusion, node vectors, high-level nets, and object-oriented inspired Petri nets extensions, among others. One running example is used emphasizing the application of the presented mechanisms to specific areas, namely to automation systems modeling, and software engineering, where object-oriented modeling plays a major role
Complexity of Lexical Descriptions and its Relevance to Partial Parsing
In this dissertation, we have proposed novel methods for robust parsing that integrate the flexibility of linguistically motivated lexical descriptions with the robustness of statistical techniques. Our thesis is that the computation of linguistic structure can be localized if lexical items are associated with rich descriptions (supertags) that impose complex constraints in a local context. However, increasing the complexity of descriptions makes the number of different descriptions for each lexical item much larger and hence increases the local ambiguity for a parser. This local ambiguity can be resolved by using supertag co-occurrence statistics collected from parsed corpora. We have explored these ideas in the context of Lexicalized Tree-Adjoining Grammar (LTAG) framework wherein supertag disambiguation provides a representation that is an almost parse. We have used the disambiguated supertag sequence in conjunction with a lightweight dependency analyzer to compute noun groups, verb groups, dependency linkages and even partial parses. We have shown that a trigram-based supertagger achieves an accuracy of 92.1‰ on Wall Street Journal (WSJ) texts. Furthermore, we have shown that the lightweight dependency analysis on the output of the supertagger identifies 83‰ of the dependency links accurately. We have exploited the representation of supertags with Explanation-Based Learning to improve parsing effciency. In this approach, parsing in limited domains can be modeled as a Finite-State Transduction. We have implemented such a system for the ATIS domain which improves parsing eciency by a factor of 15. We have used the supertagger in a variety of applications to provide lexical descriptions at an appropriate granularity. In an information retrieval application, we show that the supertag based system performs at higher levels of precision compared to a system based on part-of-speech tags. In an information extraction task, supertags are used in specifying extraction patterns. For language modeling applications, we view supertags as syntactically motivated class labels in a class-based language model. The distinction between recursive and non-recursive supertags is exploited in a sentence simplification application
- …