182 research outputs found

    Proceedings

    Get PDF
    Proceedings of the NODALIDA 2011 Workshop Constraint Grammar Applications. Editors: Eckhard Bick, Kristin Hagen, Kaili Müürisep, Trond Trosterud. NEALT Proceedings Series, Vol. 14 (2011), vi+69 pp. © 2011 The editors and contributors. Published by Northern European Association for Language Technology (NEALT) http://omilia.uio.no/nealt . Electronically published at Tartu University Library (Estonia) http://hdl.handle.net/10062/19231

    Taking on new challenges in multi-word unit processing for Machine Translation

    Get PDF
    This paper discusses the qualitative comparative evaluation performed on the results of two machine translation systems with different approaches to the processing of multi-word units. It proposes a solution for overcoming the difficulties multi-word units present to machine translation by adopting a methodology that combines the lexicon grammar approach with OpenLogos ontology and semantico-syntactic rules. The paper also discusses the importance of a qualitative evaluation metrics to correctly evaluate the performance of machine translation engines with regards to multi-word units

    Contributions to the Theory of Finite-State Based Grammars

    Get PDF
    This dissertation is a theoretical study of finite-state based grammars used in natural language processing. The study is concerned with certain varieties of finite-state intersection grammars (FSIG) whose parsers define regular relations between surface strings and annotated surface strings. The study focuses on the following three aspects of FSIGs: (i) Computational complexity of grammars under limiting parameters In the study, the computational complexity in practical natural language processing is approached through performance-motivated parameters on structural complexity. Each parameter splits some grammars in the Chomsky hierarchy into an infinite set of subset approximations. When the approximations are regular, they seem to fall into the logarithmic-time hierarchyand the dot-depth hierarchy of star-free regular languages. This theoretical result is important and possibly relevant to grammar induction. (ii) Linguistically applicable structural representations Related to the linguistically applicable representations of syntactic entities, the study contains new bracketing schemes that cope with dependency links, left- and right branching, crossing dependencies and spurious ambiguity. New grammar representations that resemble the Chomsky-Schützenberger representation of context-free languages are presented in the study, and they include, in particular, representations for mildly context-sensitive non-projective dependency grammars whose performance-motivated approximations are linear time parseable. (iii) Compilation and simplification of linguistic constraints Efficient compilation methods for certain regular operations such as generalized restriction are presented. These include an elegant algorithm that has already been adopted as the approach in a proprietary finite-state tool. In addition to the compilation methods, an approach to on-the-fly simplifications of finite-state representations for parse forests is sketched. These findings are tightly coupled with each other under the theme of locality. I argue that the findings help us to develop better, linguistically oriented formalisms for finite-state parsing and to develop more efficient parsers for natural language processing. Avainsanat: syntactic parsing, finite-state automata, dependency grammar, first-order logic, linguistic performance, star-free regular approximations, mildly context-sensitive grammar

    Part-of-speech Tagging: A Machine Learning Approach based on Decision Trees

    Get PDF
    The study and application of general Machine Learning (ML) algorithms to theclassical ambiguity problems in the area of Natural Language Processing (NLP) isa currently very active area of research. This trend is sometimes called NaturalLanguage Learning. Within this framework, the present work explores the applicationof a concrete machine-learning technique, namely decision-tree induction, toa very basic NLP problem, namely part-of-speech disambiguation (POS tagging).Its main contributions fall in the NLP field, while topics appearing are addressedfrom the artificial intelligence perspective, rather from a linguistic point of view.A relevant property of the system we propose is the clear separation betweenthe acquisition of the language model and its application within a concrete disambiguationalgorithm, with the aim of constructing two components which are asindependent as possible. Such an approach has many advantages. For instance, thelanguage models obtained can be easily adapted into previously existing taggingformalisms; the two modules can be improved and extended separately; etc.As a first step, we have experimentally proven that decision trees (DT) providea flexible (by allowing a rich feature representation), efficient and compact wayfor acquiring, representing and accessing the information about POS ambiguities.In addition to that, DTs provide proper estimations of conditional probabilities fortags and words in their particular contexts. Additional machine learning techniques,based on the combination of classifiers, have been applied to address some particularweaknesses of our tree-based approach, and to further improve the accuracy in themost difficult cases.As a second step, the acquired models have been used to construct simple,accurate and effective taggers, based on diiferent paradigms. In particular, wepresent three different taggers that include the tree-based models: RTT, STT, andRELAX, which have shown different properties regarding speed, flexibility, accuracy,etc. The idea is that the particular user needs and environment will define whichis the most appropriate tagger in each situation. Although we have observed slightdifferences, the accuracy results for the three taggers, tested on the WSJ test benchcorpus, are uniformly very high, and, if not better, they are at least as good asthose of a number of current taggers based on automatic acquisition (a qualitativecomparison with the most relevant current work is also reported.Additionally, our approach has been adapted to annotate a general Spanishcorpus, with the particular limitation of learning from small training sets. A newtechnique, based on tagger combination and bootstrapping, has been proposed toaddress this problem and to improve accuracy. Experimental results showed thatvery high accuracy is possible for Spanish tagging, with a relatively low manualeffort. Additionally, the success in this real application has confirmed the validity of our approach, and the validity of the previously presented portability argumentin favour of automatically acquired taggers

    Mixed up with machine Translation: Multi-word Units Disambiguation Challenge.

    Get PDF
    With the rapid evolution of the Internet, translation has become part of the daily life of ordinary users, not only of professional translators. Machine translation has evolved along with different types of computer-assisted translation tools. Qualitative progress has been made in the field of machine translation, but not all problems have been solved. The current times are auspicious for the development of more sophisticated evaluation tools that measure the performance of specific linguistic phenomena. One problem in particular, namely the poor analysis and translation of multi-word units, is an arena where investment in linguistic knowledge systems with the goal of improving machine translation would be beneficial. This paper addresses the difficulties multi-word units present to machine translation, by comparing translations performed by systems adopting different approaches to machine translation. It proposes a solution for improving the quality of the translation of multi-word units by adopting a methodology that combines Lexicon Grammar resources with OpenLogos lexical resources and semantico-syntactic rules. Finally, it discusses how an ideal machine translation evaluation tool might look to correctly evaluate the performance of machine translation engines with regards to multi-word units and thus to contribute to the improvement of translation quality

    Sense and preference

    Get PDF
    AbstractSemantic networks have shown considerable utility as a knowledge representation for Natural Language Processing (NLP). This paper describes a system for automatically deriving network structures from machine-readable dictionary text. This strategy helps to solve the problem of vocabulary acquisition for large-scale parsing systems, but also introduces an extra level of difficulty in terms of word-sense ambiguity. A Preference Semantics parsing system that operates over this network is discussed, in particular as regards its mechanism for using the network for lexical selection

    Using Corpus-based Linguistic Approaches in Sense Prediction Study

    Get PDF
    corecore