111 research outputs found

    Optimality Theory as a Framework for Lexical Acquisition

    Full text link
    This paper re-investigates a lexical acquisition system initially developed for French.We show that, interestingly, the architecture of the system reproduces and implements the main components of Optimality Theory. However, we formulate the hypothesis that some of its limitations are mainly due to a poor representation of the constraints used. Finally, we show how a better representation of the constraints used would yield better results

    A distributional investigation of German verbs

    Get PDF
    Diese Dissertation bietet eine empirische Untersuchung deutscher Verben auf der Grundlage statistischer Beschreibungen, die aus einem großen deutschen Textkorpus gewonnen wurden. In einem kurzen Überblick über linguistische Theorien zur lexikalischen Semantik von Verben skizziere ich die Idee, dass die Verbbedeutung wesentlich von seiner Argumentstruktur (der Anzahl und Art der Argumente, die zusammen mit dem Verb auftreten) und seiner Aspektstruktur (Eigenschaften, die den zeitlichen Ablauf des vom Verb denotierten Ereignisses bestimmen) abhängt. Anschließend erstelle ich statistische Beschreibungen von Verben, die auf diesen beiden unterschiedlichen Bedeutungsfacetten basieren. Insbesondere untersuche ich verbale Subkategorisierung, Selektionspräferenzen und Aspekt. Alle diese Modellierungsstrategien werden anhand einer gemeinsamen Aufgabe, der Verbklassifikation, bewertet. Ich zeige, dass im Rahmen von maschinellem Lernen erworbene Merkmale, die verbale lexikalische Aspekte erfassen, für eine Anwendung von Vorteil sind, die Argumentstrukturen betrifft, nämlich semantische Rollenkennzeichnung. Darüber hinaus zeige ich, dass Merkmale, die die verbale Argumentstruktur erfassen, bei der Aufgabe, ein Verb nach seiner Aspektklasse zu klassifizieren, gut funktionieren. Diese Ergebnisse bestätigen, dass diese beiden Facetten der Verbbedeutung auf grundsätzliche Weise zusammenhängen.This dissertation provides an empirical investigation of German verbs conducted on the basis of statistical descriptions acquired from a large corpus of German text. In a brief overview of the linguistic theory pertaining to the lexical semantics of verbs, I outline the idea that verb meaning is composed of argument structure (the number and types of arguments that co-occur with a verb) and aspectual structure (properties describing the temporal progression of an event referenced by the verb). I then produce statistical descriptions of verbs according to these two distinct facets of meaning: In particular, I examine verbal subcategorisation, selectional preferences, and aspectual type. All three of these modelling strategies are evaluated on a common task, automatic verb classification. I demonstrate that automatically acquired features capturing verbal lexical aspect are beneficial for an application that concerns argument structure, namely semantic role labelling. Furthermore, I demonstrate that features capturing verbal argument structure perform well on the task of classifying a verb for its aspectual type. These findings suggest that these two facets of verb meaning are related in an underlying way

    The integration of syntax and semantic plausibility in a wide-coverage model of human sentence processing

    Get PDF
    Models of human sentence processing have paid much attention to three key characteristics of the sentence processor: Its robust and accurate processing of unseen input (wide coverage), its immediate, incremental interpretation of partial input and its sensitivity to structural frequencies in previous language experience. In this thesis, we propose a model of human sentence processing that accounts for these three characteristics and also models a fourth key characteristic, namely the influence of semantic plausibility on sentence processing. The precondition for such a sentence processing model is a general model of human plausibility intuitions. We therefore begin by presenting a probabilistic model of the plausibility of verb-argument relations, which we estimate as the probability of encountering a verb-argument pair in the relation specified by a thematic role in a role-annotated training corpus. This model faces a significant sparse data problem, which we alleviate by combining two orthogonal smoothing methods. We show that the smoothed model\u27;s predictions are significantly correlated to human plausibility judgements for a range of test sets. We also demonstrate that our semantic plausibility model outperforms selectional preference models and a standard role labeller, which solve tasks from computational linguistics that are related to the prediction of human judgements. We then integrate this semantic plausibility model with an incremental, wide-coverage, probabilistic model of syntactic processing to form the Syntax/Semantics (SynSem) Integration model of sentence processing. The SynSem-Integration model combines preferences for candidate syntactic structures from two sources: Syntactic probability estimates from a probabilistic parser and our semantic plausibility model\u27;s estimates of the verb-argument relations in each syntactic analysis. The model uses these preferences to determine a globally preferred structure and predicts difficulty in human sentence processing either if syntactic and semantic preferences conflict, or if the interpretation of the preferred analysis changes non-monotonically. In a thorough evaluation against the patterns of processing difficulty found for four ambiguity phenomena in eight reading-time studies, we demonstrate that the SynSem-Integration model reliably predicts human reading time behaviour.Diese Dissertation behandelt die Modellierung des menschlichen Sprachverstehens auf der Ebene einzelner Sätze. Während sich bereits existierende Modelle hauptsächlich mit syntaktischen Prozessen befassen, liegt unser Schwerpunkt darauf, ein Modell für die semantische Plausibilität von Äußerungen in ein Satzverarbeitungsmodell zu integrieren. Vier wichtige Eigenschaften des Sprachverstehens bestimmen die Konstruktion unseres Modells: Inkrementelle Verarbeitung, eine erfahrungsbasierte Architektur, breite Abdeckung von Äußerungen, und die Integration von semantischer Plausibilität. Während die ersten drei Eigenschaften von vielen Modellen aufgegriffen werden, gab es bis jetzt kein Modell, das außerdem auch Plausibilität einbezieht. Wir stellen zunächst ein generelles Plausibilitätsmodell vor, um es dann mit einem inkrementellen, probabilistischen Satzverarbeitungsmodell mit breiter Abdeckung zu einem Modell mit allen vier angestrebten Eigenschaften zu integrieren. Unser Plausibilitätsmodell sagt menschliche Plausibilitätsbewertungen für Verb-Argumentpaare in verschiedenen Relationen (z.B. Agens oder Patiens) voraus. Das Modell estimiert die Plausibilität eines Verb-Argumentpaars in einer spezifischen, durch eine thematische Rolle angegebenen Relation als die Wahrscheinlichkeit, das Tripel aus Verb, Argument und Rolle in einem rollensemantisch annotierten Trainingskorpus anzutreffen. Die Vorhersagen des Plausbilitätsmodells korrelieren für eine Reihe verschiedener Testdatensätze signifikant mit menschlichen Plausibilitätsbewertungen. Ein Vergleich mit zwei computerlinguist- ischen Ansätzen, die jeweils eine verwandte Aufgabe erfüllen, nämlich die Zuweisung von thematischen Rollen und die Berechnung von Selektionspräferenzen, zeigt, daß unser Modell Plausibilitätsurteile verläßlicher vorhersagt. Unser Satzverstehensmodell, das Syntax/Semantik-Integrationsmodell, ist eine Kombination aus diesem Plausibilitätsmodell und einem inkrementellen, probabilistischen Satzverarbeitungsmodell auf der Basis eines syntaktischen Parsers mit breiter Abdeckung. Das Syntax/Semantik-Integrationsmodell interpoliert syntaktische Wahrscheinlichkeitsabschätzungen für Analysen einer Äußerung mit den semantischen Plausibilitätsabschätzungen für die Verb-Argumentpaare in jeder Analyse. Das Ergebnis ist eine global präferierte Analyse. Das Syntax/Semantik-Integrationsmodell sagt Verarbeitungsschwierigkeiten voraus, wenn entweder die syntaktisch und semantisch präferierte Analyse konfligieren oder wenn sich die semantische Interpretation der global präferierten Analyse in einem Verarbeitungsschritt nicht-monoton ändert. Die abschließende Evaluation anhand von Befunden über menschliche Verarbeitungsschwierigkeiten, wie sie experimentell in acht Studien für vier Ambiguitätsphänomene festgestellt wurden, zeigt, daß das Syntax/Semantik-Integrationsmodell die experimentellen Daten korrekt voraussagt

    D7.4 Third evaluation report. Evaluation of PANACEA v3 and produced resources

    Get PDF
    D7.4 reports on the evaluation of the different components integrated in the PANACEA third cycle of development as well as the final validation of the platform itself. All validation and evaluation experiments follow the evaluation criteria already described in D7.1. The main goal of WP7 tasks was to test the (technical) functionalities and capabilities of the middleware that allows the integration of the various resource-creation components into an interoperable distributed environment (WP3) and to evaluate the quality of the components developed in WP5 and WP6. The content of this deliverable is thus complementary to D8.2 and D8.3 that tackle advantages and usability in industrial scenarios. It has to be noted that the PANACEA third cycle of development addressed many components that are still under research. The main goal for this evaluation cycle thus is to assess the methods experimented with and their potentials for becoming actual production tools to be exploited outside research labs. For most of the technologies, an attempt was made to re-interpret standard evaluation measures, usually in terms of accuracy, precision and recall, as measures related to a reduction of costs (time and human resources) in the current practices based on the manual production of resources. In order to do so, the different tools had to be tuned and adapted to maximize precision and for some tools the possibility to offer confidence measures that could allow a separation of the resources that still needed manual revision has been attempted. Furthermore, the extension to other languages in addition to English, also a PANACEA objective, has been evaluated. The main facts about the evaluation results are now summarized

    D6.2 Integrated Final Version of the Components for Lexical Acquisition

    Get PDF
    The PANACEA project has addressed one of the most critical bottlenecks that threaten the development of technologies to support multilingualism in Europe, and to process the huge quantity of multilingual data produced annually. Any attempt at automated language processing, particularly Machine Translation (MT), depends on the availability of language-specific resources. Such Language Resources (LR) contain information about the language\u27s lexicon, i.e. the words of the language and the characteristics of their use. In Natural Language Processing (NLP), LRs contribute information about the syntactic and semantic behaviour of words - i.e. their grammar and their meaning - which inform downstream applications such as MT. To date, many LRs have been generated by hand, requiring significant manual labour from linguistic experts. However, proceeding manually, it is impossible to supply LRs for every possible pair of European languages, textual domain, and genre, which are needed by MT developers. Moreover, an LR for a given language can never be considered complete nor final because of the characteristics of natural language, which continually undergoes changes, especially spurred on by the emergence of new knowledge domains and new technologies. PANACEA has addressed this challenge by building a factory of LRs that progressively automates the stages involved in the acquisition, production, updating and maintenance of LRs required by MT systems. The existence of such a factory will significantly cut down the cost, time and human effort required to build LRs. WP6 has addressed the lexical acquisition component of the LR factory, that is, the techniques for automated extraction of key lexical information from texts, and the automatic collation of lexical information into LRs in a standardized format. The goal of WP6 has been to take existing techniques capable of acquiring syntactic and semantic information from corpus data, improving upon them, adapting and applying them to multiple languages, and turning them into powerful and flexible techniques capable of supporting massive applications. One focus for improving the scalability and portability of lexical acquisition techniques has been to extend exiting techniques with more powerful, less "supervised" methods. In NLP, the amount of supervision refers to the amount of manual annotation which must be applied to a text corpus before machine learning or other techniques are applied to the data to compile a lexicon. More manual annotation means more accurate training data, and thus a more accurate LR. However, given that it is impractical from a cost and time perspective to manually annotate the vast amounts of data required for multilingual MT across domains, it is important to develop techniques which can learn from corpora with less supervision. Less supervised methods are capable of supporting both large-scale acquisition and efficient domain adaptation, even in the domains where data is scarce. Another focus of lexical acquisition in PANACEA has been the need of LR users to tune the accuracy level of LRs. Some applications may require increased precision, or accuracy, where the application requires a high degree of confidence in the lexical information used. At other times a greater level of coverage may be required, with information about more words at the expense of some degree of accuracy. Lexical acquisition in PANACEA has investigated confidence thresholds for lexical acquisition to ensure that the ultimate users of LRs can generate lexical data from the PANACEA factory at the desired level of accuracy

    German particle verbs: Compositionality at the syntax-semantics interface

    Get PDF
    Particle verbs represent a type of multi-word expression composed of a base verb and a particle. The meaning of the particle verb is often, but not always, derived from the meaning of the base verb, sometimes in quite complex ways. In this work, we computationally assess the levels of German particle verb compositionality, with the use of distributional semantic methods. Our results demonstrate that the prediction of particle verb compositionality is possible with statistical significance. Furthermore, we investigate properties of German particle verbs that are relevant for their compositionality: the particular subcategorization behavior of particle verbs and their corresponding base verbs, and the question in how far the verb particles can be attributed meaning by themselves, which they contribute to the particle verb

    A Szintaktikai Lokalitás Minimalista Megközelítése: A szintaktikai lokalitási feltételekért felelős nyelvi alrendszerek munkamegosztásának vizsgálata = A Minimalist Approach to Syntactic Locality: A study of the division of labour of linguistic subsystems underlying syntactic locality effects

    Get PDF
    A projekt a szintaxis és az azzal érintkező grammatikai komponensek munkamegosztását vizsgálta a mozgatási és polaritás engedélyezési függőségekben jelentkező szintaktikai lokalitási hatások területén. A projektnek a generatív grammatika mai, Minimalista kutatási programjába illeszkedő radikális tézise szerint a természetes nyelvi szintaxis egyáltalán nem is tartalmaz külön lokalitási megszorítás(oka)t. Kimutattuk, hogy az általunk vizsgált, a szintaxisban jelentkező lokalitási hatások (i) a szintaktikai komputációs rendszer általános tulajdonságaiból, különösen a komputációs komplexitása minimalizálásának igényéből, valamint (ii) a szintaxis és a vele érintkező grammatikai alrendszerek munkamegosztásából fakadnak. A projekt olyan területeken vizsgálta a lokalitási hatások természetét, mint a főnévi kifejezések által képviselt szigetek, a szintaktikai fejmozgatás, a kvantorhatókör-értelmezés, a fókuszálás, a határozói módosítás, a mondatbeágyazás, a preszuppozíciós, a tagadó és a kérdő típusú gyenge szigetek, és egyes, a polaritásengedélyezésben szerepet játszó intervenciós hatások. A több nemzetközi együttműködést is kezdeményező kutatócsoport munkájának sikerességét a számos jelentős publikáció, köztük egy sor nemzetközi folyóiratcikk és nagy presztízsű nemzetközi kiadónál megjelenő könyvfejezet is jelzi. A kutatás keretében egy megvédett DSc értekezés és egy leadott PhD disszertáció is született, és egy további doktori disszertáció készül el még ebben az évben. | This project studied the division of labour between syntax and its interface subsystems in giving rise to some of the central syntactic locality properties of dependencies like movement and polarity licensing. Implementing the current Minimalist research program of transformational generative grammar, it explored the radical proposal that natural language syntax itself includes no special syntactic locality conditions per se. Instead, the locality effects under scrutiny are reduced to (i) the elementary properties of the syntactic computational system, including its quest to keep computational complexity to a minimum, which in turn subsumes its cyclic mapping to the interpretive systems of sound and meaning; and (ii) the division of labour between syntax and the interface subsystems, in particular, semantics and information structure. The topics investigated include the locality effects involved in noun phrase islands, syntactic head movement, quantifier scope interpretation, focusing, adverbial modification, clausal embedding, weak islands like presuppositional, negative, and wh-islands, and some apparent intervention effects in polarity licensing. The project established fruitful international co-operations, and its results have appeared in the form of a number of international journal and book chapter publications. The project has also yielded a completed PhD dissertation, a PhD thesis to be submitted later this year, and a DSc dissertation

    D6.1: Technologies and Tools for Lexical Acquisition

    Get PDF
    This report describes the technologies and tools to be used for Lexical Acquisition in PANACEA. It includes descriptions of existing technologies and tools which can be built on and improved within PANACEA, as well as of new technologies and tools to be developed and integrated in PANACEA platform. The report also specifies the Lexical Resources to be produced. Four main areas of lexical acquisition are included: Subcategorization frames (SCFs), Selectional Preferences (SPs), Lexical-semantic Classes (LCs), for both nouns and verbs, and Multi-Word Expressions (MWEs)

    Proceedings of the Conference on Natural Language Processing 2010

    Get PDF
    This book contains state-of-the-art contributions to the 10th conference on Natural Language Processing, KONVENS 2010 (Konferenz zur Verarbeitung natürlicher Sprache), with a focus on semantic processing. The KONVENS in general aims at offering a broad perspective on current research and developments within the interdisciplinary field of natural language processing. The central theme draws specific attention towards addressing linguistic aspects ofmeaning, covering deep as well as shallow approaches to semantic processing. The contributions address both knowledgebased and data-driven methods for modelling and acquiring semantic information, and discuss the role of semantic information in applications of language technology. The articles demonstrate the importance of semantic processing, and present novel and creative approaches to natural language processing in general. Some contributions put their focus on developing and improving NLP systems for tasks like Named Entity Recognition or Word Sense Disambiguation, or focus on semantic knowledge acquisition and exploitation with respect to collaboratively built ressources, or harvesting semantic information in virtual games. Others are set within the context of real-world applications, such as Authoring Aids, Text Summarisation and Information Retrieval. The collection highlights the importance of semantic processing for different areas and applications in Natural Language Processing, and provides the reader with an overview of current research in this field
    corecore