32 research outputs found

    Incremental Coreference Resolution for German

    Full text link
    The main contributions of this thesis are as follows: 1. We introduce a general model for coreference and explore its application to German. • The model features an incremental discourse processing algorithm which allows it to coherently address issues caused by underspecification of mentions, which is an especially pressing problem regarding certain German pronouns. • We introduce novel features relevant for the resolution of German pronouns. A subset of these features are made accessible through the incremental architecture of the discourse processing model. • In evaluation, we show that the coreference model combined with our features provides new state-of-the-art results for coreference and pronoun resolution for German. 2. We elaborate on the evaluation of coreference and pronoun resolution. • We discuss evaluation from the view of prospective downstream applications that benefit from coreference resolution as a preprocessing component. Addressing the shortcomings of the general evaluation framework in this regard, we introduce an alternative framework, the Application Related Coreference Scores (ARCS). • The ARCS framework enables a thorough comparison of different system outputs and the quantification of their similarities and differences beyond the common coreference evaluation. We demonstrate how the framework is applied to state-of-the-art coreference systems. This provides a method to track specific differences in system outputs, which assists researchers in comparing their approaches to related work in detail. 3. We explore semantics for pronoun resolution. • Within the introduced coreference model, we explore distributional approaches to estimate the compatibility of an antecedent candidate and the occurrence context of a pronoun. We compare a state-of-the-art approach for word embeddings to syntactic co-occurrence profiles to this end. • In comparison to related work, we extend the notion of context and thereby increase the applicability of our approach. We find that a combination of both compatibility models, coupled with the coreference model, provides a large potential for improving pronoun resolution performance. We make available all our resources, including a web demo of the system, at: http://pub.cl.uzh.ch/purl/coreference-resolutio

    Can humain association norm evaluate latent semantic analysis?

    Get PDF
    This paper presents the comparison of word association norm created by a psycholinguistic experiment to association lists generated by algorithms operating on text corpora. We compare lists generated by Church and Hanks algorithm and lists generated by LSA algorithm. An argument is presented on how those automatically generated lists reflect real semantic relations

    A definite clause grammatical inversion of extended Montague semantics

    Get PDF

    A common semantic space for monolingual and cross-lingual meta-embeddings

    Get PDF
    This master’s thesis presents a new technique for creating monolingual and cross-lingual meta-embeddings. Our method integrates multiple word embeddings created from complementary techniques, textual sources, knowledge bases and languages. Existing word vectors are projected to a common semantic space using linear transformations and averaging. With our method the resulting meta-embeddings maintain the dimensionality of the original embeddings without losing information while dealing with the out-of-vocabulary (OOV) problem. Furthermore, empirical evaluation demonstrates the effectiveness of our technique with respect to previous work on various intrinsic and extrinsic multilingual evaluations

    A grammar of the Pendau language of central Sulawesi, Indonesia

    Get PDF

    Computer-aided biomimetics : semi-open relation extraction from scientific biological texts

    Get PDF
    Engineering inspired by biology – recently termed biom* – has led to various ground-breaking technological developments. Example areas of application include aerospace engineering and robotics. However, biom* is not always successful and only sporadically applied in industry. The reason is that a systematic approach to biom* remains at large, despite the existence of a plethora of methods and design tools. In recent years computational tools have been proposed as well, which can potentially support a systematic integration of relevant biological knowledge during biom*. However, these so-called Computer-Aided Biom* (CAB) tools have not been able to fill all the gaps in the biom* process. This thesis investigates why existing CAB tools fail, proposes a novel approach – based on Information Extraction – and develops a proof-of-concept for a CAB tool that does enable a systematic approach to biom*. Key contributions include: 1) a disquisition of existing tools guides the selection of a strategy for systematic CAB, 2) a dataset of 1,500 manually-annotated sentences, 3) a novel Information Extraction approach that combines the outputs from a supervised Relation Extraction system and an existing Open Information Extraction system. The implemented exploratory approach indicates that it is possible to extract a focused selection of relations from scientific texts with reasonable accuracy, without imposing limitations on the types of information extracted. Furthermore, the tool developed in this thesis is shown to i) speed up a trade-off analysis by domain-experts, and ii) also improve the access to biology information for non-exper

    Computer-Aided Biomimetics : Semi-Open Relation Extraction from scientific biological texts

    Get PDF
    Engineering inspired by biology – recently termed biom* – has led to various groundbreaking technological developments. Example areas of application include aerospace engineering and robotics. However, biom* is not always successful and only sporadically applied in industry. The reason is that a systematic approach to biom* remains at large, despite the existence of a plethora of methods and design tools. In recent years computational tools have been proposed as well, which can potentially support a systematic integration of relevant biological knowledge during biom*. However, these so-called Computer-Aided Biom* (CAB) tools have not been able to fill all the gaps in the biom* process. This thesis investigates why existing CAB tools fail, proposes a novel approach – based on Information Extraction – and develops a proof-of-concept for a CAB tool that does enable a systematic approach to biom*. Key contributions include: 1) a disquisition of existing tools guides the selection of a strategy for systematic CAB, 2) a dataset of 1,500 manually-annotated sentences, 3) a novel Information Extraction approach that combines the outputs from a supervised Relation Extraction system and an existing Open Information Extraction system. The implemented exploratory approach indicates that it is possible to extract a focused selection of relations from scientific texts with reasonable accuracy, without imposing limitations on the types of information extracted. Furthermore, the tool developed in this thesis is shown to i) speed up a trade-off analysis by domain-experts, and ii) also improve the access to biology information for nonexperts

    A grammar of the Pendau language

    No full text
    This dissertation is a basic description of Pendau, a previously undescribed Western Malayo-Polynesian language in the Tomini-Tolitoli group found in Central Sulawesi, Indonesia. This description relies heavily on natural language data for its documentation. Most of the description covers concerns in the typological functional framework which also provides a means to organize the data. Chapter 1 gives a brief ethnographic background and introduces the linguistic context and background to Pendau. Very little had been known about Pendau until this current research. Chapter 2 describes the phonetics and the basic phonology of Pendau in essentially a structuralist framework. However it includes acoustic analyses of stress (non-phonemic pitch-accent with low to high tone), vowel formants, and the glottal stop (which is often manifested as creaky voice). Chapter 3 builds on chapter 2 by examining the phonology in a generative framework and looks at the phonological processes via lexical phonology. The most outstanding feature of phonology in Pendau is the extensive use of vowel harmony in many prefixes. Chapter 4 discusses the morphology of Pendau and the complicated stem forming morphology. At the morphological level I take a non-morphemic view and integrate a kind of word and paradigm approach in connection with lexical phonology and autosegmental phonology. Word classes are introduced in chapter 5, and chapter 6 introduces basic clausal syntax and includes a discussion of grammatical relations. Both of these chapters are fundamental to understanding the description in later parts of the dissertation. Chapter 7 discusses nominal phrases. Chapter 8 discusses prepositional and instrumental phrases. Chapter 9 describes the seven canonical verb classes and miscellaneous verb morphology. Chapter 10 describes transitivity altering operations which include causatives, applicatives, reciprocals, and a special equative construction. Chapter 11 describes directional verbs and their use as directional serial verbs. Chapter 12 describes the importance of voice and introduces the use of inverse voice which contrasts with the other transitive voice construction, active voice. Tense, aspect, and mode are described in chapter 13. Auxiliaries, adverbs, and negation are described in chapter 14. Chapter 15 describes clause combinations and complex sentences, and includes comparatives, complementation, quotation margin formulas, relative clauses, interclausal relators and propositional relations, and discourse connectors. Chapter 16 describes the use of imperatives and interrogatives. Finally, at the discourse level I integrate several discourse methods with the strongest emphases coming from Longacre and Givon. Chapter 17 describes some discourse features of cohesion and prominence. This includes fronting, leftdislocation, repetition, and topic continuity. In chapter 18 I follow Longacre's approach to discourse analysis and describe structures of different genres in Pendau. Three interlinearized texts are included in the appendices. The other appendices provide supporting data, figures, tables and charts
    corecore