3,944 research outputs found

    Exploring Metaphorical Senses and Word Representations for Identifying Metonyms

    Full text link
    A metonym is a word with a figurative meaning, similar to a metaphor. Because metonyms are closely related to metaphors, we apply features that are used successfully for metaphor recognition to the task of detecting metonyms. On the ACL SemEval 2007 Task 8 data with gold standard metonym annotations, our system achieved 86.45% accuracy on the location metonyms. Our code can be found on GitHub.Comment: 9 pages, 8 pages conten

    Towards a balanced named entity corpus for Dutch

    Get PDF

    German Perception Verbs: Automatic Classification of Prototypical and Multiple Non-literal Meanings

    Get PDF
    This paper presents a token-based automatic classification of German perception verbs into literal vs. multiple non-literal senses. Based on a corpus-based dataset of German perception verbs and their systematic meaning shifts, we identify one verb of each of the four perception classes optical, acoustic, olfactory, haptic, and use Decision Trees relying on syntactic and semantic corpus-based features to classify the verb uses into 3-4 senses each. Our classifier reaches accuracies between 45.5% and 69.4%, in comparison to baselines between 27.5% and 39.0%. In three out of four cases analyzed our classifier’s accuracy is significantly higher than the according baseline

    A Rose is a Rose is a Rose

    Get PDF

    A pragmatic guide to geoparsing evaluation

    Get PDF
    Abstract: Empirical methods in geoparsing have thus far lacked a standard evaluation framework describing the task, metrics and data used to compare state-of-the-art systems. Evaluation is further made inconsistent, even unrepresentative of real world usage by the lack of distinction between the different types of toponyms, which necessitates new guidelines, a consolidation of metrics and a detailed toponym taxonomy with implications for Named Entity Recognition (NER) and beyond. To address these deficiencies, our manuscript introduces a new framework in three parts. (Part 1) Task Definition: clarified via corpus linguistic analysis proposing a fine-grained Pragmatic Taxonomy of Toponyms. (Part 2) Metrics: discussed and reviewed for a rigorous evaluation including recommendations for NER/Geoparsing practitioners. (Part 3) Evaluation data: shared via a new dataset called GeoWebNews to provide test/train examples and enable immediate use of our contributions. In addition to fine-grained Geotagging and Toponym Resolution (Geocoding), this dataset is also suitable for prototyping and evaluating machine learning NLP models

    An investigation into figurative language in the ‘LOLITA' NLP system

    Get PDF
    The classical and folk theory view on metaphor and figurative language assumes that metaphor is a rare occurrence, restricted to the realms of poetry and rhetoric. Recent results have, however, unarguably shown that figurative language of various complexity exhibits great systematicity and is pervasive in everyday language and texts. If the ubiquity of figurative language cannot be disputed, however, any natural language processing (NLP) system aiming at processing text beyond a restricted scope has to be able to deal with figurative language. This is particularly true if the processing is to be based on deep techniques, where a deep analysis of the input is performed. The LOLITA NLP system employs deep techniques and, therefore, must be capable of dealing with figurative input. The task of natural language (NL) generation is affected by the naturalness of figurative language, too. For if metaphors are frequent and natural, NL generation not capable of handling figurative language will seem restricted and its output unnatural. This thesis describes the work undertaken to examine the options for extending the LOLITA system in the direction of figurative language processing and the results of this project. The work critically examines previous approaches and their contribution to the field, before outlining a solution which follows the principles of natural language engineering

    Fine-grained Dutch named entity recognition

    Get PDF
    This paper describes the creation of a fine-grained named entity annotation scheme and corpus for Dutch, and experiments on automatic main type and subtype named entity recognition. We give an overview of existing named entity annotation schemes, and motivate our own, which describes six main types (persons, organizations, locations, products, events and miscellaneous named entities) and finer-grained information on subtypes and metonymic usage. This was applied to a one-million-word subset of the Dutch SoNaR reference corpus. The classifier for main type named entities achieves a micro-averaged F-score of 84.91 %, and is publicly available, along with the corpus and annotations

    Taxonomy for Humans or Computers? Cognitive Pragmatics for Big Data

    Get PDF
    Criticism of big data has focused on showing that more is not necessarily better, in the sense that data may lose their value when taken out of context and aggregated together. The next step is to incorporate an awareness of pitfalls for aggregation into the design of data infrastructure and institutions. A common strategy minimizes aggregation errors by increasing the precision of our conventions for identifying and classifying data. As a counterpoint, we argue that there are pragmatic trade-offs between precision and ambiguity that are key to designing effective solutions for generating big data about biodiversity. We focus on the importance of theory-dependence as a source of ambiguity in taxonomic nomenclature and hence a persistent challenge for implementing a single, long-term solution to storing and accessing meaningful sets of biological specimens. We argue that ambiguity does have a positive role to play in scientific progress as a tool for efficiently symbolizing multiple aspects of taxa and mediating between conflicting hypotheses about their nature. Pursuing a deeper understanding of the trade-offs and synthesis of precision and ambiguity as virtues of scientific language and communication systems then offers a productive next step for realizing sound, big biodiversity data services

    Herstellung eines Phaffia rhodozyma : Stamms mit verstärkter Astaxanthin-Synthese über gezielte genetische Modifikation chemisch mutagenisierter Stämme

    Get PDF
    Ziel dieser Arbeit war es erstmals durch eine Kombination aus chemischer Mutagenese und gezielter genetischer Modifikation (hier: „metabolic engineering“) einen Phaffia-Stamm herzustellen, welcher über die Mutagenese hinaus über eine weiter verstärkte Astaxanthin-Synthese verfügt. Die von „DSM Nutritional Products“ bereitgestellten chemischen Mutanten wurden analysiert und über einen Selektionsprozess auf Pigmentstabilität und Wachstum hin optimiert, da die Stämme aus cryogenisierter Dauerkultur starke Pigmentinstabilitäten und ein verzögertes Wachstum aufwiesen. Über eine exploratorische Phase wurde die Carotinoidsynthese analysiert und festgestellt, dass in den Mutanten keine Einzelreaktionen betroffen sind, welche für die Heraufregulierung der Carotinoidsynthese in den Mutanten verantwortlich sind. Hierbei wurden Limitierungen identifiziert und diese durch Transformation von Expressionsplasmiden mit geeigneten Genen aufgehoben, um damit eine noch effizientere Metabolisierung von Astaxanthin-Vorstufen hin zu Astaxanthin zu erreichen. Eine Überexpression der Phytoensynthase/Lycopinzyklase crtYB resultierte in einem gesteigerten Carotinoidgehalt bei gleichbleibendem Astaxanthin- Anteil. Durch eine zweite Transformation mit einer Expressionskassette für die Astaxanthin-Synthase asy konnte der Carotinoidgehalt weiter gesteigert und zusätzlich eine Limitierung der Metabolisierung von Astaxanthin-Vorstufen behoben werden, sodass die Transformante nahezu alle Intermediate der Astaxanthinsynthese zu Astaxanthin metabolisieren konnte (Gassel et al. 2013). Es konnte gezeigt werden, dass auch in den Mutanten, aus Experimenten mit dem Wildtyp bekannte, Limitierungen identifiziert und ausgeglichen werden konnten
    corecore