3,944 research outputs found
Exploring Metaphorical Senses and Word Representations for Identifying Metonyms
A metonym is a word with a figurative meaning, similar to a metaphor. Because
metonyms are closely related to metaphors, we apply features that are used
successfully for metaphor recognition to the task of detecting metonyms. On the
ACL SemEval 2007 Task 8 data with gold standard metonym annotations, our system
achieved 86.45% accuracy on the location metonyms. Our code can be found on
GitHub.Comment: 9 pages, 8 pages conten
Recommended from our members
Where are you talking about? Advances and Challenges of Geographic Analysis of Text with Application to Disease Monitoring
The Natural Language Processing task we focus on in this thesis is Geoparsing. Geoparsing is the process of extraction and grounding of toponyms (place names). Consider this sentence: "The victims of the Spanish earthquake off the coast of Malaga were of American and Mexican origin." Four toponyms will be extracted (called Geotagging) and grounded to their geographic coordinates (called Toponym Resolution). However, our research goes further than any previous work by showing how to distinguish the literal place(s) of the event (Spain, Malaga) from other linguistic types/uses such as nationalities (Mexican, American), improving downstream task accuracy. We consolidate and extend the Standard Evaluation Framework, discuss key research problems, then present concrete solutions in order to advance each stage of geoparsing. For geotagging, as well as training a SOTA neural Location-NER tagger, we simplify Metonymy Resolution with a novel minimalist feature extraction combined with an LSTM-based classifier, matching SOTA results. For toponym resolution, we deploy the latest deep learning methods to achieve SOTA performance by augmenting neural models with hitherto unused geographic features called Map Vectors. With each research project, we provide high-quality datasets and system prototypes, further building resources in this field. We then show how these geoparsing advances coupled with our proposed Intra-Document Analysis can be used to associate news articles with locations in order to monitor the spread of public health threats. To this end, we evaluate our research contributions with production data from a real-time downstream application to improve geolocation of news events for disease monitoring. The data was made available to us by the Joint Research Centre (JRC), which operates one such system called MediSys that processes incoming news articles in order to monitor threats to public health and make these available to a variety of governmental, business and non-profit organisations. We also discuss steps towards an end-to-end, automated news monitoring system and make actionable recommendations for future work. In summary, the thesis aims are twofold: (1) Generate original geoparsing research aimed at advancing each stage of the pipeline by addressing pertinent challenges with concrete solutions and actionable proposals. (2) Demonstrate how this research can be applied to news event monitoring to increase the efficacy of existing biosurveillance systems, e.g. European Commission’s MediSys.I was generously funded by DREAM CDT, which was funded by NERC of UKRI
German Perception Verbs: Automatic Classification of Prototypical and Multiple Non-literal Meanings
This paper presents a token-based automatic classification of German perception verbs into literal vs. multiple non-literal senses. Based on a corpus-based dataset of German perception verbs and their systematic meaning shifts, we identify one verb of each of the four perception classes optical, acoustic, olfactory, haptic, and use Decision Trees relying on syntactic and semantic corpus-based features to classify the verb uses into 3-4 senses each. Our classifier reaches accuracies between 45.5% and 69.4%, in comparison to baselines between 27.5% and 39.0%. In three out of four cases analyzed our classifier’s accuracy is significantly higher than the according baseline
A pragmatic guide to geoparsing evaluation
Abstract: Empirical methods in geoparsing have thus far lacked a standard evaluation framework describing the task, metrics and data used to compare state-of-the-art systems. Evaluation is further made inconsistent, even unrepresentative of real world usage by the lack of distinction between the different types of toponyms, which necessitates new guidelines, a consolidation of metrics and a detailed toponym taxonomy with implications for Named Entity Recognition (NER) and beyond. To address these deficiencies, our manuscript introduces a new framework in three parts. (Part 1) Task Definition: clarified via corpus linguistic analysis proposing a fine-grained Pragmatic Taxonomy of Toponyms. (Part 2) Metrics: discussed and reviewed for a rigorous evaluation including recommendations for NER/Geoparsing practitioners. (Part 3) Evaluation data: shared via a new dataset called GeoWebNews to provide test/train examples and enable immediate use of our contributions. In addition to fine-grained Geotagging and Toponym Resolution (Geocoding), this dataset is also suitable for prototyping and evaluating machine learning NLP models
An investigation into figurative language in the ‘LOLITA' NLP system
The classical and folk theory view on metaphor and figurative language assumes that metaphor is a rare occurrence, restricted to the realms of poetry and rhetoric. Recent results have, however, unarguably shown that figurative language of various complexity exhibits great systematicity and is pervasive in everyday language and texts. If the ubiquity of figurative language cannot be disputed, however, any natural language processing (NLP) system aiming at processing text beyond a restricted scope has to be able to deal with figurative language. This is particularly true if the processing is to be based on deep techniques, where a deep analysis of the input is performed. The LOLITA NLP system employs deep techniques and, therefore, must be capable of dealing with figurative input. The task of natural language (NL) generation is affected by the naturalness of figurative language, too. For if metaphors are frequent and natural, NL generation not capable of handling figurative language will seem restricted and its output unnatural. This thesis describes the work undertaken to examine the options for extending the LOLITA system in the direction of figurative language processing and the results of this project. The work critically examines previous approaches and their contribution to the field, before outlining a solution which follows the principles of natural language engineering
Fine-grained Dutch named entity recognition
This paper describes the creation of a fine-grained named entity annotation scheme and corpus for Dutch, and experiments on automatic main type and subtype named entity recognition. We give an overview of existing named entity annotation schemes, and motivate our own, which describes six main types (persons, organizations, locations, products, events and miscellaneous named entities) and finer-grained information on subtypes and metonymic usage. This was applied to a one-million-word subset of the Dutch SoNaR reference corpus. The classifier for main type named entities achieves a micro-averaged F-score of 84.91 %, and is publicly available, along with the corpus and annotations
Taxonomy for Humans or Computers? Cognitive Pragmatics for Big Data
Criticism of big data has focused on showing that more is not necessarily better, in the sense that data may lose their value when taken out of context and aggregated together. The next step is to incorporate an awareness of pitfalls for aggregation into the design of data infrastructure and institutions. A common strategy minimizes aggregation errors by increasing the precision of our conventions for identifying and classifying data. As a counterpoint, we argue that there are pragmatic trade-offs between precision and ambiguity that are key to designing effective solutions for generating big data about biodiversity. We focus on the importance of theory-dependence as a source of ambiguity in taxonomic nomenclature and hence a persistent challenge for implementing a single, long-term solution to storing and accessing meaningful sets of biological specimens. We argue that ambiguity does have a positive role to play in scientific progress as a tool for efficiently symbolizing multiple aspects of taxa and mediating between conflicting hypotheses about their nature. Pursuing a deeper understanding of the trade-offs and synthesis of precision and ambiguity as virtues of scientific language and communication systems then offers a productive next step for realizing sound, big biodiversity data services
Herstellung eines Phaffia rhodozyma : Stamms mit verstärkter Astaxanthin-Synthese über gezielte genetische Modifikation chemisch mutagenisierter Stämme
Ziel dieser Arbeit war es erstmals durch eine Kombination aus chemischer Mutagenese und gezielter genetischer Modifikation (hier: „metabolic engineering“) einen Phaffia-Stamm herzustellen, welcher über die Mutagenese hinaus über eine weiter verstärkte Astaxanthin-Synthese verfügt.
Die von „DSM Nutritional Products“ bereitgestellten chemischen Mutanten wurden analysiert und über einen Selektionsprozess auf Pigmentstabilität und Wachstum hin optimiert, da die Stämme aus cryogenisierter Dauerkultur starke Pigmentinstabilitäten und ein verzögertes Wachstum aufwiesen.
Über eine exploratorische Phase wurde die Carotinoidsynthese analysiert und festgestellt, dass in den Mutanten keine Einzelreaktionen betroffen sind, welche für die Heraufregulierung der Carotinoidsynthese in den Mutanten verantwortlich sind. Hierbei wurden Limitierungen identifiziert und diese durch Transformation von Expressionsplasmiden mit geeigneten Genen aufgehoben, um damit eine noch effizientere Metabolisierung von Astaxanthin-Vorstufen hin zu Astaxanthin zu erreichen. Eine Überexpression der Phytoensynthase/Lycopinzyklase crtYB resultierte in einem gesteigerten Carotinoidgehalt bei gleichbleibendem Astaxanthin- Anteil. Durch eine zweite Transformation mit einer Expressionskassette für die Astaxanthin-Synthase asy konnte der Carotinoidgehalt weiter gesteigert und zusätzlich eine Limitierung der Metabolisierung von Astaxanthin-Vorstufen behoben werden, sodass die Transformante nahezu alle Intermediate der Astaxanthinsynthese zu Astaxanthin metabolisieren konnte (Gassel et al. 2013). Es konnte gezeigt werden, dass auch in den Mutanten, aus Experimenten mit dem Wildtyp bekannte, Limitierungen identifiziert und ausgeglichen werden konnten
- …