564 research outputs found

    Real-Time Event Analysis and Spatial Information Extraction From Text Using Social Media Data

    Get PDF
    Since the advent of websites that enable users to participate and interact with each other by sharing content in different forms, a plethora of possibly relevant information is at scientists\u27 fingertips. Consequently, this thesis elaborates on two distinct approaches to extract valuable information from social media data and sketches out the potential joint use case in the domain of natural disasters

    Developing a knowledge base for preposition sense disambiguation: A view from Role and Reference Grammar and FunGramKB

    Full text link
    Prepositions represent a grammatical category of frequent use in many European languages. The combination of their semantics with other lexical categories usually makes them difficult to be computationally tractable. As far as natural language processing is concerned, some studies have contributed to make progress on the usage of prepositions. However, there still exists a need to develop a model that allows tackling the problems which result from the disambiguation of prepositional semantics. The goal of this paper is to describe a lexico-conceptual model which can store the knowledge required to disambiguate predicate prepositions, as well as how this model can be exploited by a parser to extract the semantic representation of a text. The theoretical foundation of this approach, which is grounded on the premises of Role and Reference Grammar and FunGramKB, is illustrated with temporal adjuncts expressed by prepositional phrases in English.Financial support for this research has been provided by the DGI, Spanish Ministry of Education and Science, grant FFI2011-29798-C02-01. Moreover, much of this work has resulted from the first author's ongoing PhD thesis "La desambiguacion semantica de los sintagmas prepositivos como adjuntos perifericos en el marco de la Gramatica del Papel y la Referencia: un enfoque desde la linguistica computacional y la ingenieria del conocimiento", to be presented in Universidad Nacional de Educacion a Distancia (UNED).Hernández-Pastor, D.; Periñán Pascual, JC. (2016). Developing a knowledge base for preposition sense disambiguation: A view from Role and Reference Grammar and FunGramKB. Onomázein : Revista de Lingüística, Filología y Traducción. 33:251-288. https://doi.org/10.7764/onomazein.33.16S2512883

    Transfer in the Verbmobil demonstrator

    Get PDF

    Context-sensitive interpretation of natural language location descriptions : a thesis submitted in partial fulfilment of the requirements for the award of Doctor of Philosophy in Information Technology at Massey University, Auckland, New Zealand

    Get PDF
    People frequently describe the locations of objects using natural language. Location descriptions may be either structured, such as 26 Victoria Street, Auckland, or unstructured. Relative location descriptions (e.g., building near Sky Tower) are a common form of unstructured location description, and use qualitative terms to describe the location of one object relative to another (e.g., near, close to, in, next to). Understanding the meaning of these terms is easy for humans, but much more difficult for machines since the terms are inherently vague and context sensitive. In this thesis, we study the semantics (or meaning) of qualitative, geospatial relation terms, specifically geospatial prepositions. Prepositions are one of the most common forms of geospatial relation term, and they are commonly used to describe the location of objects in the geographic (geospatial) environment, such as rivers, mountains, buildings, and towns. A thorough understanding of the semantics of geospatial relation terms is important because it enables more accurate automated georeferencing of text location descriptions than use of place names only. Location descriptions that use geospatial prepositions are found in social media, web sites, blogs, and academic reports, and georeferencing can allow mapping of health, disaster and biological data that is currently inaccessible to the public. Such descriptions have unstructured format, so, their analysis is not straightforward. The specific research questions that we address are: RQ1. Which geospatial prepositions (or groups of prepositions) and senses are semantically similar? RQ2. Is the role of context important in the interpretation of location descriptions? RQ3. Is the object distance associated with geospatial prepositions across a range of geospatial scenes and scales accurately predictable using machine learning methods? RQ4. Is human annotation a reliable form of annotation for the analysis of location descriptions? To address RQ1, we determine the nature and degree of similarity among geospatial prepositions by analysing data collected with a human subjects experiment, using clustering, extensional mapping and t-stochastic neighbour embedding (t-SNE) plots to form a semantic similarity matrix. In addition to calculating similarity scores among prepositions, we identify the senses of three groups of geospatial prepositions using Venn diagrams, t-sne plots and density-based clustering, and define the relationships between the senses. Furthermore, we use two text mining approaches to identify the degree of similarity among geospatial prepositions: bag of words and GloVe embeddings. By using these methods and further analysis, we identify semantically similar groups of geospatial prepositions including: 1- beside, close to, near, next to, outside and adjacent to; 2- across, over and through and 3- beyond, past, by and off. The prepositions within these groups also share senses. Through is recognised as a specialisation of both across and over. Proximity and adjacency prepositions also have similar senses that express orientation and overlapping relations. Past, off and by share a proximal sense but beyond has a different sense from these, representing on the other side. Another finding is the more frequent use of the preposition close to for pairs of linear objects than near, which is used more frequently for non-linear ones. Also, next to is used to describe proximity more than touching (in contrast to other prepositions like adjacent to). Our application of text mining to identify semantically similar prepositions confirms that a geospatial corpus (NCGL) provides a better representation of the semantics of geospatial prepositions than a general corpus. Also, we found that GloVe embeddings provide adequate semantic similarity measures for more specialised geospatial prepositions, but less so for those that have more generalised applications and multiple senses. We explore the role of context (RQ2) by studying three sites that vary in size, nature, and context in London: Trafalgar Square, Buckingham Palace, and Hyde Park. We use the Google search engine to extract location descriptions that contain these three sites with 9 different geospatial prepositions (in, on, at, next to, close to, adjacent to, near, beside, outside) and calculate their acceptance profiles (the profile of the use of a preposition at different distances from the reference object) and acceptance thresholds (maximum distance from a reference object at which a preposition can acceptably be used). We use these to compare prepositions, and to explore the influence of different contexts. Our results show that near, in and outside are used for larger distances, while beside, adjacent to and at are used for smaller distances. Also, the acceptance threshold for close to is higher than for other proximity/adjacency prepositions such as next to, adjacent to and beside. The acceptance threshold of next to is larger than adjacent to, which confirms the findings in ‎Chapter 2 which identifies next to describing a proximity rather than touching spatial relation. We also found that relatum characteristics such as image schema affect the use of prepositions such as in, on and at. We address RQ3 by developing a machine learning regression model (using the SMOReg algorithm) to predict the distance associated with use of geospatial prepositions in specific expressions. We incorporate a wide range of input variables including the similarity matrix of geospatial prepositions (RQ1); preposition senses; semantic information in the form of embeddings; characteristics of the located and reference objects in the expression including their liquidity/solidity, scale and geometry type and contextual factors such as the density of features of different types in the surrounding area. We evaluate the model on two different datasets with 25% improvement against the best baseline respectively. Finally, we consider the importance of annotation of geospatial location descriptions (RQ4). As annotated data is essential for the successful study of automated interpretation of natural language descriptions, we study the impact and accuracy of human annotation on different geospatial elements. Agreement scores show that human annotators can annotate geospatial relation terms (e.g., geospatial prepositions) with higher agreement than other geospatial elements. This thesis advances understanding of the semantics of geospatial prepositions, particularly considering their semantic similarity and the impact of context on their interpretation. We quantify the semantic similarity of a set of 24 geospatial prepositions; identify senses and the relationships among them for 13 geospatial prepositions; compare the acceptance thresholds of 9 geospatial prepositions and describe the influence of context on them; and demonstrate that richer semantic and contextual information can be incorporated in predictive models to interpret relative geospatial location descriptions more accurately

    Prepositional clusters: investigative-oriented learning and English language teaching

    Get PDF
    This thesis attempts to do three things. Firstly, it attempts to bring a new contribution to knowledge about fixed idiomatic expressions in English, by demonstrating that the overall meaning of such expressions need not always be conveyed by the presence of lexical words. Linguistic observation of natural and authentic language use has shown the existence of fixed idiomatic expressions consisting solely of grammatical words and possessing a particular overall meaning. Some of these expressions commonly found in everyday informal communication (written and spoken) are "this and/or that", "either or ", "round and round ", "ups and downs ", "on and off', etc. Secondly, the thesis will seek to illustrate through descriptive analysis that fixed expressions consisting solely of grammatical words can be called "units of meaning", using Sinclair's (1991a) position regarding form and meaning. Thus, a part of the thesis will be devoted to investigating the lexico-grammatical behaviour of such expressions. The analysis focuses solely on prepositional clusters, whose frequent usage in informal spoken and written communication makes them suitable for investigation. These prepositional clusters are composed of prepositions or words that can function as prepositions, and formed as a result of the common syntactic patterns in which they occur. Besides analysing cluster patterns that are composed solely of prepositions or words that can function as prepositions, other clusters which are composed of prepositions with adjectives/adverbs and nouns are included in the investigation, for purposes of comparison. Hence, the prepositional cluster patterns analysed in this study are: a) Prep+and+Prep (egs. ins and outs, up and down), b) Prep+Prep (egs. roundabout, upside down, inside out), c) Prep+Adv/Adj (egs. at most, at least), d) Adj/Adv+Prep (egs. excited about, worried about, angry about), e) Noun+ Prep (egs. Reason for, request for, excuse for) and f) Prep+Noun (egs. by mistake, by chance, by coincidence). In examining the lexico-grammatical behaviour of prepositional cluster patterns, I have applied linguistic principles from both Corpus Analysis and Cognitive Semantics. This approach, which combines two fields of linguistics, lends more depth to the analysis. While principles of Corpus Analysis are useful in determining common meaning usages and grammatical functions of prepositional clusters, principles of Cognitive Semantics are able to extend the interpretation of the meaning usages, with regard to metaphoricity. Consequently, I will utilise the principles in both fields to suggest a semantic representation of all the prepositional clusters analysed in the study, based on a superordinate classification rather than on a network one. The third and final part of the thesis seeks to apply the lexicogrammatical findings and the linguistic principles used in the study to pedagogy. More specifically, these findings, together with the linguistic principles of Corpus Analysis and Cognitive Semantics, have been utilised to construct activities which demonstrate a particular ELT methodology, which I have termed Investigative-Oriented Learning (IOL). IOL is meant to address the limitation of Communicative Teaching in developing investigative questioning in language learners. The aim of IOL thus is to empower learners with skills of Conscious Investigation which may enable them to be sensitive to patterns of language, and to their idiomatic and metaphorical meanings and grammatical functions. Prepositional clusters, which illustrate idiomaticity and metaphoricity in authentic language use, have been used as an example of language patterns to illustrate the methodology behind IOL

    Prepositional clusters: investigative-oriented learning and English language teaching

    Get PDF
    This thesis attempts to do three things. Firstly, it attempts to bring a new contribution to knowledge about fixed idiomatic expressions in English, by demonstrating that the overall meaning of such expressions need not always be conveyed by the presence of lexical words. Linguistic observation of natural and authentic language use has shown the existence of fixed idiomatic expressions consisting solely of grammatical words and possessing a particular overall meaning. Some of these expressions commonly found in everyday informal communication (written and spoken) are "this and/or that", "either or ", "round and round ", "ups and downs ", "on and off', etc. Secondly, the thesis will seek to illustrate through descriptive analysis that fixed expressions consisting solely of grammatical words can be called "units of meaning", using Sinclair's (1991a) position regarding form and meaning. Thus, a part of the thesis will be devoted to investigating the lexico-grammatical behaviour of such expressions. The analysis focuses solely on prepositional clusters, whose frequent usage in informal spoken and written communication makes them suitable for investigation. These prepositional clusters are composed of prepositions or words that can function as prepositions, and formed as a result of the common syntactic patterns in which they occur. Besides analysing cluster patterns that are composed solely of prepositions or words that can function as prepositions, other clusters which are composed of prepositions with adjectives/adverbs and nouns are included in the investigation, for purposes of comparison. Hence, the prepositional cluster patterns analysed in this study are: a) Prep+and+Prep (egs. ins and outs, up and down), b) Prep+Prep (egs. roundabout, upside down, inside out), c) Prep+Adv/Adj (egs. at most, at least), d) Adj/Adv+Prep (egs. excited about, worried about, angry about), e) Noun+ Prep (egs. Reason for, request for, excuse for) and f) Prep+Noun (egs. by mistake, by chance, by coincidence). In examining the lexico-grammatical behaviour of prepositional cluster patterns, I have applied linguistic principles from both Corpus Analysis and Cognitive Semantics. This approach, which combines two fields of linguistics, lends more depth to the analysis. While principles of Corpus Analysis are useful in determining common meaning usages and grammatical functions of prepositional clusters, principles of Cognitive Semantics are able to extend the interpretation of the meaning usages, with regard to metaphoricity. Consequently, I will utilise the principles in both fields to suggest a semantic representation of all the prepositional clusters analysed in the study, based on a superordinate classification rather than on a network one. The third and final part of the thesis seeks to apply the lexicogrammatical findings and the linguistic principles used in the study to pedagogy. More specifically, these findings, together with the linguistic principles of Corpus Analysis and Cognitive Semantics, have been utilised to construct activities which demonstrate a particular ELT methodology, which I have termed Investigative-Oriented Learning (IOL). IOL is meant to address the limitation of Communicative Teaching in developing investigative questioning in language learners. The aim of IOL thus is to empower learners with skills of Conscious Investigation which may enable them to be sensitive to patterns of language, and to their idiomatic and metaphorical meanings and grammatical functions. Prepositional clusters, which illustrate idiomaticity and metaphoricity in authentic language use, have been used as an example of language patterns to illustrate the methodology behind IOL

    Exploring Linguistic Constraints in Nlp Applications

    Get PDF
    The key argument of this dissertation is that the success of an Natural Language Processing (NLP) application depends on a proper representation of the corresponding linguistic problem. This theme is raised in the context that the recent progress made in our field is widely credited to the effective use of strong engineering techniques. However, the intriguing power of highly lexicalized models shown in many NLP applications is not only an achievement by the development in machine learning, but also impossible without the extensive hand-annotated data resources made available, which are originally built with very deep linguistic considerations. More specifically, we explore three linguistic aspects in this dissertation: the distinction between closed-class vs. open-class words, long-tail distributions in vocabulary study and determinism in language models. The first two aspects are studied in unsupervised tasks, unsupervised part-of-speech (POS) tagging and morphology learning, and the last one is studied in supervised tasks, English POS tagging and Chinese word segmentation. Each linguistic aspect under study manifests itself in a (different) way to help improve performance or efficiency in some NLP application

    Proceedings of the Conference on Natural Language Processing 2010

    Get PDF
    This book contains state-of-the-art contributions to the 10th conference on Natural Language Processing, KONVENS 2010 (Konferenz zur Verarbeitung natürlicher Sprache), with a focus on semantic processing. The KONVENS in general aims at offering a broad perspective on current research and developments within the interdisciplinary field of natural language processing. The central theme draws specific attention towards addressing linguistic aspects ofmeaning, covering deep as well as shallow approaches to semantic processing. The contributions address both knowledgebased and data-driven methods for modelling and acquiring semantic information, and discuss the role of semantic information in applications of language technology. The articles demonstrate the importance of semantic processing, and present novel and creative approaches to natural language processing in general. Some contributions put their focus on developing and improving NLP systems for tasks like Named Entity Recognition or Word Sense Disambiguation, or focus on semantic knowledge acquisition and exploitation with respect to collaboratively built ressources, or harvesting semantic information in virtual games. Others are set within the context of real-world applications, such as Authoring Aids, Text Summarisation and Information Retrieval. The collection highlights the importance of semantic processing for different areas and applications in Natural Language Processing, and provides the reader with an overview of current research in this field
    • …
    corecore