534 research outputs found

    An investigation into figurative language in the ‘LOLITA' NLP system

    Get PDF
    The classical and folk theory view on metaphor and figurative language assumes that metaphor is a rare occurrence, restricted to the realms of poetry and rhetoric. Recent results have, however, unarguably shown that figurative language of various complexity exhibits great systematicity and is pervasive in everyday language and texts. If the ubiquity of figurative language cannot be disputed, however, any natural language processing (NLP) system aiming at processing text beyond a restricted scope has to be able to deal with figurative language. This is particularly true if the processing is to be based on deep techniques, where a deep analysis of the input is performed. The LOLITA NLP system employs deep techniques and, therefore, must be capable of dealing with figurative input. The task of natural language (NL) generation is affected by the naturalness of figurative language, too. For if metaphors are frequent and natural, NL generation not capable of handling figurative language will seem restricted and its output unnatural. This thesis describes the work undertaken to examine the options for extending the LOLITA system in the direction of figurative language processing and the results of this project. The work critically examines previous approaches and their contribution to the field, before outlining a solution which follows the principles of natural language engineering

    A pragmatic guide to geoparsing evaluation

    Get PDF
    Abstract: Empirical methods in geoparsing have thus far lacked a standard evaluation framework describing the task, metrics and data used to compare state-of-the-art systems. Evaluation is further made inconsistent, even unrepresentative of real world usage by the lack of distinction between the different types of toponyms, which necessitates new guidelines, a consolidation of metrics and a detailed toponym taxonomy with implications for Named Entity Recognition (NER) and beyond. To address these deficiencies, our manuscript introduces a new framework in three parts. (Part 1) Task Definition: clarified via corpus linguistic analysis proposing a fine-grained Pragmatic Taxonomy of Toponyms. (Part 2) Metrics: discussed and reviewed for a rigorous evaluation including recommendations for NER/Geoparsing practitioners. (Part 3) Evaluation data: shared via a new dataset called GeoWebNews to provide test/train examples and enable immediate use of our contributions. In addition to fine-grained Geotagging and Toponym Resolution (Geocoding), this dataset is also suitable for prototyping and evaluating machine learning NLP models

    Mining Meaning from Wikipedia

    Get PDF
    Wikipedia is a goldmine of information; not just for its many readers, but also for the growing community of researchers who recognize it as a resource of exceptional scale and utility. It represents a vast investment of manual effort and judgment: a huge, constantly evolving tapestry of concepts and relations that is being applied to a host of tasks. This article provides a comprehensive description of this work. It focuses on research that extracts and makes use of the concepts, relations, facts and descriptions found in Wikipedia, and organizes the work into four broad categories: applying Wikipedia to natural language processing; using it to facilitate information retrieval and information extraction; and as a resource for ontology building. The article addresses how Wikipedia is being used as is, how it is being improved and adapted, and how it is being combined with other structures to create entirely new resources. We identify the research groups and individuals involved, and how their work has developed in the last few years. We provide a comprehensive list of the open-source software they have produced.Comment: An extensive survey of re-using information in Wikipedia in natural language processing, information retrieval and extraction and ontology building. Accepted for publication in International Journal of Human-Computer Studie

    Classifying Relations using Recurrent Neural Network with Ontological-Concept Embedding

    Get PDF
    Relation extraction and classification represents a fundamental and challenging aspect of Natural Language Processing (NLP) research which depends on other tasks such as entity detection and word sense disambiguation. Traditional relation extraction methods based on pattern-matching using regular expressions grammars and lexico-syntactic pattern rules suffer from several drawbacks including the labor involved in handcrafting and maintaining large number of rules that are difficult to reuse. Current research has focused on using Neural Networks to help improve the accuracy of relation extraction tasks using a specific type of Recurrent Neural Network (RNN). A promising approach for relation classification uses an RNN that incorporates an ontology-based concept embedding layer in addition to word embeddings. This dissertation presents several improvements to this approach by addressing its main limitations. First, several different types of semantic relationships between concepts are incorporated into the model; prior work has only considered is-a hierarchical relationships. Secondly, a significantly larger vocabulary of concepts is used. Thirdly, an improved method for concept matching was devised. The results of adding these improvements to two state-of-the-art baseline models demonstrated an improvement to accuracy when evaluated on benchmark data used in prior studies

    The head-modifier principle and multilingual term extraction

    Get PDF
    Advances in Language Engineering may be dependent on theoretical principles originating from linguistics since both share a common object of enquiry, natural language structures. We outline an approach to term extraction that rests on theoretical claims about the structure of words. We use the structural properties of compound words to specifically elicit the sets of terms defined by type hierarchies such as hyponymy and meronymy. The theoretical claims revolve around the head-modifier principle which determines the formation of a major class of compounds. Significantly it has been suggested that the principle operates in languages other than English. To demonstrate the extendibility of our approach beyond English, we present a case study of term extraction in Chinese, a language whose written form is the vehicle of communication for over 1.3 billion language users, and therefore has great significance for the development of language engineering technologies

    Representation and Inference for Open-Domain Question Answering: Strength and Limits of two Italian Semantic Lexicons

    Get PDF
    La ricerca descritta nella tesi è stata dedicata alla costruzione di un prototipo di sistema di Question Answering per la lingua italiana. Il prototipo è stato utilizzato come ambiente di valutazione dell’utilità dell’informazione codificata in due lessici semantici computazionali, ItalWordNet e SIMPLE-CLIPS. Il fine è quello di metter in evidenza ipunti di forza e ilimiti della rappresentazione dell’informazione proposta dai due lessici

    A Multi-Modal Incompleteness Ontology model (MMIO) to enhance 4 information fusion for image retrieval

    Get PDF
    This research has been supported in part by National Science and Technology Development (NSTDA), Thailand. Project No: SCH-NR2011-851

    Representation and processing of semantic ambiguity

    Get PDF
    One of the established findings in the psycholinguistic literature is that semantic ambiguity (e.g., “dog/tree bark”) slows word comprehension in neutral/ minimal context, though it is not entirely clear why this happens. Under the “semantic competition” account, this ambiguity disadvantage effect is due to competition between multiple semantic representations in the race for activation. Under the alternative “decision-making” account, it is due to decision-making difficulties in response selection. This thesis tests the two accounts by investigating in detail the ambiguity disadvantage in semantic relatedness decisions. Chapters 2-4 concentrate on homonyms, words with multiple unrelated meanings. The findings show that the ambiguity disadvantage effect arises only when the different meanings of homonyms are of comparable frequency (e.g., “football/electric fan”), and are therefore initially activated in parallel. Critically, homonymy has this effect during semantic activation of the ambiguous word, not during response selection. This finding, in particular, refutes any idea that the ambiguity disadvantage is due to decision making in response selection. Chapters 5 and 6 concentrate on polysemes, words with multiple related senses. The findings show that the ambiguity disadvantage effect arises for polysemes with irregular sense extension (e.g., “restaurant/website menu”), but not for polysemes with regular (e.g., “fluffy/marinated rabbit”) or figurative sense extension (e.g., “wooden/authoritative chair”). The latter two escape competition because they have only one semantic representation for the dominant sense, with rules of sense extension to derive the alternative sense on-line. Taken together, this thesis establishes that the ambiguity disadvantage is due to semantic competition but is restricted to some forms of ambiguity only. This is because ambiguous words differ in how their meanings are represented and processed, as delineated in this work
    corecore