Search CORE

25 research outputs found

Knowledge-Driven Implicit Information Extraction

Author: Perera Pathirage Dinindu
Publication venue: CORE Scholar
Publication date: 01/01/2016
Field of study

Natural language is a powerful tool developed by humans over hundreds of thousands of years. The extensive usage, flexibility of the language, creativity of the human beings, and social, cultural, and economic changes that have taken place in daily life have added new constructs, styles, and features to the language. One such feature of the language is its ability to express ideas, opinions, and facts in an implicit manner. This is a feature that is used extensively in day to day communications in situations such as: 1) expressing sarcasm, 2) when trying to recall forgotten things, 3) when required to convey descriptive information, 4) when emphasizing the features of an entity, and 5) when communicating a common understanding. Consider the tweet New Sandra Bullock astronaut lost in space movie looks absolutely terrifying and the text snippet extracted from a clinical narrative He is suffering from nausea and severe headaches. Dolasteron was prescribed . The tweet has an implicit mention of the entity Gravity and the clinical text snippet has implicit mention of the relationship between medication Dolasteron and clinical condition nausea . Such implicit references of the entities and the relationships are common occurrences in daily communication and they add value to conversations. However, extracting implicit constructs has not received enough attention in the information extraction literature. This dissertation focuses on extracting implicit entities and relationships from clinical narratives and extracting implicit entities from Tweets. When people use implicit constructs in their daily communication, they assume the existence of a shared knowledge with the audience about the subject being discussed. This shared knowledge helps to decode implicitly conveyed information. For example, the above Twitter user assumed that his/her audience knows that the actress Sandra Bullock starred in the movie Gravity and it is a movie about space exploration. The clinical professional who wrote the clinical narrative above assumed that the reader knows that Dolasteron is an anti-nausea drug. The audience without such domain knowledge may not have correctly decoded the information conveyed in the above examples. This dissertation demonstrates manifestations of implicit constructs in text, studies their characteristics, and develops a software solution that is capable of extracting implicit information from text. The developed solution starts by acquiring relevant knowledge to solve the implicit information extraction problem. The relevant knowledge includes domain knowledge, contextual knowledge, and linguistic knowledge. The acquired knowledge can take different syntactic forms such as a text snippet, structured knowledge represented in standard knowledge representation languages such as the Resource Description Framework (RDF) or other custom formats. Hence, the acquired knowledge is pre-processed to create models that can be processed by machines. Such models provide the infrastructure to perform implicit information extraction. This dissertation focuses on three different use cases of implicit information and demonstrates the applicability of the developed solution in these use cases. They are: 1) implicit entity linking in clinical narratives, 2) implicit entity linking in Twitter, and 3) implicit relationship extraction from clinical narratives. The evaluations are conducted on relevant annotated datasets for implicit information and they demonstrate the effectiveness of the developed solution in extracting implicit information from text

CORE

Event extraction from biomedical texts using trimmed dependency graphs

Author: Buyko Ekaterina
Publication venue
Publication date: 03/11/2012
Field of study

This thesis explores the automatic extraction of information from biomedical publications. Such techniques are urgently needed because the biosciences are publishing continually increasing numbers of texts. The focus of this work is on events. Information about events is currently manually curated from the literature by biocurators. Biocuration, however, is time-consuming and costly so automatic methods are needed for information extraction from the literature. This thesis is dedicated to modeling, implementing and evaluating an advanced event extraction approach based on the analysis of syntactic dependency graphs. This work presents the event extraction approach proposed and its implementation, the JReX (Jena Relation eXtraction) system. This system was used by the University of Jena (JULIE Lab) team in the "BioNLP 2009 Shared Task on Event Extraction" competition and was ranked second among 24 competing teams. Thereafter JReX was the highest scorer on the worldwide shared U-Compare event extraction server, outperforming the competing systems from the challenge. This success was made possible, among other things, by extensive research on event extraction solutions carried out during this thesis, e.g., exploring the effects of syntactic and semantic processing procedures on solving the event extraction task. The evaluations executed on standard and community-wide accepted competition data were complemented by real-life evaluation of large-scale biomedical database reconstruction. This work showed that considerable parts of manually curated databases can be automatically re-created with the help of the event extraction approach developed. Successful re-creation was possible for parts of RegulonDB, the world's largest database for E. coli. In summary, the event extraction approach justified, developed and implemented in this thesis meets the needs of a large community of human curators and thus helps in the acquisition of new knowledge in the biosciences

Digitale Bibliothek Thüringen

Proceedings of the 6th Joint ISO-ACL SIGSEM Workshop on Interoperable Semantic Annotation (ISA-6)

Author
Publication venue: 'Blavatnik School of Government, University of Oxford'
Publication date: 01/01/2011
Field of study

Tilburg University Repository

Proceedings of the 6th Joint ISO-ACL SIGSEM Workshop on Interoperable Semantic Annotation (ISA-6)

Author
Publication venue: 'Blavatnik School of Government, University of Oxford'
Publication date: 01/01/2011
Field of study

Tilburg University Repository

Eunomos, a legal document and knowledge management system for the Web to provide relevant, reliable and up-to-date information on the law

Author: Boella Guido
Di Caro Luigi
Humphreys Llio Bryn
Robaldo Livio
Rossi Piercarlo
van&#160
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

Institutional Research Information System University of Turin

Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020

Author
Publication venue: 'OpenEdition'
Publication date: 01/07/2022
Field of study

On behalf of the Program Committee, a very warm welcome to the Seventh Italian Conference on Computational Linguistics (CLiC-it 2020). This edition of the conference is held in Bologna and organised by the University of Bologna. The CLiC-it conference series is an initiative of the Italian Association for Computational Linguistics (AILC) which, after six years of activity, has clearly established itself as the premier national forum for research and development in the fields of Computational Linguistics and Natural Language Processing, where leading researchers and practitioners from academia and industry meet to share their research results, experiences, and challenges

Directory of Open Access Books (DOAB)

Recommended from our members

Problem-solving recognition in scientific text

Author: Heffernan Kevin
Publication venue: University of Cambridge
Publication date: 01/10/2020
Field of study

As far back as Aristotle, problems and solutions have been recognised as a core pattern of thought, and in particular of the scientific method. Therefore, they play a significant role in the understanding of academic texts from the scientific domain. Capturing knowledge of such problem-solving utterances would provide a deep insight into text understanding. In this dissertation, I present the task of problem-solving recognition in scientific text. To date, work on problem-solving recognition has received both theoretical and computational treatment. However, theories of problem-solving put forward by applied linguists lack practical adaptation to the domain of scientific text, and computational analyses have been narrow in scope. This dissertation provides a new model of problem-solving. It is an adaptation of Hoey's (2001) model, tailored to the scientific domain. As far as modelling problems is concerned, I divided the text string expressing the statement of a problem into sub-components; this is one of my main contributions. I have mapped these sub-components to functional roles, and thus operationalised the model in such a way that it can be annotated by humans reliably. As far as the problem-solving relationship between problems and solutions is concerned, my model takes into account the local network of relationships existing between problems. In order to validate this new model, a large-scale annotation study was conducted. The annotation study shows significant agreement amongst the annotators. The model is automated in two stages using a blend of classical machine learning and state-of-the-art deep learning methods. The first stage involves the implementation of problem and solution recognisers which operate at the sentence level. The second stage is more complex in that it recognises problems and solutions jointly at the token-level, and also establishes whether there is a problem-solving relationship between each of them. One of the best performers at this stage was a Neural Relational Topic Model. The results from automation show that the model is able to recognise problem-solving utterances in text to a high degree of accuracy. My work has already shown a positive impact in both industry and academia. One start-up is currently using the model for representing academic articles, and a Japanese collaborator has received a grant to adapt my model to Japanese text

Apollo (Cambridge)

(Dis)connections between specific language impairment and dyslexia in Chinese

Author: Au TKF
Ho CSH
Kidd JC
Lam CCC
Wong AMY
Yip LPW
Publication venue: 'Surface Analysis Society of Japan'
Publication date: 01/01/2012
Field of study

Poster Session: no. 26P.40Specific language impairment (SLI) and dyslexia describe language-learning impairments that occur in the absence of a sensory, cognitive, or psychosocial impairment. SLI is primarily defined by an impairment in oral language, and dyslexia by a deficit in the reading of written words. SLI and dyslexia co-occur in school-age children learning English, with rates ranging from 17% to 75%. For children learning Chinese, SLI and dyslexia also co-occur. Wong et al. (2010) first reported on the presence of dyslexia in a clinical sample of 6- to 11-year-old school-age children with SLI. The study compared the reading-related cognitive skills of children with SLI and dyslexia (SLI-D) with 2 groups of children …postprin

HKU Scholars Hub

Uncovering the myth of learning to read Chinese characters: phonetic, semantic, and orthographic strategies used by Chinese as foreign language learners

Author: Tong SX
Yip J.
Publication venue: 'Surface Analysis Society of Japan'
Publication date: 01/01/2012
Field of study

Oral Session - 6A: Lexical modeling: no. 6A.3Chinese is considered to be one of the most challenging orthographies to be learned by non-native speakers, in particular, the character. Chinese character is the basic reading unit that converges sound, form and meaning. The predominant type of Chinese character is semantic-phonetic compound that is composed of phonetic and semantic radicals, giving the clues of the sound and meaning, respectively. Over the last two decades, psycholinguistic research has made significant progress in specifying the roles of phonetic and semantic radicals in character processing among native Chinese speakers …postprin

HKU Scholars Hub

Aspects of Coherence for Entity Analysis

Author: Heinzerling Benjamin
Publication venue
Publication date: 01/01/2019
Field of study

Natural language understanding is an important topic in natural language proces- sing. Given a text, a computer program should, at the very least, be able to under- stand what the text is about, and ideally also situate it in its extra-textual context and understand what purpose it serves. What exactly it means to understand what a text is about is an open question, but it is generally accepted that, at a minimum, un- derstanding involves being able to answer questions like “Who did what to whom? Where? When? How? And Why?”. Entity analysis, the computational analysis of entities mentioned in a text, aims to support answering the questions “Who?” and “Whom?” by identifying entities mentioned in a text. If the answers to “Where?” and “When?” are specific, named locations and events, entity analysis can also pro- vide these answers. Entity analysis aims to answer these questions by performing entity linking, that is, linking mentions of entities to their corresponding entry in a knowledge base, coreference resolution, that is, identifying all mentions in a text that refer to the same entity, and entity typing, that is, assigning a label such as Person to mentions of entities. In this thesis, we study how different aspects of coherence can be exploited to improve entity analysis. Our main contribution is a method that allows exploiting knowledge-rich, specific aspects of coherence, namely geographic, temporal, and entity type coherence. Geographic coherence expresses the intuition that entities mentioned in a text tend to be geographically close. Similarly, temporal coherence captures the intuition that entities mentioned in a text tend to be close in the tem- poral dimension. Entity type coherence is based in the observation that in a text about a certain topic, such as sports, the entities mentioned in it tend to have the same or related entity types, such as sports team or athlete. We show how to integrate features modeling these aspects of coherence into entity linking systems and esta- blish their utility in extensive experiments covering different datasets and systems. Since entity linking often requires computationally expensive joint, global optimi- zation, we propose a simple, but effective rule-based approach that enjoys some of the benefits of joint, global approaches, while avoiding some of their drawbacks. To enable convenient error analysis for system developers, we introduce a tool for visual analysis of entity linking system output. Investigating another aspect of co- herence, namely the coherence between a predicate and its arguments, we devise a distributed model of selectional preferences and assess its impact on a neural core- ference resolution system. Our final contribution examines how multilingual entity typing can be improved by incorporating subword information. We train and make publicly available subword embeddings in 275 languages and show their utility in a multilingual entity typing tas

Heidelberger Dokumentenserver