7,495 research outputs found

    Semantics-driven Abstractive Document Summarization

    Get PDF
    The evolution of the Web over the last three decades has led to a deluge of scientific and news articles on the Internet. Harnessing these publications in different fields of study is critical to effective end user information consumption. Similarly, in the domain of healthcare, one of the key challenges with the adoption of Electronic Health Records (EHRs) for clinical practice has been the tremendous amount of clinical notes generated that can be summarized without which clinical decision making and communication will be inefficient and costly. In spite of the rapid advances in information retrieval and deep learning techniques towards abstractive document summarization, the results of these efforts continue to resemble extractive summaries, achieving promising results predominantly on lexical metrics but performing poorly on semantic metrics. Thus, abstractive summarization that is driven by intrinsic and extrinsic semantics of documents is not adequately explored. Resources that can be used for generating semantics-driven abstractive summaries include: • Abstracts of multiple scientific articles published in a given technical field of study to generate an abstractive summary for topically-related abstracts within the field, thus reducing the load of having to read semantically duplicate abstracts on a given topic. • Citation contexts from different authoritative papers citing a reference paper can be used to generate utility-oriented abstractive summary for a scientific article. • Biomedical articles and the named entities characterizing the biomedical articles along with background knowledge bases to generate entity and fact-aware abstractive summaries. • Clinical notes of patients and clinical knowledge bases for abstractive clinical text summarization using knowledge-driven multi-objective optimization. In this dissertation, we develop semantics-driven abstractive models based on intra- document and inter-document semantic analyses along with facts of named entities retrieved from domain-specific knowledge bases to produce summaries. Concretely, we propose a sequence of frameworks leveraging semantics at various granularity (e.g., word, sentence, document, topic, citations, and named entities) levels, by utilizing external resources. The proposed frameworks have been applied to a range of tasks including 1. Abstractive summarization of topic-centric multi-document scientific articles and news articles. 2. Abstractive summarization of scientific articles using crowd-sourced citation contexts. 3. Abstractive summarization of biomedical articles clustered based on entity-relatedness. 4. Abstractive summarization of clinical notes of patients with heart failure and Chest X-Rays recordings. The proposed approaches achieve impressive performance in terms of preserving semantics in abstractive summarization while paraphrasing. For summarization of topic-centric multiple scientific/news articles, we propose a three-stage approach where abstracts of scientific articles or news articles are clustered based on their topical similarity determined from topics generated using Latent Dirichlet Allocation (LDA), followed by extractive phase and abstractive phase. Then, in the next stage, we focus on abstractive summarization of biomedical literature where we leverage named entities in biomedical articles to 1) cluster related articles; and 2) leverage the named entities towards guiding abstractive summarization. Finally, in the last stage, we turn to external resources such as citation contexts pointing to a scientific article to generate a comprehensive and utility-centric abstractive summary of a scientific article, domain-specific knowledge bases to fill gaps in information about entities in a biomedical article to summarize and clinical notes to guide abstractive summarization of clinical text. Thus, the bottom-up progression of exploring semantics towards abstractive summarization in this dissertation starts with (i) Semantic Analysis of Latent Topics; builds on (ii) Internal and External Knowledge-I (gleaned from abstracts and Citation Contexts); and extends it to make it comprehensive using (iii) Internal and External Knowledge-II (Named Entities and Knowledge Bases)

    A Survey on Knowledge Graphs: Representation, Acquisition and Applications

    Full text link
    Human knowledge provides a formal understanding of the world. Knowledge graphs that represent structural relations between entities have become an increasingly popular research direction towards cognition and human-level intelligence. In this survey, we provide a comprehensive review of knowledge graph covering overall research topics about 1) knowledge graph representation learning, 2) knowledge acquisition and completion, 3) temporal knowledge graph, and 4) knowledge-aware applications, and summarize recent breakthroughs and perspective directions to facilitate future research. We propose a full-view categorization and new taxonomies on these topics. Knowledge graph embedding is organized from four aspects of representation space, scoring function, encoding models, and auxiliary information. For knowledge acquisition, especially knowledge graph completion, embedding methods, path inference, and logical rule reasoning, are reviewed. We further explore several emerging topics, including meta relational learning, commonsense reasoning, and temporal knowledge graphs. To facilitate future research on knowledge graphs, we also provide a curated collection of datasets and open-source libraries on different tasks. In the end, we have a thorough outlook on several promising research directions

    A Survey on Semantic Processing Techniques

    Full text link
    Semantic processing is a fundamental research domain in computational linguistics. In the era of powerful pre-trained language models and large language models, the advancement of research in this domain appears to be decelerating. However, the study of semantics is multi-dimensional in linguistics. The research depth and breadth of computational semantic processing can be largely improved with new technologies. In this survey, we analyzed five semantic processing tasks, e.g., word sense disambiguation, anaphora resolution, named entity recognition, concept extraction, and subjectivity detection. We study relevant theoretical research in these fields, advanced methods, and downstream applications. We connect the surveyed tasks with downstream applications because this may inspire future scholars to fuse these low-level semantic processing tasks with high-level natural language processing tasks. The review of theoretical research may also inspire new tasks and technologies in the semantic processing domain. Finally, we compare the different semantic processing techniques and summarize their technical trends, application trends, and future directions.Comment: Published at Information Fusion, Volume 101, 2024, 101988, ISSN 1566-2535. The equal contribution mark is missed in the published version due to the publication policies. Please contact Prof. Erik Cambria for detail

    Knowledge-based Biomedical Data Science 2019

    Full text link
    Knowledge-based biomedical data science (KBDS) involves the design and implementation of computer systems that act as if they knew about biomedicine. Such systems depend on formally represented knowledge in computer systems, often in the form of knowledge graphs. Here we survey the progress in the last year in systems that use formally represented knowledge to address data science problems in both clinical and biological domains, as well as on approaches for creating knowledge graphs. Major themes include the relationships between knowledge graphs and machine learning, the use of natural language processing, and the expansion of knowledge-based approaches to novel domains, such as Chinese Traditional Medicine and biodiversity.Comment: Manuscript 43 pages with 3 tables; Supplemental material 43 pages with 3 table

    Entity-centric knowledge discovery for idiosyncratic domains

    Get PDF
    Technical and scientific knowledge is produced at an ever-accelerating pace, leading to increasing issues when trying to automatically organize or process it, e.g., when searching for relevant prior work. Knowledge can today be produced both in unstructured (plain text) and structured (metadata or linked data) forms. However, unstructured content is still themost dominant formused to represent scientific knowledge. In order to facilitate the extraction and discovery of relevant content, new automated and scalable methods for processing, structuring and organizing scientific knowledge are called for. In this context, a number of applications are emerging, ranging fromNamed Entity Recognition (NER) and Entity Linking tools for scientific papers to specific platforms leveraging information extraction techniques to organize scientific knowledge. In this thesis, we tackle the tasks of Entity Recognition, Disambiguation and Linking in idiosyncratic domains with an emphasis on scientific literature. Furthermore, we study the related task of co-reference resolution with a specific focus on named entities. We start by exploring Named Entity Recognition, a task that aims to identify the boundaries of named entities in textual contents. We propose a newmethod to generate candidate named entities based on n-gram collocation statistics and design several entity recognition features to further classify them. In addition, we show how the use of external knowledge bases (either domain-specific like DBLP or generic like DBPedia) can be leveraged to improve the effectiveness of NER for idiosyncratic domains. Subsequently, we move to Entity Disambiguation, which is typically performed after entity recognition in order to link an entity to a knowledge base. We propose novel semi-supervised methods for word disambiguation leveraging the structure of a community-based ontology of scientific concepts. Our approach exploits the graph structure that connects different terms and their definitions to automatically identify the correct sense that was originally picked by the authors of a scientific publication. We then turn to co-reference resolution, a task aiming at identifying entities that appear using various forms throughout the text. We propose an approach to type entities leveraging an inverted index built on top of a knowledge base, and to subsequently re-assign entities based on the semantic relatedness of the introduced types. Finally, we describe an application which goal is to help researchers discover and manage scientific publications. We focus on the problem of selecting relevant tags to organize collections of research papers in that context. We experimentally demonstrate that the use of a community-authored ontology together with information about the position of the concepts in the documents allows to significantly increase the precision of tag selection over standard methods

    Ontology-Based Clinical Information Extraction Using SNOMED CT

    Get PDF
    Extracting and encoding clinical information captured in unstructured clinical documents with standard medical terminologies is vital to enable secondary use of clinical data from practice. SNOMED CT is the most comprehensive medical ontology with broad types of concepts and detailed relationships and it has been widely used for many clinical applications. However, few studies have investigated the use of SNOMED CT in clinical information extraction. In this dissertation research, we developed a fine-grained information model based on the SNOMED CT and built novel information extraction systems to recognize clinical entities and identify their relations, as well as to encode them to SNOMED CT concepts. Our evaluation shows that such ontology-based information extraction systems using SNOMED CT could achieve state-of-the-art performance, indicating its potential in clinical natural language processing
    • …
    corecore