8 research outputs found

    Domain Adaptation Using Domain Similarity- and Domain Complexity-Based Instance Selection for Cross-Domain Sentiment Analysis

    No full text

    Biomedical Information Extraction Pipelines for Public Health in the Age of Deep Learning

    Get PDF
    abstract: Unstructured texts containing biomedical information from sources such as electronic health records, scientific literature, discussion forums, and social media offer an opportunity to extract information for a wide range of applications in biomedical informatics. Building scalable and efficient pipelines for natural language processing and extraction of biomedical information plays an important role in the implementation and adoption of applications in areas such as public health. Advancements in machine learning and deep learning techniques have enabled rapid development of such pipelines. This dissertation presents entity extraction pipelines for two public health applications: virus phylogeography and pharmacovigilance. For virus phylogeography, geographical locations are extracted from biomedical scientific texts for metadata enrichment in the GenBank database containing 2.9 million virus nucleotide sequences. For pharmacovigilance, tools are developed to extract adverse drug reactions from social media posts to open avenues for post-market drug surveillance from non-traditional sources. Across these pipelines, high variance is observed in extraction performance among the entities of interest while using state-of-the-art neural network architectures. To explain the variation, linguistic measures are proposed to serve as indicators for entity extraction performance and to provide deeper insight into the domain complexity and the challenges associated with entity extraction. For both the phylogeography and pharmacovigilance pipelines presented in this work the annotated datasets and applications are open source and freely available to the public to foster further research in public health.Dissertation/ThesisDoctoral Dissertation Biomedical Informatics 201

    A Transfer Learning Framework for Self-Adaptive Intrusion Detection in the Smart Grid based on Transferability Analysis and Domain-Adversarial Training

    Get PDF
    Machine learning is a popular approach to security monitoring and intrusion detection in cyber-physical systems (CPS) like the smart grid. General ML approaches presume that the training and testing data are generated by identical or similar independent distribution. This assumption may not hold in many real-world systems and applications like the CPS, since the system and attack dynamics may change the data distribution and thus fail the trained models. Transfer learning (TL) is a promising solution to tackle data distribution divergence problem and maintain performance when facing system and attack variations. However, there are still two challenges in introducing TL into intrusion detection: when to apply TL and how to extract effective features during TL. To address these two challenges, this research proposes a transferability analysis and domain-adversarial training (TADA) framework. This work first proposes a divergence-based transferability analysis to decide whether to apply TL, then develops a spatial-temporal domain-adversarial (DA) training model to reduce distribution divergence between two domains and improve attack detection performance. The main contributions include: (i) A divergence-based transferability analysis to help evaluate the necessity of TL in security monitoring for CPS, such as intrusion detection in the smart grid; (ii) A spatial-temporal DA training approach to extract the spatial-temporal domain-invariant features to mitigate the impact of distribution divergence and enhance detection performance. The extensive experiments demonstrate that the transferability analysis is capable of predicting accuracy drop and determining whether to apply TL. Compared to the state-of-the-art models, TADA can achieve high and more robust detection performance under system and attack variations
    corecore