201 research outputs found

    BIOMEDICAL WORD SENSE DISAMBIGUATION WITH NEURAL WORD AND CONCEPT EMBEDDINGS

    Get PDF
    Addressing ambiguity issues is an important step in natural language processing (NLP) pipelines designed for information extraction and knowledge discovery. This problem is also common in biomedicine where NLP applications have become indispensable to exploit latent information from biomedical literature and clinical narratives from electronic medical records. In this thesis, we propose an ensemble model that employs recent advances in neural word embeddings along with knowledge based approaches to build a biomedical word sense disambiguation (WSD) system. Specifically, our system identities the correct sense from a given set of candidates for each ambiguous word when presented in its context (surrounding words). We use the MSH WSD dataset, a well known public dataset consisting of 203 ambiguous terms each with nearly 200 different instances and an average of two candidate senses represented by concepts in the unified medical language system (UMLS). We employ a popular biomedical concept, Our linear time (in terms of number of senses and context length) unsupervised and knowledge based approach improves over the state-of-the-art methods by over 3% in accuracy. A more expensive approach based on the k-nearest neighbor framework improves over prior best results by 5% in accuracy. Our results demonstrate that recent advances in neural dense word vector representations offer excellent potential for solving biomedical WSD

    Co-occurrence graphs for word sense disambiguation in the biomedical domain

    Get PDF
    Word Sense Disambiguation is a key step for many Natural Language Processing tasks (e.g. summarization, text classification, relation extraction) and presents a challenge to any system that aims to process documents from the biomedical domain. In this paper, we present a new graphbased unsupervised technique to address this problem. The knowledge base used in this work is a graph built with co-occurrence information from medical concepts found in scientific abstracts, and hence adapted to the specific domain. Unlike other unsupervised approaches based on static graphs such as UMLS, in this work the knowledge base takes the context of the ambiguous terms into account. Abstracts downloaded from PubMed are used for building the graph and disambiguation is performed using the Personalized PageRank algorithm. Evaluation is carried out over two test datasets widely explored in the literature. Different parameters of the system are also evaluated to test robustness and scalability. Results show that the system is able to outperform state-of-the-art knowledge-based systems, obtaining more than 10% of accuracy improvement in some cases, while only requiring minimal external resources

    Disease Name Extraction from Clinical Text Using Conditional Random Fields

    Get PDF
    The aim of the research done in this thesis was to extract disease and disorder names from clinical texts. We utilized Conditional Random Fields (CRF) as the main method to label diseases and disorders in clinical sentences. We used some other tools such as MetaMap and Stanford Core NLP tool to extract some crucial features. MetaMap tool was used to identify names of diseases/disorders that are already in UMLS Metathesaurus. Some other important features such as lemmatized versions of words, and POS tags were extracted using the Stanford Core NLP tool. Some more features were extracted directly from UMLS Metathesaurus, including semantic types of words. We participated in the SemEval 2014 competition\u27s Task 7 and used its provided data to train and evaluate our system. Training data contained 199 clinical texts, development data contained 99 clinical texts, and the test data contained 133 clinical texts, these included discharge summaries, echocardiogram, radiology, and ECG reports. We obtained competitive results on the disease/disorder name extraction task. We found through ablation study that while all features contributed, MetaMap matches, POS tags, and previous and next words were the most effective features

    Tailored semantic annotation for semantic search

    Get PDF
    This paper presents a novel method for semantic annotation and search of a target corpus using several knowledge resources (KRs). This method relies on a formal statistical framework in which KR concepts and corpus documents are homogeneously represented using statistical language models. Under this framework, we can perform all the necessary operations for an efficient and effective semantic annotation of the corpus. Firstly, we propose a coarse tailoring of the KRs w.r.t the target corpus with the main goal of reducing the ambiguity of the annotations and their computational overhead. Then, we propose the generation of concept profiles, which allow measuring the semantic overlap of the KRs as well as performing a finer tailoring of them. Finally, we propose how to semantically represent documents and queries in terms of the KRs concepts and the statistical framework to perform semantic search. Experiments have been carried out with a corpus about web resources which includes several Life Sciences catalogs and Wikipedia pages related to web resources in general (e.g., databases, tools, services, etc.). Results demonstrate that the proposed method is more effective and efficient than state-of-the-art methods relying on either context-free annotation or keyword-based search.We thank anonymous reviewers for their very useful comments and suggestions. The work was supported by the CICYT project TIN2011-24147 from the Spanish Ministry of Economy and Competitiveness (MINECO)

    Representing and Redefining Specialised Knowledge: Medical Discourse

    Get PDF
    This volume brings together five selected papers on medical discourse which show how specialised medical corpora provide a framework that helps those engaging with medical discourse to determine how the everyday and the specialised combine to shape the discourse of medical professionals and non-medical communities in relation to both long and short-term factors. The papers contribute, in an exemplary way, to illustrating the shifting boundaries in today’s society between the two major poles making up the medical discourse cline: healthcare discourse at the one end, which records the demand for personalised therapies and individual medical services; and clinical discourse the other, which documents research into society’s collective medical needs

    Interactive Machine Learning with Applications in Health Informatics

    Full text link
    Recent years have witnessed unprecedented growth of health data, including millions of biomedical research publications, electronic health records, patient discussions on health forums and social media, fitness tracker trajectories, and genome sequences. Information retrieval and machine learning techniques are powerful tools to unlock invaluable knowledge in these data, yet they need to be guided by human experts. Unlike training machine learning models in other domains, labeling and analyzing health data requires highly specialized expertise, and the time of medical experts is extremely limited. How can we mine big health data with little expert effort? In this dissertation, I develop state-of-the-art interactive machine learning algorithms that bring together human intelligence and machine intelligence in health data mining tasks. By making efficient use of human expert's domain knowledge, we can achieve high-quality solutions with minimal manual effort. I first introduce a high-recall information retrieval framework that helps human users efficiently harvest not just one but as many relevant documents as possible from a searchable corpus. This is a common need in professional search scenarios such as medical search and literature review. Then I develop two interactive machine learning algorithms that leverage human expert's domain knowledge to combat the curse of "cold start" in active learning, with applications in clinical natural language processing. A consistent empirical observation is that the overall learning process can be reliably accelerated by a knowledge-driven "warm start", followed by machine-initiated active learning. As a theoretical contribution, I propose a general framework for interactive machine learning. Under this framework, a unified optimization objective explains many existing algorithms used in practice, and inspires the design of new algorithms.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/147518/1/raywang_1.pd

    Resting-State Functional Connectivity in Youth With Gender Dysphoria

    Full text link
    Current developmental models of gender identity and gender dysphoria (GD) lack sex-specific profiles of brain function that differentiate between typically-developing and cross-gender identified youth, as postulated by models like the unified theory of the origins of sex differences (Arnold, 2009) and the neurobiological theory of the origins of transsexuality (Swaab & Garcia-Falgueras, 2009). Previously, investigators have used brain imaging modalities such as Resting-State functional Magnetic Resonance Imaging (R-fMRI) to demonstrate differences in resting-state functional connectivity (RSFC) between typically-developing male and female youth, and between typically-developing and GID-diagnosed youth. In the present pilot study, I used R-fMRI to investigate differences in RSFC between typically-developing and cross-gender identified male and female youth subgroups, with the hypothesis that GID-diagnosed subgroups would demonstrate connectivity patterns in between those of typically-developing males and females. Eleven youth diagnosed with gender identity disorder (four males, ages 9 to 20 years; seven females, ages 12 to 20 years) were matched on age and assigned gender with 11 typically-developing youth. All participants completed written informed consent to undergo the IRB-approved research procedures. R-fMRI were collected while the participants were lying down and resting, with their eyes closed. Primary analyses focused on 14 brain regions selected because they showed sex differences most frequently or reliably in previous studies of R-fMRI in typically-developing youth. Statistical analysis used a 2 x 2 mixed effects analysis (assigned female versus assigned male x typically-developing versus GID-diagnosed), with-individual level connectivity maps as the dependent variable. Results showed that significant interaction effects of functional connectivity patterns were associated with 6 of the 14 selected brain regions. GID-diagnosed assigned females exhibited connectivity patterns similar to those of typically-developing males associated with the right medial superior frontal gyrus, right supplementary motor area, left lingual gyrus, right lingual gyrus, left middle frontal gyrus, left medial superior frontal gyrus, left cuneus, right thalamus, left dorsolateral superior frontal gyrus, and left inferior frontal gyrus, triangular part. GID-diagnosed assigned males exhibited functional connectivity patterns similar to those of typically-developing females associated with the right medial superior frontal gyrus and right supplementary motor area; in between those of typically-developing females and males associated with left lingual gyrus, right lingual gyrus, left middle frontal gyrus, left medial superior frontal gyrus, right medial superior frontal gyrus, left dorsolateral superior frontal gyrus, and left inferior frontal gyrus, triangular part; and similar to typically-developing males associated with the right lingual gyrus and left middle frontal gyrus. The right precuneus, hypothesized to show robust findings, did not reveal any effects. In the current study, GID-diagnosed assigned males tended toward demasculinized effects (quantitative interactions showing differences of magnitude), whereas GID-diagnosed assigned females tended toward masculinized effects (qualitative interactions showing differences in direction of correlation). The current findings support the view that brain development associated with gender dysphoria proceeds along separate but overlapping sex-related regions for GID-diagnosed assigned females and males and provide further evidence of greater cross-gender brain differentiation in assigned females at an earlier age than in assigned males (possibly due to earlier onset of puberty in females). These data suggest that any future use of patterns of brain function for diagnosing gender dysphoria may require separate criteria (e.g., different sets of brain regions) for assigned males and assigned females but will require replication on larger samples

    Mining the Medical and Patent Literature to Support Healthcare and Pharmacovigilance

    Get PDF
    Recent advancements in healthcare practices and the increasing use of information technology in the medical domain has lead to the rapid generation of free-text data in forms of scientific articles, e-health records, patents, and document inventories. This has urged the development of sophisticated information retrieval and information extraction technologies. A fundamental requirement for the automatic processing of biomedical text is the identification of information carrying units such as the concepts or named entities. In this context, this work focuses on the identification of medical disorders (such as diseases and adverse effects) which denote an important category of concepts in the medical text. Two methodologies were investigated in this regard and they are dictionary-based and machine learning-based approaches. Futhermore, the capabilities of the concept recognition techniques were systematically exploited to build a semantic search platform for the retrieval of e-health records and patents. The system facilitates conventional text search as well as semantic and ontological searches. Performance of the adapted retrieval platform for e-health records and patents was evaluated within open assessment challenges (i.e. TRECMED and TRECCHEM respectively) wherein the system was best rated in comparison to several other competing information retrieval platforms. Finally, from the medico-pharma perspective, a strategy for the identification of adverse drug events from medical case reports was developed. Qualitative evaluation as well as an expert validation of the developed system's performance showed robust results. In conclusion, this thesis presents approaches for efficient information retrieval and information extraction from various biomedical literature sources in the support of healthcare and pharmacovigilance. The applied strategies have potential to enhance the literature-searches performed by biomedical, healthcare, and patent professionals. The applied strategies have potential to enhance the literature-searches performed by biomedical, healthcare, and patent professionals. This can promote the literature-based knowledge discovery, improve the safety and effectiveness of medical practices, and drive the research and development in medical and healthcare arena
    • …
    corecore