2,593 research outputs found

    Exploring diseases based biomedical document clustering and visualization using self-organizing maps

    Get PDF
    Document clustering is a text mining technique used to provide better document search and browsing in digital libraries or online corpora. In this research, a vector representation of concepts of diseases and similarity measurement between concepts are proposed. They identify the closest concepts of diseases in the context of a corpus. Each document is represented by using the vector space model. A weight scheme is proposed to consider both local content and associations between concepts. Self-Organizing Maps (SOM) are often used as document clustering algorithm. The vector projection and visualization features of SOM enable visualization and analysis of the cluster distribution and relationships on the two dimensional space. The Davies-Bouldin index is used to validate the clusters based on the visualized cluster distributions. The results show that the proposed document clustering framework generates meaningful clusters and can facilitate clustering visualization and information retrieval based on the concepts of diseases

    Biomedical concept association and clustering using word embeddings

    Get PDF
    Indiana University-Purdue University Indianapolis (IUPUI)Biomedical data exists in the form of journal articles, research studies, electronic health records, care guidelines, etc. While text mining and natural language processing tools have been widely employed across various domains, these are just taking off in the healthcare space. A primary hurdle that makes it difficult to build artificial intelligence models that use biomedical data, is the limited amount of labelled data available. Since most models rely on supervised or semi-supervised methods, generating large amounts of pre-processed labelled data that can be used for training purposes becomes extremely costly. Even for datasets that are labelled, the lack of normalization of biomedical concepts further affects the quality of results produced and limits the application to a restricted dataset. This affects reproducibility of the results and techniques across datasets, making it difficult to deploy research solutions to improve healthcare services. The research presented in this thesis focuses on reducing the need to create labels for biomedical text mining by using unsupervised recurrent neural networks. The proposed method utilizes word embeddings to generate vector representations of biomedical concepts based on semantics and context. Experiments with unsupervised clustering of these biomedical concepts show that concepts that are similar to each other are clustered together. While this clustering captures different synonyms of the same concept, it also captures the similarities between various diseases and the symptoms that those diseases are symptomatic of. To test the performance of the concept vectors on corpora of documents, a document vector generation method that utilizes these concept vectors is also proposed. The document vectors thus generated are used as an input to clustering algorithms, and the results show that across multiple corpora, the proposed methods of concept and document vector generation outperform the baselines and provide more meaningful clustering. The applications of this document clustering are huge, especially in the search and retrieval space, providing clinicians, researchers and patients more holistic and comprehensive results than relying on the exclusive term that they search for. At the end, a framework for extracting clinical information that can be mapped to electronic health records from preventive care guidelines is presented. The extracted information can be integrated with the clinical decision support system of an electronic health record. A visualization tool to better understand and observe patient trajectories is also explored. Both these methods have potential to improve the preventive care services provided to patients

    Individuals tell a fascinating story: using unsupervised text mining methods to cluster policyholders based on their medical history

    Get PDF
    Background and objective: Classifying people according to their health profile is crucial in order to propose appropriate treatment. However, the medical diagnosis is sometimes not available. This is for example the case in health insurance, making the proposal of custom prevention plans difficult. When this is the case, an unsupervised clustering method is needed. This article aims to compare three different methods by adapting some text mining methods to the field of health insurance. Also, a new clustering stability measure is proposed in order to compare the stability of the tested processes. Methods : Nonnegative Matrix Factorization, the word2vec method, and marginalized Stacked Denoising Autoencoders are used and compared in order to create a high-quality input for a clustering method. A self-organizing map is then used to obtain the final clustering. A real health insurance database is used in order to test the methods. Results: the marginalized Stacked Denoising Autoencoder outperforms the other methods both in stability and result quality with our data. Conclusions: The use of text mining methods offers several possibilities to understand the context of any medical act. On a medical database, the process could reveal unexpected correlation between treatment, and thus, pathology. Moreover, this kind of method could exploit the refund dates contained in the data, but the tested method using temporality, word2vec, still needs to be improved since the results, even if satisfying, are not as better as the one offered by other methods

    Infectious Disease Ontology

    Get PDF
    Technological developments have resulted in tremendous increases in the volume and diversity of the data and information that must be processed in the course of biomedical and clinical research and practice. Researchers are at the same time under ever greater pressure to share data and to take steps to ensure that data resources are interoperable. The use of ontologies to annotate data has proven successful in supporting these goals and in providing new possibilities for the automated processing of data and information. In this chapter, we describe different types of vocabulary resources and emphasize those features of formal ontologies that make them most useful for computational applications. We describe current uses of ontologies and discuss future goals for ontology-based computing, focusing on its use in the field of infectious diseases. We review the largest and most widely used vocabulary resources relevant to the study of infectious diseases and conclude with a description of the Infectious Disease Ontology (IDO) suite of interoperable ontology modules that together cover the entire infectious disease domain

    An Analysis of the Abstracts Presented at the Annual Meetings of the Society for Neuroscience from 2001 to 2006

    Get PDF
    Annual meeting abstracts published by scientific societies often contain rich arrays of information that can be computationally mined and distilled to elucidate the state and dynamics of the subject field. We extracted and processed abstract data from the Society for Neuroscience (SFN) annual meeting abstracts during the period 2001–2006 in order to gain an objective view of contemporary neuroscience. An important first step in the process was the application of data cleaning and disambiguation methods to construct a unified database, since the data were too noisy to be of full utility in the raw form initially available. Using natural language processing, text mining, and other data analysis techniques, we then examined the demographics and structure of the scientific collaboration network, the dynamics of the field over time, major research trends, and the structure of the sources of research funding. Some interesting findings include a high geographical concentration of neuroscience research in the north eastern United States, a surprisingly large transient population (66% of the authors appear in only one out of the six studied years), the central role played by the study of neurodegenerative disorders in the neuroscience community, and an apparent growth of behavioral/systems neuroscience with a corresponding shrinkage of cellular/molecular neuroscience over the six year period. The results from this work will prove useful for scientists, policy makers, and funding agencies seeking to gain a complete and unbiased picture of the community structure and body of knowledge encapsulated by a specific scientific domain

    Information Systems and Health Care IX: Accessing Tacit Knowledge and Linking It to the Peer-Reviewed Literature

    Get PDF
    Clinical decision-making can be improved if healthcare practitioners are able to leverage both the tacit and explicit modalities of healthcare knowledge, yet at present there do not exist knowledge management systems that support any active and direct mapping between these two knowledge modalities. In this paper, we present a healthcare knowledge-mapping framework that maps (a) the tacit knowledge captured in terms of email-based discussions between pediatric pain practitioners through a Pediatric Pain Mailing List (PPML), to (b) explicit knowledge represented in terms of peer-reviewed healthcare literature available at PubMed. We report our knowledge mapping strategy that involves methods to establish discussion threads, organize the discussion threads in terms of topic-specific taxonomy, formulate an optimal search query based on the content of a discussion thread, submit the search query to PubMed and finally to retrieve and present the search results to the user

    Novel neural approaches to data topology analysis and telemedicine

    Get PDF
    1noL'abstract è presente nell'allegato / the abstract is in the attachmentopen676. INGEGNERIA ELETTRICAnoopenRandazzo, Vincenz

    Data science, analytics and artificial intelligence in e-health : trends, applications and challenges

    Get PDF
    Acknowledgments. This work has been partially supported by the Divina Pastora Seguros company.More than ever, healthcare systems can use data, predictive models, and intelligent algorithms to optimize their operations and the service they provide. This paper reviews the existing literature regarding the use of data science/analytics methods and artificial intelligence algorithms in healthcare. The paper also discusses how healthcare organizations can benefit from these tools to efficiently deal with a myriad of new possibilities and strategies. Examples of real applications are discussed to illustrate the potential of these methods. Finally, the paper highlights the main challenges regarding the use of these methods in healthcare, as well as some open research lines
    corecore