330 research outputs found

    Machine Learning Techniques for Screening and Diagnosis of Diabetes: a Survey

    Get PDF
    Diabetes has become one of the major causes of national disease and death in most countries. By 2015, diabetes had affected more than 415 million people worldwide. According to the International Diabetes Federation report, this figure is expected to rise to more than 642 million in 2040, so early screening and diagnosis of diabetes patients have great significance in detecting and treating diabetes on time. Diabetes is a multifactorial metabolic disease, its diagnostic criteria is difficult to cover all the ethology, damage degree, pathogenesis and other factors, so there is a situation for uncertainty and imprecision under various aspects of medical diagnosis process. With the development of Data mining, researchers find that machine learning is playing an increasingly important role in diabetes research. Machine learning techniques can find the risky factors of diabetes and reasonable threshold of physiological parameters to unearth hidden knowledge from a huge amount of diabetes-related data, which has a very important significance for diagnosis and treatment of diabetes. So this paper provides a survey of machine learning techniques that has been applied to diabetes data screening and diagnosis of the disease. In this paper, conventional machine learning techniques are described in early screening and diagnosis of diabetes, moreover deep learning techniques which have a significance of biomedical effect are also described

    Doctor of Philosophy

    Get PDF
    dissertationDisease-specific ontologies, designed to structure and represent the medical knowledge about disease etiology, diagnosis, treatment, and prognosis, are essential for many advanced applications, such as predictive modeling, cohort identification, and clinical decision support. However, manually building disease-specific ontologies is very labor-intensive, especially in the process of knowledge acquisition. On the other hand, medical knowledge has been documented in a variety of biomedical knowledge resources, such as textbook, clinical guidelines, research articles, and clinical data repositories, which offers a great opportunity for an automated knowledge acquisition. In this dissertation, we aim to facilitate the large-scale development of disease-specific ontologies through automated extraction of disease-specific vocabularies from existing biomedical knowledge resources. Three separate studies presented in this dissertation explored both manual and automated vocabulary extraction. The first study addresses the question of whether disease-specific reference vocabularies derived from manual concept acquisition can achieve a near-saturated coverage (or near the greatest possible amount of disease-pertinent concepts) by using a small number of literature sources. Using a general-purpose, manual acquisition approach we developed, this study concludes that a small number of expert-curated biomedical literature resources can prove sufficient for acquiring near-saturated disease-specific vocabularies. The second and third studies introduce automated techniques for extracting disease-specific vocabularies from both MEDLINE citations (title and abstract) and a clinical data repository. In the second study, we developed and assessed a pipeline-based system which extracts disease-specific treatments from PubMed citations. The system has achieved a mean precision of 0.8 for the top 100 extracted treatment concepts. In the third study, we applied classification models to reduce irrelevant disease-concepts associations extracted from MEDLINE citations and electronic medical records. This study suggested the combination of measures of relevance from disparate sources to improve the identification of true-relevant concepts through classification and also demonstrated the generalizability of the studied classification model to new diseases. With the studies, we concluded that existing biomedical knowledge resources are valuable sources for extracting disease-concept associations, from which classification based on statistical measures of relevance could assist a semi-automated generation of disease-specific vocabularies

    Causal Pattern Mining in Highly Heterogeneous and Temporal EHRs Data

    Get PDF
    University of Minnesota Ph.D. dissertation. March 2017. Major: Computer Science. Advisor: Vipin Kumar. 1 computer file (PDF); ix, 112 pages.The World Health Organization (WHO) estimates that the total healthcare spending in the U.S. is around 18\% of its GDP for the year 2011. Even with such a high per-capita expenditure, the quality of healthcare in U.S. lags behind as compared to the healthcare in other industrialized countries. This inefficient state of the U.S. healthcare system is attributed to the current Fee-for-service (FFS) model. Under the FFS model, healthcare providers (doctors, hospitals) receive payments for every hospital visit or service rendered. The lack of coordination between the service providers and patient outcomes, leads to an increase in the costs associated with the healthcare management, as healthcare providers often recommend expensive treatments. Several legislations have been approved in the recent past to improve the overall U.S. healthcare management while simultaneously reducing the associated costs. The HITECH Act, proposes to spend close to \$30 billion dollars on creating a nationwide repository of electronic Health Records (EHRs). Such a repository would consist of patient attributes such as demographics, laboratories test results, vital information and diagnosis codes. It is hoped that this EHR repository will be a platform to improve care coordination between service providers and patients healthcare outcomes, reduce health disparities thereby improving the overall healthcare management system. Data collected and stored in the EHR (HITECH) and the need to improve care efficiency and outcome (ACT) would help to improve the current state of U.S. healthcare system. Data mining techniques in conjunction with EHRs can be used to develop novel clinical decision making tools, to analyze the prevalence and incidence of diseases and to evaluate the efficacy of existing clinical and surgical interventions. In this thesis we focus on two key aspects of EHR data, i.e. temporality and causation. This becomes more important considering that the temporal nature of EHRs data has not been fully exploited. Further, increasing amounts of clinical evidence suggest that temporal nature is important for the development of clinical decision making tools and techniques. Secondly, several research articles hint at the the presence of antiquated clinical guidelines which are still in practice. In this dissertation, we first describe EHR along with the following terminologies : temporality, causation and heterogeneity. Building on this, we then describe methodologies for extracting non-causal patterns in the absence of longitudinal data. Further, we describe methods to extract non-causal patterns in the presence of longitudinal data. We describe such methodologies in the context of Type-2 Diabetes Mellitus (T2DM). Furthermore, we describe techniques to extract simple and complex causal patterns from longitudinal data in the context of sepsis and T2DM. Finally, we conclude this dissertation, by providing a summary of our work along with future directions

    Early Identification and Intervention in Patients with Atrial Fibrillation Using an Implantable Cardiac Monitor to Significantly Improve Guideline-Based Anticoagulation Therapy in an Outpatient Cardiology Clinic

    Get PDF
    The purpose of this research was to (a) examine the demographics of patients receiving care in an outpatient cardiology clinic, (b) describe the relationship between the atrial fibrillation (AF) and other variables (e.g., BMI), (c) examine the frequency and the length of time to AF diagnosis in patients implanted with an implantable cardiac monitoring (ICM) device, (d) observe provider patterns of treatment with oral anticoagulants (OACs), and (e) investigate documented considerations to either diagnose or rule out OSA in a group of outpatient AF patients in a cardiology clinic. Background: AF is largely undiagnosed but can cause major morbidity and mortality. AF is the most prevalent sustained arrhythmia encountered in the emergency department, is frequently detected in those without a prior diagnosis of AF, and is the most common cause for stroke. All relevant guidelines suggest patients from intermediate to high risk for stroke should receive OAC; however, this therapy is prescribed in less than 55% of eligible patients. AF accounts for nearly 1 in 7 strokes and affects approximately 5.8 million people in the United States. Objectives: The purpose of this study is to identify and intervene in patients with atrial fibrillation using an implantable cardiac monitor to significantly improve guideline-based anticoagulation therapy. Methods: A retrospective database from an outpatient clinic in southern California was analyzed in this non-experimental study design; it comprised routinely-collected data on patients with ICMs implanted between June 1, 2014 and December 31, 2018. This study was designed to establish the incidence of AF using an ICM device (i.e., Medtronic LINQ) in the outpatient setting. Conclusions: As evidenced by this study, most patients would not have been diagnosed with AF utilizing the shorter-duration monitoring devices typically used as the first line of treatment. Longer monitoring capabilities promise early identification of disease and reduction in morbidity for AF patients. Discussion: The incidence of AF using an ICM (i.e., Medtronic LINQ) device in an outpatient clinic was 23.4% (63 out of 269 patients), similar to national studies stating that longer monitoring of cardiac rhythms increased the diagnosis of arrythmias. Only 12.7% of detected AFs (11 out of 63 patients) occurred before 14 days, the maximum time available through the use of other, traditional monitoring devices in standard use; therefore, an alarming 87.3% of ICM-detected arrythmias could have remained unidentified. This research observed a reduction of potential, highly-debilitating embolic stroke through detection as well as a lessened risk of all-cause mortality with OACs for AF patients without stroke prophylaxis; early identification and treatment is possible utilizing an ICM device

    Relational data clustering algorithms with biomedical applications

    Get PDF

    pHealth 2021. Proc. of the 18th Internat. Conf. on Wearable Micro and Nano Technologies for Personalised Health, 8-10 November 2021, Genoa, Italy

    Get PDF
    Smart mobile systems – microsystems, smart textiles, smart implants, sensor-controlled medical devices – together with related body, local and wide-area networks up to cloud services, have become important enablers for telemedicine and the next generation of healthcare services. The multilateral benefits of pHealth technologies offer enormous potential for all stakeholder communities, not only in terms of improvements in medical quality and industrial competitiveness, but also for the management of healthcare costs and, last but not least, the improvement of patient experience. This book presents the proceedings of pHealth 2021, the 18th in a series of conferences on wearable micro and nano technologies for personalized health with personal health management systems, hosted by the University of Genoa, Italy, and held as an online event from 8 – 10 November 2021. The conference focused on digital health ecosystems in the transformation of healthcare towards personalized, participative, preventive, predictive precision medicine (5P medicine). The book contains 46 peer-reviewed papers (1 keynote, 5 invited papers, 33 full papers, and 7 poster papers). Subjects covered include the deployment of mobile technologies, micro-nano-bio smart systems, bio-data management and analytics, autonomous and intelligent systems, the Health Internet of Things (HIoT), as well as potential risks for security and privacy, and the motivation and empowerment of patients in care processes. Providing an overview of current advances in personalized health and health management, the book will be of interest to all those working in the field of healthcare today

    Biomedical Literature Mining and Knowledge Discovery of Phenotyping Definitions

    Get PDF
    Indiana University-Purdue University Indianapolis (IUPUI)Phenotyping definitions are essential in cohort identification when conducting clinical research, but they become an obstacle when they are not readily available. Developing new definitions manually requires expert involvement that is labor-intensive, time-consuming, and unscalable. Moreover, automated approaches rely mostly on electronic health records’ data that suffer from bias, confounding, and incompleteness. Limited efforts established in utilizing text-mining and data-driven approaches to automate extraction and literature-based knowledge discovery of phenotyping definitions and to support their scalability. In this dissertation, we proposed a text-mining pipeline combining rule-based and machine-learning methods to automate retrieval, classification, and extraction of phenotyping definitions’ information from literature. To achieve this, we first developed an annotation guideline with ten dimensions to annotate sentences with evidence of phenotyping definitions' modalities, such as phenotypes and laboratories. Two annotators manually annotated a corpus of sentences (n=3,971) extracted from full-text observational studies’ methods sections (n=86). Percent and Kappa statistics showed high inter-annotator agreement on sentence-level annotations. Second, we constructed two validated text classifiers using our annotated corpora: abstract-level and full-text sentence-level. We applied the abstract-level classifier on a large-scale biomedical literature of over 20 million abstracts published between 1975 and 2018 to classify positive abstracts (n=459,406). After retrieving their full-texts (n=120,868), we extracted sentences from their methods sections and used the full-text sentence-level classifier to extract positive sentences (n=2,745,416). Third, we performed a literature-based discovery utilizing the positively classified sentences. Lexica-based methods were used to recognize medical concepts in these sentences (n=19,423). Co-occurrence and association methods were used to identify and rank phenotype candidates that are associated with a phenotype of interest. We derived 12,616,465 associations from our large-scale corpus. Our literature-based associations and large-scale corpus contribute in building new data-driven phenotyping definitions and expanding existing definitions with minimal expert involvement

    Structured and unstructured data integration with electronic medical records

    Get PDF
    In recent years there has been a great population and technological evolution all over the world. At the same time, more areas beyond technology and information technology have also developed, namely medicine, which has led to an increase in average life expectancy which in turn, leads to a greater need for healthcare. In order to provide the best possible treatments and healthcare services, nowadays the hospitals store large amounts of data regarding patients and diseases (in the form of electronic medical records) or the logistics of some departments in their storage systems. Therefore, computer science techniques such as data mining and natural language processing have been used to extract knowledge and value from these information-rich sources in order not only to develop, for example, new models for disease prediction, as well as improving existing processes in healthcare centres and hospitals. This data storage can be done in one of three ways: structured, unstructured or semi-structured. In this paper, the author tested the integration of structured and unstructured data from two different departments of the same Portuguese hospital, in order to extract knowledge and improve hospital processes. Aiming to reduce the value loss of loading data that is not used in the healthcare providers systems.Nos últimos anos tem-se assistido a uma grande evolução populacional e tecnológica por todo o mundo. Paralelamente, mais áreas para além da tecnologia e informática têm-se também desenvolvido, nomeadamente a área da medicina, o que tem permitido um aumento na esperança média de vida que por sua vez leva a uma maior necessidade de cuidados de saúde. Com o intuito de fornecer os melhores serviços de saúde possíveis, nos dias que hoje os hospitais guardam nos seus sistemas informáticos grandes quantidades de dados relativamente aos pacientes e doenças (sobre a forma de registos médicos eletrónicos) ou relativos à logística de alguns departamentos dos hospitais, etc. Por conseguinte, a estes dados têm vindo a ser utilizadas técnicas da área das ciências da computação como o data mining e o processamento da língua natural para extrair conhecimento e valor dessas fontes ricas em informação com o intuito não só de desenvolver, por exemplo, novos modelos de predição de doenças, como também de melhorar processos já existentes em centros de saúde e hospitais. Este armazenamento de dados pode ser feito em uma de três formas: de forma estruturada, não estruturada ou semi-estruturada. Neste trabalho o autor testou a integração de dados estruturados e não estruturados de dois departamentos diferentes do mesmo hospital português, com o intuito de extrair conhecimento e melhorar os processos do hospital. Com o intuito de reduzir a perda do armazenamento de dados que não são utilizados
    corecore