17 research outputs found

    Data-Driven Modeling For Decision Support Systems And Treatment Management In Personalized Healthcare

    Get PDF
    Massive amount of electronic medical records (EMRs) accumulating from patients and populations motivates clinicians and data scientists to collaborate for the advanced analytics to create knowledge that is essential to address the extensive personalized insights needed for patients, clinicians, providers, scientists, and health policy makers. Learning from large and complicated data is using extensively in marketing and commercial enterprises to generate personalized recommendations. Recently the medical research community focuses to take the benefits of big data analytic approaches and moves to personalized (precision) medicine. So, it is a significant period in healthcare and medicine for transferring to a new paradigm. There is a noticeable opportunity to implement a learning health care system and data-driven healthcare to make better medical decisions, better personalized predictions; and more precise discovering of risk factors and their interactions. In this research we focus on data-driven approaches for personalized medicine. We propose a research framework which emphasizes on three main phases: 1) Predictive modeling, 2) Patient subgroup analysis and 3) Treatment recommendation. Our goal is to develop novel methods for each phase and apply them in real-world applications. In the fist phase, we develop a new predictive approach based on feature representation using deep feature learning and word embedding techniques. Our method uses different deep architectures (Stacked autoencoders, Deep belief network and Variational autoencoders) for feature representation in higher-level abstractions to obtain effective and more robust features from EMRs, and then build prediction models on the top of them. Our approach is particularly useful when the unlabeled data is abundant whereas labeled one is scarce. We investigate the performance of representation learning through a supervised approach. We perform our method on different small and large datasets. Finally we provide a comparative study and show that our predictive approach leads to better results in comparison with others. In the second phase, we propose a novel patient subgroup detection method, called Supervised Biclustring (SUBIC) using convex optimization and apply our approach to detect patient subgroups and prioritize risk factors for hypertension (HTN) in a vulnerable demographic subgroup (African-American). Our approach not only finds patient subgroups with guidance of a clinically relevant target variable but also identifies and prioritizes risk factors by pursuing sparsity of the input variables and encouraging similarity among the input variables and between the input and target variables. Finally, in the third phase, we introduce a new survival analysis framework using deep learning and active learning with a novel sampling strategy. First, our approach provides better representation with lower dimensions from clinical features using labeled (time-to-event) and unlabeled (censored) instances and then actively trains the survival model by labeling the censored data using an oracle. As a clinical assistive tool, we propose a simple yet effective treatment recommendation approach based on our survival model. In the experimental study, we apply our approach on SEER-Medicare data related to prostate cancer among African-Americans and white patients. The results indicate that our approach outperforms significantly than baseline models

    Machine Learning based Early Stage Identification of Liver Tumor using Ultrasound Images

    Get PDF
    Liver cancer is one of the most malignant diseases and its diagnosis requires more computational time. It can be minimized by applying a Machine learning algorithm for the diagnosis of cancer. The existing machine learning technique uses only the color-based methods to classify images which are not efficient. So, it is proposed to use texture-based classification for diagnosis. The input image is resized and pre-processed by Gaussian filters. The features are extracted by applying Gray level co-occurrence matrix (GLCM) and Local binary pattern (LBP in the preprocessed image. The Local Binary Pattern (LBP) is an efficient texture operator which labels the pixels of an image by thresholding the neighborhood of each pixel and considers the result as a binary number. The extracted features are classified by multi-support vector machine (Multi SVM) and K-Nearest Neighbor (K-NN) algorithms. The Advantage of combining SVM with KNN is that SVM measures a large number of values whereas KNN accurately measures point values. The results obtained from the proposed techniques achieved high precision, accuracy, sensitivity and specificity than the existing method

    Implementing decision tree-based algorithms in medical diagnostic decision support systems

    Get PDF
    As a branch of healthcare, medical diagnosis can be defined as finding the disease based on the signs and symptoms of the patient. To this end, the required information is gathered from different sources like physical examination, medical history and general information of the patient. Development of smart classification models for medical diagnosis is of great interest amongst the researchers. This is mainly owing to the fact that the machine learning and data mining algorithms are capable of detecting the hidden trends between features of a database. Hence, classifying the medical datasets using smart techniques paves the way to design more efficient medical diagnostic decision support systems. Several databases have been provided in the literature to investigate different aspects of diseases. As an alternative to the available diagnosis tools/methods, this research involves machine learning algorithms called Classification and Regression Tree (CART), Random Forest (RF) and Extremely Randomized Trees or Extra Trees (ET) for the development of classification models that can be implemented in computer-aided diagnosis systems. As a decision tree (DT), CART is fast to create, and it applies to both the quantitative and qualitative data. For classification problems, RF and ET employ a number of weak learners like CART to develop models for classification tasks. We employed Wisconsin Breast Cancer Database (WBCD), Z-Alizadeh Sani dataset for coronary artery disease (CAD) and the databanks gathered in Ghaem Hospital’s dermatology clinic for the response of patients having common and/or plantar warts to the cryotherapy and/or immunotherapy methods. To classify the breast cancer type based on the WBCD, the RF and ET methods were employed. It was found that the developed RF and ET models forecast the WBCD type with 100% accuracy in all cases. To choose the proper treatment approach for warts as well as the CAD diagnosis, the CART methodology was employed. The findings of the error analysis revealed that the proposed CART models for the applications of interest attain the highest precision and no literature model can rival it. The outcome of this study supports the idea that methods like CART, RF and ET not only improve the diagnosis precision, but also reduce the time and expense needed to reach a diagnosis. However, since these strategies are highly sensitive to the quality and quantity of the introduced data, more extensive databases with a greater number of independent parameters might be required for further practical implications of the developed models


    Get PDF

    Development of Artificial Intelligence systems as a prediction tool in ovarian cancer

    Get PDF
    PhD ThesisOvarian cancer is the 5th most common cancer in females and the UK has one of the highest incident rates in Europe. In the UK only 36% of patients will live for at least 5 years after diagnosis. The number of prognostic markers, treatments and the sequences of treatments in ovarian cancer are rising. Therefore, it is getting more difficult for the human brain to perform clinical decision making. There is a need for an expert computer system (e.g. Artificial Intelligence (AI)), which is capable of investigating the possible outcomes for each marker, treatment and sequence of treatment. Such expert systems may provide a tool which could help clinicians to analyse and predict outcome using different treatment pathways. Whilst prediction of overall survival of a patient is difficult there may be some benefits, as this not only is useful information for the patient but may also determine treatment modality. In this project a dataset was constructed of 352 patients who had been treated at a single centre. Clinical data were extracted from the health records. Expert systems were then investigated to determine the optimum model to predict overall survival of a patient. The five year survival period (a standard survival outcome measure in cancer research) was investigated; in addition, the system was developed with the flexibility to predict patient survival rates for many other categories. Comparisons with currently used prognostic models in ovarian cancer demonstrated a significant improvement in performance for the AI model (Area under the Curve (AUC) of 0.72 for AI and AUC of 0.62 for the statistical model). Using various methods, the most important variables in this prediction were identified as: FIGO stage, outcome of the surgery and CA125. This research investigated the effects of increasing the number of cases in prediction models. Results indicated that by increasing the number of cases, the prediction performance improved. Categorization of continuous data did not improve the prediction performance. The project next investigated the possibility of predicting surgical outcomes in ovarian cancer using AI, based on the variables that are available for clinicians prior to the surgery. Such a tool could have direct clinical relevance. Diverse models that can predict the outcome of the surgery were investigated and developed. The developed AI models were also compared against the standard statistical prediction model, which demonstrated that the AI model outperformed the statistical prediction model: the prediction of all outcomes (complete or optimal or suboptimal) (AUC of AI: 0.71 and AUC of statistical model: 0.51), the prediction of complete or optimal cytoreduction versus suboptimal cytoreduction (AUC of AI: 0.73 and AUC of statistical model: 0.50) and finally the prediction of complete cytoreduction versus optimal or suboptimal cytoreduction (AUC of AI: 0.79 and AUC of statistical model: 0.47). The most important variables for this prediction were identified as: FIGO stage, tumour grade and histology. The application of transcriptomic analysis to cancer research raises the question of which genes are significantly involved in a particular cancer and which genes can accurately predict survival outcomes in a given cancer. Therefore, AI techniques were employed to identify the most important genes for the prediction of Homologous Recombination (HR), an important DNA repair pathway in ovarian cancer, identifying LIG1 and POLD3 as novel prognostic biomarkers. Finally, AI models were used to predict the HR status for any given patient (AUC: 0.87). This project has demonstrated that AI may have an important role in ovarian cancer. AI systems may provide tools to help clinicians and research in ovarian cancer and may allow more informed decisions resulting in better management of this cancer

    Measuring Confidence in Classification Decisions for Clinical Decision Support Systems: A Gaussian Bayes Optimization Approach

    Get PDF
    This thesis generally investigated various aspects of designing and developing Clinical Decision Support Systems (CDSSs), but in particular exploited machine learning techniques in supporting medical diagnosis decisions. Having reviewed the fundamental functional components of existing modern CDSSs, it shows that most such systems were lacking a trusted decision evaluation module that provides reliable information about decision strengths. Therefore a refined CDSS system framework was first proposed, which centralises the concept of confidence-based classification by coupling eventual decision outcomes with a level of decision reliability. Based on measure theory, a unified Decision Score measure of the decision reliability was introduced, which combines the decision outcomes in terms of positive or negative signs together with the decision strength in percentage values. Furthermore, the behaviour of the proposed decision score measure was investigated in more complex and diverse feature spaces of high dimensionality, where the challenges of the “curse of dimensionality” are encountered. Such challenge was handled by revisiting the problem under orthogonal projections of the feature space, and have developed a new measure in performing quantified evaluations on the decision score measure, known as the Decision Sensitivity measure. The key influencing factors for the sensitivity of decisions were found to include not only the dimensionality of the selected features, but also the standard deviation of each feature used in the transformed orthogonal space. After the basic concept of the decision score measure is established, this thesis further extended the uses of the decision score measure in a multiple classifiers setting. This thesis first reviewed the principles and rationales behind various well-established information fusion schemes and tested their strengths and limitations in adapting the proposed decision score measure. Moreover, a correlation-based decision fusion scheme was proposed in maximising the potentials of the decision score measure in complex scenarios. Based on the evaluation results across different datasets, it proves that fusion schemes improve the robustness of the decision models while maintaining a good level of diagnostic accuracy in general. As clinical decision making normally faces new unseen cases and unpredictable challenges, it is essential to maintain a degree of adaptivity in a CDSS for post-deployment robustness of the system. Therefore, the last piece of the research reported in this thesis focused on investigating possible ways to refine the CDSS decision scores model in a time-efficient manner, spontaneously. In particular, this thesis reviewed several commonly used metrics and methods for monitoring and refining prediction models, and further adapted these methods to the proposed decision score measure


    Get PDF
    In this era of precision medicine, clinicians and researchers critically need the assistance of computational models that can accurately predict various clinical events and outcomes (e.g,, diagnosis of disease, determining the stage of the disease, or molecular subtyping). Typically, statistics and machine learning are applied to ‘omic’ datasets, yielding computational models that can be used for prediction. In cancer research there is still a critical need for computational models that have high classification performance but are also parsimonious in the number of variables they use. Some models are very good at performing their intended classification task, but are too complex for human researchers and clinicians to understand, due to the large number of variables they use. In contrast, some models are specifically built with a small number of variables, but may lack excellent predictive performance. This dissertation proposes a novel framework, called Junction to Knowledge (J2K), for the construction of parsimonious computational models. The J2K framework consists of four steps: filtering (discretization and variable selection), Bayesian network generation, Junction tree generation, and clique evaluation. The outcome of applying J2K to a particular dataset is a parsimonious Bayesian network model with high predictive performance, but also that is composed of a small number of variables. Not only does J2K find parsimonious gene cliques, but also provides the ability to create multi-omic models that can further improve the classification performance. These multi-omic models have the potential to accelerate biomedical discovery, followed by translation of their results into clinical practice

    Pattern Recognition

    Get PDF
    Pattern recognition is a very wide research field. It involves factors as diverse as sensors, feature extraction, pattern classification, decision fusion, applications and others. The signals processed are commonly one, two or three dimensional, the processing is done in real- time or takes hours and days, some systems look for one narrow object class, others search huge databases for entries with at least a small amount of similarity. No single person can claim expertise across the whole field, which develops rapidly, updates its paradigms and comprehends several philosophical approaches. This book reflects this diversity by presenting a selection of recent developments within the area of pattern recognition and related fields. It covers theoretical advances in classification and feature extraction as well as application-oriented works. Authors of these 25 works present and advocate recent achievements of their research related to the field of pattern recognition

    Segmentação Automática de Lesões de Esclerose Múltipla em Imagens de Ressonância Magnética

    Get PDF
    A esclerose múltipla (EM) é o distúrbio neurológico mais comum diagnosticado em jovens adultos com causas inexplicáveis e grandes repercussões na vida dos pacientes, incitando os investigadores na procura ativa de respostas. Embora a doença não possa ser curada ou prevenida, neste momento, os tratamentos disponíveis permitem apenas reduzir a gravidade da mesma e retardar a sua progressão. Torna-se cada vez mais necessário recorrer a técnicas de imagiologia e de processamento e análise de imagem, para ajudar os médicos a realizar um diagnóstico precoce e iniciar o tratamento adequado a fim de proporcionar uma melhor qualidade de vida ao paciente. Várias abordagens baseadas em segmentação automática de lesões de esclerose múltipla tem sido amplamente investigadas nos últimos anos com esse objetivo.Para o desenvolvimento deste projeto, procurou-se por um lado, o reconhecimento das etapas necessárias para implementação e otimização de uma metodologia de processamento e análise de imagem para segmentação automática de lesões de EM, e por outro, a exploração de técnicas de pré-processamento, segmentação e classificação para caracterização objetiva e quantitativa das lesões. Neste trabalho serão ainda abordados conceitos fundamentais sobre a doença de esclerose múltipla e da técnica de ressonância magnética (RM), bem como o estudo bibliográfico de algumas das metodologias atualmente existentes.A metodologia desenvolvida nesta Dissertação teve como base a implementação de diversos algoritmos de pré-processamento para suavização e remoção de ruído, remoção de tecidos não-cerebrais, correção de contraste e normalização de intensidade das imagens. Para segmentação de lesões foi aplicado o estudo de redes neuronais, uma abordagem bastante promissora e atual para o problema proposto, e para classificação foram extraídas e analisadas algumas características das lesões através da sua forma e tamanho. Pretende-se que esta nova metodologia seja flexível e permita o ensaio e a análise dos resultados.Os resultados obtidos demonstram que as técnicas de pré-processamento se revelam essenciais para as etapas subsequentes permitindo uma melhor qualidade da imagem. A segmentação de lesões através do uso de redes neuronais revelou-se apropriada tal como comprovado pelas métricas analisadas, com índice de similaridade estrutural muito próximo de 1, taxa de erro absoluto médio de 3,8% e coeficiente de Dice de 0,58. Por fim, pelas várias aplicações práticas realizadas, foi possível demonstrar a utilidade e adequação das técnicas de processamento e análise de imagem no estudo e deteção de lesões de esclerose múltipla através de imagens de RM.Multiple sclerosis is the most commonly diagnosed neurological disorder in young adults with unexplained causes and major repercussions in the lives of patients, urging researchers to actively search for answers. Although the disease cannot be cured or prevented, the available treatments nowadays reduce its severity and delay its progression. It is becoming increasingly necessary to use imaging techniques and also image processing and analysis techniques, to help doctors perform an early diagnosis and start appropriate treatment in order to provide a better quality of life for the patient. Several approaches based on automatic segmentation of multiple sclerosis lesions have been extensively investigated in recent years for this purpose.This project was developed, firstly, with the recognition of the steps necessary to implement and optimize an image processing and analysis methodology for automatic segmentation of MS lesions, and secondly, by the exploration of pre-processing, segmentation and classification techniques for objective and quantitative characterization of the lesions. This work will also be discussed basic concepts of multiple sclerosis disease and magnetic resonance imaging (MRI), as well as the bibliographical study of some of the currently existing methodologies.The methodology developed in this dissertation was based on the implementation of several pre-processing algorithms for noise smoothing and removal, non-cerebral tissue removal, contrast correction and normalization of images intensity. For lesion segmentation was applied to the study of neural networks, a very promising and current approach to the proposed problem, and to classify were extracted and analyzed some characteristics of the lesions through shape and size. It is intended that this new methodology is flexible and allow the testing and analysis of the results.The results obtained demonstrate that pre-processing techniques are essential to the subsequent steps allowing better image quality. Segmentation of lesions through the use of neural networks proved to be appropriate for this study as shown by the metrics analyzed, with a structural similarity index very close to 1, mean absolute error rate of 3.8% and Dice coefficient of 0.58. Finally, the various practical applications performed was possible to demonstrate the usefulness and adequacy of image processing and analysis techniques in the study and detection of multiple sclerosis lesions through MR images

    Recuperação de informação multimodal em repositórios de imagem médica

    Get PDF
    The proliferation of digital medical imaging modalities in hospitals and other diagnostic facilities has created huge repositories of valuable data, often not fully explored. Moreover, the past few years show a growing trend of data production. As such, studying new ways to index, process and retrieve medical images becomes an important subject to be addressed by the wider community of radiologists, scientists and engineers. Content-based image retrieval, which encompasses various methods, can exploit the visual information of a medical imaging archive, and is known to be beneficial to practitioners and researchers. However, the integration of the latest systems for medical image retrieval into clinical workflows is still rare, and their effectiveness still show room for improvement. This thesis proposes solutions and methods for multimodal information retrieval, in the context of medical imaging repositories. The major contributions are a search engine for medical imaging studies supporting multimodal queries in an extensible archive; a framework for automated labeling of medical images for content discovery; and an assessment and proposal of feature learning techniques for concept detection from medical images, exhibiting greater potential than feature extraction algorithms that were pertinently used in similar tasks. These contributions, each in their own dimension, seek to narrow the scientific and technical gap towards the development and adoption of novel multimodal medical image retrieval systems, to ultimately become part of the workflows of medical practitioners, teachers, and researchers in healthcare.A proliferação de modalidades de imagem médica digital, em hospitais, clínicas e outros centros de diagnóstico, levou à criação de enormes repositórios de dados, frequentemente não explorados na sua totalidade. Além disso, os últimos anos revelam, claramente, uma tendência para o crescimento da produção de dados. Portanto, torna-se importante estudar novas maneiras de indexar, processar e recuperar imagens médicas, por parte da comunidade alargada de radiologistas, cientistas e engenheiros. A recuperação de imagens baseada em conteúdo, que envolve uma grande variedade de métodos, permite a exploração da informação visual num arquivo de imagem médica, o que traz benefícios para os médicos e investigadores. Contudo, a integração destas soluções nos fluxos de trabalho é ainda rara e a eficácia dos mais recentes sistemas de recuperação de imagem médica pode ser melhorada. A presente tese propõe soluções e métodos para recuperação de informação multimodal, no contexto de repositórios de imagem médica. As contribuições principais são as seguintes: um motor de pesquisa para estudos de imagem médica com suporte a pesquisas multimodais num arquivo extensível; uma estrutura para a anotação automática de imagens; e uma avaliação e proposta de técnicas de representation learning para deteção automática de conceitos em imagens médicas, exibindo maior potencial do que as técnicas de extração de features visuais outrora pertinentes em tarefas semelhantes. Estas contribuições procuram reduzir as dificuldades técnicas e científicas para o desenvolvimento e adoção de sistemas modernos de recuperação de imagem médica multimodal, de modo a que estes façam finalmente parte das ferramentas típicas dos profissionais, professores e investigadores da área da saúde.Programa Doutoral em Informátic