1,100 research outputs found

    Histopathological image analysis : a review

    Get PDF
    Over the past decade, dramatic increases in computational power and improvement in image analysis algorithms have allowed the development of powerful computer-assisted analytical approaches to radiological data. With the recent advent of whole slide digital scanners, tissue histopathology slides can now be digitized and stored in digital image form. Consequently, digitized tissue histopathology has now become amenable to the application of computerized image analysis and machine learning techniques. Analogous to the role of computer-assisted diagnosis (CAD) algorithms in medical imaging to complement the opinion of a radiologist, CAD algorithms have begun to be developed for disease detection, diagnosis, and prognosis prediction to complement the opinion of the pathologist. In this paper, we review the recent state of the art CAD technology for digitized histopathology. This paper also briefly describes the development and application of novel image analysis technology for a few specific histopathology related problems being pursued in the United States and Europe

    Multilevel cluster ensembling for histopathological image segmentation

    Get PDF
    Ankara : The Department of Computer Engineering and the Graduate School of Engineering and Science of Bilkent University, 2011.Thesis (Master's) -- Bilkent University, 2011.Includes bibliographical references leaves 58-67.In cancer diagnosis and grading, histopathological examination of tissues by pathologists is accepted as the gold standard. However, this procedure has observer variability and leads to subjectivity in diagnosis. In order to overcome such problems, computational methods which use quantitative measures are proposed. These methods extract mathematical features from tissue images assuming they are composed of homogeneous regions and classify images. This assumption is not always true and segmentation of images before classification is necessary. There are methods to segment images but most of them are proposed for generic images and work on the pixel-level. Recently few algorithms incorporated medical background knowledge into segmentation. Their high level feature definitions are very promising. However, in the segmentation step, they use region growing approaches which are not very stable and may lead to local optima. In this thesis, we present an efficient and stable method for the segmentation of histopathological images which produces high quality results. We use existing high level feature definitions to segment tissue images. Our segmentation method significantly improves the segmentation accuracy and stability, compared to existing methods which use the same feature definition. We tackle image segmentation problem as a clustering problem. To improve the quality and the stability of the clustering results, we combine different clustering solutions. This approach is also known as cluster ensembles. We formulate the clustering problem as a graph partitioning problem. In order to obtain diverse and high quality clustering results quickly, we made modifications and improvements on the well-known multilevel graph partitioning scheme. Our method clusters medically meaningful components in tissue images into regions and obtains the final segmentation. Experiments showed that our multilevel cluster ensembling approach performed significantly better than existing segmentation algorithms used for generic and tissue images. Although most of the images used in experiments, contain noise and artifacts, the proposed algorithm produced high quality results.Şimşek, Ahmet ÇağrıM.S

    Ensemble based Clustering of Plasmodium falciparum genes

    Get PDF
    Ensemble learning is a recent and extended approach to the unsupervised data mining technique called clustering which is used from finding natunl gmupings that exist in a dataset. Hetre, we applied an ensemble based clustering algol'ithm called Random Fot·ests with Pat·tition amund Medoids (PAM) to multiple time sel'ies gene expt·ession data of Plasmodium falcipat·um. The Random Fot·est algol'ithm is most common ensemble leat·ning appmach that uses decision tt·ees. Random Fm·est consists of lat·ge numbet· of classification tt·ees (nnging fmm hundt·eds to thousands) built from rabootstnp sampling of the dataset. We also applied the following intemal clustet· validity measures; Silhouette Width index, Connectivity Index and the Dunn Index to select the optimal numbet· of final clustet·s. Om· t·esults show that ensemble based clustering is indeed a good altet·native fm· clustet· analysis with the premise of an improved performance ovet· traditional clustering algorithm

    A decision support system to follow up and diagnose primary headache patients using semantically enriched data

    Get PDF
    Abstract Background Headache disorders are an important health burden, having a large health-economic impact worldwide. Current treatment & follow-up processes are often archaic, creating opportunities for computer-aided and decision support systems to increase their efficiency. Existing systems are mostly completely data-driven, and the underlying models are a black-box, deteriorating interpretability and transparency, which are key factors in order to be deployed in a clinical setting. Methods In this paper, a decision support system is proposed, composed of three components: (i) a cross-platform mobile application to capture the required data from patients to formulate a diagnosis, (ii) an automated diagnosis support module that generates an interpretable decision tree, based on data semantically annotated with expert knowledge, in order to support physicians in formulating the correct diagnosis and (iii) a web application such that the physician can efficiently interpret captured data and learned insights by means of visualizations. Results We show that decision tree induction techniques achieve competitive accuracy rates, compared to other black- and white-box techniques, on a publicly available dataset, referred to as migbase. Migbase contains aggregated information of headache attacks from 849 patients. Each sample is labeled with one of three possible primary headache disorders. We demonstrate that we are able to reduce the classification error, statistically significant (ρ≤0.05), with more than 10% by balancing the dataset using prior expert knowledge. Furthermore, we achieve high accuracy rates by using features extracted using the Weisfeiler-Lehman kernel, which is completely unsupervised. This makes it an ideal approach to solve a potential cold start problem. Conclusion Decision trees are the perfect candidate for the automated diagnosis support module. They achieve predictive performances competitive to other techniques on the migbase dataset and are, foremost, completely interpretable. Moreover, the incorporation of prior knowledge increases both predictive performance as well as transparency of the resulting predictive model on the studied dataset

    Gait rehabilitation monitor

    Get PDF
    This work presents a simple wearable, non-intrusive affordable mobile framework that allows remote patient monitoring during gait rehabilitation, by doctors and physiotherapists. The system includes a set of 2 Shimmer3 9DoF Inertial Measurement Units (IMUs), Bluetooth compatible from Shimmer, an Android smartphone for collecting and primary processing of data and persistence in a local database. Low computational load algorithms based on Euler angles and accelerometer, gyroscope and magnetometer signals were developed and used for the classification and identification of several gait disturbances. These algorithms include the alignment of IMUs sensors data by means of a common temporal reference as well as heel strike and stride detection algorithms to help segmentation of the remotely collected signals by the System app to identify gait strides and extract relevant features to feed, train and test a classifier to predict gait abnormalities in gait sessions. A set of drivers from Shimmer manufacturer is used to make the connection between the app and the set of IMUs using Bluetooth. The developed app allows users to collect data and train a classification model for identifying abnormal and normal gait types. The system provides a REST API available in a backend server along with Java and Python libraries and a PostgreSQL database. The machine-learning type is Supervised using Extremely Randomized Trees method. Frequency, time and time-frequency domain features were extracted from the collected and processed signals to train the classifier. To test the framework a set of gait abnormalities and normal gait were used to train a model and test the classifier.Este trabalho apresenta uma estrutura móvel acessível, simples e não intrusiva, que permite a monitorização e a assistência remota de pacientes durante a reabilitação da marcha, por médicos e fisioterapeutas que monitorizam a reabilitação da marcha do paciente. O sistema inclui um conjunto de 2 IMUs (Inertial Mesaurement Units) Shimmer3 da marca Shimmer, compatíveís com Bluetooth, um smartphone Android para recolha, e pré-processamento de dados e armazenamento numa base de dados local. Algoritmos de baixa carga computacional baseados em ângulos Euler e sinais de acelerómetros, giroscópios e magnetómetros foram desenvolvidos e utilizados para a classificação e identificação de diversas perturbações da marcha. Estes algoritmos incluem o alinhamento e sincronização dos dados dos sensores IMUs usando uma referência temporal comum, além de algoritmos de detecção de passos e strides para auxiliar a segmentação dos sinais recolhidos remotamente pelaappdestaframeworke identificar os passos da marcha extraindo as características relevantes para treinar e testar um classificador que faça a predição de deficiências na marcha durante as sessões de monitorização. Um conjunto de drivers do fabricante Shimmer é usado para fazer a conexão entre a app e o conjunto de IMUs através de Bluetooth. A app desenvolvida permite aos utilizadores recolher dados e treinar um modelo de classificação para identificar os tipos de marcha normais e patológicos. O sistema fornece uma REST API disponível num servidor backend recorrendo a bibliotecas Java e Python e a uma base de dados PostgreSQL. O tipo de machine-learning é Supervisionado usando Extremely Randomized Trees. Features no domínio do tempo, da frequência e do tempo-frequência foram extraídas dos sinais recolhidos e processados para treinar o classificador. Para testar a estrutura, um conjunto de marchas patológicas e normais foram utilizadas para treinar um modelo e testar o classificador

    Machine Learning Models for High-dimensional Biomedical Data

    Get PDF
    abstract: The recent technological advances enable the collection of various complex, heterogeneous and high-dimensional data in biomedical domains. The increasing availability of the high-dimensional biomedical data creates the needs of new machine learning models for effective data analysis and knowledge discovery. This dissertation introduces several unsupervised and supervised methods to help understand the data, discover the patterns and improve the decision making. All the proposed methods can generalize to other industrial fields. The first topic of this dissertation focuses on the data clustering. Data clustering is often the first step for analyzing a dataset without the label information. Clustering high-dimensional data with mixed categorical and numeric attributes remains a challenging, yet important task. A clustering algorithm based on tree ensembles, CRAFTER, is proposed to tackle this task in a scalable manner. The second part of this dissertation aims to develop data representation methods for genome sequencing data, a special type of high-dimensional data in the biomedical domain. The proposed data representation method, Bag-of-Segments, can summarize the key characteristics of the genome sequence into a small number of features with good interpretability. The third part of this dissertation introduces an end-to-end deep neural network model, GCRNN, for time series classification with emphasis on both the accuracy and the interpretation. GCRNN contains a convolutional network component to extract high-level features, and a recurrent network component to enhance the modeling of the temporal characteristics. A feed-forward fully connected network with the sparse group lasso regularization is used to generate the final classification and provide good interpretability. The last topic centers around the dimensionality reduction methods for time series data. A good dimensionality reduction method is important for the storage, decision making and pattern visualization for time series data. The CRNN autoencoder is proposed to not only achieve low reconstruction error, but also generate discriminative features. A variational version of this autoencoder has great potential for applications such as anomaly detection and process control.Dissertation/ThesisDoctoral Dissertation Industrial Engineering 201

    EGFR and KRAS mutation prediction on lung cancer through medical image processing and artificial intelligence

    Get PDF
    Lung cancer causes more deaths globally than any other type of cancer. To determine the best treatment, detecting EGFR and KRAS mutations is of interest. However, non-invasive ways to obtain this information are not available. In this study, an ensemble approach is applied to increase the performance of EGFR and KRAS mutation prediction from CT images using a small dataset. A new voting scheme, Selective Class Average Voting (SCAV) is proposed and its performance is assessed both for machine learning models and Convolutional Neural Networks (CNNs). For the EGFR mutation, in the machine learning approach, there was an increase in the Sensitivity from 0.66 to 0.75, and an increase in AUC from 0.68 to 0.70. With the deep learning approach an AUC of 0.846 was obtained with custom CNNs, and with SCAV the Accuracy of the model was increased from 0.80 to 0.857. Finally, when combining the best Custom and Pre-trained CNNs using SCAV an AUC of 0.914 was obtained. For the KRAS mutation both in the machine learning models (0.65 to 0.71 AUC) and the deep learning models (0.739 to 0.778 AUC) a significant increase in performance was found. This increase was even greater with Ensembles of Pre-trained CNNs (0.809 AUC). The results obtained in this work show how to effectively learn from small image datasets to predict EGFR and KRAS mutations, and that using ensembles with SCAV increases the performance of machine learning classifiers and CNNs.DoctoradoDoctor en Ingeniería de Sistemas y Computació

    Explainable clinical decision support system: opening black-box meta-learner algorithm expert's based

    Get PDF
    Mathematical optimization methods are the basic mathematical tools of all artificial intelligence theory. In the field of machine learning and deep learning the examples with which algorithms learn (training data) are used by sophisticated cost functions which can have solutions in closed form or through approximations. The interpretability of the models used and the relative transparency, opposed to the opacity of the black-boxes, is related to how the algorithm learns and this occurs through the optimization and minimization of the errors that the machine makes in the learning process. In particular in the present work is introduced a new method for the determination of the weights in an ensemble model, supervised and unsupervised, based on the well known Analytic Hierarchy Process method (AHP). This method is based on the concept that behind the choice of different and possible algorithms to be used in a machine learning problem, there is an expert who controls the decisionmaking process. The expert assigns a complexity score to each algorithm (based on the concept of complexity-interpretability trade-off) through which the weight with which each model contributes to the training and prediction phase is determined. In addition, different methods are presented to evaluate the performance of these algorithms and explain how each feature in the model contributes to the prediction of the outputs. The interpretability techniques used in machine learning are also combined with the method introduced based on AHP in the context of clinical decision support systems in order to make the algorithms (black-box) and the results interpretable and explainable, so that clinical-decision-makers can take controlled decisions together with the concept of "right to explanation" introduced by the legislator, because the decision-makers have a civil and legal responsibility of their choices in the clinical field based on systems that make use of artificial intelligence. No less, the central point is the interaction between the expert who controls the algorithm construction process and the domain expert, in this case the clinical one. Three applications on real data are implemented with the methods known in the literature and with those proposed in this work: one application concerns cervical cancer, another the problem related to diabetes and the last one focuses on a specific pathology developed by HIV-infected individuals. All applications are supported by plots, tables and explanations of the results, implemented through Python libraries. The main case study of this thesis regarding HIV-infected individuals concerns an unsupervised ensemble-type problem, in which a series of clustering algorithms are used on a set of features and which in turn produce an output used again as a set of meta-features to provide a set of labels for each given cluster. The meta-features and labels obtained by choosing the best algorithm are used to train a Logistic regression meta-learner, which in turn is used through some explainability methods to provide the value of the contribution that each algorithm has had in the training phase. The use of Logistic regression as a meta-learner classifier is motivated by the fact that it provides appreciable results and also because of the easy explainability of the estimated coefficients

    Artificial Intelligence in Image-Based Screening, Diagnostics, and Clinical Care of Cardiopulmonary Diseases

    Get PDF
    Cardiothoracic and pulmonary diseases are a significant cause of mortality and morbidity worldwide. The COVID-19 pandemic has highlighted the lack of access to clinical care, the overburdened medical system, and the potential of artificial intelligence (AI) in improving medicine. There are a variety of diseases affecting the cardiopulmonary system including lung cancers, heart disease, tuberculosis (TB), etc., in addition to COVID-19-related diseases. Screening, diagnosis, and management of cardiopulmonary diseases has become difficult owing to the limited availability of diagnostic tools and experts, particularly in resource-limited regions. Early screening, accurate diagnosis and staging of these diseases could play a crucial role in treatment and care, and potentially aid in reducing mortality. Radiographic imaging methods such as computed tomography (CT), chest X-rays (CXRs), and echo ultrasound (US) are widely used in screening and diagnosis. Research on using image-based AI and machine learning (ML) methods can help in rapid assessment, serve as surrogates for expert assessment, and reduce variability in human performance. In this Special Issue, “Artificial Intelligence in Image-Based Screening, Diagnostics, and Clinical Care of Cardiopulmonary Diseases”, we have highlighted exemplary primary research studies and literature reviews focusing on novel AI/ML methods and their application in image-based screening, diagnosis, and clinical management of cardiopulmonary diseases. We hope that these articles will help establish the advancements in AI