693 research outputs found

    Breast cancer diagnosis: a survey of pre-processing, segmentation, feature extraction and classification

    Get PDF
    Machine learning methods have been an interesting method in the field of medical for many years, and they have achieved successful results in various fields of medical science. This paper examines the effects of using machine learning algorithms in the diagnosis and classification of breast cancer from mammography imaging data. Cancer diagnosis is the identification of images as cancer or non-cancer, and this involves image preprocessing, feature extraction, classification, and performance analysis. This article studied 93 different references mentioned in the previous years in the field of processing and tries to find an effective way to diagnose and classify breast cancer. Based on the results of this research, it can be concluded that most of today’s successful methods focus on the use of deep learning methods. Finding a new method requires an overview of existing methods in the field of deep learning methods in order to make a comparison and case study

    Enhancing Breast Cancer Prediction Using Unlabeled Data

    Get PDF
    Selles väitekirjas esitatakse sildistamata andmeid kasutav süvaõppe lähenemine rinna infiltratiivse duktaalse kartsinoomi koeregioonide automaatseks klassifitseerimiseks rinnavähi patoloogilistes digipreparaatides. Süvaõppe meetodite tööpõhimõte on sarnane inimajule, mis töötab samuti mitmetel tõlgendustasanditel. Need meetodid on osutunud tulemuslikeks ka väga keerukate probleemide nagu pildiliigituse ja esemetuvastuse lahendamisel, ületades seejuures varasemate lahendusviiside efektiivsust. Süvaõppeks on aga vaja suurt hulka sildistatud andmeid, mida võib olla keeruline saada, eriti veel meditsiinis, kuna nii haiglad kui ka patsiendid ei pruugi olla nõus sedavõrd delikaatset teavet loovutama. Lisaks sellele on masinõppesüsteemide saavutatavate aina paremate tulemuste hinnaks nende süsteemide sisemise keerukuse kasv. Selle sisemise keerukuse tõttu muutub raskemaks ka nende süsteemide töö mõistmine, mistõttu kasutajad ei kipu neid usaldama. Meditsiinilisi diagnoose ei saa järgida pimesi, kuna see võib endaga kaasa tuua patsiendi tervise kahjustamise. Mudeli mõistetavuse tagamine on seega oluline viis süsteemi usaldatavuse tõstmiseks, eriti just masinõppel põhinevate mudelite laialdasel rakendamisel sellistel kriitilise tähtsusega aladel nagu seda on meditsiin. Infiltratiivne duktaalne kartsinoom on üks levinumaid ja ka agressiivsemaid rinnavähi vorme, moodustades peaaegu 80% kõigist juhtumitest. Selle diagnoosimine on patoloogidele väga keerukas ja ajakulukas ülesanne, kuna nõuab võimalike pahaloomuliste kasvajate avastamiseks paljude healoomuliste piirkondade uurimist. Samas on infiltratiivse duktaalse kartsinoomi digipatoloogias täpne piiritlemine vähi agressiivsuse hindamise aspektist ülimalt oluline. Käesolevas uurimuses kasutatakse konvolutsioonilist närvivõrku arendamaks välja infiltratiivse duktaalse kartsinoomi diagnoosimisel rakendatav pooleldi juhitud õppe skeem. Välja pakutud raamistik suurendab esmalt väikest sildistatud andmete hulka generatiivse võistlusliku võrgu loodud sünteetiliste meditsiiniliste kujutistega. Seejärel kasutatakse juba eelnevalt treenitud võrku, et selle suurendatud andmekogumi peal läbi viia kujutuvastus, misjärel sildistamata andmed sildistatakse andmesildistusalgoritmiga. Töötluse tulemusena saadud sildistatud andmeid eelmainitud konvolutsioonilisse närvivõrku sisestades saavutatakse rahuldav tulemus: ROC kõvera alla jääv pindala ja F1 skoor on vastavalt 0.86 ja 0.77. Lisaks sellele võimaldavad välja pakutud mõistetavuse tõstmise tehnikad näha ka meditsiinilistele prognooside otsuse tegemise protsessi seletust, mis omakorda teeb süsteemi usaldamise kasutajatele lihtsamaks. Käesolev uurimus näitab, et konvolutsioonilise närvivõrgu tehtud otsuseid aitab paremini mõista see, kui kasutajatele visualiseeritakse konkreetse juhtumi puhul infiltratiivse duktaalse kartsinoomi positiivse või negatiivse otsuse langetamisel süsteemi jaoks kõige olulisemaks osutunud piirkondi.The following thesis presents a deep learning (DL) approach for automatic classification of invasive ductal carcinoma (IDC) tissue regions in whole slide images (WSI) of breast cancer (BC) using unlabeled data. DL methods are similar to the way the human brain works across different interpretation levels. These techniques have shown to outperform traditional approaches of the most complex problems such as image classification and object detection. However, DL requires a broad set of labeled data that is difficult to obtain, especially in the medical field as neither the hospitals nor the patients are willing to reveal such sensitive information. Moreover, machine learning (ML) systems are achieving better performance at the cost of becoming increasingly complex. Because of that, they become less interpretable that causes distrust from the users. Model interpretability is a way to enhance trust in a system. It is a very desirable property, especially crucial with the pervasive adoption of ML-based models in the critical domains like the medical field. With medical diagnostics, the predictions cannot be blindly followed as it may result in harm to the patient. IDC is one of the most common and aggressive subtypes of all breast cancers accounting nearly 80% of them. Assessment of the disease is a very time-consuming and challenging task for pathologists, as it involves scanning large swatches of benign regions to identify an area of malignancy. Meanwhile, accurate delineation of IDC in WSI is crucial for the estimation of grading cancer aggressiveness. In the following study, a semi-supervised learning (SSL) scheme is developed using the deep convolutional neural network (CNN) for IDC diagnosis. The proposed framework first augments a small set of labeled data with synthetic medical images, generated by the generative adversarial network (GAN) that is followed by feature extraction using already pre-trained network on the larger dataset and a data labeling algorithm that labels a much broader set of unlabeled data. After feeding the newly labeled set into the proposed CNN model, acceptable performance is achieved: the AUC and the F-measure accounting for 0.86, 0.77, respectively. Moreover, proposed interpretability techniques produce explanations for medical predictions and build trust in the presented CNN. The following study demonstrates that it is possible to enable a better understanding of the CNN decisions by visualizing areas that are the most important for a particular prediction and by finding elements that are the reasons for IDC, Non-IDC decisions made by the network

    Automated segmentation of tissue images for computerized IHC analysis

    Get PDF
    This paper presents two automated methods for the segmentation ofimmunohistochemical tissue images that overcome the limitations of themanual approach aswell as of the existing computerized techniques. The first independent method, based on unsupervised color clustering, recognizes automatically the target cancerous areas in the specimen and disregards the stroma; the second method, based on colors separation and morphological processing, exploits automated segmentation of the nuclear membranes of the cancerous cells. Extensive experimental results on real tissue images demonstrate the accuracy of our techniques compared to manual segmentations; additional experiments show that our techniques are more effective in immunohistochemical images than popular approaches based on supervised learning or active contours. The proposed procedure can be exploited for any applications that require tissues and cells exploration and to perform reliable and standardized measures of the activity of specific proteins involved in multi-factorial genetic pathologie

    Deep weakly-supervised learning methods for classification and localization in histology images: a survey

    Full text link
    Using state-of-the-art deep learning models for cancer diagnosis presents several challenges related to the nature and availability of labeled histology images. In particular, cancer grading and localization in these images normally relies on both image- and pixel-level labels, the latter requiring a costly annotation process. In this survey, deep weakly-supervised learning (WSL) models are investigated to identify and locate diseases in histology images, without the need for pixel-level annotations. Given training data with global image-level labels, these models allow to simultaneously classify histology images and yield pixel-wise localization scores, thereby identifying the corresponding regions of interest (ROI). Since relevant WSL models have mainly been investigated within the computer vision community, and validated on natural scene images, we assess the extent to which they apply to histology images which have challenging properties, e.g. very large size, similarity between foreground/background, highly unstructured regions, stain heterogeneity, and noisy/ambiguous labels. The most relevant models for deep WSL are compared experimentally in terms of accuracy (classification and pixel-wise localization) on several public benchmark histology datasets for breast and colon cancer -- BACH ICIAR 2018, BreaKHis, CAMELYON16, and GlaS. Furthermore, for large-scale evaluation of WSL models on histology images, we propose a protocol to construct WSL datasets from Whole Slide Imaging. Results indicate that several deep learning models can provide a high level of classification accuracy, although accurate pixel-wise localization of cancer regions remains an issue for such images. Code is publicly available.Comment: 35 pages, 18 figure

    Self-Path: self-supervision for classification of pathology images with limited annotations

    Get PDF
    While high-resolution pathology images lend themselves well to ‘data hungry’ deep learning algorithms, obtaining exhaustive annotations on these images for learning is a major challenge. In this paper, we propose a self-supervised convolutional neural network (CNN) frame-work to leverage unlabeled data for learning generalizable and domain invariant representations in pathology images. Our proposed framework, termed as Self-Path, employs multi-task learning where the main task is tissue classification and pretext tasks are a variety of self-supervised tasks with labels inherent to the input images.We introduce novel pathology-specific self-supervision tasks that leverage contextual, multi-resolution and semantic features in pathology images for semi-supervised learning and domain adaptation. We investigate the effectiveness of Self-Path on 3 different pathology datasets. Our results show that Self-Path with the pathology-specific pretext tasks achieves state-of-the-art performance for semi-supervised learning when small amounts of labeled data are available. Further, we show that Self-Path improves domain adaptation for histopathology image classification when there is no labeled data available for the target domain. This approach can potentially be employed for other applications in computational pathology, where annotation budget is often limited or large amount of unlabeled image data is available

    Graph Embedding via High Dimensional Model Representation for Hyperspectral Images

    Full text link
    Learning the manifold structure of remote sensing images is of paramount relevance for modeling and understanding processes, as well as to encapsulate the high dimensionality in a reduced set of informative features for subsequent classification, regression, or unmixing. Manifold learning methods have shown excellent performance to deal with hyperspectral image (HSI) analysis but, unless specifically designed, they cannot provide an explicit embedding map readily applicable to out-of-sample data. A common assumption to deal with the problem is that the transformation between the high-dimensional input space and the (typically low) latent space is linear. This is a particularly strong assumption, especially when dealing with hyperspectral images due to the well-known nonlinear nature of the data. To address this problem, a manifold learning method based on High Dimensional Model Representation (HDMR) is proposed, which enables to present a nonlinear embedding function to project out-of-sample samples into the latent space. The proposed method is compared to manifold learning methods along with its linear counterparts and achieves promising performance in terms of classification accuracy of a representative set of hyperspectral images.Comment: This is an accepted version of work to be published in the IEEE Transactions on Geoscience and Remote Sensing. 11 page

    A Survey on Deep Learning in Medical Image Analysis

    Full text link
    Deep learning algorithms, in particular convolutional networks, have rapidly become a methodology of choice for analyzing medical images. This paper reviews the major deep learning concepts pertinent to medical image analysis and summarizes over 300 contributions to the field, most of which appeared in the last year. We survey the use of deep learning for image classification, object detection, segmentation, registration, and other tasks and provide concise overviews of studies per application area. Open challenges and directions for future research are discussed.Comment: Revised survey includes expanded discussion section and reworked introductory section on common deep architectures. Added missed papers from before Feb 1st 201

    Data Reduction Algorithms in Machine Learning and Data Science

    Get PDF
    Raw data are usually required to be pre-processed for better representation or discrimination of classes. This pre-processing can be done by data reduction, i.e., either reduction in dimensionality or numerosity (cardinality). Dimensionality reduction can be used for feature extraction or data visualization. Numerosity reduction is useful for ranking data points or finding the most and least important data points. This thesis proposes several algorithms for data reduction, known as dimensionality and numerosity reduction, in machine learning and data science. Dimensionality reduction tackles feature extraction and feature selection methods while numerosity reduction includes prototype selection and prototype generation approaches. This thesis focuses on feature extraction and prototype selection for data reduction. Dimensionality reduction methods can be divided into three categories, i.e., spectral, probabilistic, and neural network-based methods. The spectral methods have a geometrical point of view and are mostly reduced to the generalized eigenvalue problem. Probabilistic and network-based methods have stochastic and information theoretic foundations, respectively. Numerosity reduction methods can be divided into methods based on variance, geometry, and isolation. For dimensionality reduction, under the spectral category, I propose weighted Fisher discriminant analysis, Roweis discriminant analysis, and image quality aware embedding. I also propose quantile-quantile embedding as a probabilistic method where the distribution of embedding is chosen by the user. Backprojection, Fisher losses, and dynamic triplet sampling using Bayesian updating are other proposed methods in the neural network-based category. Backprojection is for training shallow networks with a projection-based perspective in manifold learning. Two Fisher losses are proposed for training Siamese triplet networks for increasing and decreasing the inter- and intra-class variances, respectively. Two dynamic triplet mining methods, which are based on Bayesian updating to draw triplet samples stochastically, are proposed. For numerosity reduction, principal sample analysis and instance ranking by matrix decomposition are the proposed variance-based methods; these methods rank instances using inter-/intra-class variances and matrix factorization, respectively. Curvature anomaly detection, in which the points are assumed to be the vertices of polyhedron, and isolation Mondrian forest are the proposed methods based on geometry and isolation, respectively. To assess the proposed tools developed for data reduction, I apply them to some applications in medical image analysis, image processing, and computer vision. Data reduction, used as a pre-processing tool, has different applications because it provides various ways of feature extraction and prototype selection for applying to different types of data. Dimensionality reduction extracts informative features and prototype selection selects the most informative data instances. For example, for medical image analysis, I use Fisher losses and dynamic triplet sampling for embedding histopathology image patches and demonstrating how different the tumorous cancer tissue types are from the normal ones. I also propose offline/online triplet mining using extreme distances for this embedding. In image processing and computer vision application, I propose Roweisfaces and Roweisposes for face recognition and 3D action recognition, respectively, using my proposed Roweis discriminant analysis method. I also introduce the concepts of anomaly landscape and anomaly path using the proposed curvature anomaly detection and use them to denoise images and video frames. I report extensive experiments, on different datasets, to show the effectiveness of the proposed algorithms. By experiments, I demonstrate that the proposed methods are useful for extracting informative features and instances for better accuracy, representation, prediction, class separation, data reduction, and embedding. I show that the proposed dimensionality reduction methods can extract informative features for better separation of classes. An example is obtaining an embedding space for separating cancer histopathology patches from the normal patches which helps hospitals diagnose cancers more easily in an automatic way. I also show that the proposed numerosity reduction methods are useful for ranking data instances based on their importance and reducing data volumes without a significant drop in performance of machine learning and data science algorithms
    corecore