89 research outputs found

    Modular Machine Learning Methods for Computer-Aided Diagnosis of Breast Cancer

    Get PDF
    The purpose of this study was to improve breast cancer diagnosis by reducing the number of benign biopsies performed. To this end, we investigated modular and ensemble systems of machine learning methods for computer-aided diagnosis (CAD) of breast cancer. A modular system partitions the input space into smaller domains, each of which is handled by a local model. An ensemble system uses multiple models for the same cases and combines the models\u27 predictions. Five supervised machine learning techniques (LDA, SVM, BP-ANN, CBR, CART) were trained to predict the biopsy outcome from mammographic findings (BIRADS™) and patient age based on a database of 2258 cases mixed from multiple institutions. The generalization of the models was tested on second set of 2177 cases. Clusters were identified in the database using a priori knowledge and unsupervised learning methods (agglomerative hierarchical clustering followed by K-Means, SOM, AutoClass). The performance of the global models over the clusters was examined and local models were trained for clusters. While some local models were superior to some global models, we were unable to build a modular CAD system that was better than the global BP-ANN model. The ensemble systems based on simplistic combination schemes did not result in significant improvements and more complicated combination schemes were found to be unduly optimistic. One of the most striking results of this dissertation was that CAD systems trained on a mixture of lesion types performed much better on masses than on calcifications. Our study of the institutional effects suggests that models built on cases mixed between institutions may overcome some of the weaknesses of models built on cases from a single institution. It was suggestive that each of the unsupervised methods identified a cluster of younger women with well-circumscribed or obscured, oval-shaped masses that accounted for the majority of the BP-ANN’s recommendations for follow up. From the cluster analysis and the CART models, we determined a simple diagnostic rule that performed comparably to the global BP-ANN. Approximately 98% sensitivity could be maintained while providing approximately 26% specificity. This should be compared to the clinical status quo of 100% sensitivity and 0% specificity on this database of indeterminate cases already referred to biopsy

    Breast Cancer Classification using Deep Learned Features Boosted with Handcrafted Features

    Full text link
    Breast cancer is one of the leading causes of death among women across the globe. It is difficult to treat if detected at advanced stages, however, early detection can significantly increase chances of survival and improves lives of millions of women. Given the widespread prevalence of breast cancer, it is of utmost importance for the research community to come up with the framework for early detection, classification and diagnosis. Artificial intelligence research community in coordination with medical practitioners are developing such frameworks to automate the task of detection. With the surge in research activities coupled with availability of large datasets and enhanced computational powers, it expected that AI framework results will help even more clinicians in making correct predictions. In this article, a novel framework for classification of breast cancer using mammograms is proposed. The proposed framework combines robust features extracted from novel Convolutional Neural Network (CNN) features with handcrafted features including HOG (Histogram of Oriented Gradients) and LBP (Local Binary Pattern). The obtained results on CBIS-DDSM dataset exceed state of the art

    Automatic BIRAD scoring of breast cancer mammograms

    Get PDF
    A computer aided diagnosis system (CAD) is developed to fully characterize and classify mass to benign and malignancy and to predict BIRAD (Breast Imaging Reporting and Data system) scores using mammographic image data. The CAD includes a preprocessing step to de-noise mammograms. This is followed by an active counter segmentation to deforms an initial curve, annotated by a radiologist, to separate and define the boundary of a mass from background. A feature extraction scheme wasthen used to fully characterize a mass by extraction of the most relevant features that have a large impact on the outcome of a patient biopsy. For this thirty-five medical and mathematical features based on intensity, shape and texture associated to the mass were extracted. Several feature selection schemes were then applied to select the most dominant features for use in next step, classification. Finally, a hierarchical classification schemes were applied on those subset of features to firstly classify mass to benign (mass with BIRAD score 2) and malignant mass (mass with BIRAD score over 4), and secondly to sub classify mass with BIRAD score over 4 to three classes (BIRAD with score 4a,4b,4c). Accuracy of segmentation performance were evaluated by calculating the degree of overlapping between the active counter segmentation and the manual segmentation, and the result was 98.5%. Also reproducibility of active counter 3 using different manual initialization of algorithm by three radiologists were assessed and result was 99.5%. Classification performance was evaluated using one hundred sixty masses (80 masses with BRAD score 2 and 80 mass with BIRAD score over4). The best result for classification of data to benign and malignance was found using a combination of sequential forward floating feature (SFFS) selection and a boosted tree hybrid classifier with Ada boost ensemble method, decision tree learner type and 100 learners’ regression tree classifier, achieving 100% sensitivity and specificity in hold out method, 99.4% in cross validation method and 98.62 % average accuracy in cross validation method. For further sub classification of eighty malignance data with BIRAD score of over 4 (30 mass with BIRAD score 4a,30 masses with BIRAD score 4b and 20 masses with BIRAD score 4c), the best result achieved using the boosted tree with ensemble method bag, decision tree learner type with 200 learners Classification, achieving 100% sensitivity and specificity in hold out method, 98.8% accuracy and 98.41% average accuracy for ten times run in cross validation method. Beside those 160 masses (BIRAD score 2 and over 4) 13 masses with BIRAD score 3 were gathered. Which means patient is recommended to be tested in another medical imaging technique and also is recommended to do follow-up in six months. The CAD system was trained with mass with BIRAD score 2 and over 4 also 4 it was further tested using 13 masses with a BIRAD score of 3 and the CAD results are shown to agree with the radiologist’s classification after confirming in six months follow up. The present results demonstrate high sensitivity and specificity of the proposed CAD system compared to prior research. The present research is therefore intended to make contributions to the field by proposing a novel CAD system, consists of series of well-selected image processing algorithms, to firstly classify mass to benign or malignancy, secondly sub classify BIRAD 4 to three groups and finally to interpret BIRAD 3 to BIRAD 2 without a need of follow up study

    Aplicação de técnicas de data mining para suporte ao diagnóstico de cancro da mama

    Get PDF
    More than ever, there is an increase of the number of decision support methods and computer aided diagnostic systems applied to various areas of medicine. In breast cancer research, many works have been done in order to reduce false-positives when used as a double reading method. In this study, we aimed to present a set of data mining techniques that were applied to approach a decision support system in the area of breast cancer diagnosis. This method is geared to assist clinical practice in identifying mammographic findings such as microcalcifications, masses and even normal tissues, in order to avoid misdiagnosis. In this work a reliable database was used, with 410 images from about 115 patients, containing previous reviews performed by radiologists as microcalcifications, masses and also normal tissue findings. Throughout this work, two feature extraction techniques were used: the gray level co-occurrence matrix and the gray level run length matrix. For classification purposes, we considered various scenarios according to different distinct patterns of injuries and several classifiers in order to distinguish the best performance in each case described. The many classifiers used were Naïve Bayes, Support Vector Machines, k-nearest Neighbors and Decision Trees (J48 and Random Forests). The results in distinguishing mammographic findings revealed great percentages of PPV and very good accuracy values. Furthermore, it also presented other related results of classification of breast density and BI-RADS® scale. The best predictive method found for all tested groups was the Random Forest classifier, and the best performance has been achieved through the distinction of microcalcifications. The conclusions based on the several tested scenarios represent a new perspective in breast cancer diagnosis using data mining techniques.Cada vez mais assistimos a um aumento global do número de métodos de apoio a decisão e diagnóstico assistido por computador, aplicados a diversas áreas da medicina. Na área de investigação do cancro da mama muitos são os trabalhos que têm sido desenvolvidos como segunda leitura de modo a reduzir o número de falsos positivos no diagnóstico. Neste estudo é apresentado um conjunto de técnicas de data mining que poderão ser aplicadas a um sistema de apoio à decisão na área do diagnóstico de cancro da mama. Esta abordagem tem por objetivo ajudar os clínicos na identificação de achados mamográficos como microcalcificações, massas e mesmo tecidos normais, de forma a evitar diagnósticos errados. Para isso, neste trabalho é usada uma base de dados fidedigna, de 410 imagens correspondentes a 115 pacientes, contendo análises prévias, realizadas por radiologistas, de microcalcificações, massas e tecidos considerados normais. Ao longo deste trabalho são utilizadas duas técnicas de extração de características, a matriz de coocorrência de níveis de cinza e a matriz de comprimento da linha de níveis de cinza. Para a classificação foram considerados diferentes cenários de acordo com diferentes padrões de distinção de lesões e ainda vários classificadores de forma a distinguir as melhores performances em cada caso descrito. Os vários classificadores usados foram Naïve Bayes, Support Vector Machines, k-nearest Neighbors e Decision Trees (J48 e Random Forests). Os resultados obtidos na distinção dos achados mamográficos revelaram percentagens de valor preditivo positivo e de precisão bastante boas. São ainda apresentados outros resultados relacionados com sistemas de classificação de densidade mamária e escala BI-RADS®. O melhor método de previsão encontrado, perante todos os grupos testados, foi o classificador Random Forest e o melhor desempenho foi conseguido através da distinção de microcalcificações. As conclusões feitas ao longo dos vários cenários testados foram interessantes em termos que representam uma nova perspetiva no diagnóstico do cancro da mama, utilizando técnicas de data mining

    Various Deep Learning Techniques Involved In Breast Cancer Mammogram Classification – A Survey

    Get PDF
    The most common and rapidly spreading disease in the world is breast cancer. Most cases of breast cancer are observed in females. Breast cancer can be controlled with early detection. Early discovery helps to manage a lot of cases and lower the death rate. On breast cancer, numerous studies have been conducted. Machine learning is the method that is utilized in research the most frequently. There have been a lot of earlier machine learning-based studies. Decision trees, KNN, SVM, naive bays, and other machine learning algorithms perform better in their respective fields. However, a newly created method is now being utilized to categorize breast cancer. Deep learning is a recently developed method. The limitations of machine learning are solved through deep learning. Convolution neural networks, recurrent neural networks, deep belief networks, and other deep learning techniques are frequently utilized in data science. Deep learning algorithms perform better than machine learning algorithms. The best aspects of the images are extracted. CNN is employed in our study to categorize the photos. Basically, CNN is the most widely used technique to categorize images, on which our research is based
    • …