    Applying novel machine learning technology to optimize computer-aided detection and diagnosis of medical images

    The purpose of developing Computer-Aided Detection (CAD) schemes is to assist physicians (i.e., radiologists) in interpreting medical imaging findings and reducing inter-reader variability more accurately. In developing CAD schemes, Machine Learning (ML) plays an essential role because it is widely used to identify effective image features from complex datasets and optimally integrate them with the classifiers, which aims to assist the clinicians to more accurately detect early disease, classify disease types and predict disease treatment outcome. In my dissertation, in different studies, I assess the feasibility of developing several novel CAD systems in the area of medical imaging for different purposes. The first study aims to develop and evaluate a new computer-aided diagnosis (CADx) scheme based on analysis of global mammographic image features to predict the likelihood of cases being malignant. CADx scheme is applied to pre-process mammograms, generate two image maps in the frequency domain using discrete cosine transform and fast Fourier transform, compute bilateral image feature differences from left and right breasts, and apply a support vector machine (SVM) method to predict the likelihood of the case being malignant. This study demonstrates the feasibility of developing a new global image feature analysis based CADx scheme of mammograms with high performance. This new CADx approach is more efficient in development and potentially more robust in future applications by avoiding difficulty and possible errors in breast lesion segmentation. In the second study, to automatically identify a set of effective mammographic image features and build an optimal breast cancer risk stratification model, I investigate advantages of applying a machine learning approach embedded with a locally preserving projection (LPP) based feature combination and regeneration algorithm to predict short-term breast cancer risk. To this purpose, a computer-aided image processing scheme is applied to segment fibro-glandular tissue depicted on mammograms and initially compute 44 features related to the bilateral asymmetry of mammographic tissue density distribution between left and right breasts. Next, an embedded LLP algorithm optimizes the feature space and regenerates a new operational vector with 4 features using a maximal variance approach. This study demonstrates that applying the LPP algorithm effectively reduces feature dimensionality, and yields higher and potentially more robust performance in predicting short-term breast cancer risk. In the third study, to more precisely classify malignant lesions, I investigate the feasibility of applying a random projection algorithm to build an optimal feature vector from the initially CAD-generated large feature pool and improve the performance of the machine learning model. In this process, a CAD scheme is first applied to segment mass regions and initially compute 181 features. An SVM model embedded with the feature dimensionality reduction method is then built to predict the likelihood of lesions being malignant. This study demonstrates that the random project algorithm is a promising method to generate optimal feature vectors to improve the performance of machine learning models of medical images. The last study aims to develop and test a new CAD scheme of chest X-ray images to detect coronavirus (COVID-19) infected pneumonia. To this purpose, the CAD scheme first applies two image preprocessing steps to remove the majority of diaphragm regions, process the original image using a histogram equalization algorithm, and a bilateral low-pass filter. Then, the original image and two filtered images are used to form a pseudo color image. This image is fed into three input channels of a transfer learning-based convolutional neural network (CNN) model to classify chest X-ray images into 3 classes of COVID-19 infected pneumonia, other community-acquired no-COVID-19 infected pneumonia, and normal (non-pneumonia) cases. This study demonstrates that adding two image preprocessing steps and generating a pseudo color image plays an essential role in developing a deep learning CAD scheme of chest X-ray images to improve accuracy in detecting COVID-19 infected pneumonia. In summary, I developed and presented several image pre-processing algorithms, feature extraction methods, and data optimization techniques to present innovative approaches for quantitative imaging markers based on machine learning systems in all these studies. The studies' simulation and results show the discriminative performance of the proposed CAD schemes on different application fields helpful to assist radiologists on their assessments in diagnosing disease and improve their overall performance


    In this volume, the topics are constructed from a variety of contents: the bases of mammography systems, optimization of screening mammography with reference to evidence-based research, new technologies of image acquisition and its surrounding systems, and case reports with reference to up-to-date multimodality images of breast cancer. Mammography has been lagged in the transition to digital imaging systems because of the necessity of high resolution for diagnosis. However, in the past ten years, technical improvement has resolved the difficulties and boosted new diagnostic systems. We hope that the reader will learn the essentials of mammography and will be forward-looking for the new technologies. We want to express our sincere gratitude and appreciation?to all the co-authors who have contributed their work to this volume

    Matching of Mammographic Lesions in Different Breast Projections

    De todos os cancros, cancro da mama é o que causa mais mortes entre mulheres. Programas de rastreio do cancro da mama podem ajudar a decrescer esta mortalidade, visto que deteção e tratamento do tumor em fases iniciais aumentam a taxa de sobrevivência. Normalmente, um par de radiologistas fazem a interpretação das mamografias, no entanto o processo é longo e cansativo. Isto incentivou o desenvolvimento de sistemas de diagnósitco auxiliado por computador (CADx), para substituir o segundo radiologista, fazendo melhor uso do tempo de especialistas. No entanto, sistemas CADx são associados a taxas elevadas de falsos positivos, dado que a maior parte detes apenas usam uma vista (craniocaudal ou mediolateral oblique) da mamografia. O radiologista, por sua vez, usa ambas as projeções, baseando o seu diagnóstico em diferenças visíveis entre as duas vistas. Quando se consideram as duas projeções da mamografia, a correspondência de lesões é um passo necessário para se fazer o diagnóstico. No entanto, isto é uma tarefa complexa, dado que podem existir vários candidatos a lesão, em cada uma das vistas, para se fazer correspondência. Neste trabalho, um sistema que faz correspondências entre lesões é proposto. Este é composto por três blocos: detetor de candidatos, extração de caraterísticas e correspondência de lesões. O primeiro é uma replicação do trabalho de Ribli et al., e o seu propósito é detetar possíveis candidatos a lesão. O segundo é a extração de vetores de caraterísticas de cada candidato, quer usando a backbone do detetor de candidatos, quer extraindo caraterísticas mais tradicionais, ou usando uma rede neuronal treinada com a triplet loss para distinguir lesões. O terceiro é o cálculo da distância entre os vetores de caraterísticas, usando também heurísticas para restringir possíveis pares de candidatos incorretos, e a ordenação de distâncias para atribuir a correspondência de cada lesão. Este trabalho oferece várias opções de possíveis extractores de caraterísticas e heurísticas a serem incroporados num sistema CADx que seja baseado em detetores de objetos. O facto do modelo treinado com a triplet loss ser competitivo com os restantos modelos, torna o sistema bastante mais viável, sendo que este oferece a possibilidade de a correspondência ser independente da deteção de candidatos. Heurísticas "hard" e "soft" são introduzidas como métodos para limitar correspondências. O sistema é capaz de fazer correspondências de forma satisfatória, dado que a sua exatidão ( 70%85%) é significativamente maior que a probabilidade aleatória (30%40%) dos dados usados. Heurísticas "hard" têm resultados encorajantes na precision@k, dado que estas rejeitam um número significativo de falsos positivos gerados pelo detetor de lesões.Of all cancer diseases, breast cancer is the most lethal among women. It has been shown that breast cancer screening programs can decrease mortality, since early detection increases the chances of survival. Usually, a pair of radiologists interpret the screening mammograms, however the process is long and exhausting. This has encouraged the development of computer aided diagnosis (CADx) systems to replace the second radiologist, making a better use of human-experts' time. But CADx systems are associated with high false positive rates, since most of them only use one view (craniocaudal or mediolateral oblique) of the screening mammogram. Radiologist, on the other hand, use both views; frequently reasoning about the diagnosis by noticeable differences between the two views. When considering both projections of a mammogram, lesion matching is a necessary step to perform diagnosis. However this is a complex task, since there might be various lesion candidates on both projections to match. In this work, a matching system is proposed. The system is a cascade of three blocks: candidates detector, feature extraction and lesion matching. The first is a replication of Ribli et al.'s Faster R-CNN and its purpose is to find possible lesion candidates. The second is the feature vector extraction of each candidate, either by using the candidates detector's backbone, handcrafted features or a siamese network model trained for distinguish lesions. The third is the calculus of the distance between feature vector, also using some heuristics to restrain possible non-lesion pairs, and the ranking of the distances to match the lesions. This work provides several options of possible feature extractors and heuristics to be incorporated into a CADx system based on object detectors. The fact that the triplet loss trained models obtained competitive results with the other features extractors is valuable, since it offers some independence between the detection and matching tasks. "Hard" heuristics and "soft" heurisitcs are introduced as methods to restrain matching. The system is able to detect matches satisfactorily, since its accuracy (70%85%) is significantly higher than chance level (30%40%). "Hard" heuristics proposals achieved encouraging results on precision@k, due to its match and candidates exclusion methods, which rejects a significant number of false positives generated by the object detector

    Deep-Learning-Based Computer- Aided Systems for Breast Cancer Imaging: A Critical Review

    [EN] This paper provides a critical review of the literature on deep learning applications in breast tumor diagnosis using ultrasound and mammography images. It also summarizes recent advances in computer-aided diagnosis/detection (CAD) systems, which make use of new deep learning methods to automatically recognize breast images and improve the accuracy of diagnoses made by radiologists. This review is based upon published literature in the past decade (January 2010-January 2020), where we obtained around 250 research articles, and after an eligibility process, 59 articles were presented in more detail. The main findings in the classification process revealed that new DL-CAD methods are useful and effective screening tools for breast cancer, thus reducing the need for manual feature extraction.     Developing Novel Computer Aided Diagnosis Schemes for Improved Classification of Mammography Detected Masses

    Mammography imaging is a population-based breast cancer screening tool that has greatly aided in the decrease in breast cancer mortality over time. Although mammography is the most frequently employed breast imaging modality, its performance is often unsatisfactory with low sensitivity and high false positive rates. This is due to the fact that reading and interpreting mammography images remains difficult due to the heterogeneity of breast tumors and dense overlapping fibroglandular tissue. To help overcome these clinical challenges, researchers have made great efforts to develop computer-aided detection and/or diagnosis (CAD) schemes to provide radiologists with decision-making support tools. In this dissertation, I investigate several novel methods for improving the performance of a CAD system in distinguishing between malignant and benign masses. The first study, we test the hypothesis that handcrafted radiomics features and deep learning features contain complementary information, therefore the fusion of these two types of features will increase the feature representation of each mass and improve the performance of CAD system in distinguishing malignant and benign masses. Regions of interest (ROI) surrounding suspicious masses are extracted and two types of features are computed. The first set consists of 40 radiomic features and the second set includes deep learning (DL) features computed from a pretrained VGG16 network. DL features are extracted from two pseudo color image sets, producing a total of three feature vectors after feature extraction, namely: handcrafted, DL-stacked, DL-pseudo. Linear support vector machines (SVM) are trained using each feature set alone and in combinations. Results show that the fusion CAD system significantly outperforms the systems using either feature type alone (AUC=0.756±0.042 p<0.05). This study demonstrates that both handcrafted and DL futures contain useful complementary information and that fusion of these two types of features increases the CAD classification performance. In the second study, we expand upon our first study and develop a novel CAD framework that fuses information extracted from ipsilateral views of bilateral mammograms using both DL and radiomics feature extraction methods. Each case in this study is represented by four images which includes the craniocaudal (CC) and mediolateral oblique (MLO) view of left and right breast. First, we extract matching ROIs from each of the four views using an ipsilateral matching and bilateral registration scheme to ensure masses are appropriately matched. Next, the handcrafted radiomics features and VGG16 model-generated features are extracted from each ROI resulting in eight feature vectors. Then, after reducing feature dimensionality and quantifying the bilateral asymmetry, we test four fusion methods. Results show that multi-view CAD systems significantly outperform single-view systems (AUC = 0.876±0.031 vs AUC = 0.817±0.026 for CC view and 0.792±0.026 for MLO view, p<0.001). The study demonstrates that the shift from single-view CAD to four-view CAD and the inclusion of both deep transfer learning and radiomics features increases the feature representation of the mass thus improves CAD performance in distinguishing between malignant and benign breast lesions. In the third study, we build upon the first and second studies and investigate the effects of pseudo color image generation in classifying suspicious mammography detected breast lesions as malignant or benign using deep transfer learning in a multi-view CAD scheme. Seven pseudo color image sets are created through a combination of the original grayscale image, a histogram equalized image, a bilaterally filtered image, and a segmented mass image. Using the multi-view CAD framework developed in the previous study, we observe that the two pseudo-color sets created using a segmented mass in one of the three image channels performed significantly better than all other pseudo-color sets (AUC=0.882, p<0.05 for all comparisons and AUC=0.889, p<0.05 for all comparisons). The results of this study support our hypothesis that pseudo color images generated with a segmented mass optimize the mammogram image feature representation by providing increased complementary information to the CADx scheme which results in an increase in the performance in classifying suspicious mammography detected breast lesions as malignant or benign. In summary, each of the studies presented in this dissertation aim to increase the accuracy of a CAD system in classifying suspicious mammography detected masses. Each of these studies takes a novel approach to increase the feature representation of the mass that needs to be classified. The results of each study demonstrate the potential utility of these CAD schemes as an aid to radiologists in the clinical workflow

    A Decision Support System (DSS) for Breast Cancer Detection Based on Invariant Feature Extraction, Classification, and Retrieval of Masses of Mammographic Images

    This paper presents an integrated system for the breast cancer detection from mammograms based on automated mass detection, classification, and retrieval with a goal to support decision-making by retrieving and displaying the relevant past cases as well as predicting the images as benign or malignant. It is hypothesized that the proposed diagnostic aid would refresh the radiologist’s mental memory to guide them to a precise diagnosis with concrete visualizations instead of only suggesting a second diagnosis like many other CAD systems. Towards achieving this goal, a Graph-Based Visual Saliency (GBVS) method is used for automatic mass detection, invariant features are extracted based on using Non-Subsampled Contourlet transform (NSCT) and eigenvalues of the Hessian matrix in a histogram of oriented gradients (HOG), and finally classification and retrieval are performed based on using Support Vector Machines (SVM) and Extreme Learning Machines (ELM), and a linear combination-based similarity fusion approach. The image retrieval and classification performances are evaluated and compared in the benchmark Digital Database for Screening Mammography (DDSM) of 2604 cases by using both the precision-recall and classification accuracies. Experimental results demonstrate the effectiveness of the proposed system and show the viability of a real-time clinical application

    A Bottom-Up Review of Image Analysis Methods for Suspicious Region Detection in Mammograms.

    Breast cancer is one of the most common death causes amongst women all over the world. Early detection of breast cancer plays a critical role in increasing the survival rate. Various imaging modalities, such as mammography, breast MRI, ultrasound and thermography, are used to detect breast cancer. Though there is a considerable success with mammography in biomedical imaging, detecting suspicious areas remains a challenge because, due to the manual examination and variations in shape, size, other mass morphological features, mammography accuracy changes with the density of the breast. Furthermore, going through the analysis of many mammograms per day can be a tedious task for radiologists and practitioners. One of the main objectives of biomedical imaging is to provide radiologists and practitioners with tools to help them identify all suspicious regions in a given image. Computer-aided mass detection in mammograms can serve as a second opinion tool to help radiologists avoid running into oversight errors. The scientific community has made much progress in this topic, and several approaches have been proposed along the way. Following a bottom-up narrative, this paper surveys different scientific methodologies and techniques to detect suspicious regions in mammograms spanning from methods based on low-level image features to the most recent novelties in AI-based approaches. Both theoretical and practical grounds are provided across the paper sections to highlight the pros and cons of different methodologies. The paper's main scope is to let readers embark on a journey through a fully comprehensive description of techniques, strategies and datasets on the topic