459 research outputs found

    "Breast Cancer Prediction using Machine Learning Models"

    Get PDF
    Breast cancer is a type of cancer that develops in the cells of the breast. Treatment for breast cancer usually involves X-ray, chemotherapy, or a combination of both treatments. Detecting cancer at an early stage can save a person's life. Artificial intelligence (AI) plays a very important role in this area. Therefore, predicting breast cancer remains a very challenging issue for clinicians and researchers. This work aims to predict the probability of breast cancer in patients. Using machine learning (ML) models such as Multilayer Perceptron (MLP), K-Nearest Neightbot (KNN), AdaBoost (AB), Bagging, Gradient Boosting (GB), and Random Forest (RF). The breast cancer diagnostic medical dataset from the Wisconsin repository has been used. The dataset includes 569 observations and 32 features. Following the data analysis methodology, data cleaning, exploratory analysis, training, testing, and validation were performed. The performance of the models was evaluated with the parameters: classification accuracy, specificity, sensitivity, F1 count, and precision. The training and results indicate that the six trained models can provide optimal classification and prediction results. The RF, GB, and AB models achieved 100% accuracy, outperforming the other models. Therefore, the suggested models for breast cancer identification, classification, and prediction are RF, GB, and AB. Likewise, the Bagging, KNN, and MLP models achieved a performance of 99.56%, 95.82%, and 96.92%, respectively. Similarly, the last three models achieved an optimal yield close to 100%. Finally, the results show a clear advantage of the RF, GB, and AB models, as they achieve more accurate results in breast cancer prediction

    Class prediction for high-dimensional class-imbalanced data

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The goal of class prediction studies is to develop rules to accurately predict the class membership of new samples. The rules are derived using the values of the variables available for each subject: the main characteristic of high-dimensional data is that the number of variables greatly exceeds the number of samples. Frequently the classifiers are developed using class-imbalanced data, i.e., data sets where the number of samples in each class is not equal. Standard classification methods used on class-imbalanced data often produce classifiers that do not accurately predict the minority class; the prediction is biased towards the majority class. In this paper we investigate if the high-dimensionality poses additional challenges when dealing with class-imbalanced prediction. We evaluate the performance of six types of classifiers on class-imbalanced data, using simulated data and a publicly available data set from a breast cancer gene-expression microarray study. We also investigate the effectiveness of some strategies that are available to overcome the effect of class imbalance.</p> <p>Results</p> <p>Our results show that the evaluated classifiers are highly sensitive to class imbalance and that variable selection introduces an additional bias towards classification into the majority class. Most new samples are assigned to the majority class from the training set, unless the difference between the classes is very large. As a consequence, the class-specific predictive accuracies differ considerably. When the class imbalance is not too severe, down-sizing and asymmetric bagging embedding variable selection work well, while over-sampling does not. Variable normalization can further worsen the performance of the classifiers.</p> <p>Conclusions</p> <p>Our results show that matching the prevalence of the classes in training and test set does not guarantee good performance of classifiers and that the problems related to classification with class-imbalanced data are exacerbated when dealing with high-dimensional data. Researchers using class-imbalanced data should be careful in assessing the predictive accuracy of the classifiers and, unless the class imbalance is mild, they should always use an appropriate method for dealing with the class imbalance problem.</p

    How Dual-Energy Contrast-Enhanced Spectral Mammography Can Provide Useful Clinical Information About Prognostic Factors in Breast Cancer Patients: A Systematic Review of Literature

    Get PDF
    Introduction: In the past decade, a new technique derived from full-field digital mammography has been developed, named contrast-enhanced spectral mammography (CESM). The aim of this study was to define the association between CESM findings and usual prognostic factors, such as estrogen receptors, progesterone receptors, HER2, and Ki67, in order to offer an updated overview of the state of the art for the early differential diagnosis of breast cancer and following personalized treatments. Materials and methods: According to the PRISMA guidelines, two electronic databases (PubMed and Scopus) were investigated, using the following keywords: breast cancer AND (CESM OR contrast enhanced spectral mammography OR contrast enhanced dual energy mammography) AND (receptors OR prognostic factors OR HER2 OR progesterone OR estrogen OR Ki67). The search was concluded in August 2021. No restriction was applied to publication dates. Results: We obtained 28 articles from the research in PubMed and 114 articles from Scopus. After the removal of six replicas that were counted only once, out of 136 articles, 37 articles were reviews. Eight articles alone have tackled the relation between CESM imaging and ER, PR, HER2, and Ki67. When comparing radiological characterization of the lesions obtained by either CESM or contrast-enhanced MRI, they have a similar association with the proliferation of tumoral cells, as expressed by Ki-67. In CESM-enhanced lesions, the expression was found to be 100% for ER and 77.4% for PR, while moderate or high HER2 positivity was found in lesions with non-mass enhancement and with mass closely associated with a non-mass enhancement component. Conversely, the non-enhancing breast cancer lesions were not associated with any prognostic factor, such as ER, PR, HER2, and Ki67, which may be associated with the probability of showing enhancement. Radiomics on CESM images has the potential for non-invasive characterization of potentially heterogeneous tumors with different hormone receptor status. Conclusions: CESM enhancement is associated with the proliferation of tumoral cells, as well as to the expression of estrogen and progesterone receptors. As CESM is a relatively young imaging technique, a few related works were found; this may be due to the "off-label" modality. In the next few years, the role of CESM in breast cancer diagnostics will be more thoroughly investigated

    Benchmarking weakly-supervised deep learning pipelines for whole slide classification in computational pathology.

    Get PDF
    Artificial intelligence (AI) can extract visual information from histopathological slides and yield biological insight and clinical biomarkers. Whole slide images are cut into thousands of tiles and classification problems are often weakly-supervised: the ground truth is only known for the slide, not for every single tile. In classical weakly-supervised analysis pipelines, all tiles inherit the slide label while in multiple-instance learning (MIL), only bags of tiles inherit the label. However, it is still unclear how these widely used but markedly different approaches perform relative to each other. We implemented and systematically compared six methods in six clinically relevant end-to-end prediction tasks using data from N=2980 patients for training with rigorous external validation. We tested three classical weakly-supervised approaches with convolutional neural networks and vision transformers (ViT) and three MIL-based approaches with and without an additional attention module. Our results empirically demonstrate that histological tumor subtyping of renal cell carcinoma is an easy task in which all approaches achieve an area under the receiver operating curve (AUROC) of above 0.9. In contrast, we report significant performance differences for clinically relevant tasks of mutation prediction in colorectal, gastric, and bladder cancer. In these mutation prediction tasks, classical weakly-supervised workflows outperformed MIL-based weakly-supervised methods for mutation prediction, which is surprising given their simplicity. This shows that new end-to-end image analysis pipelines in computational pathology should be compared to classical weakly-supervised methods. Also, these findings motivate the development of new methods which combine the elegant assumptions of MIL with the empirically observed higher performance of classical weakly-supervised approaches. We make all source codes publicly available at https://github.com/KatherLab/HIA, allowing easy application of all methods to any similar task

    Analyzing the breast tissue in mammograms using deep learning

    Get PDF
    La densitat mamogràfica de la mama (MBD) reflecteix la quantitat d'àrea fibroglandular del teixit mamari que apareix blanca i brillant a les mamografies, comunament coneguda com a densitat percentual de la mama (PD%). El MBD és un factor de risc per al càncer de mama i un factor de risc per emmascarar tumors. Tot i això, l'estimació precisa de la DMO amb avaluació visual continua sent un repte a causa del contrast feble i de les variacions significatives en els teixits grassos de fons en les mamografies. A més, la interpretació correcta de les imatges de mamografia requereix experts mèdics altament capacitats: És difícil, laboriós, car i propens a errors. No obstant això, el teixit mamari dens pot dificultar la identificació del càncer de mama i associar-se amb un risc més gran de càncer de mama. Per exemple, s'ha informat que les dones amb una alta densitat mamària en comparació amb les dones amb una densitat mamària baixa tenen un risc de quatre a sis vegades més gran de desenvolupar la malaltia. La clau principal de la computació de densitat de mama i la classificació de densitat de mama és detectar correctament els teixits densos a les imatges mamogràfiques. S'han proposat molts mètodes per estimar la densitat mamària; no obstant això, la majoria no estan automatitzats. A més, s'han vist greument afectats per la baixa relació senyal-soroll i la variabilitat de la densitat en aparença i textura. Seria més útil tenir un sistema de diagnòstic assistit per ordinador (CAD) per ajudar el metge a analitzar-lo i diagnosticar-lo automàticament. El desenvolupament actual de mètodes daprenentatge profund ens motiva a millorar els sistemes actuals danàlisi de densitat mamària. L'enfocament principal de la present tesi és desenvolupar un sistema per automatitzar l'anàlisi de densitat de la mama ( tal com; Segmentació de densitat de mama (BDS), percentatge de densitat de mama (BDP) i classificació de densitat de mama (BDC) ), utilitzant tècniques d'aprenentatge profund i aplicant-la a les mamografies temporals després del tractament per analitzar els canvis de densitat de mama per trobar un pacient perillós i sospitós.La densidad mamográfica de la mama (MBD) refleja la cantidad de área fibroglandular del tejido mamario que aparece blanca y brillante en las mamografías, comúnmente conocida como densidad porcentual de la mama (PD%). El MBD es un factor de riesgo para el cáncer de mama y un factor de riesgo para enmascarar tumores. Sin embargo, la estimación precisa de la DMO con evaluación visual sigue siendo un reto debido al contraste débil y a las variaciones significativas en los tejidos grasos de fondo en las mamografías. Además, la interpretación correcta de las imágenes de mamografía requiere de expertos médicos altamente capacitados: Es difícil, laborioso, caro y propenso a errores. Sin embargo, el tejido mamario denso puede dificultar la identificación del cáncer de mama y asociarse con un mayor riesgo de cáncer de mama. Por ejemplo, se ha informado que las mujeres con una alta densidad mamaria en comparación con las mujeres con una densidad mamaria baja tienen un riesgo de cuatro a seis veces mayor de desarrollar la enfermedad. La clave principal de la computación de densidad de mama y la clasificación de densidad de mama es detectar correctamente los tejidos densos en las imágenes mamográficas. Se han propuesto muchos métodos para la estimación de la densidad mamaria; sin embargo, la mayoría de ellos no están automatizados. Además, se han visto gravemente afectados por la baja relación señal-ruido y la variabilidad de la densidad en apariencia y textura. Sería más útil disponer de un sistema de diagnóstico asistido por ordenador (CAD) para ayudar al médico a analizarlo y diagnosticarlo automáticamente. El desarrollo actual de métodos de aprendizaje profundo nos motiva a mejorar los sistemas actuales de análisis de densidad mamaria. El enfoque principal de la presente tesis es desarrollar un sistema para automatizar el análisis de densidad de la mama ( tal como; Segmentación de densidad de mama (BDS), porcentaje de densidad de mama (BDP) y clasificación de densidad de mama (BDC)), utilizando técnicas de aprendizaje profundo y aplicándola en las mamografías temporales después del tratamiento para analizar los cambios de densidad de mama para encontrar un paciente peligroso y sospechoso.Mammographic breast density (MBD) reflects the amount of fibroglandular breast tissue area that appears white and bright on mammograms, commonly referred to as breast percent density (PD%). MBD is a risk factor for breast cancer and a risk factor for masking tumors. However, accurate MBD estimation with visual assessment is still a challenge due to faint contrast and significant variations in background fatty tissues in mammograms. In addition, correctly interpreting mammogram images requires highly trained medical experts: it is difficult, time-consuming, expensive, and error-prone. Nevertheless, dense breast tissue can make it harder to identify breast cancer and be associated with an increased risk of breast cancer. For example, it has been reported that women with a high breast density compared to women with a low breast density have a four- to six-fold increased risk of developing the disease. The primary key of breast density computing and breast density classification is to detect the dense tissues in the mammographic images correctly. Many methods have been proposed for breast density estimation; however, most are not automated. Besides, they have been badly affected by low signal-to-noise ratio and variability of density in appearance and texture. It would be more helpful to have a computer-aided diagnosis (CAD) system to assist the doctor analyze and diagnosing it automatically. Current development in deep learning methods motivates us to improve current breast density analysis systems. The main focus of the present thesis is to develop a system for automating the breast density analysis ( such as; breast density segmentation(BDS), breast density percentage (BDP), and breast density classification ( BDC)), using deep learning techniques and applying it on the temporal mammograms after treatment for analyzing the breast density changes to find a risky and suspicious patient

    Data-Driven Deep Learning-Based Analysis on THz Imaging

    Get PDF
    Breast cancer affects about 12.5% of women population in the United States. Surgical operations are often needed post diagnosis. Breast conserving surgery can help remove malignant tumors while maximizing the remaining healthy tissues. Due to lacking effective real-time tumor analysis tools and a unified operation standard, re-excision rate could be higher than 30% among breast conserving surgery patients. This results in significant physical, physiological, and financial burdens to those patients. This work designs deep learning-based segmentation algorithms that detect tissue type in excised tissues using pulsed THz technology. This work evaluates the algorithms for tissue type classification task among freshly excised tumor samples. Freshly excised tumor samples are more challenging than formalin-fixed, paraffin-embedded (FFPE) block sample counterparts due to excessive fluid, image registration difficulties, and lacking trustworthy pixelwise labels of each tissue sample. Additionally, evaluating freshly excised tumor samples has profound meaning of potentially applying pulsed THz scan technology to breast conserving cancer surgery in operating room. Recently, deep learning techniques have been heavily researched since GPU based computation power becomes economical and stronger. This dissertation revisits breast cancer tissue segmentation related problems using pulsed terahertz wave scan technique among murine samples and applies recent deep learning frameworks to enhance the performance in various tasks. This study first performs pixelwise classification on terahertz scans with CNN-based neural networks and time-frequency based feature tensors using wavelet transformation. This study then explores the neural network based semantic segmentation strategy performing on terahertz scans considering spatial information and incorporating noisy label handling with label correction techniques. Additionally, this study performs resolution restoration for visual enhancement on terahertz scans using an unsupervised, generative image-to-image translation methodology. This work also proposes a novel data processing pipeline that trains a semantic segmentation network using only neural generated synthetic terahertz scans. The performance is evaluated using various evaluation metrics among different tasks

    Multi-Objective Optimization Based Image Segmentation: Method and Applications

    Get PDF
    Master'sMASTER OF ENGINEERIN

    Added benefits of computer-assisted analysis of Hematoxylin-Eosin stained breast histopathological digital slides

    Get PDF
    This thesis aims at determining if computer-assisted analysis can be used to better understand pathologists’ perception of mitotic figures on Hematoxylin-Eosin (HE) stained breast histopathological digital slides. It also explores the feasibility of reproducible histologic nuclear atypia scoring by incorporating computer-assisted analysis to cytological scores given by a pathologist. In addition, this thesis investigates the possibility of computer-assisted diagnosis for categorizing HE breast images into different subtypes of cancer or benign masses. In the first study, a data set of 453 mitoses and 265 miscounted non-mitoses within breast cancer digital slides were considered. Different features were extracted from the objects in different channels of eight colour spaces. The findings from the first research study suggested that computer-aided image analysis can provide a better understanding of image-related features related to discrepancies among pathologists in recognition of mitoses. Two tasks done routinely by the pathologists are making diagnosis and grading the breast cancer. In the second study, a new tool for reproducible nuclear atypia scoring in breast cancer histological images was proposed. The third study proposed and tested MuDeRN (MUlti-category classification of breast histopathological image using DEep Residual Networks), which is a framework for classifying hematoxylin-eosin stained breast digital slides either as benign or cancer, and then categorizing cancer and benign cases into four different subtypes each. The studies indicated that computer-assisted analysis can aid in both nuclear grading (COMPASS) and breast cancer diagnosis (MuDeRN). The results could be used to improve current status of breast cancer prognosis estimation through reducing the inter-pathologist disagreement in counting mitotic figures and reproducible nuclear grading. It can also improve providing a second opinion to the pathologist for making a diagnosis
    corecore