148 research outputs found

    An Evaluation of the Wisconsin Breast Cancer Dataset using Ensemble Classifiers and RFE Feature Selection Technique

    Get PDF
    Breast cancer represents one of the deadliest diseases that records a high number of death rate annually. It is the most common type of cancer and the main cause of death among women worldwide. Machine learning (ML) approach is an effective way to classify data, especially in medical field. It is widely used for classification and analysis to make decisions. In this paper, a performance comparison between two ensemble ML classifiers: Random Forest (RF) and eXtreme Gradient Boosting (XGBoost) on the Wisconsin Breast Cancer Dataset (WBCD) is conducted. The main objective of this study is to assess the correctness of the classifiers with respect to their efficiency and effectiveness in classifying the dataset. This was done by utilizing all and reduced features of the dataset that were generated with Recursive Feature Elimination (RFE) feature selection technique. Four metrics were used in the study: Accuracy, Precision, Recall and F1-Score to evaluate the classifiers. All experiments were executed within Anaconda Environment with Jupyter Notebook and conducted using Python programming language. Experimental result shows that XGBoost with 5 reduced feature using RFE feature selection technique gives the highest accuracy (99.02%) with lowest error rate

    Hierarchical Classification System for Breast Cancer Specimen Report (HCSBC) -- an end-to-end model for characterizing severity and diagnosis

    Full text link
    Automated classification of cancer pathology reports can extract information from unstructured reports and categorize each report into structured diagnosis and severity categories. Thus, such system can reduce the burden for populating tumor registries, help registration for clinical trial as well as developing large dataset for deep learning model development using true pathologic ground truth. However, the content of breast pathology reports can be difficult for categorize due to the high linguistic variability in content and wide variety of potential diagnoses >50. Existing NLP models are primarily focused on developing classifier for primary breast cancer types (e.g. IDC, DCIS, ILC) and tumor characteristics, and ignore the rare diagnosis of cancer subtypes. We then developed a hierarchical hybrid transformer-based pipeline (59 labels) - Hierarchical Classification System for Breast Cancer Specimen Report (HCSBC), which utilizes the potential of the transformer context-preserving NLP technique and compared our model to several state of the art ML and DL models. We trained the model on the EUH data and evaluated our model's performance on two external datasets - MGH and Mayo Clinic. We publicly release the code and a live application under Huggingface spaces repositor

    Applications of machine and deep learning to thyroid cytology and histopathology: a review.

    Get PDF
    This review synthesises past research into how machine and deep learning can improve the cyto- and histopathology processing pipelines for thyroid cancer diagnosis. The current gold-standard preoperative technique of fine-needle aspiration cytology has high interobserver variability, often returns indeterminate samples and cannot reliably identify some pathologies; histopathology analysis addresses these issues to an extent, but it requires surgical resection of the suspicious lesions so cannot influence preoperative decisions. Motivated by these issues, as well as by the chronic shortage of trained pathologists, much research has been conducted into how artificial intelligence could improve current pipelines and reduce the pressure on clinicians. Many past studies have indicated the significant potential of automated image analysis in classifying thyroid lesions, particularly for those of papillary thyroid carcinoma, but these have generally been retrospective, so questions remain about both the practical efficacy of these automated tools and the realities of integrating them into clinical workflows. Furthermore, the nature of thyroid lesion classification is significantly more nuanced in practice than many current studies have addressed, and this, along with the heterogeneous nature of processing pipelines in different laboratories, means that no solution has proven itself robust enough for clinical adoption. There are, therefore, multiple avenues for future research: examine the practical implementation of these algorithms as pathologist decision-support systems; improve interpretability, which is necessary for developing trust with clinicians and regulators; and investigate multiclassification on diverse multicentre datasets, aiming for methods that demonstrate high performance in a process- and equipment-agnostic manner

    Implementing decision tree-based algorithms in medical diagnostic decision support systems

    Get PDF
    As a branch of healthcare, medical diagnosis can be defined as finding the disease based on the signs and symptoms of the patient. To this end, the required information is gathered from different sources like physical examination, medical history and general information of the patient. Development of smart classification models for medical diagnosis is of great interest amongst the researchers. This is mainly owing to the fact that the machine learning and data mining algorithms are capable of detecting the hidden trends between features of a database. Hence, classifying the medical datasets using smart techniques paves the way to design more efficient medical diagnostic decision support systems. Several databases have been provided in the literature to investigate different aspects of diseases. As an alternative to the available diagnosis tools/methods, this research involves machine learning algorithms called Classification and Regression Tree (CART), Random Forest (RF) and Extremely Randomized Trees or Extra Trees (ET) for the development of classification models that can be implemented in computer-aided diagnosis systems. As a decision tree (DT), CART is fast to create, and it applies to both the quantitative and qualitative data. For classification problems, RF and ET employ a number of weak learners like CART to develop models for classification tasks. We employed Wisconsin Breast Cancer Database (WBCD), Z-Alizadeh Sani dataset for coronary artery disease (CAD) and the databanks gathered in Ghaem Hospital’s dermatology clinic for the response of patients having common and/or plantar warts to the cryotherapy and/or immunotherapy methods. To classify the breast cancer type based on the WBCD, the RF and ET methods were employed. It was found that the developed RF and ET models forecast the WBCD type with 100% accuracy in all cases. To choose the proper treatment approach for warts as well as the CAD diagnosis, the CART methodology was employed. The findings of the error analysis revealed that the proposed CART models for the applications of interest attain the highest precision and no literature model can rival it. The outcome of this study supports the idea that methods like CART, RF and ET not only improve the diagnosis precision, but also reduce the time and expense needed to reach a diagnosis. However, since these strategies are highly sensitive to the quality and quantity of the introduced data, more extensive databases with a greater number of independent parameters might be required for further practical implications of the developed models

    FTIR Spectroscopy for cancer diagnosis. How can glass substrates be used to bring it closer to clinical practice?

    Get PDF
    Cancer incidence rates are increasing world-wide including in the UK. An increase in cancer cases puts further pressure on pathology departments that are often already struggling to meet targets to diagnose cancers in a timely manner. Delays in diagnosis will cause the delay of treatment being provided and worse patient outcomes. Current diagnostic methods for cancer rely on cytological/histological staining of biopsies and a diagnosis is made in a subjective manner by a pathologist. These methods are time consuming and require great expertise. New diagnostic methods are needed to help relieve pressures on pathology departments. There is a consensus that vibrational spectroscopy techniques have the potential to be tools that could aid in cancer diagnostics. Despite an increasingly growing body of research demonstrating how vibrational spectroscopy methods could be utilised for clinical diagnostics there has been several barriers to the translation of such methods. The research in this thesis aims to investigate and demonstrate methodologies to utilise modes of infrared spectroscopy with glass substrates for lung and breast cancer diagnostics. One of the major barriers for the use of infrared spectroscopy in cancer diagnostics is the expense and difficulty of procurement of conventional substrates. This thesis aimed to investigate a methodology to use a glass coverslips substrate for the classification of lung and breast cancer cells using IR spectroscopy. Glass coverslips were used because of their affordability and accessibility, an important consideration for the translation of diagnostic methods. In vitro cancer cell lines and healthy tissue derived cell lines were used to model this research to test the feasibility of the proposed methods. This research first investigated a sample preparation method for cytology samples to be analysed with FTIR spectroscopy. The next sections demonstrated the proposed method could be used to classify lung and breast cancer cells in-from non-malignant cells in-vitro using FTIR spectroscopy and a random forest classifier. The methodology was next used to demonstrate how FTIR spectroscopy could be used to identify individual lung cancer cells from leukocytes in mixed samples. This is the first time this has been demonstrated. Finally, related IR spectroscopy technique, O-PTIR spectroscopy, was investigated for how it could be used with glass slides for the classification of lung cancer cells from non-malignant cells. The research in this thesis has demonstrated that glass substrates are viable for the classification of lung and breast cancer cells with high accuracy using sample preparation methods that are commonplace in pathology laboratories for current diagnostic procedures

    Breast Cancer Detection by Extracting and Selecting Features Using Machine Learning

    Get PDF
    The cancer of the breast is a significant cause of female death worldwide, but especially in developing countries. For better results and higher survival rates, early diagnosis and screening are crucial. Machine learning (ML) methods can aid in the initialdiscovery and diagnosis of breast cancer by choosing the most informative elements from medical data and eliminating irrelevant ones. The approach of feature extraction involves taking unstructured data and extracting a representative set of characteristics that may be used to classify or forecast data. The aim is to decrease the dimensionality of the feature space while upholding or even refining the accuracy of the ML model. An artificial intelligence model is developed on the given features to categorize mammography images into benign and malignant groups. Different supervised learning techniques, including support vector machines, random forests, and artificial neural networks, are employed and contrasted in order to select the best-performing model. This research offers a comprehensive framework for utilizing machine learning methods to detect breast cancer. The technique demonstrates how it might assist radiologists in the early detection of breast cancer by effectively extracting and selecting critical characteristics that could improve patient outcomes and potentially save lives
    • …
    corecore