11,269 research outputs found

    Machine learning methods for histopathological image analysis

    Full text link
    Abundant accumulation of digital histopathological images has led to the increased demand for their analysis, such as computer-aided diagnosis using machine learning techniques. However, digital pathological images and related tasks have some issues to be considered. In this mini-review, we introduce the application of digital pathological image analysis using machine learning algorithms, address some problems specific to such analysis, and propose possible solutions.Comment: 23 pages, 4 figure

    Decision Tree Algorithm for Breast Cancer Detection

    Get PDF
    A major form of cancer affecting women around the world is breast cancer. This underscores the importance of early detection for optimal treatment outcomes. This paper addresses the challenge of correctly classifying tumors as malignant or benign in light of the fact that breast cancer is a significant component of cancer cases around the world. As a breast cancer detection algorithm, there are several advantages to using this decision tree algorithm. Decision trees provide insight into the importance of features, which in turn allows for the identification of key factors that contribute to the classification of breast cancer. In addition to that, decision trees are able to deal with both numerical and categorical features, so they are suitable for a variety of breast cancer data sets. It is also important to note that decision trees are less sensitive than other algorithms when it comes to outliers and missing data. To begin with, decision trees provide insight into the importance of features, which allows for the identification of key factors that contribute to the classification of breast cancers. A decision tree can also be used to analyze both numerical and categorical features, making it more versatile for the analysis of breast cancer data in general. The decision tree algorithm, on the other hand, has a lower sensitivity to outliers and missing data than some other algorithms. As a result of utilizing performance metrics to assess the effectiveness of algorithms, it was found that the Decision Tree Algorithm was more effective at detecting breast cancer than other algorithms

    Stratification bias in low signal microarray studies

    Get PDF
    BACKGROUND: When analysing microarray and other small sample size biological datasets, care is needed to avoid various biases. We analyse a form of bias, stratification bias, that can substantially affect analyses using sample-reuse validation techniques and lead to inaccurate results. This bias is due to imperfect stratification of samples in the training and test sets and the dependency between these stratification errors, i.e. the variations in class proportions in the training and test sets are negatively correlated. RESULTS: We show that when estimating the performance of classifiers on low signal datasets (i.e. those which are difficult to classify), which are typical of many prognostic microarray studies, commonly used performance measures can suffer from a substantial negative bias. For error rate this bias is only severe in quite restricted situations, but can be much larger and more frequent when using ranking measures such as the receiver operating characteristic (ROC) curve and area under the ROC (AUC). Substantial biases are shown in simulations and on the van 't Veer breast cancer dataset. The classification error rate can have large negative biases for balanced datasets, whereas the AUC shows substantial pessimistic biases even for imbalanced datasets. In simulation studies using 10-fold cross-validation, AUC values of less than 0.3 can be observed on random datasets rather than the expected 0.5. Further experiments on the van 't Veer breast cancer dataset show these biases exist in practice. CONCLUSION: Stratification bias can substantially affect several performance measures. In computing the AUC, the strategy of pooling the test samples from the various folds of cross-validation can lead to large biases; computing it as the average of per-fold estimates avoids this bias and is thus the recommended approach. As a more general solution applicable to other performance measures, we show that stratified repeated holdout and a modified version of k-fold cross-validation, balanced, stratified cross-validation and balanced leave-one-out cross-validation, avoids the bias. Therefore for model selection and evaluation of microarray and other small biological datasets, these methods should be used and unstratified versions avoided. In particular, the commonly used (unbalanced) leave-one-out cross-validation should not be used to estimate AUC for small datasets

    Artificial Intelligence and Patient-Centered Decision-Making

    Get PDF
    Advanced AI systems are rapidly making their way into medical research and practice, and, arguably, it is only a matter of time before they will surpass human practitioners in terms of accuracy, reliability, and knowledge. If this is true, practitioners will have a prima facie epistemic and professional obligation to align their medical verdicts with those of advanced AI systems. However, in light of their complexity, these AI systems will often function as black boxes: the details of their contents, calculations, and procedures cannot be meaningfully understood by human practitioners. When AI systems reach this level of complexity, we can also speak of black-box medicine. In this paper, we want to argue that black-box medicine conflicts with core ideals of patient-centered medicine. In particular, we claim, black-box medicine is not conducive for supporting informed decision-making based on shared information, shared deliberation, and shared mind between practitioner and patient

    Doubly Optimized Calibrated Support Vector Machine (DOC-SVM): an algorithm for joint optimization of discrimination and calibration.

    Get PDF
    Historically, probabilistic models for decision support have focused on discrimination, e.g., minimizing the ranking error of predicted outcomes. Unfortunately, these models ignore another important aspect, calibration, which indicates the magnitude of correctness of model predictions. Using discrimination and calibration simultaneously can be helpful for many clinical decisions. We investigated tradeoffs between these goals, and developed a unified maximum-margin method to handle them jointly. Our approach called, Doubly Optimized Calibrated Support Vector Machine (DOC-SVM), concurrently optimizes two loss functions: the ridge regression loss and the hinge loss. Experiments using three breast cancer gene-expression datasets (i.e., GSE2034, GSE2990, and Chanrion's datasets) showed that our model generated more calibrated outputs when compared to other state-of-the-art models like Support Vector Machine (p=0.03, p=0.13, and p<0.001) and Logistic Regression (p=0.006, p=0.008, and p<0.001). DOC-SVM also demonstrated better discrimination (i.e., higher AUCs) when compared to Support Vector Machine (p=0.38, p=0.29, and p=0.047) and Logistic Regression (p=0.38, p=0.04, and p<0.0001). DOC-SVM produced a model that was better calibrated without sacrificing discrimination, and hence may be helpful in clinical decision making
    • …
    corecore