68 research outputs found

    Human-computer collaboration for skin cancer recognition

    Get PDF
    The rapid increase in telemedicine coupled with recent advances in diagnostic artificial intelligence (AI) create the imperative to consider the opportunities and risks of inserting AI-based support into new paradigms of care. Here we build on recent achievements in the accuracy of image-based AI for skin cancer diagnosis to address the effects of varied representations of AI-based support across different levels of clinical expertise and multiple clinical workflows. We find that good quality AI-based support of clinical decision-making improves diagnostic accuracy over that of either AI or physicians alone, and that the least experienced clinicians gain the most from AI-based support. We further find that AI-based multiclass probabilities outperformed content-based image retrieval (CBIR) representations of AI in the mobile technology environment, and AI-based support had utility in simulations of second opinions and of telemedicine triage. In addition to demonstrating the potential benefits associated with good quality AI in the hands of non-expert clinicians, we find that faulty AI can mislead the entire spectrum of clinicians, including experts. Lastly, we show that insights derived from AI class-activation maps can inform improvements in human diagnosis. Together, our approach and findings offer a framework for future studies across the spectrum of image-based diagnostics to improve human-computer collaboration in clinical practice

    Skin Lesion Analyser: An Efficient Seven-Way Multi-Class Skin Cancer Classification Using MobileNet

    Full text link
    Skin cancer, a major form of cancer, is a critical public health problem with 123,000 newly diagnosed melanoma cases and between 2 and 3 million non-melanoma cases worldwide each year. The leading cause of skin cancer is high exposure of skin cells to UV radiation, which can damage the DNA inside skin cells leading to uncontrolled growth of skin cells. Skin cancer is primarily diagnosed visually employing clinical screening, a biopsy, dermoscopic analysis, and histopathological examination. It has been demonstrated that the dermoscopic analysis in the hands of inexperienced dermatologists may cause a reduction in diagnostic accuracy. Early detection and screening of skin cancer have the potential to reduce mortality and morbidity. Previous studies have shown Deep Learning ability to perform better than human experts in several visual recognition tasks. In this paper, we propose an efficient seven-way automated multi-class skin cancer classification system having performance comparable with expert dermatologists. We used a pretrained MobileNet model to train over HAM10000 dataset using transfer learning. The model classifies skin lesion image with a categorical accuracy of 83.1 percent, top2 accuracy of 91.36 percent and top3 accuracy of 95.34 percent. The weighted average of precision, recall, and f1-score were found to be 0.89, 0.83, and 0.83 respectively. The model has been deployed as a web application for public use at (https://saketchaturvedi.github.io). This fast, expansible method holds the potential for substantial clinical impact, including broadening the scope of primary care practice and augmenting clinical decision-making for dermatology specialists.Comment: This is a pre-copyedited version of a contribution published in Advances in Intelligent Systems and Computing, Hassanien A., Bhatnagar R., Darwish A. (eds) published by Chaturvedi S.S., Gupta K., Prasad P.S. The definitive authentication version is available online via https://doi.org/10.1007/978-981-15-3383-9_1

    Deep neural network or dermatologist?

    Full text link
    Deep learning techniques have proven high accuracy for identifying melanoma in digitised dermoscopic images. A strength is that these methods are not constrained by features that are pre-defined by human semantics. A down-side is that it is difficult to understand the rationale of the model predictions and to identify potential failure modes. This is a major barrier to adoption of deep learning in clinical practice. In this paper we ask if two existing local interpretability methods, Grad-CAM and Kernel SHAP, can shed light on convolutional neural networks trained in the context of melanoma detection. Our contributions are (i) we first explore the domain space via a reproducible, end-to-end learning framework that creates a suite of 30 models, all trained on a publicly available data set (HAM10000), (ii) we next explore the reliability of GradCAM and Kernel SHAP in this context via some basic sanity check experiments (iii) finally, we investigate a random selection of models from our suite using GradCAM and Kernel SHAP. We show that despite high accuracy, the models will occasionally assign importance to features that are not relevant to the diagnostic task. We also show that models of similar accuracy will produce different explanations as measured by these methods. This work represents first steps in bridging the gap between model accuracy and interpretability in the domain of skin cancer classification

    Validation of artificial intelligence prediction models for skin cancer diagnosis using dermoscopy images: the 2019 International Skin Imaging Collaboration Grand Challenge

    Full text link
    Previous studies of artificial intelligence (AI) applied to dermatology have shown AI to have higher diagnostic classification accuracy than expert dermatologists; however, these studies did not adequately assess clinically realistic scenarios, such as how AI systems behave when presented with images of disease categories that are not included in the training dataset or images drawn from statistical distributions with significant shifts from training distributions. We aimed to simulate these real-world scenarios and evaluate the effects of image source institution, diagnoses outside of the training set, and other image artifacts on classification accuracy, with the goal of informing clinicians and regulatory agencies about safety and real-world accuracy.We designed a large dermoscopic image classification challenge to quantify the performance of machine learning algorithms for the task of skin cancer classification from dermoscopic images, and how this performance is affected by shifts in statistical distributions of data, disease categories not represented in training datasets, and imaging or lesion artifacts. Factors that might be beneficial to performance, such as clinical metadata and external training data collected by challenge participants, were also evaluated. 25?331 training images collected from two datasets (in Vienna [HAM10000] and Barcelona [BCN20000]) between Jan 1, 2000, and Dec 31, 2018, across eight skin diseases, were provided to challenge participants to design appropriate algorithms. The trained algorithms were then tested for balanced accuracy against the HAM10000 and BCN20000 test datasets and data from countries not included in the training dataset (Turkey, New Zealand, Sweden, and Argentina). Test datasets contained images of all diagnostic categories available in training plus other diagnoses not included in training data (not trained category). We compared the performance of the algorithms against that of 18 dermatologists in a simulated setting that reflected intended clinical use.64 teams submitted 129 state-of-the-art algorithm predictions on a test set of 8238 images. The best performing algorithm achieved 58·8% balanced accuracy on the BCN20000 data, which was designed to better reflect realistic clinical scenarios, compared with 82·0% balanced accuracy on HAM10000, which was used in a previously published benchmark. Shifted statistical distributions and disease categories not included in training data contributed to decreases in accuracy. Image artifacts, including hair, pen markings, ulceration, and imaging source institution, decreased accuracy in a complex manner that varied based on the underlying diagnosis. When comparing algorithms to expert dermatologists (2460 ratings on 1269 images), algorithms performed better than experts in most categories, except for actinic keratoses (similar accuracy on average) and images from categories not included in training data (26% correct for experts vs 6% correct for algorithms, p<0·0001). For the top 25 submitted algorithms, 47·1% of the images from categories not included in training data were misclassified as malignant diagnoses, which would lead to a substantial number of unnecessary biopsies if current state-of-the-art AI technologies were clinically deployed.We have identified specific deficiencies and safety issues in AI diagnostic systems for skin cancer that should be addressed in future diagnostic evaluation protocols to improve safety and reliability in clinical practice.Melanoma Research Alliance and La Marató de TV3.Copyright © 2022 The Author(s). Published by Elsevier Ltd. This is an Open Access article under the CC BY-NC-ND 4.0 license. Published by Elsevier Ltd.. All rights reserved

    Human surface anatomy terminology for dermatology: a Delphi consensus from the International Skin Imaging Collaboration

    Full text link
    BackgroundThere is no internationally vetted set of anatomic terms to describe human surface anatomy.ObjectiveTo establish expert consensus on a standardized set of terms that describe clinically relevant human surface anatomy.MethodsWe conducted a Delphi consensus on surface anatomy terminology between July 2017 and July 2019. The initial survey included 385 anatomic terms, organized in seven levels of hierarchy. If agreement exceeded the 75% established threshold, the term was considered - accepted- and included in the final list. Terms added by the participants were passed on to the next round of consensus. Terms with <75% agreement were included in subsequent surveys along with alternative terms proposed by participants until agreement was reached on all terms.ResultsThe Delphi included 21 participants. We found consensus (- ¥75% agreement) on 361/385 (93.8%) terms and eliminated one term in the first round. Of 49 new terms suggested by participants, 45 were added via consensus. To adjust for a recently published International Classification of Diseases- Surface Topography list of terms, a third survey including 111 discrepant terms was sent to participants. Finally, a total of 513 terms reached agreement via the Delphi method.ConclusionsWe have established a set of 513 clinically relevant terms for denoting human surface anatomy, towards the use of standardized terminology in dermatologic documentation.Linked Commentary: R.J.G. Chalmers. J Eur Acad Dermatol Venereol 2020; 34: 2456- 2457. https://doi.org/10.1111/jdv.16978.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/163915/1/jdv16855_am.pdfhttp://deepblue.lib.umich.edu/bitstream/2027.42/163915/2/jdv16855-sup-0001-FigS1-S3.pdfhttp://deepblue.lib.umich.edu/bitstream/2027.42/163915/3/jdv16855.pd
    corecore