3 research outputs found
Validation of artificial intelligence prediction models for skin cancer diagnosis using dermoscopy images: the 2019 International Skin Imaging Collaboration Grand Challenge
Previous studies of artificial intelligence (AI) applied to dermatology have shown AI to have higher diagnostic classification accuracy than expert dermatologists; however, these studies did not adequately assess clinically realistic scenarios, such as how AI systems behave when presented with images of disease categories that are not included in the training dataset or images drawn from statistical distributions with significant shifts from training distributions. We aimed to simulate these real-world scenarios and evaluate the effects of image source institution, diagnoses outside of the training set, and other image artifacts on classification accuracy, with the goal of informing clinicians and regulatory agencies about safety and real-world accuracy.We designed a large dermoscopic image classification challenge to quantify the performance of machine learning algorithms for the task of skin cancer classification from dermoscopic images, and how this performance is affected by shifts in statistical distributions of data, disease categories not represented in training datasets, and imaging or lesion artifacts. Factors that might be beneficial to performance, such as clinical metadata and external training data collected by challenge participants, were also evaluated. 25?331 training images collected from two datasets (in Vienna [HAM10000] and Barcelona [BCN20000]) between Jan 1, 2000, and Dec 31, 2018, across eight skin diseases, were provided to challenge participants to design appropriate algorithms. The trained algorithms were then tested for balanced accuracy against the HAM10000 and BCN20000 test datasets and data from countries not included in the training dataset (Turkey, New Zealand, Sweden, and Argentina). Test datasets contained images of all diagnostic categories available in training plus other diagnoses not included in training data (not trained category). We compared the performance of the algorithms against that of 18 dermatologists in a simulated setting that reflected intended clinical use.64 teams submitted 129 state-of-the-art algorithm predictions on a test set of 8238 images. The best performing algorithm achieved 58路8% balanced accuracy on the BCN20000 data, which was designed to better reflect realistic clinical scenarios, compared with 82路0% balanced accuracy on HAM10000, which was used in a previously published benchmark. Shifted statistical distributions and disease categories not included in training data contributed to decreases in accuracy. Image artifacts, including hair, pen markings, ulceration, and imaging source institution, decreased accuracy in a complex manner that varied based on the underlying diagnosis. When comparing algorithms to expert dermatologists (2460 ratings on 1269 images), algorithms performed better than experts in most categories, except for actinic keratoses (similar accuracy on average) and images from categories not included in training data (26% correct for experts vs 6% correct for algorithms, p<0路0001). For the top 25 submitted algorithms, 47路1% of the images from categories not included in training data were misclassified as malignant diagnoses, which would lead to a substantial number of unnecessary biopsies if current state-of-the-art AI technologies were clinically deployed.We have identified specific deficiencies and safety issues in AI diagnostic systems for skin cancer that should be addressed in future diagnostic evaluation protocols to improve safety and reliability in clinical practice
Nasal skin reconstruction:Time to rethink the reconstructive ladder?
BACKGROUND: Nasal scarring can compromise aesthetics and function given its complex three-dimensional structure and central location. This study aimed to measure patients' satisfaction after reconstruction for nasal defects following Mohs micrographic surgery.METHODS: Patients presenting with nasal nonmelanoma skin cancer at Memorial Sloan Kettering Cancer Center New York, USA and Catharina Hospital Eindhoven, Netherlands from April 2017 to November 2019 were asked to participate. Reconstruction type, complications, and patients satisfaction were assessed. Patients completed the FACE-Q Skin Cancer - Satisfaction with Facial Appearance scale (preoperative and 1-year postoperative) and the Appraisal of Scars scale (1-year postoperative).RESULTS: A total of 128 patients completed the preand postoperative scales. There were 35 (27%) surgical defects repaired with primary closures, 71 (55.5%) with flaps, and 22 (17.2%) with full-thickness skin grafts (FTSG). Patients that underwent a flap or FTSG reconstruction had higher scar satisfaction scores than primary closures (p = 0.03). A trend was seen with patients following flap reconstructions scoring 7.8 points higher than primary closures and patients with upper nose defects scoring 6.4 points higher than lower nose defects. Males were significantly more satisfied than females. No significant difference was observed in the preoperative and postoperative facial appearance scores between the three groups (p = 0.39).CONCLUSION: Patients are more satisfied in the long term with their scars after flap reconstructions compared to primary closures. Therefore, nasal skin reconstruction may not follow the traditional reconstructive ladder and more complex approaches may lead to higher long-term scar satisfaction.</p
Combined PARP1-targeted nuclear contrast and reflectance contrast enhances confocal microscopic detection of basal cell carcinoma
Reflectance confocal microscopy (RCM) with endogenous backscattered contrast can noninvasively image basal cell carcinomas (BCCs) in skin. However, BCCs present with high nuclear density and the relatively weak backscattering from nuclei impose a fundamental limit on contrast, detectability, and diagnostic accuracy. We investigated PARPi-FL, an exogenous nuclear poly (ADP-ribose) polymerase (PARP1)-targeted fluorescent contrast agent and fluorescence confocal microscopy (FCM) towards improving BCC diagnosis. Methods: We tested PARP1 expression in 95 BCC tissues using immunohistochemistry, followed by PARPi-FL staining in 32 fresh surgical BCC specimens. Diagnostic accuracy of PARPi-FL contrast was evaluated in 83 surgical specimens. Optimal parameters for trans-epidermal permeability of PARPi-FL through intact skin was tested ex vivo on 5 human skin specimens and in vivo in 3 adult Yorkshire pigs. Results: We found significantly higher PARP1 expression and PARPi-FL binding in BCCs, as compared to normal skin structures. Blinded reading of RCM-and-FCM images by two experts demonstrated a higher diagnostic accuracy for BCCs with combined fluorescence and reflectance contrast, as compared to RCM-alone. Optimal parameters (time and concentration) for PARPi-FL trans-epidermal permeation through intact skin were successfully determined. Conclusion: Combined fluorescence and reflectance contrast may improve noninvasive BCC diagnosis with confocal microscopy