81 research outputs found
Effectiveness of the “Timed Up and Go” (TUG) and the Chair Test as Screening Tools for Geriatric Fall Risk Assessment in the ED
OBJECTIVE: We sought to evaluate the effectiveness of the Timed Up and Go (TUG) and the Chair test as screening tools in the Emergency Department (ED), stratified by sex.
METHODS: This prospective cohort study was conducted at a Level 1 Trauma center. After consent, subjects performed the TUG and the Chair test. Subjects were contacted for phone follow-up and asked to self-report interim falling.
RESULTS: Data from 192 subjects were analyzed. At baseline, 71.4% (n = 137) screened positive for increased falls risk based on the TUG evaluation, and 77.1% (n = 148) scored below average on the Chair test. There were no differences by patient sex. By the six-month evaluation 51 (26.6%) study participants reported at least one fall. Females reported a non-significant higher prevalence of falls compared to males (29.7% versus 22.2%, p = 0.24). TUG test had a sensitivity of 70.6% (95% CI: 56.2%-82.5%), a specificity of 28.4% (95% CI: 21.1%-36.6%), a positive predictive (PP) value 26.3% (95% CI: 19.1%-34.5%) and a negative predictive (NP) value of 72.7% (95% CI: 59.0%-83.9%). Similar results were observed with the Chair test. It had a sensitivity of 78.4% (95% CI: 64.7%-88.7%), a specificity of 23.4% (95% CI: 16.7%-31.3%), a PP value 27.0% (95% CI: 20.1%-34.9%) and a NP value of 75.0% (95% CI: 59.7%-86.8%). No significant differences were observed between sexes.
CONCLUSIONS: There were no sex specific significant differences in TUG or Chair test screening performance. Neither test performed well as a screening tool for future falls in the elderly in the ED setting
Validation of artificial intelligence prediction models for skin cancer diagnosis using dermoscopy images: the 2019 International Skin Imaging Collaboration Grand Challenge
Previous studies of artificial intelligence (AI) applied to dermatology have shown AI to have higher diagnostic classification accuracy than expert dermatologists; however, these studies did not adequately assess clinically realistic scenarios, such as how AI systems behave when presented with images of disease categories that are not included in the training dataset or images drawn from statistical distributions with significant shifts from training distributions. We aimed to simulate these real-world scenarios and evaluate the effects of image source institution, diagnoses outside of the training set, and other image artifacts on classification accuracy, with the goal of informing clinicians and regulatory agencies about safety and real-world accuracy.We designed a large dermoscopic image classification challenge to quantify the performance of machine learning algorithms for the task of skin cancer classification from dermoscopic images, and how this performance is affected by shifts in statistical distributions of data, disease categories not represented in training datasets, and imaging or lesion artifacts. Factors that might be beneficial to performance, such as clinical metadata and external training data collected by challenge participants, were also evaluated. 25?331 training images collected from two datasets (in Vienna [HAM10000] and Barcelona [BCN20000]) between Jan 1, 2000, and Dec 31, 2018, across eight skin diseases, were provided to challenge participants to design appropriate algorithms. The trained algorithms were then tested for balanced accuracy against the HAM10000 and BCN20000 test datasets and data from countries not included in the training dataset (Turkey, New Zealand, Sweden, and Argentina). Test datasets contained images of all diagnostic categories available in training plus other diagnoses not included in training data (not trained category). We compared the performance of the algorithms against that of 18 dermatologists in a simulated setting that reflected intended clinical use.64 teams submitted 129 state-of-the-art algorithm predictions on a test set of 8238 images. The best performing algorithm achieved 58·8% balanced accuracy on the BCN20000 data, which was designed to better reflect realistic clinical scenarios, compared with 82·0% balanced accuracy on HAM10000, which was used in a previously published benchmark. Shifted statistical distributions and disease categories not included in training data contributed to decreases in accuracy. Image artifacts, including hair, pen markings, ulceration, and imaging source institution, decreased accuracy in a complex manner that varied based on the underlying diagnosis. When comparing algorithms to expert dermatologists (2460 ratings on 1269 images), algorithms performed better than experts in most categories, except for actinic keratoses (similar accuracy on average) and images from categories not included in training data (26% correct for experts vs 6% correct for algorithms, p<0·0001). For the top 25 submitted algorithms, 47·1% of the images from categories not included in training data were misclassified as malignant diagnoses, which would lead to a substantial number of unnecessary biopsies if current state-of-the-art AI technologies were clinically deployed.We have identified specific deficiencies and safety issues in AI diagnostic systems for skin cancer that should be addressed in future diagnostic evaluation protocols to improve safety and reliability in clinical practice
Comparison of Airway Intubation Devices When Using a Biohazard Suit: A Feasibility Study
OBJECTIVES: We set out to compare emergency medicine residents\u27 intubating times and success rates for direct laryngoscopy (DL), GlideScope-assisted intubation (GS), and the Supraglottic Airway Laryngopharyngeal Tube (SALT) airway with and without biohazard gear.
METHODS: Each resident passed through 2 sets of 3 testing stations (DL, GS, SALT) in succession, intubating Laerdal mannequin heads with the 3 modalities after randomization to start with or without biohazard gear.
RESULTS: Thirty-seven residents participated, and 27 were male (73%); 14 (37.8%) had prior experience intubating in biohazard suits. There was a statistically significant difference in those who had prior intubation experience between DL (37, 100%), GS (32, 86.5%), and SALT (12, 32.4%) (P \u3c .001) and in median time to intubation (48 seconds, no suit; 57 seconds, with suits) (P = .03). There was no statistically significant difference between the overall times to intubate for the 3 devices. First-pass success was highest for DL (91.2%, no suit; 83.7%, suit) followed by GS (89%, no suit; 78.3%, suit) and SALT (51%, no suit; 67.6%, suit).
CONCLUSION: A minority of participants had prior experience intubating in biohazard suits. Use of biohazard suits extends time to successful intubation. There was no difference in time to intubation for the 3 devices, but first-pass success was highest for DL (with or without biohazard gear)
Validity and reliability of dermoscopic criteria used to differentiate nevi from melanoma aweb-based international dermoscopy society study
IMPORTANCE The comparative diagnostic performance of dermoscopic algorithms and their individual criteria are not well studied. OBJECTIVES To analyze the discriminatory power and reliability of dermoscopic criteria used in melanoma detection and compare the diagnostic accuracy of existing algorithms. DESIGN, SETTING, AND PARTICIPANTS Thiswas a retrospective, observational study of 477 lesions (119 melanomas [24.9%] and 358 nevi [75.1%]), which were divided into 12 image sets that consisted of 39 or 40 images per set. A link on the International Dermoscopy Society website from January 1, 2011, through December 31, 2011, directed participants to the study website. Data analysis was performed from June 1, 2013, through May 31, 2015. Participants included physicians, residents, and medical students, and there were no specialty-Type or experience-level restrictions. Participants were randomly assigned to evaluate 1 of the 12 image sets. MAIN OUTCOMES AND MEASURES Associations with melanoma and intraclass correlation coefficients (ICCs) were evaluated for the presence of dermoscopic criteria. Diagnostic accuracy measures were estimated for the following algorithms: The ABCD rule, the Menzies method, the 7-point checklist, the 3-point checklist, chaos and clues, and CASH (color, architecture, symmetry, and homogeneity). RESULTS A total of 240 participants registered, and 103 (42.9%) evaluated all images. The 110 participants (45.8%) who evaluated fewer than 20 lesions were excluded, resulting in data from 130 participants (54.2%), 121 (93.1%) of whom were regular dermoscopy users. Criteria associated with melanoma included marked architectural disorder (odds ratio [OR], 6.6; 95%CI, 5.6-7.8), pattern asymmetry (OR, 4.9; 95%CI, 4.1-5.8), nonorganized pattern (OR, 3.3; 95%CI, 2.9-3.7), border score of 6 (OR, 3.3; 95%CI, 2.5-4.3), and contour asymmetry (OR, 3.2; 95%CI, 2.7-3.7) (P < .001 for all). Most dermoscopic criteria had poor to fair interobserver agreement. Criteria that reached moderate levels of agreement included comma vessels (ICC, 0.44; 95%CI, 0.40-0.49), absence of vessels (ICC, 0.46; 95%CI, 0.42-0.51), dark brown color (ICC, 0.40; 95%CI, 0.35-0.44), and architectural disorder (ICC, 0.43; 95%CI, 0.39-0.48). The Menziesmethod had the highest sensitivity for melanoma diagnosis (95.1%) but the lowest specificity (24.8%) compared with any other method (P < .001). The ABCD rule had the highest specificity (59.4%). All methods had similar areas under the receiver operating characteristic curves. CONCLUSIONS AND RELEVANCE Important dermoscopic criteria for melanoma recognition were revalidated by participants with varied experience. Six algorithms tested had similar but modest levels of diagnostic accuracy, and the interobserver agreement of most individual criteria was poor
Validity and Reliability of Dermoscopic Criteria Used to Differentiate Nevi From Melanoma: A Web-Based International Dermoscopy Society Study.
IMPORTANCE: The comparative diagnostic performance of dermoscopic algorithms and their individual criteria are not well studied. OBJECTIVES: To analyze the discriminatory power and reliability of dermoscopic criteria used in melanoma detection and compare the diagnostic accuracy of existing algorithms. DESIGN, SETTING, AND PARTICIPANTS: This was a retrospective, observational study of 477 lesions (119 melanomas [24.9%] and 358 nevi [75.1%]), which were divided into 12 image sets that consisted of 39 or 40 images per set. A link on the International Dermoscopy Society website from January 1, 2011, through December 31, 2011, directed participants to the study website. Data analysis was performed from June 1, 2013, through May 31, 2015. Participants included physicians, residents, and medical students, and there were no specialty-type or experience-level restrictions. Participants were randomly assigned to evaluate 1 of the 12 image sets. MAIN OUTCOMES AND MEASURES: Associations with melanoma and intraclass correlation coefficients (ICCs) were evaluated for the presence of dermoscopic criteria. Diagnostic accuracy measures were estimated for the following algorithms: the ABCD rule, the Menzies method, the 7-point checklist, the 3-point checklist, chaos and clues, and CASH (color, architecture, symmetry, and homogeneity). RESULTS: A total of 240 participants registered, and 103 (42.9%) evaluated all images. The 110 participants (45.8%) who evaluated fewer than 20 lesions were excluded, resulting in data from 130 participants (54.2%), 121 (93.1%) of whom were regular dermoscopy users. Criteria associated with melanoma included marked architectural disorder (odds ratio [OR], 6.6; 95% CI, 5.6-7.8), pattern asymmetry (OR, 4.9; 95% CI, 4.1-5.8), nonorganized pattern (OR, 3.3; 95% CI, 2.9-3.7), border score of 6 (OR, 3.3; 95% CI, 2.5-4.3), and contour asymmetry (OR, 3.2; 95% CI, 2.7-3.7) (P < .001 for all). Most dermoscopic criteria had poor to fair interobserver agreement. Criteria that reached moderate levels of agreement included comma vessels (ICC, 0.44; 95% CI, 0.40-0.49), absence of vessels (ICC, 0.46; 95% CI, 0.42-0.51), dark brown color (ICC, 0.40; 95% CI, 0.35-0.44), and architectural disorder (ICC, 0.43; 95% CI, 0.39-0.48). The Menzies method had the highest sensitivity for melanoma diagnosis (95.1%) but the lowest specificity (24.8%) compared with any other method (P < .001). The ABCD rule had the highest specificity (59.4%). All methods had similar areas under the receiver operating characteristic curves. CONCLUSIONS AND RELEVANCE: Important dermoscopic criteria for melanoma recognition were revalidated by participants with varied experience. Six algorithms tested had similar but modest levels of diagnostic accuracy, and the interobserver agreement of most individual criteria was poor
- …