41,533 research outputs found
Interobserver agreement of various thyroid imaging reporting and data systems
Ultrasonography is the best available tool for the initial work-up of thyroid nodules. Substantial interobserver variability has been documented in the recognition and reporting of some of the lesion characteristics. A number of classification systems have been developed to estimate the likelihood of malignancy: several of them have been endorsed by scientific societies, but their reproducibility has yet to be assessed. We evaluated the interobserver variability of the AACE/ACE/AME, ACR, ATA, EU-TIRADS, and K-TIRADS classification systems and the interobserver concordance in the indication to FNA biopsy. Two raters independently evaluated 1055 ultrasound images of thyroid nodules identified in 265 patients at multiple time points, in two separate sets (501 and 554 images). After the first set of nodules, a joint reading was performed to reach a consensus in the feature definitions. The interobserver agreement (Krippendorff alpha) in the first set of nodules was 0.47, 0.49, 0.49, 0.61, and 0.53, for AACE/ACE/AME, ACR, ATA, EU-TIRADS, and K-TIRADS systems, respectively. The agreement for the indication to biopsy was substantial to near-perfect, being 0.73, 0.61, 0.75, 0.68, and 0.82, respectively (Cohen's kappa). For all systems, agreement on the nodules of the second set increased. Despite the wide variability in the description of single ultrasonographic features, the classification systems may improve the interobserver agreement, that further ameliorates after a specific training. When selecting nodules to be submitted to FNA biopsy, that is main purpose of these classifications, the interobserver agreement is substantial to almost perfect
Observer variability of absolute and relative thrombus density measurements in patients with acute ischemic stroke
Introduction: Thrombus density may be a predictor for acute ischemic stroke treatment success. However, only limited data on observer variability for thrombus density measurements exist. This study assesses the variability and bias of four common thrombus density measurement methods by expert and non-expert observers. Methods: For 132 consecutive patients with acute ischemic stroke, three experts and two trained observers determined thrombus density by placing three standardized regions of interest (ROIs) in the thrombus and corresponding contralateral arterial segment. Subsequently, absolute and relative thrombus densities were determined using either one or three ROIs. Intraclass correlation coefficient (ICC) was determined, and Bland–Altman analysis was performed to evaluate interobserver and intermethod agreement. Accuracy of the trained observer was evaluated with a reference expert observer using the same statistical analysis. Results: The highest interobserver agreement was obtained for absolute thrombus measurements using three ROIs (ICCs ranging from 0.54 to 0.91). In general, interobserver agreement was lower for relative measurements, and for using one instead of three ROIs. Interobserver agreement of trained non-experts and experts was similar. Accuracy of the trained observer measurements was comparable to the expert interobserver agreement and was better for absolute measurements and with three ROIs. The agreement between the one ROI and three ROI methods was good. Conclusion: Absolute thrombus density measurement has superior interobserver agreement compared to relative density measurement. Interobserver variation is smaller when multiple ROIs are used. Trained non-expert observers can accurately and reproducibly assess absolute thrombus densities using three ROIs
Interobserver variability of ultrasound parameters in portal hypertension
The aim of this study was to assess interobserver agreement of ultrasound parameters for portal hypertension in hepatosplenic mansonic schistosomiasis. Spleen size, diameter of the portal, splenic and superior mesenteric veins and presence of thrombosis and cavernous transformation were determined by three radiologists in blinded and independent fashion in 30 patients. Interobserver agreement was measured by the kappa index and intraclass correlation coefficient. Interobserver agreement was considered substantial (κ = 0.714-0.795) for portal vein thrombosis and perfect (κ = 1) for cavernous transformation. Interobserver agreement measured by the intraclass correlation coefficient was excellent for longitudinal diameter of the spleen (r = 0.828-0.869) and splenic index (r = 0.816-0.905) and varied from fair to almost perfect for diameter of the portal (r = 0.622-0.675), splenic (r = 0.573-0.913) and superior mesenteric (r = 0.525-0.607) veins. According to the results, ultrasound is a highly reproducible method for the main morphological parameters of portal hypertension in schistosomiasis patients.Universidade Federal de São Paulo (UNIFESP) Departamento de Diagnóstico por ImagensUniversidade de Brasília Faculdade de MedicinaUNIFESP, Depto. de Diagnóstico por ImagensSciEL
Chest CT scoring for evaluation of lung sequelae in congenital diaphragmatic hernia survivors
Objectives Data on long-term structural lung abnormalities in survivors of congenital diaphragmatic hernia (CDH) is scarce. The purpose of this study was to develop a chest computed tomography (CT) score to assess the structural lung sequelae in CDH survivors and to study the correlation between the CT scoring and clinical parameters in the neonatal period and at 1 year of follow-up. Methods A prospective, clinical follow-up program is organised for CDH survivors at the University Hospital of Leuven including a chest CT at the age of 1 year. The CT scoring used and evaluated, named CDH-CT score, was adapted from the revised Aukland score for chronic lung disease of prematurity. Results Thirty-five patients were included. All CT scans showed some pulmonary abnormalities, ranging from very mild to severe. The mean total CT score was 16 (IQR: 9-23), with the greatest contribution from the subscores for decreased attenuation (5; IQR: 2-8), subpleural linear and triangular opacities (4; IQR: 3-5), and atelectasis/consolidation (2; IQR: 1-3). Interobserver and intraobserver agreement was very good for the total score (ICC coefficient > 0.9). Total CT score correlated with number of neonatal days ventilated/on oxygen as well as with respiratory symptoms and feeding problems at 1 year of age. Conclusion The CDH-CT scoring tool has a good intraobserver and interobserver repeatability and correlates with relevant clinical parameters. This holds promise for its use in clinical follow-up and as outcome parameter in clinical interventional studies
Interobserver Reliability in Describing Radiographic Lung Changes After Stereotactic Body Radiation Therapy
Purpose Radiographic lung changes after stereotactic body radiation therapy (SBRT) vary widely between patients. Standardized descriptions of acute (≤6 months after treatment) and late (\u3e6 months after treatment) benign lung changes have been proposed but the reliable application of these classification systems has not been demonstrated. Herein, we examine the interobserver reliability of classifying acute and late lung changes after SBRT.
Methods and materials A total of 280 follow-up computed tomography scans at 3, 6, and 12 months post-treatment were analyzed in 100 patients undergoing thoracic SBRT. Standardized descriptions of acute lung changes (3- and 6-month scans) include diffuse consolidation, patchy consolidation and ground glass opacity (GGO), diffuse GGO, patchy GGO, and no change. Late lung change classifications (12-month scans) include modified conventional pattern, mass-like pattern, scar-like pattern, and no change. Five physicians scored the images independently in a blinded fashion. Fleiss\u27 kappa scores quantified the interobserver agreement.
Results The Kappa scores were 0.30 at 3 months, 0.20 at 6 months, and 0.25 at 12 months. The proportion of patients in each category at 3 and 6 months was as follows: Diffuse consolidation 11% and 21%; patchy consolidation and GGO 15% and 28%; diffuse GGO 10% and 11%; patchy GGO 15% and 15%; and no change 49% and 25%, respectively. The percentage of patients in each category at 12 months was as follows: Modified conventional 46%; mass-like 16%; scar-like 26%; and no change 12%. Uniform scoring between the observers occurred in 26, 8, and 14 cases at 3, 6, and 12 months, respectively.
Conclusions Interobserver reliability scores indicate a fair agreement to classify radiographic lung changes after SBRT. Qualitative descriptions are insufficient to categorize these findings because most patient scans do not fit clearly into a single classification. Categorization at 6 months may be the most difficult because late and acute lung changes can arise at that time
Impact of 3 Tesla MRI on interobserver agreement in clinically isolated syndrome: A MAGNIMS multicentre study
Compared to 1.5 T, 3 T magnetic resonance imaging (MRI) increases signal-to-noise ratio leading to improved image quality. However, its clinical relevance in clinically isolated syndrome suggestive of multiple sclerosis remains uncertain
Variability in modified rankin scoring across a large cohort of observers
<br>Background and Purpose— The modified Rankin scale (mRS) is the most commonly used outcome measure in stroke trials. However, substantial interobserver variability in mRS scoring has been reported. These studies likely underestimate the variability present in multicenter clinical trials, because exploratory work has only been undertaken in single centers by a few observers, all of similar training. We examined mRS variability across a large cohort of international observers using data from a video training resource.</br>
<br>Methods— The mRS training package includes a series of “real-life” patient interviews for grading. Training data were collected centrally and analyzed for variability using kappa statistics. We examined variability against a standard of “correct” mRS grades; examined variability by country; and for UK assessors, examined variability by center and by professional background of the observer.</br>
<br>Results— To date, 2942 assessments from 30 countries have been submitted. Overall reliability for mRS grading has been moderate to good with substantial heterogeneity across countries. Native English language has had little effect on reliability. Within the United Kingdom, there was no significant variation by profession.</br>
<br>Conclusion— Our results confirm interobserver variability in mRS assessment. The heterogeneity across countries is intriguing because it appears not to be related solely to language. These data highlight the need for novel strategies to improve reliability.</br>
How does a cadaver model work for testing ultrasound diagnostic capability for rheumatic-like tendon damage?
To establish whether a cadaver model can serve as an effective surrogate for the detection of tendon damage characteristic of rheumatoid arthritis (RA). In addition, we evaluated intraobserver and interobserver agreement in the grading of RA-like tendon tears shown by US, as well as the concordance between the US findings and the surgically induced lesions in the cadaver model. RA-like tendon damage was surgically induced in the tibialis anterior tendon (TAT) and tibialis posterior tendon (TPT) of ten ankle/foot fresh-frozen cadaveric specimens. Of the 20 tendons examined, six were randomly assigned a surgically induced partial tear; six a complete tear; and eight left undamaged. Three rheumatologists, experts in musculoskeletal US, assessed from 1 to 5 the quality of US imaging of the cadaveric models on a Likert scale. Tendons were then categorized as having either no damage, (0); partial tear, (1); or complete tear (2). All 20 tendons were blindly and independently evaluated twice, over two rounds, by each of the three observers. Overall, technical performance was satisfactory for all items in the two rounds (all values over 2.9 in a Likert scale 1-5). Intraobserver and interobserver agreement for US grading of tendon damage was good (mean κ values 0.62 and 0.71, respectively), with greater reliability found in the TAT than the TPT. Concordance between US findings and experimental tendon lesions was acceptable (70-100 %), again greater for the TAT than for the TPT. A cadaver model with surgically created tendon damage can be useful in evaluating US metric properties of RA tendon lesions
Exploring the reliability of the modified Rankin Scale
<p><b>Background and Purpose:</b> The modified Rankin Scale (mRS) is the most prevalent outcome measure in stroke trials. Use of the mRS may be hampered by variability in grading. Previous estimates of the properties of the mRS have used diverse methodologies and may not apply to contemporary trial populations. We used a mock clinical trial design to explore inter- and intraobserver variability of the mRS.</p>
<p><b>Methods:</b> Consenting patients with stroke attending for outpatient review had the mRS performed by 2 independent assessors with pairs of assessors selected from a team of 3 research nurses and 4 stroke physicians. Before formal assessment, interviewers estimated disability based only on initial patient observation. Each patient was then randomized to undergo the mRS using standard assessment or a prespecified structured interview. The second interviewer in the pair reassessed the patient using the same method blinded to the colleague’s score. For each patient assessed, one rater was randomly assigned to video record their interview. After 3 months, this interviewer reviewed and regraded their original video assessment.</p>
<p><b>Results:</b> Across 100 paired assessments, interobserver agreement was moderate (k=0.57). Intraobserver variability was good (k=0.72) but less than would be expected from previous literature. Forty-nine assessments were performed using the structured interview approach with no significant difference between structured and standard mRS. Researchers were unable to reliably predict mRS from initial limited patient assessment (k=0.16).</p>
<p><b>Conclusions:</b> Despite availability of training and structured interview, there remains substantial interobserver variability in mRS grades awarded even by experienced researchers. Additional methods to improve mRS reliability are required.</p>
Reliability of the modified Rankin Scale: a systematic review
<p><b>Background and Purpose:</b> A perceived weakness of the modified Rankin Scale is potential for interobserver variability. We undertook a systematic review of modified Rankin Scale reliability studies.</p>
<p><b>Methods:</b> Two researchers independently reviewed the literature. Crossdisciplinary electronic databases were interrogated using the following key words: Stroke*; Cerebrovasc*; Modified Rankin*; Rankin Scale*; Oxford Handicap*; Observer variation*. Data were extracted according to prespecified criteria with decisions on inclusion by consensus.</p>
<p><b>Results:</b> From 3461 titles, 10 studies (587 patients) were included. Reliability of modified Rankin Scale varied from weighted κ=0.95 to κ=0.25. Overall reliability of mRS was κ=0.46; weighted κ=0.90 (traditional modified Rankin Scale) and κ=0.62; weighted κ=0.87 (structured interview).</p>
<p><b>Conclusion:</b> There remains uncertainty regarding modified Rankin Scale reliability. Interobserver studies closest in design to large-scale clinical trials demonstrate potentially significant interobserver variability.</p>
- …
