17,646 research outputs found

    Parameters influencing interobserver agreement and observer accuracy in a vigilance analogue to naturalistic observation

    Get PDF
    Interobserver agreement (reliability) is the usual method used to estimate observer accuracy in naturalistic and contrived observations. Despite the warnings by early researchers and the growing interest in methodological problems involved in the observation process, there has been no research explicating the relationship between interobserver agreement and observer accuracy. In addition, there has been little research into the environmental and organismic variables which influence interobserver agreement and observer accuracy. In an attempt to address these problems, a situation that is analogous to naturalistic observation, namely a vigilance paradigm, was utilized. Experimental assistants performed two arbitrary behaviors (lifting and/or moving the index finger of each hand) at a preprogrammed rate; the behaviors were automatically recorded by electromechanical equipment. In one-hour sessions, the subjects, who were 36 female college undergraduates, recorded the assistant's behaviors by pressing buttons; the subjects' responses were also electromechanically recorded. The experimental design was a two by two by three factorial design with repeated measures across a 60 minute experimental session

    Intra- and interobserver variation in lung sound classification. Effect of training.

    Get PDF
    This study explores how last year medical students at the University of Tromsþ, the Arctic University of Norway, interpret and describe different lung sounds. This is done by measuring intra- and interobserver variation in agreement among 16 students, when reporting abnormal lung sounds after listening to audio recordings. Agreement with a reference standard is included, and testing of effect on training on these agreements. To test the training effect the students were separated in two groups, one of them having an intervention, a 3 hour course. The results serves to inform the medical society about the inconsistency in reporting lung sounds in this particular population, and hopefully also help finding measures to obtain better agreement. Cohens kappa have been used to measure intraobserver agreement and agreement with the reference standard, Fleiss kappa to measure interobserver agreement. An “exact” Mann-Whitney U test for testing the effect of the course. The kappa level of agreement set to define acceptable agreement is “moderate”, with a lower limit of .41. The results indicate highly acceptable intraobserver agreement, and the agreement tended to improve in both the intervention and the control group. The agreement with the reference standard was also highly acceptable for the category wheezes and acceptable for crackles and the abnormal category. A tendency to positive change in the intervention group when compared to the control group was found, but the difference was only statistically significant for the abnormal category in the agreement against reference standard. The interobserver agreement did not reach the limit of acceptable, except for wheezes. Summarized, a weak effect of the intervention was observed

    Exploring the reliability of the modified Rankin Scale

    Get PDF
    <p><b>Background and Purpose:</b> The modified Rankin Scale (mRS) is the most prevalent outcome measure in stroke trials. Use of the mRS may be hampered by variability in grading. Previous estimates of the properties of the mRS have used diverse methodologies and may not apply to contemporary trial populations. We used a mock clinical trial design to explore inter- and intraobserver variability of the mRS.</p> <p><b>Methods:</b> Consenting patients with stroke attending for outpatient review had the mRS performed by 2 independent assessors with pairs of assessors selected from a team of 3 research nurses and 4 stroke physicians. Before formal assessment, interviewers estimated disability based only on initial patient observation. Each patient was then randomized to undergo the mRS using standard assessment or a prespecified structured interview. The second interviewer in the pair reassessed the patient using the same method blinded to the colleague’s score. For each patient assessed, one rater was randomly assigned to video record their interview. After 3 months, this interviewer reviewed and regraded their original video assessment.</p> <p><b>Results:</b> Across 100 paired assessments, interobserver agreement was moderate (k=0.57). Intraobserver variability was good (k=0.72) but less than would be expected from previous literature. Forty-nine assessments were performed using the structured interview approach with no significant difference between structured and standard mRS. Researchers were unable to reliably predict mRS from initial limited patient assessment (k=0.16).</p> <p><b>Conclusions:</b> Despite availability of training and structured interview, there remains substantial interobserver variability in mRS grades awarded even by experienced researchers. Additional methods to improve mRS reliability are required.</p&gt

    Interobserver Reliability in Describing Radiographic Lung Changes After Stereotactic Body Radiation Therapy

    Get PDF
    Purpose Radiographic lung changes after stereotactic body radiation therapy (SBRT) vary widely between patients. Standardized descriptions of acute (≀6 months after treatment) and late (\u3e6 months after treatment) benign lung changes have been proposed but the reliable application of these classification systems has not been demonstrated. Herein, we examine the interobserver reliability of classifying acute and late lung changes after SBRT. Methods and materials A total of 280 follow-up computed tomography scans at 3, 6, and 12 months post-treatment were analyzed in 100 patients undergoing thoracic SBRT. Standardized descriptions of acute lung changes (3- and 6-month scans) include diffuse consolidation, patchy consolidation and ground glass opacity (GGO), diffuse GGO, patchy GGO, and no change. Late lung change classifications (12-month scans) include modified conventional pattern, mass-like pattern, scar-like pattern, and no change. Five physicians scored the images independently in a blinded fashion. Fleiss\u27 kappa scores quantified the interobserver agreement. Results The Kappa scores were 0.30 at 3 months, 0.20 at 6 months, and 0.25 at 12 months. The proportion of patients in each category at 3 and 6 months was as follows: Diffuse consolidation 11% and 21%; patchy consolidation and GGO 15% and 28%; diffuse GGO 10% and 11%; patchy GGO 15% and 15%; and no change 49% and 25%, respectively. The percentage of patients in each category at 12 months was as follows: Modified conventional 46%; mass-like 16%; scar-like 26%; and no change 12%. Uniform scoring between the observers occurred in 26, 8, and 14 cases at 3, 6, and 12 months, respectively. Conclusions Interobserver reliability scores indicate a fair agreement to classify radiographic lung changes after SBRT. Qualitative descriptions are insufficient to categorize these findings because most patient scans do not fit clearly into a single classification. Categorization at 6 months may be the most difficult because late and acute lung changes can arise at that time

    Interobserver agreement of various thyroid imaging reporting and data systems

    Get PDF
    Ultrasonography is the best available tool for the initial work-up of thyroid nodules. Substantial interobserver variability has been documented in the recognition and reporting of some of the lesion characteristics. A number of classification systems have been developed to estimate the likelihood of malignancy: several of them have been endorsed by scientific societies, but their reproducibility has yet to be assessed. We evaluated the interobserver variability of the AACE/ACE/AME, ACR, ATA, EU-TIRADS, and K-TIRADS classification systems and the interobserver concordance in the indication to FNA biopsy. Two raters independently evaluated 1055 ultrasound images of thyroid nodules identified in 265 patients at multiple time points, in two separate sets (501 and 554 images). After the first set of nodules, a joint reading was performed to reach a consensus in the feature definitions. The interobserver agreement (Krippendorff alpha) in the first set of nodules was 0.47, 0.49, 0.49, 0.61, and 0.53, for AACE/ACE/AME, ACR, ATA, EU-TIRADS, and K-TIRADS systems, respectively. The agreement for the indication to biopsy was substantial to near-perfect, being 0.73, 0.61, 0.75, 0.68, and 0.82, respectively (Cohen's kappa). For all systems, agreement on the nodules of the second set increased. Despite the wide variability in the description of single ultrasonographic features, the classification systems may improve the interobserver agreement, that further ameliorates after a specific training. When selecting nodules to be submitted to FNA biopsy, that is main purpose of these classifications, the interobserver agreement is substantial to almost perfect

    Collagen bundle morphometry in skin & scar tissue: a novel distance mapping method provides superior measurements compared to Fourier analysis

    Get PDF
    Histopathological evaluations of fibrotic processes require the characterization of collagen morphology in terms of geometrical features such as bundle orientation thickness and spacing. However, there are currently no reliable and valid techniques of measuring bundle thickness and spacing. Hence, two objective methods quantifying the collagen bundle thickness and spacing were tested for their reliability and validity: Fourier first-order maximum analysis and Distance Mapping, with the latter constituting a newly developed morphometric technique. Histological slides were constructed and imaged from 50 scar and 50 healthy human skin biopsies and subsequently analyzed by two observers to determine the interobserver reliability via the intraclass correlation coefficient. An intraclass correlation coefficient larger than 0.7 is considered as representing good reliability. The interobserver reliability for the Fourier first-order maximum and for the Distance Mapping algorithms, respectively, showed an intraclass correlation coefficient above 0.72 and 0.89. Additionally, we performed an assessment of validity in the form of responsiveness, in particular, demonstrating medium to excellent results via a calculation of the effect size, highlighting that both methods are sensitive enough to measure a treatment effect in clinical practice. In summary, two reliable and valid measurement methods were demonstrated for collagen bundle morphometry for the first time. Due to its superior reliability and more useful measures (bundle thickness and bundle spacing), Distance Mapping emerges as the preferred and more practical method. Nevertheless, in the future, both methods can be used for reliable and valid collagen morphometry of skin and scars, whereas further applications evaluating the quantitative microscopy of other fibrotic processes are anticipated

    Does 4D transperineal ultrasound have additional value over 2D transperineal ultrasound for diagnosing posterior pelvic floor disorders in women with obstructed defecation syndrome?

    Get PDF
    Objective To establish the diagnostic test accuracy of two‐dimensional (2D) and four‐dimensional (4D) transperineal ultrasound (TPUS) for diagnosis of posterior pelvic floor disorders in women with obstructed defecation syndrome (ODS), in order to assess if 4D ultrasound imaging provides additional value. Methods This was a prospective cohort study of 121 consecutive women with ODS. Symptoms of ODS and pelvic organ prolapse on clinical examination were assessed using validated methods. All women underwent both 2D‐ and 4D‐TPUS. Imaging analysis was performed by two blinded observers. Posterior pelvic floor disorders were dichotomized into presence or absence, according to predefined cut‐off values. In the absence of a reference standard, a composite reference standard was created from a combination of results of evacuation proctography, magnetic resonance imaging and endovaginal ultrasound. Primary outcome measures were diagnostic test characteristics of 2D‐ and 4D‐TPUS for rectocele, enterocele, intussusception and anismus. Secondary outcome measures were interobserver agreement, agreement between the two imaging techniques, and association of severity of ODS symptoms and degree of posterior vaginal wall prolapse with conditions observed on imaging. Results For diagnosis of all four posterior pelvic floor disorders, there was no difference in sensitivity or specificity between 2D‐ and 4D‐TPUS (P = 0.131–1.000). Good agreement between 2D‐ and 4D‐TPUS was found for diagnosis of rectocele (Îș = 0.675) and moderate agreement for diagnoses of enterocele, intussusception and anismus (Îș = 0.465–0.545). There was no difference in rectocele depth measurements between the techniques (19.9 mm for 2D vs 19.0 mm for 4D, P = 0.802). Interobserver agreement was comparable for both techniques, although 2D‐TPUS had excellent interobserver agreement for diagnosis of enterocele and rectocele depth measurements, while this was only moderate and good, respectively, for 4D‐TPUS. Diagnoses of rectocele and enterocele on both 2D‐ and 4D‐TPUS were significantly associated with degree of posterior vaginal wall prolapse on clinical examination (odds ratio (OR) = 1.89–2.72). The conditions observed using either imaging technique were not associated with severity of ODS symptoms (OR = 0.82–1.13). Conclusions There is no evidence of superiority of 4D ultrasound acquisition to dynamic 2D ultrasound acquisition for the diagnosis of posterior pelvic floor disorders. 2D‐ and 4D‐TPUS could be used interchangeably to screen women with symptoms of ODS

    Variability in modified rankin scoring across a large cohort of observers

    Get PDF
    <br>Background and Purpose— The modified Rankin scale (mRS) is the most commonly used outcome measure in stroke trials. However, substantial interobserver variability in mRS scoring has been reported. These studies likely underestimate the variability present in multicenter clinical trials, because exploratory work has only been undertaken in single centers by a few observers, all of similar training. We examined mRS variability across a large cohort of international observers using data from a video training resource.</br> <br>Methods— The mRS training package includes a series of “real-life” patient interviews for grading. Training data were collected centrally and analyzed for variability using kappa statistics. We examined variability against a standard of “correct” mRS grades; examined variability by country; and for UK assessors, examined variability by center and by professional background of the observer.</br> <br>Results— To date, 2942 assessments from 30 countries have been submitted. Overall reliability for mRS grading has been moderate to good with substantial heterogeneity across countries. Native English language has had little effect on reliability. Within the United Kingdom, there was no significant variation by profession.</br> <br>Conclusion— Our results confirm interobserver variability in mRS assessment. The heterogeneity across countries is intriguing because it appears not to be related solely to language. These data highlight the need for novel strategies to improve reliability.</br&gt

    Assessment scales in stroke: clinimetric and clinical considerations

    Get PDF
    As stroke care has developed, there has been a need to robustly assess the efficacy of interventions both at the level of the individual stroke survivor and in the context of clinical trials. To describe stroke-survivor recovery meaningfully, more sophisticated measures are required than simple dichotomous end points, such as mortality or stroke recurrence. As stroke is an exemplar disabling long-term condition, measures of function are well suited as outcome assessment. In this review, we will describe functional assessment scales in stroke, concentrating on three of the more commonly used tools: the National Institutes of Health Stroke Scale, the modified Rankin Scale, and the Barthel Index. We will discuss the strengths, limitations, and application of these scales and use the scales to highlight important properties that are relevant to all assessment tools. We will frame much of this discussion in the context of "clinimetric" analysis. As they are increasingly used to inform stroke-survivor assessments, we will also discuss some of the commonly used quality-of-life measures. A recurring theme when considering functional assessment is that no tool suits all situations. Clinicians and researchers should chose their assessment tool based on the question of interest and the evidence base around clinimetric properties
