285 research outputs found

    Estimating the number needed to treat from continuous outcomes in randomised controlled trials: methodological challenges and worked example using data from the UK Back Pain Exercise and Manipulation (BEAM) trial

    Get PDF
    Background Reporting numbers needed to treat (NNT) improves interpretability of trial results. It is unusual that continuous outcomes are converted to numbers of individual responders to treatment (i.e., those who reach a particular threshold of change); and deteriorations prevented are only rarely considered. We consider how numbers needed to treat can be derived from continuous outcomes; illustrated with a worked example showing the methods and challenges. Methods We used data from the UK BEAM trial (n = 1, 334) of physical treatments for back pain; originally reported as showing, at best, small to moderate benefits. Participants were randomised to receive 'best care' in general practice, the comparator treatment, or one of three manual and/or exercise treatments: 'best care' plus manipulation, exercise, or manipulation followed by exercise. We used established consensus thresholds for improvement in Roland-Morris disability questionnaire scores at three and twelve months to derive NNTs for improvements and for benefits (improvements gained+deteriorations prevented). Results At three months, NNT estimates ranged from 5.1 (95% CI 3.4 to 10.7) to 9.0 (5.0 to 45.5) for exercise, 5.0 (3.4 to 9.8) to 5.4 (3.8 to 9.9) for manipulation, and 3.3 (2.5 to 4.9) to 4.8 (3.5 to 7.8) for manipulation followed by exercise. Corresponding between-group mean differences in the Roland-Morris disability questionnaire were 1.6 (0.8 to 2.3), 1.4 (0.6 to 2.1), and 1.9 (1.2 to 2.6) points. Conclusion In contrast to small mean differences originally reported, NNTs were small and could be attractive to clinicians, patients, and purchasers. NNTs can aid the interpretation of results of trials using continuous outcomes. Where possible, these should be reported alongside mean differences. Challenges remain in calculating NNTs for some continuous outcomes

    STARD for Abstracts: Essential items for reporting diagnostic accuracy studies in journal or conference abstracts

    Get PDF
    Many abstracts of diagnostic accuracy studies are currently insufficiently informative. We extended the STARD (Standards for Reporting Diagnostic Accuracy) statement by developing a list of essential items that authors should consider when reporting diagnostic accuracy studies in journal or conference abstracts. After a literature review of published guidance for reporting biomedical studies, we identified 39 items potentially relevant to report in an abstract. We then selected essential items through a two round web based survey among the 85 members of the STARD Group, followed by discussions within an executive committee. Seventy three STARD Group members responded (86%), with 100% completion rate. STARD for Abstracts is a list of 11 quintessential items, to be reported in every abstract of a diagnostic accuracy study. We provide examples of complete reporting, and developed template text for writing informative abstract

    Smallest detectable change in volume differs between mass flow sensor and pneumotachograph

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>To assess a pulmonary function change over time the mass flow sensor and the pneumotachograph are widely used in commercially available instruments. However, the smallest detectable change for both devices has never been compared. Therefore, the aim of this study is to determine the smallest detectable change in vital capacity (VC) and single-breath diffusion parameters measured by mass flow sensor and or pneumotachograph.</p> <p>Method</p> <p>In 28 healthy pulmonary function technicians VC, transfer factor for carbon monoxide (DLCO) and alveolar volume (VA) was repeatedly (10×) measured. The smallest detectable change was calculated by 1.96 x Standard Error of Measurement ×√2.</p> <p>Findings</p> <p>The mean (range) of the smallest detectable change measured by mass flow sensor and pneumotachograph respectively, were for VC (in Liter): 0.53 (0.46-0.65); 0.25 (0.17-0.36) (<it>p </it>= 0.04), DLCO (in mmol*kPa<sup>-1</sup>*min<sup>-1</sup>): 1.53 (1.26-1.7); 1.18 (0.84-1.39) (<it>p </it>= 0.07), VA (in Liter): 0.66. (0.53-0.82); 0.43 (0.34-0.53) (<it>p </it>= 0.04) and DLCO/VA (in mmol*kPa<sup>-1</sup>*min<sup>-1</sup>*L<sup>-1</sup>): 0.22 (0.19-0.28); 0.19 (0.14-0.22) (<it>p </it>= 0.79).</p> <p>Conclusions</p> <p>Smallest detectable significant change in VC and VA as measured by pneumotachograph are smaller than by mass flow sensor. Therefore, the pneumotachograph is the preferred instrument to estimate lung volume change over time in individual patients.</p

    Understanding Ferguson's delta: time to say good-bye?

    Get PDF
    A critique of Hankins, M: 'How discriminating are discriminative instruments?' Health and Quality of Life Outcomes 2008, 6:3

    Reproducibility of goniometric measurement of the knee in the in-hospital phase following total knee arthroplasty

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The objective of the present study was to assess interobserver reproducibility (in terms of reliability and agreement) of active and passive measurements of knee RoM using a long arm goniometer, performed by trained physical therapists in a clinical setting in total knee arthroplasty patients, within the first four days after surgery.</p> <p>Methods</p> <p>Test-retest analysis</p> <p>Setting: University hospital departments of orthopaedics and physical therapy</p> <p>Participants: Two experienced physical therapists assessed 30 patients, three days after total knee arthroplasty.</p> <p>Main outcome measure: RoM measurement using a long-arm (50 cm) goniometer</p> <p>Agreement was calculated as the mean difference between observers ± 95% CI of this mean difference. The intraclass correlation coefficient (ICC) was calculated as a measure of reliability, based on two-way random effects analysis of variance.</p> <p>Results</p> <p>The lowest level of agreement was that for measurement of passive flexion with the patient in supine position (mean difference 1.4°; limits of agreement 16.2° to 19° for the difference between the two observers. The highest levels of agreement were found for measurement of passive flexion with the patient in sitting position and for measurement of passive extension (mean difference 2.7°; limits of agreement -6.7 to 12.1 and mean difference 2.2°; limits of agreement -6.2 to 10.6 degrees, respectively). The ability to differentiate between subjects ranged from 0.62 for measurement of passive extension to 0.89 for measurements of active flexion (ICC values).</p> <p>Conclusion</p> <p>Interobserver agreement for flexion as well as extension was only fair. When two different observers assess the same patients in the acute phase after total knee arthroplasty using a long arm goniometer, differences in RoM of less than eight degrees cannot be distinguished from measurement error. Reliability was found to be acceptable for comparison on group level, but poor for individual comparisons over time.</p

    The FDA guidance for industry on PROs: the point of view of a pharmaceutical company

    Get PDF
    The importance of the patients point of view on their health status is widely recognised. Patient-reported outcomes is a broad term encompassing a large variety of different health data reported by patients, as symptoms, functional status, Quality of Life and Health-Related Quality of Life. Measurements of Health-Related Quality of Life have been developed during many years of researches, and a lot of validated questionnaires exist. However, few attempts have been made to standardise the evaluation of instruments characteristics, no recommendations are made about interpretation on Health-Related Quality of Life results, especially regarding the clinical significance of a change leading a therapeutic approach. Moreover, the true value of Health-Related Quality of Life evaluations in clinical trials has not yet been completely defined. An important step towards a more structured and frequent use of Patient-Reported Outcomes in drug development is represented by the FDA Guidance, issued on February 2006. In our paper we aim to report some considerations on this Guidance. Our comments focus especially on the characteristics of instruments to use, the Minimal Important Difference, and the methods to calculate it. Furthermore, we present the advantages and opportunities of using the Patient-Reported Outcomes in drug development, as seen by a pharmaceutical company. The Patient-Reported Outcomes can provide additional data to make a drug more competitive than others of the same pharmacological class, and a well demonstrated positive impact on the patient' health status and daily life might allow a higher price and/or the inclusion in a reimbursement list. Applying extensively the FDA Guidance in the next trials could lead to a wider culture of subjective measurement, and to a greater consideration for the patient's opinions on his/her care. Moreover, prescribing doctors and payers could benefit from subjective information to better define the value of drugs

    Reproducibility and responsiveness of the Symptom Severity Scale and the hand and finger function subscale of the Dutch arthritis impact measurement scales (Dutch-AIMS2-HFF) in primary care patients with wrist or hand problems

    Get PDF
    BACKGROUND: To determine the clinimetric properties of two questionnaires assessing symptoms (Symptom Severity Scale) and physical functioning (hand and finger function subscale of the AIMS2) in a Dutch primary care population. METHODS: The first 84 participants in a 1-year follow-up study on the diagnosis and prognosis of hand and wrist problems completed the Symptom Severity Scale and the hand and finger function subscale of the Dutch-AIMS2 twice within 1 to 2 weeks. The data were used to assess test-retest reliability (ICC) and smallest detectable change (SDC, based on the standard error of measurement (SEM)). To assess responsiveness, changes in scores between baseline and the 3 month follow-up were related to an external criterion to estimate the minimal important change (MIC). We calculated the group size needed to detect the MIC beyond measurement error. RESULTS: The ICC for the Symptom Severity Scale was 0.68 (95% CI: 0.54–0.78). The SDC was 1.00 at individual level and 0.11 at group level, both on a 5-point scale. The MIC was 0.23, exceeding the SDC at group level. The group size required to detect a MIC beyond measurement error was 19 for the Symptom Severity Scale. The ICC for the hand and finger function subscale of the Dutch-AIMS2 was 0.62 (95% CI: 0.47–0.74). The SDC was 3.80 at individual level and 0.42 at group level, both on an 11-point scale. The MIC was 0.31, which was less than the SDC at group level. The group size required to detect a MIC beyond measurement error was 150. CONCLUSION: In our heterogeneous primary care population the Symptom Severity Scale was found to be a suitable instrument to assess the severity of symptoms, whereas the hand and finger function subscale of the Dutch-AIMS2 was less suitable for the measurement of physical functioning in patients with hand and wrist problems

    The COSMIN checklist for evaluating the methodological quality of studies on measurement properties: A clarification of its content

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The COSMIN checklist (COnsensus-based Standards for the selection of health status Measurement INstruments) was developed in an international Delphi study to evaluate the methodological quality of studies on measurement properties of health-related patient reported outcomes (HR-PROs). In this paper, we explain our choices for the design requirements and preferred statistical methods for which no evidence is available in the literature or on which the Delphi panel members had substantial discussion.</p> <p>Methods</p> <p>The issues described in this paper are a reflection of the Delphi process in which 43 panel members participated.</p> <p>Results</p> <p>The topics discussed are internal consistency (relevance for reflective and formative models, and distinction with unidimensionality), content validity (judging relevance and comprehensiveness), hypotheses testing as an aspect of construct validity (specificity of hypotheses), criterion validity (relevance for PROs), and responsiveness (concept and relation to validity, and (in) appropriate measures).</p> <p>Conclusions</p> <p>We expect that this paper will contribute to a better understanding of the rationale behind the items, thereby enhancing the acceptance and use of the COSMIN checklist.</p

    Inexperienced clinicians can extract pathoanatomic information from MRI narrative reports with high reproducibility for use in research/quality assurance

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Although reproducibility in reading MRI images amongst radiologists and clinicians has been studied previously, no studies have examined the reproducibility of inexperienced clinicians in extracting pathoanatomic information from magnetic resonance imaging (MRI) narrative reports and transforming that information into quantitative data. However, this process is frequently required in research and quality assurance contexts. The purpose of this study was to examine inter-rater reproducibility (agreement and reliability) among an inexperienced group of clinicians in extracting spinal pathoanatomic information from radiologist-generated MRI narrative reports.</p> <p>Methods</p> <p>Twenty MRI narrative reports were randomly extracted from an institutional database. A group of three physiotherapy students independently reviewed the reports and coded the presence of 14 common pathoanatomic findings using a categorical electronic coding matrix. Decision rules were developed after initial coding in an effort to resolve ambiguities in narrative reports. This process was repeated a further three times using separate samples of 20 MRI reports until no further ambiguities were identified (total n = 80). Reproducibility between trainee clinicians and two highly trained raters was examined in an arbitrary coding round, with agreement measured using percentage agreement and reliability measured using unweighted Kappa (<it>k</it>). Reproducibility was then examined in another group of three trainee clinicians who had not participated in the production of the decision rules, using another sample of 20 MRI reports.</p> <p>Results</p> <p>The mean percentage agreement for paired comparisons between the initial trainee clinicians improved over the four coding rounds (97.9-99.4%), although the greatest improvement was observed after the first introduction of coding rules. High inter-rater reproducibility was observed between trainee clinicians across 14 pathoanatomic categories over the four coding rounds (agreement range: 80.8-100%; reliability range <it>k </it>= 0.63-1.00). Concurrent validity was high in paired comparisons between trainee clinicians and highly trained raters (agreement 97.8-98.1%, reliability <it>k </it>= 0.83-0.91). Reproducibility was also high in the second sample of trainee clinicians (inter-rater agreement 96.7-100.0% and reliability <it>k </it>= 0.76-1.00; intra-rater agreement 94.3-100.0% and reliability <it>k </it>= 0.61-1.00).</p> <p>Conclusions</p> <p>A high level of radiological training is not required in order to transform MRI-derived pathoanatomic information from a narrative format to a quantitative format with high reproducibility for research or quality assurance purposes.</p
    corecore