250 research outputs found

    Inter-rater agreement and reliability of the COSMIN (COnsensus-based Standards for the selection of health status Measurement Instruments) Checklist

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The COSMIN checklist is a tool for evaluating the methodological quality of studies on measurement properties of health-related patient-reported outcomes. The aim of this study is to determine the inter-rater agreement and reliability of each item score of the COSMIN checklist (n = 114).</p> <p>Methods</p> <p>75 articles evaluating measurement properties were randomly selected from the bibliographic database compiled by the Patient-Reported Outcome Measurement Group, Oxford, UK. Raters were asked to assess the methodological quality of three articles, using the COSMIN checklist. In a one-way design, percentage agreement and intraclass kappa coefficients or quadratic-weighted kappa coefficients were calculated for each item.</p> <p>Results</p> <p>88 raters participated. Of the 75 selected articles, 26 articles were rated by four to six participants, and 49 by two or three participants. Overall, percentage agreement was appropriate (68% was above 80% agreement), and the kappa coefficients for the COSMIN items were low (61% was below 0.40, 6% was above 0.75). Reasons for low inter-rater agreement were need for subjective judgement, and accustom to different standards, terminology and definitions.</p> <p>Conclusions</p> <p>Results indicated that raters often choose the same response option, but that it is difficult on item level to distinguish between articles. When using the COSMIN checklist in a systematic review, we recommend getting some training and experience, completing it by two independent raters, and reaching consensus on one final rating. Instructions for using the checklist are improved.</p

    Rating the methodological quality in systematic reviews of studies on measurement properties: a scoring system for the COSMIN checklist

    Get PDF
    Background: The COSMIN checklist is a standardized tool for assessing the methodological quality of studies on measurement properties. It contains 9 boxes, each dealing with one measurement property, with 5-18 items per box about design aspects and statistical methods. Our aim was to develop a scoring system for the COSMIN checklist to calculate quality scores per measurement property when using the checklist in systematic reviews of measurement properties. Methods: The scoring system was developed based on discussions among experts and testing of the scoring system on 46 articles from a systematic review. Four response options were defined for each COSMIN item (excellent, good, fair, and poor). A quality score per measurement property is obtained by taking the lowest rating of any item in a box ("worst score counts"). Results: Specific criteria for excellent, good, fair, and poor quality for each COSMIN item are described. In defining the criteria, the "worst score counts" algorithm was taken into consideration. This means that only fatal flaws were defined as poor quality. The scores of the 46 articles show how the scoring system can be used to provide an overview of the methodological quality of studies included in a systematic review of measurement properties. Conclusions: Based on experience in testing this scoring system on 46 articles, the COSMIN checklist with the proposed scoring system seems to be a useful tool for assessing the methodological quality of studies included in systematic reviews of measurement properties. © The Author(s) 2011

    The COSMIN checklist for assessing the methodological quality of studies on measurement properties of health status measurement instruments: an international Delphi study

    Get PDF
    BACKGROUND: Aim of the COSMIN study (COnsensus-based Standards for the selection of health status Measurement INstruments) was to develop a consensus-based checklist to evaluate the methodological quality of studies on measurement properties. We present the COSMIN checklist and the agreement of the panel on the items of the checklist. METHODS: A four-round Delphi study was performed with international experts (psychologists, epidemiologists, statisticians and clinicians). Of the 91 invited experts, 57 agreed to participate (63%). Panel members were asked to rate their (dis)agreement with each proposal on a five-point scale. Consensus was considered to be reached when at least 67% of the panel members indicated 'agree' or 'strongly agree'. RESULTS: Consensus was reached on the inclusion of the following measurement properties: internal consistency, reliability, measurement error, content validity (including face validity), construct validity (including structural validity, hypotheses testing and cross-cultural validity), criterion validity, responsiveness, and interpretability. The latter was not considered a measurement property. The panel also reached consensus on how these properties should be assessed. CONCLUSIONS: The resulting COSMIN checklist could be useful when selecting a measurement instrument, peer-reviewing a manuscript, designing or reporting a study on measurement properties, or for educational purposes.This study was financially supported by the EMGO Institute for Health and Care Research, VU University Medical Center, Amsterdam, and the Anna Foundation, Leiden, The Netherlands

    The COSMIN checklist for evaluating the methodological quality of studies on measurement properties: A clarification of its content

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The COSMIN checklist (COnsensus-based Standards for the selection of health status Measurement INstruments) was developed in an international Delphi study to evaluate the methodological quality of studies on measurement properties of health-related patient reported outcomes (HR-PROs). In this paper, we explain our choices for the design requirements and preferred statistical methods for which no evidence is available in the literature or on which the Delphi panel members had substantial discussion.</p> <p>Methods</p> <p>The issues described in this paper are a reflection of the Delphi process in which 43 panel members participated.</p> <p>Results</p> <p>The topics discussed are internal consistency (relevance for reflective and formative models, and distinction with unidimensionality), content validity (judging relevance and comprehensiveness), hypotheses testing as an aspect of construct validity (specificity of hypotheses), criterion validity (relevance for PROs), and responsiveness (concept and relation to validity, and (in) appropriate measures).</p> <p>Conclusions</p> <p>We expect that this paper will contribute to a better understanding of the rationale behind the items, thereby enhancing the acceptance and use of the COSMIN checklist.</p

    Methodological quality of 100 recent systematic reviews of health-related outcome measurement instruments:an overview of reviews

    Get PDF
    PURPOSE: Systematic reviews evaluating and comparing the measurement properties of outcome measurement instruments (OMIs) play an important role in OMI selection. Earlier overviews of review quality (2007, 2014) evidenced substantial concerns with regards to alignment to scientific standards. This overview aimed to investigate whether the quality of recent systematic reviews of OMIs lives up to the current scientific standards.METHODS: One hundred systematic reviews of OMIs published from June 1, 2021 onwards were randomly selected through a systematic literature search performed on March 17, 2022 in MEDLINE and EMBASE. The quality of systematic reviews was appraised by two independent reviewers. An updated data extraction form was informed by the earlier studies, and results were compared to these earlier studies' findings.RESULTS: A quarter of the reviews had an unclear research question or aim, and in 22% of the reviews the search strategy did not match the aim. Half of the reviews had an incomprehensive search strategy, because relevant search terms were not included. In 63% of the reviews (compared to 41% in 2014 and 30% in 2007) a risk of bias assessment was conducted. In 73% of the reviews (some) measurement properties were evaluated (58% in 2014 and 55% in 2007). In 60% of the reviews the data were (partly) synthesized (42% in 2014 and 7% in 2007); evaluation of measurement properties and data syntheses was not conducted separately for subscales in the majority. Certainty assessments of the quality of the total body of evidence were conducted in only 33% of reviews (not assessed in 2014 and 2007). The majority (58%) did not make any recommendations on which OMI (not) to use.CONCLUSION: Despite clear improvements in risk of bias assessments, measurement property evaluation and data synthesis, specifying the research question, conducting the search strategy and performing a certainty assessment remain poor. To ensure that systematic reviews of OMIs meet current scientific standards, more consistent conduct and reporting of systematic reviews of OMIs is needed.</p

    Methodological quality of 100 recent systematic reviews of health-related outcome measurement instruments:an overview of reviews

    Get PDF
    Purpose: Systematic reviews evaluating and comparing the measurement properties of outcome measurement instruments (OMIs) play an important role in OMI selection. Earlier overviews of review quality (2007, 2014) evidenced substantial concerns with regards to alignment to scientific standards. This overview aimed to investigate whether the quality of recent systematic reviews of OMIs lives up to the current scientific standards.Methods: One hundred systematic reviews of OMIs published from June 1, 2021 onwards were randomly selected through a systematic literature search performed on March 17, 2022 in MEDLINE and EMBASE. The quality of systematic reviews was appraised by two independent reviewers. An updated data extraction form was informed by the earlier studies, and results were compared to these earlier studies’ findings.Results: A quarter of the reviews had an unclear research question or aim, and in 22% of the reviews the search strategy did not match the aim. Half of the reviews had an incomprehensive search strategy, because relevant search terms were not included. In 63% of the reviews (compared to 41% in 2014 and 30% in 2007) a risk of bias assessment was conducted. In 73% of the reviews (some) measurement properties were evaluated (58% in 2014 and 55% in 2007). In 60% of the reviews the data were (partly) synthesized (42% in 2014 and 7% in 2007); evaluation of measurement properties and data syntheses was not conducted separately for subscales in the majority. Certainty assessments of the quality of the total body of evidence were conducted in only 33% of reviews (not assessed in 2014 and 2007). The majority (58%) did not make any recommendations on which OMI (not) to use.Conclusion: Despite clear improvements in risk of bias assessments, measurement property evaluation and data synthesis, specifying the research question, conducting the search strategy and performing a certainty assessment remain poor. To ensure that systematic reviews of OMIs meet current scientific standards, more consistent conduct and reporting of systematic reviews of OMIs is needed

    Methodological quality of 100 recent systematic reviews of health-related outcome measurement instruments:an overview of reviews

    Get PDF
    Purpose: Systematic reviews evaluating and comparing the measurement properties of outcome measurement instruments (OMIs) play an important role in OMI selection. Earlier overviews of review quality (2007, 2014) evidenced substantial concerns with regards to alignment to scientific standards. This overview aimed to investigate whether the quality of recent systematic reviews of OMIs lives up to the current scientific standards.Methods: One hundred systematic reviews of OMIs published from June 1, 2021 onwards were randomly selected through a systematic literature search performed on March 17, 2022 in MEDLINE and EMBASE. The quality of systematic reviews was appraised by two independent reviewers. An updated data extraction form was informed by the earlier studies, and results were compared to these earlier studies’ findings.Results: A quarter of the reviews had an unclear research question or aim, and in 22% of the reviews the search strategy did not match the aim. Half of the reviews had an incomprehensive search strategy, because relevant search terms were not included. In 63% of the reviews (compared to 41% in 2014 and 30% in 2007) a risk of bias assessment was conducted. In 73% of the reviews (some) measurement properties were evaluated (58% in 2014 and 55% in 2007). In 60% of the reviews the data were (partly) synthesized (42% in 2014 and 7% in 2007); evaluation of measurement properties and data syntheses was not conducted separately for subscales in the majority. Certainty assessments of the quality of the total body of evidence were conducted in only 33% of reviews (not assessed in 2014 and 2007). The majority (58%) did not make any recommendations on which OMI (not) to use.Conclusion: Despite clear improvements in risk of bias assessments, measurement property evaluation and data synthesis, specifying the research question, conducting the search strategy and performing a certainty assessment remain poor. To ensure that systematic reviews of OMIs meet current scientific standards, more consistent conduct and reporting of systematic reviews of OMIs is needed

    Development of the Social Participation Restrictions Questionnaire (SPaRQ) through consultation with adults with hearing loss, researchers, and clinicians: a content evaluation study

    Get PDF
    Objective - This research aimed to evaluate the content of the Social Participation Restrictions Questionnaire (SPaRQ) in terms of its relevance, clarity, comprehensiveness, acceptability to adults with hearing loss, and responsiveness. Design - Cognitive interviews and a subject matter expert survey were conducted. The interview data were analysed using thematic analysis and a taxonomy of questionnaire clarity problems. Descriptive statistics were calculated for the survey data. Study sample - Fourteen adults with hearing loss participated in the cognitive interviews. Twenty clinicians and academics completed the subject matter expert survey. Results - The majority of the SPaRQ content was found to be relevant, clear, comprehensive, and acceptable. However, an important clarity problem was identified: many adults with hearing loss struggled to switch from answering positively-worded items (e.g. ‘I can attend social gatherings’) to answering negatively-worded items (e.g. ‘I feel isolated’). Several subject matter experts found responsiveness difficult to assess. The SPaRQ was amended where necessary. Conclusions - Few hearing-specific questionnaires have undergone content evaluation. This study highlights the value of content evaluation as a means of identifying important flaws and improving the quality of a measure. The next stage of this research is a psychometric evaluation of the measure

    The translation, validity and reliability of the German version of the Fremantle Back Awareness Questionnaire

    Get PDF
    Background: The Fremantle Back Awareness Questionnaire (FreBAQ) claims to assess disrupted self-perception of the back. The aim of this study was to develop a German version of the Fre-BAQ (FreBAQ-G) and assess its test-retest reliability, its known-groups validity and its convergent validity with another purported measure of back perception. Methods: The FreBaQ-G was translated following international guidelines for the transcultural adaptation of questionnaires. Thirty-five patients with non-specific CLBP and 48 healthy participants were recruited. Assessor one administered the FreBAQ-G to each patient with CLBP on two separate days to quantify intra-observer reliability. Assessor two administered the FreBaQ-G to each patient on day 1. The scores were compared to those obtained by assessor one on day 1 to assess inter-observer reliability. Known-groups validity was quantified by comparing the FreBAQ-G score between patients and healthy controls. To assess convergent validity, patient\u27s FreBAQ-G scores were correlated to their two-point discrimination (TPD) scores. Results: Intra- and Inter-observer reliability were both moderate with ICC3.1 = 0.88 (95%CI: 0.77 to 0.94) and 0.89 (95%CI: 0.79 to 0.94), respectively. Intra- and inter-observer limits of agreement (LoA) were 6.2 (95%CI: 5.0±8.1) and 6.0 (4.8±7.8), respectively. The adjusted mean difference between patients and controls was 5.4 (95%CI: 3.0 to 7.8, p\u3c0.01). Patient\u27s FreBAQ-G scores were not associated with TPD thresholds (Pearson\u27s r = -0.05, p = 0.79). Conclusions: The FreBAQ-G demonstrated a degree of reliability and known-groups validity. Interpretation of patient level data should be performed with caution because the LoA were substantial. It did not demonstrate convergent validity against TPD. Floor effects of some items of the FreBAQ-G may have influenced the validity and reliability results. The clinimetric properties of the FreBAQ-G require further investigation as a simple measure of disrupted self-perception of the back before firm recommendations on its use can be made
    corecore