93 research outputs found
Chapter 9: Options for Summarizing Medical Test Performance in the Absence of a “Gold Standard”
The classical paradigm for evaluating test performance compares the results of an index test with a reference test. When the reference test does not mirror the “truth” adequately well (e.g. is an “imperfect” reference standard), the typical (“naïve”) estimates of sensitivity and specificity are biased. One has at least four options when performing a systematic review of test performance when the reference standard is “imperfect”: (a) to forgo the classical paradigm and assess the index test’s ability to predict patient relevant outcomes instead of test accuracy (i.e., treat the index test as a predictive instrument); (b) to assess whether the results of the two tests (index and reference) agree or disagree (i.e., treat them as two alternative measurement methods); (c) to calculate “naïve” estimates of the index test’s sensitivity and specificity from each study included in the review and discuss in which direction they are biased; (d) mathematically adjust the “naïve” estimates of sensitivity and specificity of the index test to account for the imperfect reference standard. We discuss these options and illustrate some of them through examples
Family-Based versus Unrelated Case-Control Designs for Genetic Associations
The most simple and commonly used approach for genetic associations is the case-control study design of unrelated people. This design is susceptible to population stratification. This problem is obviated in family-based studies, but it is usually difficult to accumulate large enough samples of well-characterized families. We addressed empirically whether the two designs give similar estimates of association in 93 investigations where both unrelated case-control and family-based designs had been employed. Estimated odds ratios differed beyond chance between the two designs in only four instances (4%). The summary relative odds ratio (ROR) (the ratio of odds ratios obtained from unrelated case-control and family-based studies) was close to unity (0.96 [95% confidence interval, 0.91–1.01]). There was no heterogeneity in the ROR across studies (amount of heterogeneity beyond chance I(2) = 0%). Differences on whether results were nominally statistically significant (p < 0.05) or not with the two designs were common (opposite classification rates 14% and 17%); this reflected largely differences in power. Conclusions were largely similar in diverse subgroup analyses. Unrelated case-control and family-based designs give overall similar estimates of association. We cannot rule out rare large biases or common small biases
Electrocardiogram-gated single-photonemission computed tomography versus cardiacmagnetic resonance imaging for the assessmentof left ventricular volumes and ejection fraction A meta-analysis
AbstractObjectivesThe purpose of this study was to evaluate the accuracy of electrocardiogram (ECG)-gated single-photon emission computed tomography (SPECT) for assessment of left ventricular (LV) end-diastolic volume (EDV), end-systolic volume (ESV) and ejection fraction (EF) compared with the gold standard of cardiac magnetic resonance imaging (MRI).BackgroundSeveral comparisons of ECG-gated SPECT with cardiac MRI have been performed for evaluation of LV volumes and EF, but each has considered few subjects, thus leaving uncertainty about the frequency of discrepancies between the two methods.MethodsWe performed a meta-analysis of data on 164 subjects from nine studies comparing ECG-gated SPECT versus cardiac MRI. Data were pooled in correlation and regression analyses relating ECG-gated SPECT and cardiac MRI measurements. The frequency of discrepancies of at least 30 ml in EDV, 20 ml in ESV and 5% or 10% in EF and concordance for EF ≤40% versus >40% were determined.ResultsThere was an overall excellent correlation between ECG-gated SPECT and cardiac MRI for EDV (r = 0.89), ESV (r = 0.92) and EF (r = 0.87). However, rates of discrepancies for individual subjects were considerable (37% [95% confidence interval {CI}, 26% to 50%] for at least 30 ml in EDV; 35% [95% CI, 23% to 49%] for at least 20 ml in ESV; 52% [95% CI, 37% to 63%] for at least 5% in EF; and 23% [95% CI, 11% to 42%] for at least 10% in EF). The misclassification rate for the 40% EF cutoff was 11%.ConclusionsElectrocardiogram-gated SPECT measurements of EDV, ESV and EF show high correlation with cardiac MRI measurements, but substantial errors may occur in individual patients. Electrocardiogram-gated SPECT offers useful functional information, but cardiac MRI should be used when accurate measurement is required
Appraising the Potential Uses and Harms of LLMs for Medical Systematic Reviews
Medical systematic reviews play a vital role in healthcare decision making
and policy. However, their production is time-consuming, limiting the
availability of high-quality and up-to-date evidence summaries. Recent
advancements in large language models (LLMs) offer the potential to
automatically generate literature reviews on demand, addressing this issue.
However, LLMs sometimes generate inaccurate (and potentially misleading) texts
by hallucination or omission. In healthcare, this can make LLMs unusable at
best and dangerous at worst. We conducted 16 interviews with international
systematic review experts to characterize the perceived utility and risks of
LLMs in the specific context of medical evidence reviews. Experts indicated
that LLMs can assist in the writing process by drafting summaries, generating
templates, distilling information, and crosschecking information. They also
raised concerns regarding confidently composed but inaccurate LLM outputs and
other potential downstream harms, including decreased accountability and
proliferation of low-quality reviews. Informed by this qualitative analysis, we
identify criteria for rigorous evaluation of biomedical LLMs aligned with
domain expert views.Comment: 18 pages, 2 figures, 8 tables. Accepted as an EMNLP 2023 main pape
Closing the Gap between Methodologists and End-Users: R as a Computational Back-End
The R environment provides a natural platform for developing new statistical methods due to the mathematical expressiveness of the language, the large number of existing libraries, and the active developer community. One drawback to R, however, is the learning curve; programming is a deterrent to non-technical users, who typically prefer graphical user interfaces (GUIs) to command line environments. Thus, while statisticians develop new methods in R, practitioners are often behind in terms of the statistical techniques they use as they rely on GUI applications. Meta-analysis is an instructive example; cutting-edge meta-analysis methods are often ignored by the overwhelming majority of practitioners, in part because they have no easy way of applying them. This paper proposes a strategy to close the gap between the statistical state-of-the-science and what is applied in practice. We present open-source meta-analysis software that uses R as the underlying statistical engine, and Python for the GUI. We present a framework that allows methodologists to implement new methods in R that are then automatically integrated into the GUI for use by end-users, so long as the programmer conforms to our interface. Such an approach allows an intuitive interface for non-technical users while leveraging the latest advanced statistical methods implemented by methodologists
Chapter 10: Deciding Whether to Complement a Systematic Review of Medical Tests with Decision Modeling
Limited by what is reported in the literature, most systematic reviews of medical tests focus on “test accuracy” (or better, test performance), rather than on the impact of testing on patient outcomes. The link between testing, test results and patient outcomes is typically complex: even when testing has high accuracy, there is no guarantee that physicians will act according to test results, that patients will follow their orders, or that the intervention will yield a beneficial endpoint. Therefore, test performance is typically not sufficient for assessing the usefulness of medical tests. Modeling (in the form of decision or economic analysis) is a natural framework for linking test performance data to clinical outcomes. We propose that (some) modeling should be considered to facilitate the interpretation of summary test performance measures by connecting testing and patient outcomes. We discuss a simple algorithm for helping systematic reviewers think through this possibility, and illustrate it by means of an example
Local Literature Bias in Genetic Epidemiology: An Empirical Evaluation of the Chinese Literature
BACKGROUND: Postulated epidemiological associations are subject to several biases. We evaluated whether the Chinese literature on human genome epidemiology may offer insights on the operation of selective reporting and language biases. METHODS AND FINDINGS: We targeted 13 gene-disease associations, each already assessed by meta-analyses, including at least 15 non-Chinese studies. We searched the Chinese Journal Full-Text Database for additional Chinese studies on the same topics. We identified 161 Chinese studies on 12 of these gene-disease associations; only 20 were PubMed-indexed (seven English full-text). Many studies (14–35 per topic) were available for six topics, covering diseases common in China. With one exception, the first Chinese study appeared with a time lag (2–21 y) after the first non-Chinese study on the topic. Chinese studies showed significantly more prominent genetic effects than non-Chinese studies, and 48% were statistically significant per se, despite their smaller sample size (median sample size 146 versus 268, p < 0.001). The largest genetic effects were often seen in PubMed-indexed Chinese studies (65% statistically significant per se). Non-Chinese studies of Asian-descent populations (27% significant per se) also tended to show somewhat more prominent genetic effects than studies of non-Asian descent (17% significant per se). CONCLUSION: Our data provide evidence for the interplay of selective reporting and language biases in human genome epidemiology. These biases may not be limited to the Chinese literature and point to the need for a global, transparent, comprehensive outlook in molecular population genetics and epidemiologic studies in general
Accelerated diagnostic protocols using high-sensitivity troponin assays to rule in or out myocardial infarction : A systematic review
The authors are grateful to Gaelen Adam, MLIS for literature searching.Peer reviewe
- …