27 research outputs found
Costs and causes of oncology drug attrition with the example of insulin-like growth factor-1 receptor inhibitors
Importance: The development of oncology drugs is expensive and beset by a high attrition rate. Analysis of the costs and causes of translational failure may help to reduce attrition and permit the more appropriate use of resources to reduce mortality from cancer. Objective: To analyze the causes of failure and expenses incurred in clinical trials of novel oncology drugs, with the example of insulin-like growth factor-1 receptor (IGF-1R) inhibitors, none of which was approved for use in oncology practice. Design, Setting, and Participants: In this cross-sectional study, inhibitors of the IGF-1R and their clinical trials for use in oncology practice between January 1, 2000, and July 31, 2021, were identified by searching PubMed and ClinicalTrials.gov. A proprietary commercial database was interrogated to provide expenses incurred in these trials. If data were not available, estimates were made of expenses using mean values from the proprietary database. A search revealed studies of the effects of IGF-1R inhibitors in preclinical in vivo assays, permitting calculation of the percentage of tumor growth inhibition. Archival data on the clinical trials of IGF-1R inhibitors and proprietary estimates of their expenses were examined, together with an analysis of preclinical data on IGF-1R inhibitors obtained from the published literature. Main Outcomes and Measures: Expenses associated with research and development of IGF-1R inhibitors. Results: Sixteen inhibitors of IGF-1R studied in 183 clinical trials were found. None of the trials, in a wide range of tumor types, showed efficacy permitting drug approval. More than 12000 patients entered trials of IGF-1R inhibitors in oncology indications in 2003 to 2021. These trials incurred aggregate research and development expenses estimated at between 2.3 billion. Analysis of the results of preclinical in vivo assays of IGF-1R inhibitors that supported subsequent clinical investigations showed mixed activity and protocols that poorly reflected the treatment of advanced metastatic tumors in humans. Conclusions and Relevance: Failed drug development in oncology incurs substantial expense. At an industry level, an estimated 60 billion is spent annually on failed oncology trials. Improved target validation and more appropriate preclinical models are required to reduce attrition, with more attention to decision-making before launching clinical trials. A more appropriate use of resources may better reduce cancer mortality.
Global patient outcomes after elective surgery: prospective cohort study in 27 low-, middle- and high-income countries.
BACKGROUND: As global initiatives increase patient access to surgical treatments, there remains a need to understand the adverse effects of surgery and define appropriate levels of perioperative care. METHODS: We designed a prospective international 7-day cohort study of outcomes following elective adult inpatient surgery in 27 countries. The primary outcome was in-hospital complications. Secondary outcomes were death following a complication (failure to rescue) and death in hospital. Process measures were admission to critical care immediately after surgery or to treat a complication and duration of hospital stay. A single definition of critical care was used for all countries. RESULTS: A total of 474 hospitals in 19 high-, 7 middle- and 1 low-income country were included in the primary analysis. Data included 44 814 patients with a median hospital stay of 4 (range 2-7) days. A total of 7508 patients (16.8%) developed one or more postoperative complication and 207 died (0.5%). The overall mortality among patients who developed complications was 2.8%. Mortality following complications ranged from 2.4% for pulmonary embolism to 43.9% for cardiac arrest. A total of 4360 (9.7%) patients were admitted to a critical care unit as routine immediately after surgery, of whom 2198 (50.4%) developed a complication, with 105 (2.4%) deaths. A total of 1233 patients (16.4%) were admitted to a critical care unit to treat complications, with 119 (9.7%) deaths. Despite lower baseline risk, outcomes were similar in low- and middle-income compared with high-income countries. CONCLUSIONS: Poor patient outcomes are common after inpatient surgery. Global initiatives to increase access to surgical treatments should also address the need for safe perioperative care. STUDY REGISTRATION: ISRCTN5181700
When Quality Beats Quantity: Decision Theory, Drug Discovery, and the Reproducibility Crisis.
A striking contrast runs through the last 60 years of biopharmaceutical discovery, research, and development. Huge scientific and technological gains should have increased the quality of academic science and raised industrial R&D efficiency. However, academia faces a "reproducibility crisis"; inflation-adjusted industrial R&D costs per novel drug increased nearly 100 fold between 1950 and 2010; and drugs are more likely to fail in clinical development today than in the 1970s. The contrast is explicable only if powerful headwinds reversed the gains and/or if many "gains" have proved illusory. However, discussions of reproducibility and R&D productivity rarely address this point explicitly. The main objectives of the primary research in this paper are: (a) to provide quantitatively and historically plausible explanations of the contrast; and (b) identify factors to which R&D efficiency is sensitive. We present a quantitative decision-theoretic model of the R&D process. The model represents therapeutic candidates (e.g., putative drug targets, molecules in a screening library, etc.) within a "measurement space", with candidates' positions determined by their performance on a variety of assays (e.g., binding affinity, toxicity, in vivo efficacy, etc.) whose results correlate to a greater or lesser degree. We apply decision rules to segment the space, and assess the probability of correct R&D decisions. We find that when searching for rare positives (e.g., candidates that will successfully complete clinical development), changes in the predictive validity of screening and disease models that many people working in drug discovery would regard as small and/or unknowable (i.e., an 0.1 absolute change in correlation coefficient between model output and clinical outcomes in man) can offset large (e.g., 10 fold, even 100 fold) changes in models' brute-force efficiency. We also show how validity and reproducibility correlate across a population of simulated screening and disease models. We hypothesize that screening and disease models with high predictive validity are more likely to yield good answers and good treatments, so tend to render themselves and their diseases academically and commercially redundant. Perhaps there has also been too much enthusiasm for reductionist molecular models which have insufficient predictive validity. Thus we hypothesize that the average predictive validity of the stock of academically and industrially "interesting" screening and disease models has declined over time, with even small falls able to offset large gains in scientific knowledge and brute-force efficiency. The rate of creation of valid screening and disease models may be the major constraint on R&D productivity
Evidence for the effectiveness of minimum pricing of alcohol: a systematic review and assessment using the Bradford Hill criteria for causality.
Objectives: To assess the evidence for price-based alcohol policy interventions to determine whether minimum unit pricing (MUP) is likely to be effective.
Design: Systematic review and assessment of studies according to Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines, against the Bradford Hill criteria for causality. Three electronic databases were searched from inception to February 2017. Additional articles were found through hand searching and grey literature searches.
Criteria: for selecting studies We included any study design that reported on the effect of price-based interventions on alcohol consumption or alcohol-related morbidity, mortality and wider harms. Studies reporting on the effects of taxation or affordability and studies that only investigated price elasticity of demand were beyond the scope of this review. Studies with any conflict of interest were excluded. All studies were appraised for methodological quality.
Results: Of 517 studies assessed, 33 studies were included: 26 peer-reviewed research studies and seven from the grey literature. All nine of the Bradford Hill criteria were met, although different types of study satisfied different criteria. For example, modelling studies complied with the consistency and specificity criteria, time series analyses demonstrated the temporality and experiment criteria, and the analogy criterion was fulfilled by comparing the findings with the wider literature on taxation and affordability.
Conclusions: Overall, the Bradford Hill criteria for causality were satisfied. There was very little evidence that minimum alcohol prices are not associated with consumption or subsequent harms. However the overall quality of the evidence was variable, a large proportion of the evidence base has been produced by a small number of research teams, and the quantitative uncertainty in many estimates or forecasts is often poorly communicated outside the academic literature. Nonetheless, price-based alcohol policy interventions such as MUP are likely to reduce alcohol consumption, alcohol-related morbidity and mortality
Quantitative classifier model.
<p>Bivariate normal probability density function determined by the correlation, <i>ρ</i><sub><i>Y</i>,<i>R</i></sub>, between decision variable, <i>Y</i>, and reference variable, <i>R</i>. Lighter colours indicate high probability density (candidate molecules more likely to lie here), and darker colours indicate a low probability density (molecules less likely to lie here). The units on the horizontal and vertical axes are one standard deviation. We apply a decision threshold, <i>y</i><sub><i>t</i></sub> (vertical dotted line) to the decision variable and then apply a reference test and a reference threshold, <i>r</i><sub><i>t</i></sub>,(horizontal dotted line) to molecules that exceed the decision threshold <i>y</i><sub><i>t</i></sub>. In the sensitivity analyses (see later) decision and reference thresholds are varied as is <i>ρ</i><sub><i>Y</i>,<i>R</i></sub>. True positives (<i>TP</i>) and false positives (<i>FP</i>) correspond to the probability mass in the upper right and lower right quadrants, respectively. (A) When <i>ρ</i><sub><i>Y</i>,<i>R</i></sub> is high, <i>PPV</i> is high. (B) When <i>ρ</i><sub><i>Y</i>,<i>R</i></sub> is low, <i>PPV</i> tends to be low.</p
Predictive validity and classifier performance.
<p>(A) The bivariate normal probability density function for decision variable <i>Y</i> (horizontal axis) and reference variable <i>R</i> (vertical axis). The correlation between <i>Y</i> and <i>R</i> is high (<i>ρ</i><sub><i>Y</i>,<i>R</i></sub> = 0.95) so the decision variable has high PV. The graph shows only the positive quadrant of the distribution. The reference threshold, expressed here in units of standard deviation, is <i>r</i><sub><i>t</i></sub> = 0.5 (dotted line) so positives are common, accounting for P(<i>R</i> ≥ <i>r</i><sub><i>t</i></sub>) ≈ 30% of the probability mass. (B) shows <i>TPR</i> (solid line) and <i>FPR</i> (dotted line) as the decision threshold, <i>y</i><sub><i>t</i></sub>, varies. At some thresholds, the spread between the <i>TPR</i> and <i>FPR</i> is wide. (C) shows <i>PPV</i> vs. decision threshold, <i>y</i><sub><i>t</i></sub>. (D) to (F) repeat the analyses with a decision variable with lower PV (<i>ρ</i><sub><i>Y</i>,<i>R</i></sub> = 0.4). <i>PPV</i> declines vs. panel (C) but <i>PPV</i> remains high because positives are common. (G) to (I) repeat that analysis at <i>ρ</i><sub><i>Y</i>,<i>R</i></sub> = 0.95 but with a high reference threshold (2.5 standard deviation units) and rare positives (P(<i>R</i> ≥ <i>r</i><sub><i>t</i></sub>) ≈ 0.6% of the probability mass). It is possible to achieve a high <i>PPV</i>, but only at a high decision threshold when the <i>TPR</i> is low, which would require screening a large number of items per positive detected. (J) to (L) show the situation with the same high reference threshold (i.e., rare positives) but with a decision variable with low PV. In this case, <i>PPV</i> is low, even with a very high decision threshold and a very low <i>TPR</i>.</p
Decision performance as <i>y</i><sub><i>t</i></sub> (throughput) and <i>ρ</i><sub><i>Y</i>,<i>R</i></sub> (predictive validity) vary.
<p>Shading shows the <i>PPV</i> of the classifier (log<sub>10</sub> units, with lighter shades showing better performance). The vertical axis represents both decision threshold and screening throughput. The scale is in log<sub>10</sub> units. 7 represents a throughput of 10<sup>7</sup> and a decision threshold that accepts only the top 10<sup>7th</sup> of candidates (P(<i>Y</i> ≥ <i>y</i><sub><i>t</i></sub>) = 10<sup>−7</sup>, <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0147215#pone.0147215.e007" target="_blank">Eq 6</a>); 6 represents a throughput of 10<sup>6</sup> and a decision threshold that accepts only the top 10<sup>6th</sup> of candidates (P(<i>Y</i> ≥ <i>y</i><sub><i>t</i></sub>) = 10<sup>−6</sup>, <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0147215#pone.0147215.e007" target="_blank">Eq 6</a>); etc. The horizontal axis represents PV as the correlation coefficient, <i>ρ</i><sub><i>Y</i>,<i>R</i></sub>, between <i>Y</i> and <i>R</i>, with the right hand end of each axis representing high PV (<i>ρ</i><sub><i>Y</i>,<i>R</i></sub> = 0.98), and the left hand end of each axis representing low PV (<i>ρ</i><sub><i>Y</i>,<i>R</i></sub> = 0). Our choice of scale for each axis is discussed in the main text. In (A), positives are relatively common. Here, P(<i>R</i> ≥ <i>r</i><sub><i>t</i></sub>) = 0.01, or one percent of the candidates entering the classifier. In (B), positives are relatively rare. Here, P(<i>R</i> ≥ <i>r</i><sub><i>t</i></sub>) = 10<sup>−5</sup>, or one hundred thousandth of the candidates entering the classifier. The spacing and orientation of the contours show the degree to which <i>PPV</i> changes with throughput and with <i>ρ</i><sub><i>Y</i>,<i>R</i></sub>. <i>PPV</i> is relatively sensitive to throughput when <i>ρ</i><sub><i>Y</i>,<i>R</i></sub> is high and when positives are very rare (lower right hand side of panel B.). However, <i>PPV</i> is relatively insensitive to throughput when <i>ρ</i><sub><i>Y</i>,<i>R</i></sub> is low (left hand side of both panels). For much of the parameter space illustrated, an absolute 0.1 change in <i>ρ</i><sub><i>Y</i>,<i>R</i></sub> (e.g., from 0.4 to 0.5, or 0.5 to 0.6 on the horizontal axis) has a larger effect on <i>PPV</i> than a 10x change in throughput (e.g., from 4 log<sub>10</sub> units to 5 log<sub>10</sub> units on the vertical axis).</p
Decision theoretic view of biopharma discovery, research, and development.
<p>(A) The process starts with a large set of therapeutic possibilities (light blue oval). These could be putative disease mechanisms or candidate drug targets, in either an academic or commercial setting. However, we discuss them as if they are molecules in a commercial R&D campaign (e.g., compounds in a screening library and the analogues that could be reasonably synthesized to create leads). There are <i>A</i> candidates that with perfect R&D decision making and an unlimited R&D budget would eventually be approved by the drug regulator for the indication or indications. There are <i>U</i> candidates that would not succeed given similar skill and investment. In general, <i>U</i> >> <i>A</i>. The Discovery (D), Preclinical (P), and Clinical Trial (C) diamonds are “classifiers” (<a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0147215#pone.0147215.t001" target="_blank">Table 1</a>). Each takes decision variables (<i>X</i>, <i>Y</i>, <i>Z</i>) from predictive models for some or all of the candidates and tests the variables against a decision threshold, yielding <i>yeses</i> which receive further scrutiny or <i>noes</i> which are abandoned. The unit cost per surviving candidate increases through the process [<a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0147215#pone.0147215.ref021" target="_blank">21</a>]. Given serial decisions, only <i>yeses</i> from C face the gold standard reference test; the drug regulator (e.g., the Food and Drug Administration, or FDA). The other decisions face “imperfect” reference tests [<a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0147215#pone.0147215.ref033" target="_blank">33</a>] [<a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0147215#pone.0147215.ref034" target="_blank">34</a>] [<a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0147215#pone.0147215.ref027" target="_blank">27</a>], the next steps in the process, which are mere proxies for the gold standard. The imperfect reference test for <i>yeses</i> from D is provided by P. The imperfect reference test for <i>yeses</i> from P is provided by C. (B) Decision variables <i>X</i>, <i>Y</i>, and <i>Z</i>, will correlate to a greater or lesser extent with each other and with the gold standard reference variable <i>R</i>. The correlation coefficient between <i>X</i> and <i>Y</i> is <i>ρ</i><sub><i>X</i>,<i>Y</i></sub>, the correlation coefficient between <i>Y</i> and <i>Z</i> is <i>ρ</i><sub><i>Y</i>,<i>Z</i></sub>, etc. Most of these correlations will never be measured directly during the R&D process. If <i>ρ</i><sub><i>X</i>,<i>R</i></sub> is very low, the Discovery stage will not enrich the Preclinical stage for approvable candidates, even if <i>ρ</i><sub><i>X</i>,<i>Y</i></sub> is high and decisions from D initially appear to have been successful.</p
Link between validity and reproducibility across a set of screening and disease models.
<p>The figure shows the results of a Monte Carlo simulation (see <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0147215#pone.0147215.s001" target="_blank">S1 File</a> for code). (A) Each small point represents one simulated screening or disease model (PM). When testing therapeutic candidates, each PM yields an expected signal which is the sum of two components. The first component is the signal from the reference test multiplied by a gain parameter (horizontal axis). The second component is a model-specific signal, whose gain is shown on the vertical axis. This component can also be thought of as systematic model-specific bias. It is real, but it tells us nothing about the reference test. (B) Each model’s PV is determined by the relative strength of the reference component versus the model-specific component of the signal. PV is high when the reference component is much larger than the model-specific component of the signal. This is because the output of the PM will correlate with the reference test when its signal is dominated by the reference signal. (C) Each PM’s signal to noise ratio increases with the sum of the reference component and the model-specific component. (D) Each point represents the performance of one of the models in Panel A., in two simulated experiments that include sampling and measurement noise. The horizontal axis shows the results of the first experiment. It is sample predictive validity (the correlation coefficient between the output of the model and the output of the reference test for a sample of therapeutic candidates). The vertical axis is the second experiment. It is test-retest reliability using the same sample of therapeutic candidates (calculated as the correlation coefficient between the results of the test and retest). The symbols (star, diamond, triangle, and cross) show how the space in (A) maps onto the space in (D). The line in (D) shows the best fit for the linear regression between sample PV and test-retest reliability. For the simulation shown, we sampled 400 therapeutic candidates for each PM. Both the reference and model-specific components of PM’s signal were drawn from a normally distributed random variable, whose mean was zero and whose standard deviations were equal to the respective gains on the horizontal and vertical axes of (A) to (C).</p
Effect of multiple classification steps.
<p>(A) Points represents decision performance with one, two, three, or four, similar classifiers applied in series. Each line represents the same value of correlation coefficient, <i>ρ</i>, applied to all pairwise relationships between decision variables and between decision variables and <i>R</i>. Thus in each line, all decision variables are equally correlated with each other and with <i>R</i>. The correlation coefficient between decision variables (<i>X</i>, Y, <i>W</i>, <i>Z</i>) and <i>R</i> vary from 0.9 (high PV, top right line) to 0.3 (low PV, bottom left line). The top left point on each line shows a single classifier applied to <i>X</i>, with each additional point towards the bottom and right of each line showing the effects of adding an additional classifier, up to a maximum of 4 classifiers. The top decile of candidates in the starting set exceed each decision threshold and the reference threshold (i.e., P(<i>X</i> ≥ <i>x</i><sub><i>t</i></sub>) = P(<i>Y</i> ≥ <i>y</i><sub><i>t</i></sub>) = P(<i>W</i> ≥ <i>w</i><sub><i>t</i></sub>) = P(<i>Z</i> ≥ <i>z</i><sub><i>t</i></sub>) = P(<i>R</i> ≥ <i>r</i><sub><i>t</i></sub>) = 0.1). In general, adding more steps increases <i>PPV</i> but at the cost of a lower <i>TPR</i>. There are diminishing returns from each additional classifier, particularly when the decision variables are highly correlated with one another. Furthermore, a single classifier that is highly correlated with <i>R</i> (e.g., the uppermost points on the lines with high correlation coefficients) often outperforms a combination of several classifiers with lower correlations with <i>R</i> in terms of both <i>PPV</i> and <i>TPR</i>. Note the logarithmic vertical axis. (B) is exactly as (A) but shows on the vertical axis the number of candidates screened per <i>TP</i> (<a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0147215#pone.0147215.t001" target="_blank">Table 1</a>). The number of candidates that must be screened per true positive identified increases as <i>ρ</i> (PV) declines because positives are wrongly rejected. Increasing <i>ρ</i> (PV) increases search efficiency. Note the logarithmic vertical axis.</p