956 research outputs found

    Identifying genes that contribute most to good classification in microarrays

    Get PDF
    BACKGROUND: The goal of most microarray studies is either the identification of genes that are most differentially expressed or the creation of a good classification rule. The disadvantage of the former is that it ignores the importance of gene interactions; the disadvantage of the latter is that it often does not provide a sufficient focus for further investigation because many genes may be included by chance. Our strategy is to search for classification rules that perform well with few genes and, if they are found, identify genes that occur relatively frequently under multiple random validation (random splits into training and test samples). RESULTS: We analyzed data from four published studies related to cancer. For classification we used a filter with a nearest centroid rule that is easy to implement and has been previously shown to perform well. To comprehensively measure classification performance we used receiver operating characteristic curves. In the three data sets with good classification performance, the classification rules for 5 genes were only slightly worse than for 20 or 50 genes and somewhat better than for 1 gene. In two of these data sets, one or two genes had relatively high frequencies not noticeable with rules involving 20 or 50 genes: desmin for classifying colon cancer versus normal tissue; and zyxin and secretory granule proteoglycan genes for classifying two types of leukemia. CONCLUSION: Using multiple random validation, investigators should look for classification rules that perform well with few genes and select, for further study, genes with relatively high frequencies of occurrence in these classification rules

    Improved Moving Puncture Gauge Conditions for Compact Binary Evolutions

    Get PDF
    Robust gauge conditions are critically important to the stability and accuracy of numerical relativity (NR) simulations involving compact objects. Most of the NR community use the highly robust---though decade-old---moving-puncture (MP) gauge conditions for such simulations. It has been argued that in binary black hole (BBH) evolutions adopting this gauge, noise generated near adaptive-mesh-refinement (AMR) boundaries does not converge away cleanly with increasing resolution, severely limiting gravitational waveform accuracy at computationally feasible resolutions. We link this noise to a sharp (short-wavelength), initial outgoing gauge wave crossing into progressively lower resolution AMR grids, and present improvements to the standard MP gauge conditions that focus on stretching, smoothing, and more rapidly settling this outgoing wave. Our best gauge choice greatly reduces gravitational waveform noise during inspiral, yielding less fluctuation in convergence order and 40\sim 40% lower waveform phase and amplitude errors at typical resolutions. Noise in other physical quantities of interest is also reduced, and constraint violations drop by more than an order of magnitude. We expect these improvements will carry over to simulations of all types of compact binary systems, as well as other NN+1 formulations of gravity for which MP-like gauge conditions can be chosen.Comment: 25 pages, 16 figures, 2 tables. Matches published versio

    Estimating the cumulative risk of false positive cancer screenings

    Get PDF
    BACKGROUND: When evaluating cancer screening it is important to estimate the cumulative risk of false positives from periodic screening. Because the data typically come from studies in which the number of screenings varies by subject, estimation must take into account dropouts. A previous approach to estimate the probability of at least one false positive in n screenings unrealistically assumed that the probability of dropout does not depend on prior false positives. METHOD: By redefining the random variables, we obviate the unrealistic dropout assumption. We also propose a relatively simple logistic regression and extend estimation to the expected number of false positives in n screenings. RESULTS: We illustrate our methodology using data from women ages 40 to 64 who received up to four annual breast cancer screenings in the Health Insurance Program of Greater New York study, which began in 1963. Covariates were age, time since previous screening, screening number, and whether or not a previous false positive occurred. Defining a false positive as an unnecessary biopsy, the only statistically significant covariate was whether or not a previous false positive occurred. Because the effect of screening number was not statistically significant, extrapolation beyond 4 screenings was reasonable. The estimated mean number of unnecessary biopsies in 10 years per woman screened is .11 with 95% confidence interval of (.10, .12). Defining a false positive as an unnecessary work-up, all the covariates were statistically significant and the estimated mean number of unnecessary work-ups in 4 years per woman screened is .34 with 95% confidence interval (.32, .36). CONCLUSION: Using data from multiple cancer screenings with dropouts, and allowing dropout to depend on previous history of false positives, we propose a logistic regression model to estimate both the probability of at least one false positive and the expected number of false positives associated with n cancer screenings. The methodology can be used for both informed decision making at the individual level, as well as planning of health services

    The fallacy of enrolling only high-risk subjects in cancer prevention trials: Is there a "free lunch"?

    Get PDF
    BACKGROUND: There is a common belief that most cancer prevention trials should be restricted to high-risk subjects in order to increase statistical power. This strategy is appropriate if the ultimate target population is subjects at the same high-risk. However if the target population is the general population, three assumptions may underlie the decision to enroll high-risk subject instead of average-risk subjects from the general population: higher statistical power for the same sample size, lower costs for the same power and type I error, and a correct ratio of benefits to harms. We critically investigate the plausibility of these assumptions. METHODS: We considered each assumption in the context of a simple example. We investigated statistical power for fixed sample size when the investigators assume that relative risk is invariant over risk group, but when, in reality, risk difference is invariant over risk groups. We investigated possible costs when a trial of high-risk subjects has the same power and type I error as a larger trial of average-risk subjects from the general population. We investigated the ratios of benefit to harms when extrapolating from high-risk to average-risk subjects. RESULTS: Appearances here are misleading. First, the increase in statistical power with a trial of high-risk subjects rather than the same number of average-risk subjects from the general population assumes that the relative risk is the same for high-risk and average-risk subjects. However, if the absolute risk difference rather than the relative risk were the same, the power can be less with the high-risk subjects. In the analysis of data from a cancer prevention trial, we found that invariance of absolute risk difference over risk groups was nearly as plausible as invariance of relative risk over risk groups. Therefore a priori assumptions of constant relative risk across risk groups are not robust, limiting extrapolation of estimates of benefit to the general population. Second, a trial of high-risk subjects may cost more than a larger trial of average risk subjects with the same power and type I error because of additional recruitment and diagnostic testing to identify high-risk subjects. Third, the ratio of benefits to harms may be more favorable in high-risk persons than in average-risk persons in the general population, which means that extrapolating this ratio to the general population would be misleading. Thus there is no free lunch when using a trial of high-risk subjects to extrapolate results to the general population. CONCLUSION: Unless the intervention is targeted to only high-risk subjects, cancer prevention trials should be implemented in the general population

    Selecting patients for randomized trials: a systematic approach based on risk group

    Get PDF
    BACKGROUND: A key aspect of randomized trial design is the choice of risk group. Some trials include patients from the entire at-risk population, others accrue only patients deemed to be at increased risk. We present a simple statistical approach for choosing between these approaches. The method is easily adapted to determine which of several competing definitions of high risk is optimal. METHOD: We treat eligibility criteria for a trial, such as a smoking history, as a prediction rule associated with a certain sensitivity (the number of patients who have the event and who are classified as high risk divided by the total number patients who have an event) and specificity (the number of patients who do not have an event and who do not meet criteria for high risk divided by the total number of patients who do not have an event). We then derive simple formulae to determine the proportion of patients receiving intervention, and the proportion who experience an event, where either all patients or only those at high risk are treated. We assume that the relative risk associated with intervention is the same over all choices of risk group. The proportion of events and interventions are combined using a net benefit approach and net benefit compared between strategies. RESULTS: We applied our method to design a trial of adjuvant therapy after prostatectomy. We were able to demonstrate that treating a high risk group was superior to treating all patients; choose the optimal definition of high risk; test the robustness of our results by sensitivity analysis. Our results had a ready clinical interpretation that could immediately aid trial design. CONCLUSION: The choice of risk group in randomized trials is usually based on rather informal methods. Our simple method demonstrates that this decision can be informed by simple statistical analyses

    Use of in vivo phage display to engineer novel adenoviruses for targeted delivery to the cardiac vasculature

    Get PDF
    We performed in vivo phage display in the stroke prone spontaneously hypertensive rat, a cardiovascular disease model, and the normotensive Wistar Kyoto rat to identify cardiac targeting peptides, and then assessed each in the context of viral gene delivery. We identified both common and strain-selective peptides, potentially indicating ubiquitous markers and those found selectively in dysfunctional microvasculature of the heart. We show the utility of the peptide, DDTRHWG, for targeted gene delivery in human cells and rats in vivo when cloned into the fiber protein of subgroup D adenovirus 19p. This study therefore identifies cardiac targeting peptides by in vivo phage display and the potential of a candidate peptide for vector targeting strategies

    Comparing breast cancer mortality rates before-and-after a change in availability of screening in different regions: Extension of the paired availability design

    Get PDF
    BACKGROUND: In recent years there has been increased interest in evaluating breast cancer screening using data from before-and-after studies in multiple geographic regions. One approach, not previously mentioned, is the paired availability design. The paired availability design was developed to evaluate the effect of medical interventions by comparing changes in outcomes before and after a change in the availability of an intervention in various locations. A simple potential outcomes model yields estimates of efficacy, the effect of receiving the intervention, as opposed to effectiveness, the effect of changing the availability of the intervention. By combining estimates of efficacy rather than effectiveness, the paired availability design avoids confounding due to different fractions of subjects receiving the interventions at different locations. The original formulation involved short-term outcomes; the challenge here is accommodating long-term outcomes. METHODS: The outcome is incident breast cancer deaths in a time period, which are breast cancer deaths that were diagnosed in the same time period. We considered the plausibility of the basic five assumptions of the paired availability design and propose a novel analysis to accommodate likely violations of the assumption of stable screening effects. RESULTS: We applied the paired availability design to data on breast cancer screening from six counties in Sweden. The estimated yearly change in incident breast cancer deaths per 100,000 persons ages 40–69 (in most counties) due to receipt of screening (among the relevant type of subject in the potential outcomes model) was -9 with 95% confidence interval (-14, -4) or (-14, -5), depending on the sensitivity analysis. CONCLUSION: In a realistic application, the extended paired availability design yielded reasonably precise confidence intervals for the effect of receiving screening on the rate of incident breast cancer death. Although the assumption of stable preferences may be questionable, its impact will be small if there is little screening in the first time period. However, estimates may be substantially confounded by improvements in systemic therapy over time. Therefore the results should be interpreted with care
    corecore