1,383 research outputs found

    Identifying genes that contribute most to good classification in microarrays

    Get PDF
    BACKGROUND: The goal of most microarray studies is either the identification of genes that are most differentially expressed or the creation of a good classification rule. The disadvantage of the former is that it ignores the importance of gene interactions; the disadvantage of the latter is that it often does not provide a sufficient focus for further investigation because many genes may be included by chance. Our strategy is to search for classification rules that perform well with few genes and, if they are found, identify genes that occur relatively frequently under multiple random validation (random splits into training and test samples). RESULTS: We analyzed data from four published studies related to cancer. For classification we used a filter with a nearest centroid rule that is easy to implement and has been previously shown to perform well. To comprehensively measure classification performance we used receiver operating characteristic curves. In the three data sets with good classification performance, the classification rules for 5 genes were only slightly worse than for 20 or 50 genes and somewhat better than for 1 gene. In two of these data sets, one or two genes had relatively high frequencies not noticeable with rules involving 20 or 50 genes: desmin for classifying colon cancer versus normal tissue; and zyxin and secretory granule proteoglycan genes for classifying two types of leukemia. CONCLUSION: Using multiple random validation, investigators should look for classification rules that perform well with few genes and select, for further study, genes with relatively high frequencies of occurrence in these classification rules

    Estimating the cumulative risk of false positive cancer screenings

    Get PDF
    BACKGROUND: When evaluating cancer screening it is important to estimate the cumulative risk of false positives from periodic screening. Because the data typically come from studies in which the number of screenings varies by subject, estimation must take into account dropouts. A previous approach to estimate the probability of at least one false positive in n screenings unrealistically assumed that the probability of dropout does not depend on prior false positives. METHOD: By redefining the random variables, we obviate the unrealistic dropout assumption. We also propose a relatively simple logistic regression and extend estimation to the expected number of false positives in n screenings. RESULTS: We illustrate our methodology using data from women ages 40 to 64 who received up to four annual breast cancer screenings in the Health Insurance Program of Greater New York study, which began in 1963. Covariates were age, time since previous screening, screening number, and whether or not a previous false positive occurred. Defining a false positive as an unnecessary biopsy, the only statistically significant covariate was whether or not a previous false positive occurred. Because the effect of screening number was not statistically significant, extrapolation beyond 4 screenings was reasonable. The estimated mean number of unnecessary biopsies in 10 years per woman screened is .11 with 95% confidence interval of (.10, .12). Defining a false positive as an unnecessary work-up, all the covariates were statistically significant and the estimated mean number of unnecessary work-ups in 4 years per woman screened is .34 with 95% confidence interval (.32, .36). CONCLUSION: Using data from multiple cancer screenings with dropouts, and allowing dropout to depend on previous history of false positives, we propose a logistic regression model to estimate both the probability of at least one false positive and the expected number of false positives associated with n cancer screenings. The methodology can be used for both informed decision making at the individual level, as well as planning of health services

    Principles of Cancer Screening: Lessons From History and Study Design Issues

    Get PDF
    Early detection of cancer has held great promise and intuitive appeal in the medical community for well over a century. Its history developed in tandem with that of the periodic health examination, in which any deviations—subtle or glaring--from a clearly demarcated “normal” were to be rooted out, given the underlying hypothesis that diseases develop along progressive linear paths of increasing abnormalities. This model of disease development drove the logical deduction that early detection—by “breaking the chain” of cancer development--must be of benefit to affected individuals. In the latter half of the 20th century, researchers and guidelines organizations began to explicitly challenge the core assumptions underpinning many clinical practices. A move away from intuitive thinking began with the development of evidence-based medicine. One key method developed to explicitly quantify the overall risk-benefit profile of a given procedure was the analytic framework. The shift away from pure deductive reasoning and reliance on personal observation was driven, in part, by a rising awareness of critical biases in cancer screening that can mislead clinicians, including healthy volunteer bias, length-biased sampling, lead-time bias, and overdiagnosis. A new focus on the net balance of both benefits and harms when determining the overall worth of an intervention also arose: it was recognized that the potential downsides of early detection were frequently overlooked or discounted because screening is performed on basically healthy persons and initially involves relatively noninvasive methods. Although still inconsistently applied to early detection programs, policies, and belief systems in the United States, an evidence-based approach is essential to counteract the misleading—even potentially harmful--allure of intuition and individual observation

    The fallacy of enrolling only high-risk subjects in cancer prevention trials: Is there a "free lunch"?

    Get PDF
    BACKGROUND: There is a common belief that most cancer prevention trials should be restricted to high-risk subjects in order to increase statistical power. This strategy is appropriate if the ultimate target population is subjects at the same high-risk. However if the target population is the general population, three assumptions may underlie the decision to enroll high-risk subject instead of average-risk subjects from the general population: higher statistical power for the same sample size, lower costs for the same power and type I error, and a correct ratio of benefits to harms. We critically investigate the plausibility of these assumptions. METHODS: We considered each assumption in the context of a simple example. We investigated statistical power for fixed sample size when the investigators assume that relative risk is invariant over risk group, but when, in reality, risk difference is invariant over risk groups. We investigated possible costs when a trial of high-risk subjects has the same power and type I error as a larger trial of average-risk subjects from the general population. We investigated the ratios of benefit to harms when extrapolating from high-risk to average-risk subjects. RESULTS: Appearances here are misleading. First, the increase in statistical power with a trial of high-risk subjects rather than the same number of average-risk subjects from the general population assumes that the relative risk is the same for high-risk and average-risk subjects. However, if the absolute risk difference rather than the relative risk were the same, the power can be less with the high-risk subjects. In the analysis of data from a cancer prevention trial, we found that invariance of absolute risk difference over risk groups was nearly as plausible as invariance of relative risk over risk groups. Therefore a priori assumptions of constant relative risk across risk groups are not robust, limiting extrapolation of estimates of benefit to the general population. Second, a trial of high-risk subjects may cost more than a larger trial of average risk subjects with the same power and type I error because of additional recruitment and diagnostic testing to identify high-risk subjects. Third, the ratio of benefits to harms may be more favorable in high-risk persons than in average-risk persons in the general population, which means that extrapolating this ratio to the general population would be misleading. Thus there is no free lunch when using a trial of high-risk subjects to extrapolate results to the general population. CONCLUSION: Unless the intervention is targeted to only high-risk subjects, cancer prevention trials should be implemented in the general population

    Leptoquark production in ultrahigh-energy neutrino interactions revisited

    Get PDF
    The prospects for producing leptoquarks (LQs) in ultrahigh-energy (UHE) neutrino nucleon collisions are re-examined in the light of recent interpretations of HERA data in terms of leptoquark production. We update predictions for cross-sections for the production of first- and second-generation leptoquarks in UHE nu-N and nubar-N collisions including (i) recent experimental limits on masses and couplings from the LEP and TEVATRON colliders as well as rare processes, (ii) modern parton distributions, and (iii) radiative corrections to single leptoquark production. If the HERA events are due to an SU(2) doublet leptoquark which couples mainly to (e+,q) states, we argue that there are likely other LQ states which couple to neutrinos which are close in mass, due to constraints from precision electroweak measurements.Comment: 12 pages, LaTeX, 3 separate postscript figures. Added 1 reference plus discussion, updated another referenc

    CP Asymmetry in B_d --> phi K_S: Standard Model Pollution

    Full text link
    The difference in the time dependent CP asymmetries between the modes B>psiKSB --> psi K_S and B>phiKSB --> phi K_S is a clean signal for physics beyond the Standard Model. This interpretation could fail if there is a large enhancement of the matrix element of the b>uubarsb --> u ubar s operator between the BdB_d initial state and the phiKSphi K_S final state. We argue against this possibility and propose some experimental tests that could shed light on the situation.Comment: 9 pages, Revte

    Leptoproduction of Heavy Quarks II -- A Unified QCD Formulation of Charged and Neutral Current Processes from Fixed-target to Collider Energies

    Full text link
    A unified QCD formulation of leptoproduction of massive quarks in charged current and neutral current processes is described. This involves adopting consistent factorization and renormalization schemes which encompass both vector-boson-gluon-fusion (flavor creation) and vector-boson-massive-quark-scattering (flavor excitation) production mechanisms. It provides a framework which is valid from the threshold for producing the massive quark (where gluon-fusion is dominant) to the very high energy regime when the typical energy scale \mu is much larger than the quark mass m_Q (where the quark-scattering should be prevalent). This approach effectively resums all large logarithms of the type (alpha_s(mu) log(mu^2/m_Q^2)^n which limit the validity of existing fixed-order calculations to the region mu ~ O(m_Q). We show that the (massive) quark-scattering contribution (after subtraction of overlaps) is important in most parts of the (x, Q) plane except near the threshold region. We demonstrate that the factorization scale dependence of the structure functions calculated in this approach is substantially less than those obtained in the fixed-order calculations, as one would expect from a more consistent formulation.Comment: LaTeX format, 29 pages, 11 figures. Revised to make auto-TeX-abl

    Complete Order alpha_s^3 Results for e^+ e^- to (gamma,Z) to Four Jets

    Full text link
    We present the next-to-leading order (O(alpha_s^3)) perturbative QCD predictions for e^+e^- annihilation into four jets. A previous calculation omitted the O(alpha_s^3) terms suppressed by one or more powers of 1/N_c^2, where N_c is the number of colors, and the `light-by-glue scattering' contributions. We find that all such terms are uniformly small, constituting less than 10% of the correction. For the Durham clustering algorithm, the leading and next-to-leading logarithms in the limit of small jet resolution parameter y_{cut} can be resummed. We match the resummed results to our fixed-order calculation in order to improve the small y_{cut} prediction.Comment: Latex2e, 17 pages with 5 encapsulated figures. Note added regarding subsequent related work. To appear in Phys. Rev.

    An analysis of two-body non-leptonic B decays involving light mesons in the standard model

    Get PDF
    We report a theoretical analysis of the exclusive non-leptonic decays of B mesons into two light mesons,some of which have been measured recently by the CLEO collaboration. Our analysis is carried out in the context of an effective Hamiltonian based on the Standard Model using next-to-leading order perturbative QCD calculations. Using a factorization ansatz for the hadronic matrix elements, we show that existing data are accounted for in this approach. Thus, theoretical scenarios with a substantially enhanced Wilson coefficient of the chromomagnetic dipole operator (as compared to the SM) and/or those with a substantial color-singlet ccˉc\bar{c} component in the wave function of η\eta^\prime are not required by these data. Implications of some of these measurements for the parameters of the CKM matrix are presented.Comment: 42 pages including 21 postscript figures; uses epsfig, some ref. added; improved mixing scheme for (eta,eta') system implemented; error corrected in (3.61) as explained below eq. (3.64) in the present version. Figs. 17-20 update
    corecore