9,286 research outputs found

    Stratification bias in low signal microarray studies

    Get PDF
    BACKGROUND: When analysing microarray and other small sample size biological datasets, care is needed to avoid various biases. We analyse a form of bias, stratification bias, that can substantially affect analyses using sample-reuse validation techniques and lead to inaccurate results. This bias is due to imperfect stratification of samples in the training and test sets and the dependency between these stratification errors, i.e. the variations in class proportions in the training and test sets are negatively correlated. RESULTS: We show that when estimating the performance of classifiers on low signal datasets (i.e. those which are difficult to classify), which are typical of many prognostic microarray studies, commonly used performance measures can suffer from a substantial negative bias. For error rate this bias is only severe in quite restricted situations, but can be much larger and more frequent when using ranking measures such as the receiver operating characteristic (ROC) curve and area under the ROC (AUC). Substantial biases are shown in simulations and on the van 't Veer breast cancer dataset. The classification error rate can have large negative biases for balanced datasets, whereas the AUC shows substantial pessimistic biases even for imbalanced datasets. In simulation studies using 10-fold cross-validation, AUC values of less than 0.3 can be observed on random datasets rather than the expected 0.5. Further experiments on the van 't Veer breast cancer dataset show these biases exist in practice. CONCLUSION: Stratification bias can substantially affect several performance measures. In computing the AUC, the strategy of pooling the test samples from the various folds of cross-validation can lead to large biases; computing it as the average of per-fold estimates avoids this bias and is thus the recommended approach. As a more general solution applicable to other performance measures, we show that stratified repeated holdout and a modified version of k-fold cross-validation, balanced, stratified cross-validation and balanced leave-one-out cross-validation, avoids the bias. Therefore for model selection and evaluation of microarray and other small biological datasets, these methods should be used and unstratified versions avoided. In particular, the commonly used (unbalanced) leave-one-out cross-validation should not be used to estimate AUC for small datasets

    Poncelet's Theorem, Paraorthogonal Polynomials and the Numerical Range of Compressed Multiplication Operators

    Get PDF
    There has been considerable recent literature connecting Poncelet's theorem to ellipses, Blaschke products and numerical ranges, summarized, for example, in the recent book [11]. We show how those results can be understood using ideas from the theory of orthogonal polynomials on the unit circle (OPUC) and, in turn, can provide new insights to the theory of OPUC.Comment: 46 pages, 4 figures; minor revisions from v1; accepted for publication in Adv. Mat

    Variance Reduced Stochastic Gradient Descent with Neighbors

    Full text link
    Stochastic Gradient Descent (SGD) is a workhorse in machine learning, yet its slow convergence can be a computational bottleneck. Variance reduction techniques such as SAG, SVRG and SAGA have been proposed to overcome this weakness, achieving linear convergence. However, these methods are either based on computations of full gradients at pivot points, or on keeping per data point corrections in memory. Therefore speed-ups relative to SGD may need a minimal number of epochs in order to materialize. This paper investigates algorithms that can exploit neighborhood structure in the training data to share and re-use information about past stochastic gradients across data points, which offers advantages in the transient optimization phase. As a side-product we provide a unified convergence analysis for a family of variance reduction algorithms, which we call memorization algorithms. We provide experimental results supporting our theory.Comment: Appears in: Advances in Neural Information Processing Systems 28 (NIPS 2015). 13 page

    Smooth Initial Conditions from Weak Gravity

    Full text link
    CMB measurements reveal an unnaturally smooth early universe. We propose a mechanism to make this smoothness natural by weakening the strength of gravity at early times, and therefore altering which initial conditions have low entropy.Comment: 14 pages, 5 figures. Minor changes, version appearing in PL

    Cosmological Moduli Dynamics

    Get PDF
    Low energy effective actions arising from string theory typically contain many scalar fields, some with a very complicated potential and others with no potential at all. The evolution of these scalars is of great interest. Their late time values have a direct impact on low energy observables, while their early universe dynamics can potentially source inflation or adversely affect big bang nucleosynthesis. Recently, classical and quantum methods for fixing the values of these scalars have been introduced. The purpose of this work is to explore moduli dynamics in light of these stabilization mechanisms. In particular, we explore a truncated low energy effective action that models the neighborhood of special points (or more generally loci) in moduli space, such as conifold points, where extra massless degrees of freedom arise. We find that the dynamics has a surprisingly rich structure - including the appearance of chaos - and we find a viable mechanism for trapping some of the moduli.Comment: 35 pages, 14 figures, references adde

    Accelerating vaccine development and deployment: report of a Royal Society satellite meeting.

    Get PDF
    The Royal Society convened a meeting on the 17th and 18th November 2010 to review the current ways in which vaccines are developed and deployed, and to make recommendations as to how each of these processes might be accelerated. The meeting brought together academics, industry representatives, research sponsors, regulators, government advisors and representatives of international public health agencies from a broad geographical background. Discussions were held under Chatham House rules. High-throughput screening of new vaccine antigens and candidates was seen as a driving force for vaccine discovery. Multi-stakeholder, small-scale manufacturing facilities capable of rapid production of clinical grade vaccines are currently too few and need to be expanded. In both the human and veterinary areas, there is a need for tiered regulatory standards, differentially tailored for experimental and commercial vaccines, to allow accelerated vaccine efficacy testing. Improved cross-fertilization of knowledge between industry and academia, and between human and veterinary vaccine developers, could lead to more rapid application of promising approaches and technologies to new product development. Identification of best-practices and development of checklists for product development plans and implementation programmes were seen as low-cost opportunities to shorten the timeline for vaccine progression from the laboratory bench to the people who need it
    • …
    corecore