1,562 research outputs found

    DNA expression microarrays may be the wrong tool to identify biological pathways

    Get PDF
    DNA microarray expression signatures are expected to provide new insights into patho- physiological pathways. Numerous variant statistical methods have been described for each step of the signal analysis. We employed five similar statistical tests on the same data set at the level of gene selection. Inter-test agreement for the identification of biological pathways in BioCarta, KEGG and Reactome was calculated using Cohen’s k- score. The identification of specific biological pathways showed only moderate agreement (0.30 < k < 0.79) between the analysis methods used. Pathways identified by microarrays must be treated cautiously as they vary according to the statistical method used

    The Effective Prepotential of N=2 Supersymmetric SU(N_c) Gauge Theories

    Get PDF
    We determine the effective prepotential for N=2 supersymmetric SU(N_c) gauge theories with an arbitrary number of flavors N_f < 2N_c, from the exact solution constructed out of spectral curves. The prepotential is the same for the several models of spectral curves proposed in the literature. It has to all orders the logarithmic singularities of the one-loop perturbative corrections, thus confirming the non-renormalization theorems from supersymmetry. In particular, the renormalized order parameters and their duals have all the correct monodromy transformations prescribed at weak coupling. We evaluate explicitly the contributions of one- and two-instanton processes.Comment: 34 pages, Plain TeX, no macros needed, no figure

    Topics in statistical inference for massive data and high-dimensional data

    Get PDF
    This dissertation consists of three research papers that deal with three different problems in statistics concerning high-volume datasets. The first paper studies the distributed statistical inference for massive data. With the increasing size of the data, computational complexity and feasibility should be taken into consideration for statistical analyses. We investigate the statistical efficiency of the distributed version of a general class of statistics. Distributed bootstrap algorithms are proposed to approximate the distribution of the distributed statistics. These approaches relief the computational burdens of conventional methods while preserving adequate statistical efficiency. The second paper deals with testing the identity and sphericity hypotheses problem regarding high-dimensional covariance matrices, with a focus on improving the power of existing methods. By taking advantage of the sparsity in the underlying covariance matrices, the power improvement is accomplished by utilizing the banding estimator for the covariance matrices, which leads to a significant reduction in the variance of the test statistics. The last paper considers variable selection for high-dimensional data. Distance-based variable importance measures are proposed to rank and select variables with dependence structures being taken into consideration. The importance measures are inspired by the multi-response permutation procedure (MRPP) and the energy distance. A backward selection algorithm is developed to discover important variables and to improve the power of the original MRPP for high-dimensional data
    • …
    corecore