18 research outputs found

    On a new class of data depths for measuring representativeness

    Get PDF
    Theme: Big Data and Statistical ComputingSession SS1R1 - Data DepthData depth provides a natural means to rank multivariate vectors with respect to an underlying multivariate distribution. The conventional notion of a depth function emphasizes a centre-outward ordering of data points. While useful for certain statistical applications, such emphasis has rendered most classical data depths insensitive to some distributional features, such as multimodality, of concern to other statistical applications. To get around the problem we introduce a new notion of data depth which seeks to rank data points according to their representativeness, rather than centrality, with respect to an underlying distribution of interest. We propose a general device for defining such depth functions, based essentially on a choice of goodness-of-fit test statistic. Our device calls for a new interpretation of depth more akin to the concept of density than location. It copes particularly well with multivariate data exhibiting multimodality. In addition to providing depth values for individual data points, the new class of depth functions derived from goodness-of-fit tests also extends naturally to provide depth values for subsets of data points, a concept new to the data-depth literature. Applications of the new depth functions are demonstrated with both simulated and real data.published_or_final_versio

    Graph ranking for exploratory gene data analysis

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Microarray technology has made it possible to simultaneously monitor the expression levels of thousands of genes in a single experiment. However, the large number of genes greatly increases the challenges of analyzing, comprehending and interpreting the resulting mass of data. Selecting a subset of important genes is inevitable to address the challenge. Gene selection has been investigated extensively over the last decade. Most selection procedures, however, are not sufficient for accurate inference of underlying biology, because biological significance does not necessarily have to be statistically significant. Additional biological knowledge needs to be integrated into the gene selection procedure.</p> <p>Results</p> <p>We propose a general framework for gene ranking. We construct a bipartite graph from the Gene Ontology (GO) and gene expression data. The graph describes the relationship between genes and their associated molecular functions. Under a species condition, edge weights of the graph are assigned to be gene expression level. Such a graph provides a mathematical means to represent both species-independent and species-dependent biological information. We also develop a new ranking algorithm to analyze the weighted graph via a kernelized spatial depth (KSD) approach. Consequently, the importance of gene and molecular function can be simultaneously ranked by a real-valued measure, KSD, which incorporates the global and local structure of the graph. Over-expressed and under-regulated genes also can be separately ranked.</p> <p>Conclusion</p> <p>The gene-function bigraph integrates molecular function annotations into gene expression data. The relevance of genes is described in the graph (through a common function). The proposed method provides an exploratory framework for gene data analysis.</p

    Depth functions as measures of representativeness

    Get PDF
    Data depth provides a natural means to rank multivariate vectors with respect to an underlying multivariate distribution. Most existing depth functions emphasize a centre-outward ordering of data points, which may not provide a useful geometric representation of certain distributional features, such as multimodality, of concern to some statistical applications. Such inadequacy motivates us to develop a device for ranking data points according to their “representativeness” rather than “centrality” with respect to an underlying distribution of interest. Derived essentially from a choice of goodness-of-fit test statistic, our device calls for a new interpretation of “depth” more akin to the concept of density than location. It copes particularly well with multivariate data exhibiting multimodality. In addition to providing depth values for individual data points, depth functions derived from goodness-of-fit tests also extend naturally to provide depth values for subsets of data points, a concept new to the data-depth literature.postprin

    Monge-Kantorovich Depth, Quantiles, Ranks, and Signs

    Get PDF
    We propose new concepts of statistical depth, multivariate quantiles, ranks and signs, based on canonical transportation maps between a distribution of interest on Rd and a reference distribution on the d-dimensional unit ball. The new depth concept, called Monge-Kantorovich depth, specializes to halfspace depth in the case of elliptical distributions, but, for more general distributions, differs from the latter in the ability for its contours to account for non convex features of the distribution of interest. We propose empirical counterparts to the population versions of those Monge-Kantorovich depth contours, quantiles, ranks and signs, and show their consistency by establishing a uniform convergence property for empirical transport maps, which is of independent interest
    corecore