934 research outputs found

    A new method for identifying bivariate differential expression in high dimensional microarray data using quadratic discriminant analysis

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>One of the drawbacks we face up when analyzing gene to phenotype associations in genomic data is the ugly performance of the designed classifier due to the small sample-high dimensional data structures (<it>n</it> ≪ <it>p</it>) at hand. This is known as the peaking phenomenon, a common situation in the analysis of gene expression data. Highly predictive bivariate gene interactions whose marginals are useless for discrimination are also affected by such phenomenon, so they are commonly discarded by state of the art sequential search algorithms. Such patterns are known as weak/marginal strong bivariate interactions. This paper addresses the problem of uncovering them in high dimensional settings.</p> <p>Results</p> <p>We propose a new approach which uses the quadratic discriminant analysis (QDA) as a search engine in order to detect such signals. The choice of QDA is justified by a simulation study for a benchmark of classifiers which reveals its appealing properties. The procedure rests on an exhaustive search which explores the feature space in a blockwise manner by dividing it in blocks and by assessing the accuracy of the QDA for the predictors within each pair of blocks; the block size is determined by the resistance of the QDA to peaking. This search highlights chunks of features which are expected to contain the type of subtle interactions we are concerned with; a closer look at this smaller subset of features by means of an exhaustive search guided by the QDA error rate for all the pairwise input combinations within this subset will enable their final detection. The proposed method is applied both to synthetic data and to a public domain microarray data. When applied to gene expression data, it leads to pairs of genes which are not univariate differentially expressed but exhibit subtle patterns of bivariate differential expression.</p> <p>Conclusions</p> <p>We have proposed a novel approach for identifying weak marginal/strong bivariate interactions. Unlike standard approaches as the top scoring pair (TSP) and the CorScor, our procedure does not assume a specified shape of phenotype separation and may enrich the type of bivariate differential expression patterns that can be uncovered in high dimensional data.</p

    Sparse Interaction Additive Networks via Feature Interaction Detection and Sparse Selection

    Full text link
    There is currently a large gap in performance between the statistically rigorous methods like linear regression or additive splines and the powerful deep methods using neural networks. Previous works attempting to close this gap have failed to fully investigate the exponentially growing number of feature combinations which deep networks consider automatically during training. In this work, we develop a tractable selection algorithm to efficiently identify the necessary feature combinations by leveraging techniques in feature interaction detection. Our proposed Sparse Interaction Additive Networks (SIAN) construct a bridge from these simple and interpretable models to fully connected neural networks. SIAN achieves competitive performance against state-of-the-art methods across multiple large-scale tabular datasets and consistently finds an optimal tradeoff between the modeling capacity of neural networks and the generalizability of simpler methods

    Uncovering latent structure in valued graphs: A variational approach

    Full text link
    As more and more network-structured data sets are available, the statistical analysis of valued graphs has become common place. Looking for a latent structure is one of the many strategies used to better understand the behavior of a network. Several methods already exist for the binary case. We present a model-based strategy to uncover groups of nodes in valued graphs. This framework can be used for a wide span of parametric random graphs models and allows to include covariates. Variational tools allow us to achieve approximate maximum likelihood estimation of the parameters of these models. We provide a simulation study showing that our estimation method performs well over a broad range of situations. We apply this method to analyze host--parasite interaction networks in forest ecosystems.Comment: Published in at http://dx.doi.org/10.1214/10-AOAS361 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Detecting multivariate interactions in spatial point patterns with Gibbs models and variable selection

    Get PDF
    We propose a method for detecting significant interactions in very large multivariate spatial point patterns. This methodology develops high dimensional data understanding in the point process setting. The method is based on modelling the patterns using a flexible Gibbs point process model to directly characterise point-to-point interactions at different spatial scales. By using the Gibbs framework significant interactions can also be captured at small scales. Subsequently, the Gibbs point process is fitted using a pseudo-likelihood approximation, and we select significant interactions automatically using the group lasso penalty with this likelihood approximation. Thus we estimate the multivariate interactions stably even in this setting. We demonstrate the feasibility of the method with a simulation study and show its power by applying it to a large and complex rainforest plant population data set of 83 species

    The role of heterogeneity in spatial plant population dynamics

    Get PDF
    Ecological theory names interacting mechanisms that allow competing species to coexist in limited available space, some of them are perceive as antagonistic. Most prominent are niche differentiation, heterogeneity and neutrality (ecological equivalence). Species similarity is also influenced by two mechanisms: Habitat filtering selects for ecologically similar species, while niche differentiation reduces competitive pressure and thus prefers ecologically different species. The spatial arrangement of abiotic resources can determine the spatial pattern and competition framework for a pre-selected tree species ensemble. Spatial occurrence patterns of trees are formed by dispersal, growth and mortality which are influenced by the interacting abiotic and abiotic conditions. The relative impact of these mechanisms are underresearched in temperate forest trees, especially in Europe. We analysed a data set of a temperate old-growth forest with spatially explicit information about more than 15 000 individual trees of six tree species (90 % beech admixed with Ash, Hornbeam, Sycamore, Norway Maple, and Wych Elm) located in the central region of the Hainich National Park in central Germany. We tested space-related coexistence mechanisms under heterogeneous conditions. For this, we employed Point Pattern Analysis for testing several ecological hypotheses on inter- and intraspecific interactions of the species, varying from randomness to strict ecological niche. In order to identify the critical components of possible niches, we collected field data on the abiotic conditions such as the availability of water and light, and considered topography using a Digital Elevation Model. These field data were used for fitting suitability surfaces depending on tree species identity using spatial interpolation methods such as Kriging and Generalised Additive Models. We used Spatial Point Process Models to reconstruct the spatial distribution processes composed of purely biotic, abiotic or mixed covariates of the tree species. We found that spatial heterogeneity was important in all aspects we studied. Both, tree density and the distribution of the abiotic habitat components varied in space. Especially when species interacted with beech, abiotic heterogeneity played an important role: beech outcompeted the admixed species under most prevailing abiotic conditions. This way, beech influenced the spatial pattern of the six studied species by limiting available (niche) space via inter- and intraspecific competition. Here, Beech proved to be the superior competitor with no pronounced abiotic niche, but is mostly excluded from slopes. The remaining available niche space was often occupied by ecologically similar species, which formed typical associations in subregions of the study area less suitable for beech. We found spatial segregation between the three most abundant species Beech, Ash, and Hornbeam, coexistence by niches seem to be rather trait based rather than based on abiotic preferences. Habitat suitability and spatial distribution of Ash, Sycamore, and Norway Maple were more affected by the abiotic environmental condition than Beech, Hornbeam, and Elm. This indicates that the coexistence of rare species seems to be mediated by heterogeneity. Our study revealed that the difference in abiotic conditions, such as soil depth and plant-available water were relevant for habitat suitability at small spatial and temporal scales. When simulating the distribution pattern of the surveyed species, it became apparent that biotic interactions play an important part in shaping the scales at which aggregation or segregation happen in the abiotic environment. Beech and Sycamore both showed endogenous heterogeneity. For both species, point processes models incorporated several different interaction scales of intraspecific interaction. The interspecific interaction played only a minor role compared to the intraspecific one. All results together seem to underline that niche differentiation happens at the level of the individual allowing ecologically similar species to interact de facto neutrally within their niche space and thus, to coexist in presence of a strong competitor

    NATURAL AND ANTHROPOGENIC DRIVERS OF TREE EVOLUTIONARY DYNAMICS

    Get PDF
    Species of trees inhabit diverse and heterogeneous environments, and often play important ecological roles in such communities. As a result of their vast ecological breadth, trees have become adapted to various environmental pressures. In this dissertation I examine various environmental factors that drive evolutionary dynamics in threePinusspecies in California and Nevada, USA. In chapter two, I assess the role of management influence of thinning, fire, and their interaction on fine-scale gene flow within fire-suppressed populations of Pinus lambertiana, a historically dominant and ecologically important member of mixed-conifer forests of the Sierra Nevada, California. Here, I find evidence that treatment prescription differentially affects fine-scale genetic structure and effective gene flow in this species. In my third chapter, I describe the development of a dense linkage map for Pinus balfouriana which I use in chapter four to assess the quantitative trait locus (QTL) landscape of water-use efficiency across two isolated ranges of the species. I find evidence that precipitation-related variables structure the geographical range of P. balfouriana, that traits related to water-use efficiency are heritable and differentiated across populations, and associated QTLs underlying this phenotypic variation explain large proportions of total variation. In chapter five, I assess evidence for local adaptation to the eastern Sierra Nevada rain shadow within P. albicaulisacross fine spatial scales of the Lake Tahoe Basin, USA. Here, genetic variation of traits related to water availability were structured more so across populations than neutral variation, and loci identified by genome-wide association methods show elevated signals of local adaptation that track soil water availability. In chapter six, I review theory related to polygenic local adaptation and literature of genotype-phenotype associations in trees. I find that evidence suggests a polygenic basis for many traits important to conservation and industry, and I suggest paths forward to best describing such genetic bases in tree species. Overall, my results show that spatial and genetic structure of trees are often driven by their environment, and that ongoing selective pressures driven by environmental change will continue to be important in these systems
    • …
    corecore