297 research outputs found

    Spatially-constrained clustering of ecological networks

    Full text link
    Spatial ecological networks are widely used to model interactions between georeferenced biological entities (e.g., populations or communities). The analysis of such data often leads to a two-step approach where groups containing similar biological entities are firstly identified and the spatial information is used afterwards to improve the ecological interpretation. We develop an integrative approach to retrieve groups of nodes that are geographically close and ecologically similar. Our model-based spatially-constrained method embeds the geographical information within a regularization framework by adding some constraints to the maximum likelihood estimation of parameters. A simulation study and the analysis of real data demonstrate that our approach is able to detect complex spatial patterns that are ecologically meaningful. The model-based framework allows us to consider external information (e.g., geographic proximities, covariates) in the analysis of ecological networks and appears to be an appealing alternative to consider such data

    Continuous testing for Poisson process intensities: A new perspective on scanning statistics

    Full text link
    We propose a novel continuous testing framework to test the intensities of Poisson Processes. This framework allows a rigorous definition of the complete testing procedure, from an infinite number of hypothesis to joint error rates. Our work extends traditional procedures based on scanning windows, by controlling the family-wise error rate and the false discovery rate in a non-asymptotic manner and in a continuous way. The decision rule is based on a \pvalue process that can be estimated by a Monte-Carlo procedure. We also propose new test statistics based on kernels. Our method is applied in Neurosciences and Genomics through the standard test of homogeneity, and the two-sample test

    Strategies for online inference of model-based clustering in large and growing networks

    Full text link
    In this paper we adapt online estimation strategies to perform model-based clustering on large networks. Our work focuses on two algorithms, the first based on the SAEM algorithm, and the second on variational methods. These two strategies are compared with existing approaches on simulated and real data. We use the method to decipher the connexion structure of the political websphere during the US political campaign in 2008. We show that our online EM-based algorithms offer a good trade-off between precision and speed, when estimating parameters for mixture distributions in the context of random graphs.Comment: Published in at http://dx.doi.org/10.1214/10-AOAS359 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    A statistical approach for array CGH data analysis

    Get PDF
    BACKGROUND: Microarray-CGH experiments are used to detect and map chromosomal imbalances, by hybridizing targets of genomic DNA from a test and a reference sample to sequences immobilized on a slide. These probes are genomic DNA sequences (BACs) that are mapped on the genome. The signal has a spatial coherence that can be handled by specific statistical tools. Segmentation methods seem to be a natural framework for this purpose. A CGH profile can be viewed as a succession of segments that represent homogeneous regions in the genome whose BACs share the same relative copy number on average. We model a CGH profile by a random Gaussian process whose distribution parameters are affected by abrupt changes at unknown coordinates. Two major problems arise : to determine which parameters are affected by the abrupt changes (the mean and the variance, or the mean only), and the selection of the number of segments in the profile. RESULTS: We demonstrate that existing methods for estimating the number of segments are not well adapted in the case of array CGH data, and we propose an adaptive criterion that detects previously mapped chromosomal aberrations. The performances of this method are discussed based on simulations and publicly available data sets. Then we discuss the choice of modeling for array CGH data and show that the model with a homogeneous variance is adapted to this context. CONCLUSIONS: Array CGH data analysis is an emerging field that needs appropriate statistical tools. Process segmentation and model selection provide a theoretical framework that allows precise biological interpretations. Adaptive methods for model selection give promising results concerning the estimation of the number of altered regions on the genome

    Multi-CPU and multi-GPU hybrid computations of multi-scale scalar transport

    No full text
    International audienceThe aim of this work is to propose an hybrid implementation of a semi-Lagrangian particle method on a multi-CPU and multi-GPU architecture. The applications we have in view deal with the transport of a passive scalar in a turbulent flow. For high Schmidt numbers (ratio of flow viscosity to scalar diffusivity), these problems exhibit two different scales: one related to the flow and the other -a smaller scale - to the scalar fluctuations. This scale separation motivates the use of hybrid methods where scalar and flow dynamics can be solved with different algorithms and at different resolutions. The coupling between these scales is done through the velocity field

    Hydrogen Energy Storage: New Techno-Economic Emergence Solution Analysis

    No full text
    International audienceThe integration of various renewable energy sources as well as the liberalization of electricity markets are established facts in modern electrical power systems. The increased share of renewable sources within power systems intensifies the supply variability and intermittency. Therefore, energy storage is deemed as one of the solutions for stabilizing the supply of electricity to maintain generation-demand balance and to guarantee uninterrupted supply of energy to users. In the context of sustainable development and energy resources depletion, the question of the growth of renewable energy electricity production is highly linked to the ability to propose new and adapted energy storage solutions. The purpose of this multidisciplinary paper is to highlight the new hydrogen production and storage technology, its efficiency and the impact of the policy context on its development. A comprehensive techno/socio/economic study of long term hydrogen based storage systems in electrical networks is addressed. The European policy concerning the different energy storage systems and hydrogen production is explicitly discussed. The state of the art of the techno-economic features of the hydrogen production and storage is introduced. Using Matlab-Simulink for a power system of rated 70 kW generator, the excess produced hydrogen during high generation periods or low demand can be sold either directly to the grid owners or as filled hydrogen bottles. The affordable use of Hydrogen-based technologies for long term electricity storage is verified

    Adaptive Lasso and group-Lasso for functional Poisson regression

    Get PDF
    International audienceHigh dimensional Poisson regression has become a standard framework for the analysis of massive counts datasets. In this work we estimate the intensity function of the Poisson regression model by using a dictionary approach, which generalizes the classical basis approach , combined with a Lasso or a group-Lasso procedure. Selection depends on penalty weights that need to be calibrated. Standard methodologies developed in the Gaussian framework can not be directly applied to Poisson models due to heteroscedasticity. Here we provide data-driven weights for the Lasso and the group-Lasso derived from concentration inequalities adapted to the Poisson case. We show that the associated Lasso and group-Lasso procedures satisfy fast and slow oracle inequalities. Simulations are used to assess the empirical performance of our procedure, and an original application to the analysis of Next Generation Sequencing data is provided

    A mixture model for random graphs

    Get PDF
    {The Erdos-RĂ©nyi model of a network is simple and possesses many explicit expressions for average and asymptotic properties, but it does not fit well to real-word networks. The vertices of these networks are often structured in \textit{prior} unknown clusters (functionally related proteins or social communities) with different connectivity properties. We define a generalization of the Erdos-RĂ©nyi model called ERMG for Erdos-RĂ©nyi Mixtures for Graphs. This new model is based on mixture distributions. We give some of its properties, an algorithm to estimate its parameters and apply this method to uncover the modular structure of a network of enzymatic reactions
    • 

    corecore