793 research outputs found

    How Many Subpopulations is Too Many? Exponential Lower Bounds for Inferring Population Histories

    Full text link
    Reconstruction of population histories is a central problem in population genetics. Existing coalescent-based methods, like the seminal work of Li and Durbin (Nature, 2011), attempt to solve this problem using sequence data but have no rigorous guarantees. Determining the amount of data needed to correctly reconstruct population histories is a major challenge. Using a variety of tools from information theory, the theory of extremal polynomials, and approximation theory, we prove new sharp information-theoretic lower bounds on the problem of reconstructing population structure -- the history of multiple subpopulations that merge, split and change sizes over time. Our lower bounds are exponential in the number of subpopulations, even when reconstructing recent histories. We demonstrate the sharpness of our lower bounds by providing algorithms for distinguishing and learning population histories with matching dependence on the number of subpopulations. Along the way and of independent interest, we essentially determine the optimal number of samples needed to learn an exponential mixture distribution information-theoretically, proving the upper bound by analyzing natural (and efficient) algorithms for this problem.Comment: 38 pages, Appeared in RECOMB 201

    Comparing the effectiveness of recent algorithms to fill and smooth incomplete and noisy time series

    Get PDF
    Geophysical time series often feature missing data or data acquired at irregular times. Procedures are needed to either resample these series at systematic time intervals or to generate reasonable estimates at specified times in order to meet specific user requirements or to facilitate subsequent analyses. Interpolation methods have long been used to address this problem, taking into account the fact that available measurements also include errors of measurement or uncertainties. This paper inspects some of the currently used approaches to fill gaps and smooth time series (smoothing splines, Singular Spectrum Analysis and Lomb-Scargle) by comparing their performance in either reconstructing the original record or in minimizing the Mean Absolute Error (MAE) between the underlying model and the available data, using both artificially-generated series or well-known publicly available records. Some methods make no assumption on the type of variability in the data while others hypothesize the presence of at least some dominant frequencies. It will be seen that each method exhibits advantages and drawbacks, and that the choice of an approach largely depends on the properties of the underlying time series and the objective of the research.JRC.H.5-Land Resources Managemen

    Semiparametric Bayesian models for human brain mapping

    Get PDF
    Functional magnetic resonance imaging (fMRI) has led to enormous progress in human brain mapping. Adequate analysis of the massive spatiotemporal data sets generated by this imaging technique, combining parametric and non-parametric components, imposes challenging problems in statistical modelling. Complex hierarchical Bayesian models in combination with computer-intensive Markov chain Monte Carlo inference are promising tools.The purpose of this paper is twofold. First, it provides a review of general semiparametric Bayesian models for the analysis of fMRI data. Most approaches focus on important but separate temporal or spatial aspects of the overall problem, or they proceed by stepwise procedures. Therefore, as a second aim, we suggest a complete spatiotemporal model for analysing fMRI data within a unified semiparametric Bayesian framework. An application to data from a visual stimulation experiment illustrates our approach and demonstrates its computational feasibility

    Machine learning for the performance assessment of high-speed links

    Get PDF
    This paper investigates the application of support vector machine to the modeling of high-speed interconnects with largely varying and/or highly uncertain design parameters. The proposed method relies on a robust and well-established mathematical framework, yielding accurate surrogates of complex dynamical systems. An identification procedure based on the observation of a small set of system responses allows generating compact parametric relations, which can be used for design optimization and/or stochastic analysis. The feasibility and strength of the method are demonstrated based on a benchmark function and on the statistical assessment of a realistic printed circuit board interconnect, highlighting the main features and benefits of this technique over state-of-the-art solutions. Emphasis is given to the effects of the initial sample size and of input noise on the model estimation

    Robust reconstruction of sparse network dynamics

    Full text link
    Reconstruction of the network interaction structure from multivariate time series is an important problem in multiple fields of science. This problem is ill-posed for large networks leading to the reconstruction of false interactions. We put forward the Ergodic Basis Pursuit (EBP) method that uses the network dynamics' statistical properties to ensure the exact reconstruction of sparse networks when a minimum length of time series is attained. We show that this minimum time series length scales quadratically with the node degree being probed and logarithmic with the network size. Our approach is robust against noise and allows us to treat the noise level as a parameter. We show the reconstruction power of the EBP in experimental multivariate time series from optoelectronic networks.Comment: 48 pages, 6 figure

    Identifying Biological Network Structure, Predicting Network Behavior, and Classifying Network State With High Dimensional Model Representation (HDMR)

    Get PDF
    This work presents an adapted Random Sampling - High Dimensional Model Representation (RS-HDMR) algorithm for synergistically addressing three key problems in network biology: (1) identifying the structure of biological networks from multivariate data, (2) predicting network response under previously unsampled conditions, and (3) inferring experimental perturbations based on the observed network state. RS-HDMR is a multivariate regression method that decomposes network interactions into a hierarchy of non-linear component functions. Sensitivity analysis based on these functions provides a clear physical and statistical interpretation of the underlying network structure. The advantages of RS-HDMR include efficient extraction of nonlinear and cooperative network relationships without resorting to discretization, prediction of network behavior without mechanistic modeling, robustness to data noise, and favorable scalability of the sampling requirement with respect to network size. As a proof-of-principle study, RS-HDMR was applied to experimental data measuring the single-cell response of a protein-protein signaling network to various experimental perturbations. A comparison to network structure identified in the literature and through other inference methods, including Bayesian and mutual-information based algorithms, suggests that RS-HDMR can successfully reveal a network structure with a low false positive rate while still capturing non-linear and cooperative interactions. RS-HDMR identified several higher-order network interactions that correspond to known feedback regulations among multiple network species and that were unidentified by other network inference methods. Furthermore, RS-HDMR has a better ability to predict network response under unsampled conditions in this application than the best statistical inference algorithm presented in the recent DREAM3 signaling-prediction competition. RS-HDMR can discern and predict differences in network state that arise from sources ranging from intrinsic cell-cell variability to altered experimental conditions, such as when drug perturbations are introduced. This ability ultimately allows RS-HDMR to accurately classify the experimental conditions of a given sample based on its observed network state
    • …
    corecore