115 research outputs found

    Data-driven coarse graining in action: Modeling and prediction of complex systems

    Get PDF
    In many physical, technological, social, and economic applications, one is commonly faced with the task of estimating statistical properties, such as mean first passage times of a temporal continuous process, from empirical data (experimental observations). Typically, however, an accurate and reliable estimation of such properties directly from the data alone is not possible as the time series is often too short, or the particular phenomenon of interest is only rarely observed. We propose here a theoretical-computational framework which provides us with a systematic and rational estimation of statistical quantities of a given temporal process, such as waiting times between subsequent bursts of activity in intermittent signals. Our framework is illustrated with applications from real-world data sets, ranging from marine biology to paleoclimatic data

    Efficient Density Estimation via Piecewise Polynomial Approximation

    Get PDF
    We give a highly efficient "semi-agnostic" algorithm for learning univariate probability distributions that are well approximated by piecewise polynomial density functions. Let pp be an arbitrary distribution over an interval II which is τ\tau-close (in total variation distance) to an unknown probability distribution qq that is defined by an unknown partition of II into tt intervals and tt unknown degree-dd polynomials specifying qq over each of the intervals. We give an algorithm that draws \tilde{O}(t\new{(d+1)}/\eps^2) samples from pp, runs in time \poly(t,d,1/\eps), and with high probability outputs a piecewise polynomial hypothesis distribution hh that is (O(\tau)+\eps)-close (in total variation distance) to pp. This sample complexity is essentially optimal; we show that even for τ=0\tau=0, any algorithm that learns an unknown tt-piecewise degree-dd probability distribution over II to accuracy \eps must use \Omega({\frac {t(d+1)} {\poly(1 + \log(d+1))}} \cdot {\frac 1 {\eps^2}}) samples from the distribution, regardless of its running time. Our algorithm combines tools from approximation theory, uniform convergence, linear programming, and dynamic programming. We apply this general algorithm to obtain a wide range of results for many natural problems in density estimation over both continuous and discrete domains. These include state-of-the-art results for learning mixtures of log-concave distributions; mixtures of tt-modal distributions; mixtures of Monotone Hazard Rate distributions; mixtures of Poisson Binomial Distributions; mixtures of Gaussians; and mixtures of kk-monotone densities. Our general technique yields computationally efficient algorithms for all these problems, in many cases with provably optimal sample complexities (up to logarithmic factors) in all parameters

    Efficiently Learning Structured Distributions from Untrusted Batches

    Full text link
    We study the problem, introduced by Qiao and Valiant, of learning from untrusted batches. Here, we assume mm users, all of whom have samples from some underlying distribution pp over 1,,n1, \ldots, n. Each user sends a batch of kk i.i.d. samples from this distribution; however an ϵ\epsilon-fraction of users are untrustworthy and can send adversarially chosen responses. The goal is then to learn pp in total variation distance. When k=1k = 1 this is the standard robust univariate density estimation setting and it is well-understood that Ω(ϵ)\Omega (\epsilon) error is unavoidable. Suprisingly, Qiao and Valiant gave an estimator which improves upon this rate when kk is large. Unfortunately, their algorithms run in time exponential in either nn or kk. We first give a sequence of polynomial time algorithms whose estimation error approaches the information-theoretically optimal bound for this problem. Our approach is based on recent algorithms derived from the sum-of-squares hierarchy, in the context of high-dimensional robust estimation. We show that algorithms for learning from untrusted batches can also be cast in this framework, but by working with a more complicated set of test functions. It turns out this abstraction is quite powerful and can be generalized to incorporate additional problem specific constraints. Our second and main result is to show that this technology can be leveraged to build in prior knowledge about the shape of the distribution. Crucially, this allows us to reduce the sample complexity of learning from untrusted batches to polylogarithmic in nn for most natural classes of distributions, which is important in many applications. To do so, we demonstrate that these sum-of-squares algorithms for robust mean estimation can be made to handle complex combinatorial constraints (e.g. those arising from VC theory), which may be of independent technical interest.Comment: 46 page

    PAC learning using Nadaraya-Watson estimator based on orthonormal systems

    Get PDF
    Regression or function classes of Euclidean type with compact support and certain smoothness properties are shown to be PAC learnable by the Nadaraya-Watson estimator based on complete orthonormal systems. While requiring more smoothness properties than typical PAC formulations, this estimator is computationally efficient, easy to implement, and known to perform well in a number of practical applications. The sample sizes necessary for PAC learning of regressions or functions under sup norm cost are derived for a general orthonormal system. The result covers the widely used estimators based on Haar wavelets, trignometric functions, and Daubechies wavelets

    Global Chronic Total Occlusion Crossing Algorithm: JACC State-of-the-Art Review

    Get PDF
    The authors developed a global chronic total occlusion crossing algorithm following 10 steps: 1) dual angiography; 2) careful angiographic review focusing on proximal cap morphology, occlusion segment, distal vessel quality, and collateral circulation; 3) approaching proximal cap ambiguity using intravascular ultrasound, retrograde, and move-the-cap techniques; 4) approaching poor distal vessel quality using the retrograde approach and bifurcation at the distal cap by use of a dual-lumen catheter and intravascular ultrasound; 5) feasibility of retrograde crossing through grafts and septal and epicardial collateral vessels; 6) antegrade wiring strategies; 7) retrograde approach; 8) changing strategy when failing to achieve progress; 9) considering performing an investment procedure if crossing attempts fail; and 10) stopping when reaching high radiation or contrast dose or in case of long procedural time, occurrence of a serious complication, operator and patient fatigue, or lack of expertise or equipment. This algorithm can improve outcomes and expand discussion, research, and collaboration

    Global Chronic Total Occlusion Crossing Algorithm

    Get PDF
    The authors developed a global chronic total occlusion crossing algorithm following 10 steps: 1) dual angiography; 2) careful angiographic review focusing on proximal cap morphology, occlusion segment, distal vessel quality, and collateral circulation; 3) approaching proximal cap ambiguity using intravascular ultrasound, retrograde, and move-the-cap techniques; 4) approaching poor distal vessel quality using the retrograde approach and bifurcation at the distal cap by use of a dual-lumen catheter and intravascular ultrasound; 5) feasibility of retrograde crossing through grafts and septal and epicardial collateral vessels; 6) antegrade wiring strategies; 7) retrograde approach; 8) changing strategy when failing to achieve progress; 9) considering performing an investment procedure if crossing attempts fail; and 10) stopping when reaching high radiation or contrast dose or in case of long procedural time, occurrence of a serious complication, operator and patient fatigue, or lack of expertise or equipment. This algorithm can improve outcomes and expand discussion, research, and collaboration.info:eu-repo/semantics/publishedVersio
    corecore