2,222 research outputs found

    Robust Inference of Trees

    Full text link
    This paper is concerned with the reliable inference of optimal tree-approximations to the dependency structure of an unknown distribution generating data. The traditional approach to the problem measures the dependency strength between random variables by the index called mutual information. In this paper reliability is achieved by Walley's imprecise Dirichlet model, which generalizes Bayesian learning with Dirichlet priors. Adopting the imprecise Dirichlet model results in posterior interval expectation for mutual information, and in a set of plausible trees consistent with the data. Reliable inference about the actual tree is achieved by focusing on the substructure common to all the plausible trees. We develop an exact algorithm that infers the substructure in time O(m^4), m being the number of random variables. The new algorithm is applied to a set of data sampled from a known distribution. The method is shown to reliably infer edges of the actual tree even when the data are very scarce, unlike the traditional approach. Finally, we provide lower and upper credibility limits for mutual information under the imprecise Dirichlet model. These enable the previous developments to be extended to a full inferential method for trees.Comment: 26 pages, 7 figure

    Functional Bipartite Ranking: a Wavelet-Based Filtering Approach

    Full text link
    It is the main goal of this article to address the bipartite ranking issue from the perspective of functional data analysis (FDA). Given a training set of independent realizations of a (possibly sampled) second-order random function with a (locally) smooth autocorrelation structure and to which a binary label is randomly assigned, the objective is to learn a scoring function s with optimal ROC curve. Based on linear/nonlinear wavelet-based approximations, it is shown how to select compact finite dimensional representations of the input curves adaptively, in order to build accurate ranking rules, using recent advances in the ranking problem for multivariate data with binary feedback. Beyond theoretical considerations, the performance of the learning methods for functional bipartite ranking proposed in this paper are illustrated by numerical experiments

    A survey of max-type recursive distributional equations

    Full text link
    In certain problems in a variety of applied probability settings (from probabilistic analysis of algorithms to statistical physics), the central requirement is to solve a recursive distributional equation of the form X =^d g((\xi_i,X_i),i\geq 1). Here (\xi_i) and g(\cdot) are given and the X_i are independent copies of the unknown distribution X. We survey this area, emphasizing examples where the function g(\cdot) is essentially a ``maximum'' or ``minimum'' function. We draw attention to the theoretical question of endogeny: in the associated recursive tree process X_i, are the X_i measurable functions of the innovations process (\xi_i)?Comment: Published at http://dx.doi.org/10.1214/105051605000000142 in the Annals of Applied Probability (http://www.imstat.org/aap/) by the Institute of Mathematical Statistics (http://www.imstat.org

    copulaedas: An R Package for Estimation of Distribution Algorithms Based on Copulas

    Get PDF
    The use of copula-based models in EDAs (estimation of distribution algorithms) is currently an active area of research. In this context, the copulaedas package for R provides a platform where EDAs based on copulas can be implemented and studied. The package offers complete implementations of various EDAs based on copulas and vines, a group of well-known optimization problems, and utility functions to study the performance of the algorithms. Newly developed EDAs can be easily integrated into the package by extending an S4 class with generic functions for their main components. This paper presents copulaedas by providing an overview of EDAs based on copulas, a description of the implementation of the package, and an illustration of its use through examples. The examples include running the EDAs defined in the package, implementing new algorithms, and performing an empirical study to compare the behavior of different algorithms on benchmark functions and a real-world problem

    Tools to Characterize Point Patterns: dbmss for R

    Get PDF
    The dbmss package for R provides an easy-to-use toolbox to characterize the spatial structure of point patterns. Our contribution presents the state of the art of distance-based methods employed in economic geography and which are also used in ecology. Topographic functions such as Ripley's K, absolute functions such as Duranton and Overman's Kd and relative functions such as Marcon and Puech's M are implemented. Their confidence envelopes (including global ones) and tests against counterfactuals are included in the package

    Graph analysis of functional brain networks: practical issues in translational neuroscience

    Full text link
    The brain can be regarded as a network: a connected system where nodes, or units, represent different specialized regions and links, or connections, represent communication pathways. From a functional perspective communication is coded by temporal dependence between the activities of different brain areas. In the last decade, the abstract representation of the brain as a graph has allowed to visualize functional brain networks and describe their non-trivial topological properties in a compact and objective way. Nowadays, the use of graph analysis in translational neuroscience has become essential to quantify brain dysfunctions in terms of aberrant reconfiguration of functional brain networks. Despite its evident impact, graph analysis of functional brain networks is not a simple toolbox that can be blindly applied to brain signals. On the one hand, it requires a know-how of all the methodological steps of the processing pipeline that manipulates the input brain signals and extract the functional network properties. On the other hand, a knowledge of the neural phenomenon under study is required to perform physiological-relevant analysis. The aim of this review is to provide practical indications to make sense of brain network analysis and contrast counterproductive attitudes

    Analytic urns

    Full text link
    This article describes a purely analytic approach to urn models of the generalized or extended P\'olya-Eggenberger type, in the case of two types of balls and constant ``balance,'' that is, constant row sum. The treatment starts from a quasilinear first-order partial differential equation associated with a combinatorial renormalization of the model and bases itself on elementary conformal mapping arguments coupled with singularity analysis techniques. Probabilistic consequences in the case of ``subtractive'' urns are new representations for the probability distribution of the urn's composition at any time n, structural information on the shape of moments of all orders, estimates of the speed of convergence to the Gaussian limit and an explicit determination of the associated large deviation function. In the general case, analytic solutions involve Abelian integrals over the Fermat curve x^h+y^h=1. Several urn models, including a classical one associated with balanced trees (2-3 trees and fringe-balanced search trees) and related to a previous study of Panholzer and Prodinger, as well as all urns of balance 1 or 2 and a sporadic urn of balance 3, are shown to admit of explicit representations in terms of Weierstra\ss elliptic functions: these elliptic models appear precisely to correspond to regular tessellations of the Euclidean plane.Comment: Published at http://dx.doi.org/10.1214/009117905000000026 in the Annals of Probability (http://www.imstat.org/aop/) by the Institute of Mathematical Statistics (http://www.imstat.org
    • …
    corecore