Search CORE

3,035 research outputs found

Rediscovery of Good-Turing estimators via Bayesian nonparametrics

Author: Favaro Stefano
Nipoti Bernardo
Teh Yee Whye
Publication venue
Publication date: 16/06/2015
Field of study

The problem of estimating discovery probabilities originated in the context of statistical ecology, and in recent years it has become popular due to its frequent appearance in challenging applications arising in genetics, bioinformatics, linguistics, designs of experiments, machine learning, etc. A full range of statistical approaches, parametric and nonparametric as well as frequentist and Bayesian, has been proposed for estimating discovery probabilities. In this paper we investigate the relationships between the celebrated Good-Turing approach, which is a frequentist nonparametric approach developed in the 1940s, and a Bayesian nonparametric approach recently introduced in the literature. Specifically, under the assumption of a two parameter Poisson-Dirichlet prior, we show that Bayesian nonparametric estimators of discovery probabilities are asymptotically equivalent, for a large sample size, to suitably smoothed Good-Turing estimators. As a by-product of this result, we introduce and investigate a methodology for deriving exact and asymptotic credible intervals to be associated with the Bayesian nonparametric estimators of discovery probabilities. The proposed methodology is illustrated through a comprehensive simulation study and the analysis of Expressed Sequence Tags data generated by sequencing a benchmark complementary DNA library

arXiv.org e-Print Archive

CiteSeerX

Institutional Research Information System University of Turin

A new specification of generalized linear models for categorical data

Author: Guédon Yann
Peyhardi Jean
Trottier Catherine
Publication venue
Publication date: 01/01/2014
Field of study

Regression models for categorical data are specified in heterogeneous ways. We propose to unify the specification of such models. This allows us to define the family of reference models for nominal data. We introduce the notion of reversible models for ordinal data that distinguishes adjacent and cumulative models from sequential ones. The combination of the proposed specification with the definition of reference and reversible models and various invariance properties leads to a new view of regression models for categorical data.Comment: 31 pages, 13 figure

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

HAL Descartes

Agritrop

HAL-CIRAD

struc2vec: Learning Node Representations from Structural Identity

Author: Mikolov Tomas
Narayanan A
Pizarro Narciso
Salvador S
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 03/07/2017
Field of study

Structural identity is a concept of symmetry in which network nodes are identified according to the network structure and their relationship to other nodes. Structural identity has been studied in theory and practice over the past decades, but only recently has it been addressed with representational learning techniques. This work presents struc2vec, a novel and flexible framework for learning latent representations for the structural identity of nodes. struc2vec uses a hierarchy to measure node similarity at different scales, and constructs a multilayer graph to encode structural similarities and generate structural context for nodes. Numerical experiments indicate that state-of-the-art techniques for learning node representations fail in capturing stronger notions of structural identity, while struc2vec exhibits much superior performance in this task, as it overcomes limitations of prior approaches. As a consequence, numerical experiments indicate that struc2vec improves performance on classification tasks that depend more on structural identity.Comment: 10 pages, KDD2017, Research Trac

arXiv.org e-Print Archive

Crossref

Increasing Interdependence of Multivariate Distributions

Author: Bruno Strulovici
Margaret Meyer
Publication venue
Publication date
Field of study

Orderings of interdependence among random variables are useful in many economic contexts, for example, in assessing ex post inequality under uncertainty; in comparing multidimensional inequality; in valuing portfolios of assets or insurance policies; and in assessing systemic risk. We explore five orderings of interdependence for multivariate distributions: greater weak association, the supermodular ordering, the convex-modular ordering, the dispersion ordering, and the concordance ordering. For two dimensions, all five orderings are equivalent, whereas for an arbitrary number of dimensions n > 2, the five orderings are strictly ranked. For the special case of binary random variables, we establish some equivalences among the orderings.dependence ordering; stochastic orders; supermodularity; weak association; concordance JEL Classification Numbers: D63, D81, G11, G22

Research Papers in Economics

Accelerated hardware video object segmentation: From foreground detection to connected components labelling

Author: Andrew Hunter
Batlle
Boykov
Cucchiara
Haralick
Hongying Meng
Kofi Appiah
Patrick Dickinson
Rosenfeld
Stauffer
Zhou
Zhou
Publication venue: 'Elsevier BV'
Publication date: 01/01/2010
Field of study

This is the preprint version of the Article - Copyright @ 2010 ElsevierThis paper demonstrates the use of a single-chip FPGA for the segmentation of moving objects in a video sequence. The system maintains highly accurate background models, and integrates the detection of foreground pixels with the labelling of objects using a connected components algorithm. The background models are based on 24-bit RGB values and 8-bit gray scale intensity values. A multimodal background differencing algorithm is presented, using a single FPGA chip and four blocks of RAM. The real-time connected component labelling algorithm, also designed for FPGA implementation, run-length encodes the output of the background subtraction, and performs connected component analysis on this representation. The run-length encoding, together with other parts of the algorithm, is performed in parallel; sequential operations are minimized as the number of run-lengths are typically less than the number of pixels. The two algorithms are pipelined together for maximum efficiency

University of Lincoln Institutional Repository

Crossref

Nottingham Trent Institutional Repository (IRep)

Sheffield Hallam University Research Archive

Brunel University Research Archive