3,035 research outputs found
Rediscovery of Good-Turing estimators via Bayesian nonparametrics
The problem of estimating discovery probabilities originated in the context
of statistical ecology, and in recent years it has become popular due to its
frequent appearance in challenging applications arising in genetics,
bioinformatics, linguistics, designs of experiments, machine learning, etc. A
full range of statistical approaches, parametric and nonparametric as well as
frequentist and Bayesian, has been proposed for estimating discovery
probabilities. In this paper we investigate the relationships between the
celebrated Good-Turing approach, which is a frequentist nonparametric approach
developed in the 1940s, and a Bayesian nonparametric approach recently
introduced in the literature. Specifically, under the assumption of a two
parameter Poisson-Dirichlet prior, we show that Bayesian nonparametric
estimators of discovery probabilities are asymptotically equivalent, for a
large sample size, to suitably smoothed Good-Turing estimators. As a by-product
of this result, we introduce and investigate a methodology for deriving exact
and asymptotic credible intervals to be associated with the Bayesian
nonparametric estimators of discovery probabilities. The proposed methodology
is illustrated through a comprehensive simulation study and the analysis of
Expressed Sequence Tags data generated by sequencing a benchmark complementary
DNA library
A new specification of generalized linear models for categorical data
Regression models for categorical data are specified in heterogeneous ways.
We propose to unify the specification of such models. This allows us to define
the family of reference models for nominal data. We introduce the notion of
reversible models for ordinal data that distinguishes adjacent and cumulative
models from sequential ones. The combination of the proposed specification with
the definition of reference and reversible models and various invariance
properties leads to a new view of regression models for categorical data.Comment: 31 pages, 13 figure
struc2vec: Learning Node Representations from Structural Identity
Structural identity is a concept of symmetry in which network nodes are
identified according to the network structure and their relationship to other
nodes. Structural identity has been studied in theory and practice over the
past decades, but only recently has it been addressed with representational
learning techniques. This work presents struc2vec, a novel and flexible
framework for learning latent representations for the structural identity of
nodes. struc2vec uses a hierarchy to measure node similarity at different
scales, and constructs a multilayer graph to encode structural similarities and
generate structural context for nodes. Numerical experiments indicate that
state-of-the-art techniques for learning node representations fail in capturing
stronger notions of structural identity, while struc2vec exhibits much superior
performance in this task, as it overcomes limitations of prior approaches. As a
consequence, numerical experiments indicate that struc2vec improves performance
on classification tasks that depend more on structural identity.Comment: 10 pages, KDD2017, Research Trac
Increasing Interdependence of Multivariate Distributions
Orderings of interdependence among random variables are useful in many economic contexts, for example, in assessing ex post inequality under uncertainty; in comparing multidimensional inequality; in valuing portfolios of assets or insurance policies; and in assessing systemic risk. We explore five orderings of interdependence for multivariate distributions: greater weak association, the supermodular ordering, the convex-modular ordering, the dispersion ordering, and the concordance ordering. For two dimensions, all five orderings are equivalent, whereas for an arbitrary number of dimensions n > 2, the five orderings are strictly ranked. For the special case of binary random variables, we establish some equivalences among the orderings.dependence ordering; stochastic orders; supermodularity; weak association; concordance JEL Classification Numbers: D63, D81, G11, G22
Accelerated hardware video object segmentation: From foreground detection to connected components labelling
This is the preprint version of the Article - Copyright @ 2010 ElsevierThis paper demonstrates the use of a single-chip FPGA for the segmentation of moving objects in a video sequence. The system maintains highly accurate background models, and integrates the detection of foreground pixels with the labelling of objects using a connected components algorithm. The background models are based on 24-bit RGB values and 8-bit gray scale intensity values. A multimodal background differencing algorithm is presented, using a single FPGA chip and four blocks of RAM. The real-time connected component labelling algorithm, also designed for FPGA implementation, run-length encodes the output of the background subtraction, and performs connected component analysis on this representation. The run-length encoding, together with other parts of the algorithm, is performed in parallel; sequential operations are minimized as the number of run-lengths are typically less than the number of pixels. The two algorithms are pipelined together for maximum efficiency
- ā¦