Search CORE

19,208 research outputs found

Bayesian Dependence Tests for Continuous, Binary and Mixed Continuous-Binary Variables

Author: Benavoli Alessio
de Campos Cassio P.
Publication venue: 'MDPI AG'
Publication date: 01/01/2016
Field of study

Tests for dependence of continuous, discrete and mixed continuous-discrete variables are ubiquitous in science. The goal of this paper is to derive Bayesian alternatives to frequentist null hypothesis significance tests for dependence. In particular, we will present three Bayesian tests for dependence of binary, continuous and mixed variables. These tests are nonparametric and based on the Dirichlet Process, which allows us to use the same prior model for all of them. Therefore, the tests are “consistent” among each other, in the sense that the probabilities that variables are dependent computed with these tests are commensurable across the different types of variables being tested. By means of simulations with artificial data, we show the effectiveness of the new tests

Queen's University Belfast Research Portal

Repository TU/e

Directory of Open Access Journals

Penalized EM algorithm and copula skeptic graphical models for inferring networks for mixed variables

Author: Abegaz Fentaw
Wit Ernst
Publication venue
Publication date: 01/01/2014
Field of study

In this article, we consider the problem of reconstructing networks for continuous, binary, count and discrete ordinal variables by estimating sparse precision matrix in Gaussian copula graphical models. We propose two approaches:

\ell_1

penalized extended rank likelihood with Monte Carlo Expectation-Maximization algorithm (copula EM glasso) and copula skeptic with pair-wise copula estimation for copula Gaussian graphical models. The proposed approaches help to infer networks arising from nonnormal and mixed variables. We demonstrate the performance of our methods through simulation studies and analysis of breast cancer genomic and clinical data and maize genetics data

arXiv.org e-Print Archive

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

Dissertations of the University of Groningen

Clustering South African households based on their asset status using latent variable models

Author: Clark Samuel J.
Collinson Mark A.
Gormley Isobel Claire
Kabudula Chodziwadziwa Whiteson
McCormick Tyler H.
McParland Damien
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 31/07/2014
Field of study

The Agincourt Health and Demographic Surveillance System has since 2001 conducted a biannual household asset survey in order to quantify household socio-economic status (SES) in a rural population living in northeast South Africa. The survey contains binary, ordinal and nominal items. In the absence of income or expenditure data, the SES landscape in the study population is explored and described by clustering the households into homogeneous groups based on their asset status. A model-based approach to clustering the Agincourt households, based on latent variable models, is proposed. In the case of modeling binary or ordinal items, item response theory models are employed. For nominal survey items, a factor analysis model, similar in nature to a multinomial probit model, is used. Both model types have an underlying latent variable structure - this similarity is exploited and the models are combined to produce a hybrid model capable of handling mixed data types. Further, a mixture of the hybrid models is considered to provide clustering capabilities within the context of mixed binary, ordinal and nominal response data. The proposed model is termed a mixture of factor analyzers for mixed data (MFA-MD). The MFA-MD model is applied to the survey data to cluster the Agincourt households into homogeneous groups. The model is estimated within the Bayesian paradigm, using a Markov chain Monte Carlo algorithm. Intuitive groupings result, providing insight to the different socio-economic strata within the Agincourt region.Comment: Published in at http://dx.doi.org/10.1214/14-AOAS726 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

CiteSeerX

Massively-Parallel Feature Selection for Big Data

Author: Borboudakis Giorgos
Christophides Vassilis
Katsogridakis Pavlos
Pratikakis Polyvios
Tsamardinos Ioannis
Publication venue
Publication date: 23/08/2017
Field of study

We present the Parallel, Forward-Backward with Pruning (PFBP) algorithm for feature selection (FS) in Big Data settings (high dimensionality and/or sample size). To tackle the challenges of Big Data FS PFBP partitions the data matrix both in terms of rows (samples, training examples) as well as columns (features). By employing the concepts of

p

-values of conditional independence tests and meta-analysis techniques PFBP manages to rely only on computations local to a partition while minimizing communication costs. Then, it employs powerful and safe (asymptotically sound) heuristics to make early, approximate decisions, such as Early Dropping of features from consideration in subsequent iterations, Early Stopping of consideration of features within the same iteration, or Early Return of the winner in each iteration. PFBP provides asymptotic guarantees of optimality for data distributions faithfully representable by a causal network (Bayesian network or maximal ancestral graph). Our empirical analysis confirms a super-linear speedup of the algorithm with increasing sample size, linear scalability with respect to the number of features and processing cores, while dominating other competitive algorithms in its class

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Hal-Diderot

Mixed Cumulative Distribution Networks

Author: Blundell Charles
Silva Ricardo
Teh Yee Whye
Publication venue
Publication date: 31/08/2010
Field of study

Directed acyclic graphs (DAGs) are a popular framework to express multivariate probability distributions. Acyclic directed mixed graphs (ADMGs) are generalizations of DAGs that can succinctly capture much richer sets of conditional independencies, and are especially useful in modeling the effects of latent variables implicitly. Unfortunately there are currently no good parameterizations of general ADMGs. In this paper, we apply recent work on cumulative distribution networks and copulas to propose one one general construction for ADMG models. We consider a simple parameter estimation approach, and report some encouraging experimental results.Comment: 11 pages, 4 figure

arXiv.org e-Print Archive

CiteSeerX

UCL Discovery

Oxford University Research Archive