Search CORE

136,635 research outputs found

A low variance consistent test of relative dependency

Author: Blaschko Matthew
Bounliphone Wacha
Gretton Arthur
Tenenhaus Arthur
Publication venue
Publication date: 27/05/2015
Field of study

We describe a novel non-parametric statistical hypothesis test of relative dependence between a source variable and two candidate target variables. Such a test enables us to determine whether one source variable is significantly more dependent on a first target variable or a second. Dependence is measured via the Hilbert-Schmidt Independence Criterion (HSIC), resulting in a pair of empirical dependence measures (source-target 1, source-target 2). We test whether the first dependence measure is significantly larger than the second. Modeling the covariance between these HSIC statistics leads to a provably more powerful test than the construction of independent HSIC statistics by sub-sampling. The resulting test is consistent and unbiased, and (being based on U-statistics) has favorable convergence properties. The test can be computed in quadratic time, matching the computational complexity of standard empirical HSIC estimators. The effectiveness of the test is demonstrated on several real-world problems: we identify language groups from a multilingual corpus, and we prove that tumor location is more dependent on gene expression than chromosomal imbalances. Source code is available for download at https://github.com/wbounliphone/reldep.Comment: International Conference on Machine Learning, Jul 2015, Lille, Franc

arXiv.org e-Print Archive

HAL-CentraleSupelec

INRIA a CCSD electronic archive server

HAL-Rennes 1

Robust Unit Root and Cointegration Rank Tests for Panels and Large Systems

Author: Peter Pedroni
Tim Vogelsang
Publication venue
Publication date
Field of study

This study develops new tests for unit roots and cointegration rank in heterogeneous time series panels using methods that are robust to the presence of both incidental trends and cross sectional dependency of unknown form. Furthermore, the procedures do not require a choice of lag truncation or bandwidth to accommodate higher order serial correlation. The cointegration rank tests can also be implemented in relatively large dimensioned systems of equations for which conventional VECM based tests become infeasible. Monte Carlo simulations demonstrate that the procedures have high power and good size properties even in panels with relatively small dimensions.Panel Unit Roots, Cointegration Rank Tests, Robust Autocovariance Estimation

Research Papers in Economics

Asymptotic Analysis of Generative Semi-Supervised Learning

Author: Balasubramanian Krishnakumar
Dillon Joshua V
Lebanon Guy
Publication venue
Publication date: 01/01/2010
Field of study

Semisupervised learning has emerged as a popular framework for improving modeling accuracy while controlling labeling cost. Based on an extension of stochastic composite likelihood we quantify the asymptotic accuracy of generative semi-supervised learning. In doing so, we complement distribution-free analysis by providing an alternative framework to measure the value associated with different labeling policies and resolve the fundamental question of how much data to label and in what manner. We demonstrate our approach with both simulation studies and real world experiments using naive Bayes for text classification and MRFs and CRFs for structured prediction in NLP.Comment: 12 pages, 9 figure

arXiv.org e-Print Archive

CiteSeerX

Robust Inference of Trees

Author: C.K. Chow
D.H. Wolpert
G.D. Kleiter
H. Jeffreys
H. Papadimitriou
H. Yaman
I.D. Aron
J. Pearl
J.-M. Bernard
J.B. Kruskal Jr.
J.B.S. Haldane
M. Hutter
M. Ramoni
M. Zaffalon
M.G. Kendall
Marco Zaffalon
Marcus Hutter
N. Friedman
P. Walley
P. Walley
S. Kullback
W. Perks
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 25/11/2005
Field of study

This paper is concerned with the reliable inference of optimal tree-approximations to the dependency structure of an unknown distribution generating data. The traditional approach to the problem measures the dependency strength between random variables by the index called mutual information. In this paper reliability is achieved by Walley's imprecise Dirichlet model, which generalizes Bayesian learning with Dirichlet priors. Adopting the imprecise Dirichlet model results in posterior interval expectation for mutual information, and in a set of plausible trees consistent with the data. Reliable inference about the actual tree is achieved by focusing on the substructure common to all the plausible trees. We develop an exact algorithm that infers the substructure in time O(m^4), m being the number of random variables. The new algorithm is applied to a set of data sampled from a known distribution. The method is shown to reliably infer edges of the actual tree even when the data are very scarce, unlike the traditional approach. Finally, we provide lower and upper credibility limits for mutual information under the imprecise Dirichlet model. These enable the previous developments to be extended to a full inferential method for trees.Comment: 26 pages, 7 figure

arXiv.org e-Print Archive

Crossref

The Australian National University

Why has (reasonably accurate) Automatic Speech Recognition been so hard to achieve?

Author: Gillick Larry
Wegmann Steven
Publication venue
Publication date: 01/01/2009
Field of study

Hidden Markov models (HMMs) have been successfully applied to automatic speech recognition for more than 35 years in spite of the fact that a key HMM assumption -- the statistical independence of frames -- is obviously violated by speech data. In fact, this data/model mismatch has inspired many attempts to modify or replace HMMs with alternative models that are better able to take into account the statistical dependence of frames. However it is fair to say that in 2010 the HMM is the consensus model of choice for speech recognition and that HMMs are at the heart of both commercially available products and contemporary research systems. In this paper we present a preliminary exploration aimed at understanding how speech data depart from HMMs and what effect this departure has on the accuracy of HMM-based speech recognition. Our analysis uses standard diagnostic tools from the field of statistics -- hypothesis testing, simulation and resampling -- which are rarely used in the field of speech recognition. Our main result, obtained by novel manipulations of real and resampled data, demonstrates that real data have statistical dependency and that this dependency is responsible for significant numbers of recognition errors. We also demonstrate, using simulation and resampling, that if we `remove' the statistical dependency from data, then the resulting recognition error rates become negligible. Taken together, these results suggest that a better understanding of the structure of the statistical dependency in speech data is a crucial first step towards improving HMM-based speech recognition

arXiv.org e-Print Archive

CiteSeerX

Recommended from our members

Endogenous Correlation

Author: Satchell Stephen E.
Yang J.-H. Steffi
Publication venue: Faculty of Economics
Publication date: 16/06/2004
Field of study

We model endogenous correlation in asset returns via the role of heterogeneous expectations in investor types, and the dynamic impact of imitative learning by investors. Learning is driven by relative performance. In addition, we allow a cautious slow learning pace to reflect institutional conditions. Imitative learning shapes the market ecology that influences price formation. Using the model of non-imitative agents as a benchmark, our results show that the dynamics of imitative learning endogenously induce a significant degree of asset dependency and patterns of non-constant correlation. The asymmetric learning effect on correlation, however, implies a self-reinforcing process, where a bearish condition amplifies the effect that further exacerbates asset dependency. We conclude that imitative learning, even when rational, can to a certain extent account for the phenomena of market crashes. Our results have implications for transparency in regulation issues

Apollo (Cambridge)

Development of filtered Euler–Euler two-phase model for circulating fluidised bed: High resolution simulation, formulation and a priori analyses

Author: Fede Pascal
Simonin Olivier
Özel Ali
Publication venue: 'Elsevier BV'
Publication date: 01/10/2013
Field of study

Euler–Euler two-phase model simulations are usually performed with mesh sizes larger than the smallscale structure size of gas–solid flows in industrial fluidised beds because of computational resource limitation. Thus, these simulations do not fully account for the particle segregation effect at the small scale and this causes poor prediction of bed hydrodynamics. An appropriate modelling approach accounting for the influence of unresolved structures needs to be proposed for practical simulations. For this purpose, computational grids are refined to a cell size of a few particle diameters to obtain mesh-independent results requiring up to 17 million cells in a 3D periodic circulating fluidised bed. These mesh-independent results are filtered by volume averaging and used to perform a priori analyses on the filtered phase balance equations. Results show that filtered momentum equations can be used for practical simulations but must take account of a drift velocity due to the sub-grid correlation between the local fluid velocity and the local particle volume fraction, and particle sub-grid stresses due to the filtering of the non-linear convection term. This paper proposes models for sub-grid drift velocity and particle sub-grid stresses and assesses these models by a priori tests

Heriot Watt Pure

Open Archive Toulouse Archive Ouverte