6 research outputs found
The correlation space of Gaussian latent tree models and model selection without fitting
We provide a complete description of possible covariance matrices consistent
with a Gaussian latent tree model for any tree. We then present techniques for
utilising these constraints to assess whether observed data is compatible with
that Gaussian latent tree model. Our method does not require us first to fit
such a tree. We demonstrate the usefulness of the inverse-Wishart distribution
for performing preliminary assessments of tree-compatibility using
semialgebraic constraints. Using results from Drton et al. (2008) we then
provide the appropriate moments required for test statistics for assessing
adherence to these equality constraints. These are shown to be effective even
for small sample sizes and can be easily adjusted to test either the entire
model or only certain macrostructures hypothesized within the tree. We
illustrate our exploratory tetrad analysis using a linguistic application and
our confirmatory tetrad analysis using a biological application.Comment: 15 page
Qualitative inequalities for squared partial correlations of a Gaussian random vector
We describe various sets of conditional independence relationships,
sufficient for qualitatively comparing non-vanishing squared partial
correlations of a Gaussian random vector. These sufficient conditions are
satisfied by several graphical Markov models. Rules for comparing degree of
association among the vertices of such Gaussian graphical models are also
developed. We apply these rules to compare conditional dependencies on Gaussian
trees. In particular for trees, we show that such dependence can be completely
characterized by the length of the paths joining the dependent vertices to each
other and to the vertices conditioned on. We also apply our results to
postulate rules for model selection for polytree models. Our rules apply to
mutual information of Gaussian random vectors as well.Comment: 21 pages, 13 figure
Gaussian latent tree model constraints for linguistics and other applications
The relationships between languages are often modelled as phylogenetic trees whereby there is a single shared ancestral language at the root and contemporary languages appear as leaves. These can be thought of as directed acyclic graphs with hidden variables, specifically Bayesian networks. However, from a statistical perspective there is often no formal assessment of the suitability of these latent tree models. A lot of the work that seeks to address this has focused on discrete variable models. However, when observations are instead considered as functional data, the high dimensional approximations are often better considered in a Gaussian context. The high dimensional data is often inefficiently stored and so the first challenge is to project this data to a low dimension while retaining the information of interest. One approach is to use the newly developed tool named separable-canonical variate analysis to form a basis.
Extending the techniques for assessing latent tree model compatibility to beyond discrete variables, the complete set of Gaussian tree constraints are derived for the first time. This set comprises equations and inequality statements in terms of correlations of observed variables. These statements must in theory be adhered to for a Gaussian latent tree model to be appropriate for a given data set. Using the separable-canonical variate analysis basis to obtain a truncated representation, the suitability of a phylogenetic tree can then be plainly assessed. However, in practice it is desirable to allow for some sampling error and as such probabilistic tools are developed alongside the theoretical derivation of Gaussian tree constraints.
The proposed methodology is implemented in an in-depth study of a real linguistic data set to assess the phylogenies of five Romance languages. This application is distinctive as the data set consists of acoustic recordings, these are treated as functional data, and moreover these are then being used to compare languages in a phylogenetic context. As a consequence a wide range of theory and tools are called upon from the multivariate and functional domains, and the powerful new separable-canonical function analysis and separable-canonical variate analysis are used. Utilising the newly derived Gaussian tree constraints for hidden variable models provides a first insight into features of spoken languages that appear to be tree-compatible