6 research outputs found

    Bayesian Inference of Networks Across Multiple Sample Groups and Data Types

    Full text link
    In this paper, we develop a graphical modeling framework for the inference of networks across multiple sample groups and data types. In medical studies, this setting arises whenever a set of subjects, which may be heterogeneous due to differing disease stage or subtype, is profiled across multiple platforms, such as metabolomics, proteomics, or transcriptomics data. Our proposed Bayesian hierarchical model first links the network structures within each platform using a Markov random field prior to relate edge selection across sample groups, and then links the network similarity parameters across platforms. This enables joint estimation in a flexible manner, as we make no assumptions on the directionality of influence across the data types or the extent of network similarity across the sample groups and platforms. In addition, our model formulation allows the number of variables and number of subjects to differ across the data types, and only requires that we have data for the same set of groups. We illustrate the proposed approach through both simulation studies and an application to gene expression levels and metabolite abundances on subjects with varying severity levels of Chronic Obstructive Pulmonary Disease (COPD)

    Modeling and Estimating Multi-Block Interactions for High-Dimensional Stationary Time Series

    Full text link
    Modeling and estimating interactions amongst multiple groups of variables is an important task for understanding the structure of complex system. In particular, for time series, the interdependence structure can be either on contemporaneous correlations, or on lead-lag cross-relations. This thesis addresses a number of topics related to such interdependence structures, under high-dimensional scaling. The first part of the thesis considers modeling and estimating interactions between observable blocks of variables, as well as their respective within-block dependence structures, in high-dimensional independent and identically distributed (iid), as well as temporal dependent settings. In the iid case, we model the blocks of variables of interest through a multi-layered Gaussian graphical model, and introduce a penalized maximum likelihood (MLE) procedure that provides both statistical and algorithmic guarantees, leveraging the structure of the log-likelihood function and its bi-convex nature. For the case where the data exhibit temporal dependence, the blocks are modeled through a stable Vector Autoregressive (VAR) system with group Granger-causal ordering. Building upon the work for the iid case, we estimate their lead-lag relationships, as well as the contemporaneous dependence structure using a penalized MLE criterion, under different structural assumptions of the transition matrices --- sparse or low rank. We establish theoretical properties for the estimates analogous to the iid case, modulo an additional cost due to the temporal dependence in the data. Moreover, we devise a testing procedure for the presence of such group Granger causality, tailoring it to the posited structural assumptions on the transition matrix that couples the blocks. The devised estimation and testing procedure are assessed via numerical experiments, and further illustrated on a real data example from economics that examines the impact of the stock market on major macroeconomic indicators. However, large stable VAR systems have the inherent limitation that the transition matrix needs to be very sparse or has small averaged magnitude to satisfy the stationary constraint. This further raises the issue of whether VAR model is the appropriate modeling framework for ultra large number of time series. To this end, we consider systems of time series that can be summarized by a small set of latent factors. In the second part of this thesis, we focus on estimating the interaction between an observable process and a dynamically evolving latent factor process. Specifically, we extend the popular in applied economics work, factor-augmented vector autoregressive (FAVAR) model to high dimensions and study estimation of the model parameters by formulating an optimization problem that involves a low-rank-plus-sparse type decomposition. Moreover, we investigate model identifiability issues and establish theoretical properties for the proposed estimator. The performance of the proposed method is evaluated through synthetic data, and the model is further illustrated on an economic data set that examines interlinkages between commodity prices and macroeconomic variables. Along a slightly different line of inquiry where the contemporaneous dependence is of prime interest rather than lead-lag relationships, we extend the approximate factor model where correlations amongst the idiosyncratic (error) component are assumed to be weak, to the case where moderate-to-strong correlations are allowed. Using a formulation similar to the FAVAR problem, we propose an algorithm to estimate the model parameters and investigate its statistical and algorithmic properties. The model and the quality of the resulting estimates are illustrated on log-returns of stock prices of large financial institutions.PHDStatisticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/145976/1/jiahelin_1.pd
    corecore