237 research outputs found
Copula Density Estimation by Total Variation Penalized Likelihood with Linear Equality Constraints
A copula density is the joint probability density function (PDF) of a random vector with uniform marginals. An approach to bivariate copula density estimation is introduced that is based on a maximum penalized likelihood estimation (MPLE) with a total variation (TV) penalty term. The marginal unity and symmetry constraints for copula density are enforced by linear equality constraints. The TV-MPLE subject to linear equality constraints is solved by an augmented Lagrangian and operator-splitting algorithm. It offers an order of magnitude improvement in computational efficiency over another TV-MPLE method without constraints solved by log-barrier method for second order cone program. A data-driven selection of the regularization parameter is through K-fold cross-validation (CV). Simulation and real data application show the effectiveness of the proposed approach. The MATLAB code implementing the methodology is available online
Factorial graphical lasso for dynamic networks
Dynamic networks models describe a growing number of important scientific
processes, from cell biology and epidemiology to sociology and finance. There
are many aspects of dynamical networks that require statistical considerations.
In this paper we focus on determining network structure. Estimating dynamic
networks is a difficult task since the number of components involved in the
system is very large. As a result, the number of parameters to be estimated is
bigger than the number of observations. However, a characteristic of many
networks is that they are sparse. For example, the molecular structure of genes
make interactions with other components a highly-structured and therefore
sparse process.
Penalized Gaussian graphical models have been used to estimate sparse
networks. However, the literature has focussed on static networks, which lack
specific temporal constraints. We propose a structured Gaussian dynamical
graphical model, where structures can consist of specific time dynamics, known
presence or absence of links and block equality constraints on the parameters.
Thus, the number of parameters to be estimated is reduced and accuracy of the
estimates, including the identification of the network, can be tuned up. Here,
we show that the constrained optimization problem can be solved by taking
advantage of an efficient solver, logdetPPA, developed in convex optimization.
Moreover, model selection methods for checking the sensitivity of the inferred
networks are described. Finally, synthetic and real data illustrate the
proposed methodologies.Comment: 30 pp, 5 figure
Recommended from our members
A joint regression modeling framework for analyzing bivariate binary data in R
We discuss some of the features of the R add-on package GJRM which implements a flexible joint modeling framework for fitting a number of multivariate response regression models under various sampling schemes. In particular,we focus on the case inwhich the user wishes to fit bivariate binary regression models in the presence of several forms of selection bias. The framework allows for Gaussian and non-Gaussian dependencies through the use of copulae, and for the association and mean parameters to depend on flexible functions of covariates. We describe some of the methodological details underpinning the bivariate binary models implemented in the package and illustrate them by fitting interpretable models of different complexity on three data-sets
Distributional Regression for Data Analysis
Flexible modeling of how an entire distribution changes with covariates is an
important yet challenging generalization of mean-based regression that has seen
growing interest over the past decades in both the statistics and machine
learning literature. This review outlines selected state-of-the-art statistical
approaches to distributional regression, complemented with alternatives from
machine learning. Topics covered include the similarities and differences
between these approaches, extensions, properties and limitations, estimation
procedures, and the availability of software. In view of the increasing
complexity and availability of large-scale data, this review also discusses the
scalability of traditional estimation methods, current trends, and open
challenges. Illustrations are provided using data on childhood malnutrition in
Nigeria and Australian electricity prices.Comment: Accepted for publication in Annual Review of Statistics and its
Applicatio
Generalized Additive Modeling For Multivariate Distributions
In this thesis, we develop tools to study the influence of predictors on multivariate distributions. We tackle the issue of conditional dependence modeling using generalized additive models, a natural extension of linear and generalized linear models allowing for smooth functions of the covariates. Compared to existing methods, the framework that we develop has two main advantages. First, it is completely flexible, in the sense that the dependence structure can vary with an arbitrary set of covariates in a parametric, nonparametric or semiparametric way. Second, it is both quick and numerically stable, which means that it is suitable for exploratory data analysis and stepwise model building. Starting from the bivariate case, we extend our framework to pair-copula constructions, and open new possibilities for further applied and methodological work. Our regression-like theory of the dependence, being built on conditional copulas and generalized additive models, is at the same time theoretically sound and practically useful
Bivariate copula additive models for location, scale and shape
In generalized additive models for location, scale and shape (GAMLSS), the response distribution is not restricted to belong to the exponential family and all the model’s parameters can be made dependent on additive predictors that allow for several types of covariate effects (such as linear, non-linear, random and spatial effects). In many empirical situations, however, modeling simultaneously two or more responses conditional on some covariates can be of considerable relevance. The scope of GAMLSS is extended by introducing bivariate copula models with continuous margins for the GAMLSS class. The proposed computational tool permits the copula dependence and marginal distribution parameters to be estimated simultaneously, and each parameter to be modeled using an additive predictor. Simultaneous parameter estimation is achieved within a penalized likelihood framework using a trust region algorithm with integrated automatic multiple smoothing parameter selection. The introduced approach allows for straightforward inclusion of potentially any parametric marginal distribution and copula function. The models can be easily used via the copulaReg() function in the R package SemiParBIVProbit. The proposal is illustrated through two case studies and simulated data
On Graphical Models via Univariate Exponential Family Distributions
Undirected graphical models, or Markov networks, are a popular class of
statistical models, used in a wide variety of applications. Popular instances
of this class include Gaussian graphical models and Ising models. In many
settings, however, it might not be clear which subclass of graphical models to
use, particularly for non-Gaussian and non-categorical data. In this paper, we
consider a general sub-class of graphical models where the node-wise
conditional distributions arise from exponential families. This allows us to
derive multivariate graphical model distributions from univariate exponential
family distributions, such as the Poisson, negative binomial, and exponential
distributions. Our key contributions include a class of M-estimators to fit
these graphical model distributions; and rigorous statistical analysis showing
that these M-estimators recover the true graphical model structure exactly,
with high probability. We provide examples of genomic and proteomic networks
learned via instances of our class of graphical models derived from Poisson and
exponential distributions.Comment: Journal of Machine Learning Researc
Dependence Modelling and Testing: Copula and Varying Coefficient Model with Missing Data
This thesis investigates three topics in theoretical econometrics: goodness-of-fit tests for copulas,
copula density estimators which preserve the copula property, and bias-correction for the naive
kernel local linear estimators in the two-sample varying coefficient model with missing data.
In the first topic a family of goodness-of-fit tests for copulas is proposed. The tests use generalizations
of the information matrix equality of White (1982). The asymptotic distribution of the
generalized tests is derived. In Monte Carlo simulations, the behavior of the new tests is compared
with several Cramer-von Mises type tests and the desired properties of the new tests are confirmed
in high dimensions. In the second topic, a semi-parametric copula density estimation procedure
that guarantees that the estimator is a genuine copula density is outlined. A simulation-based study
is constructed to examine the performance of the proposed copula density estimation method and
compare it with the leading copula density estimators in the literature. The method is also applied to
estimate copula densities in two empirical cases. The third topic shows that the naive kernel estimator
using matching data is not consistent in the two-sample varying coefficient model with missing
data. A bias-corrected consistent estimator is proposed and the asymptotic theory is discussed. A
simulation study is conducted to support the theoretical results
- …