109,129 research outputs found
Sublinear expectation linear regression
Nonlinear expectation, including sublinear expectation as its special case,
is a new and original framework of probability theory and has potential
applications in some scientific fields, especially in finance risk measure and
management. Under the nonlinear expectation framework, however, the related
statistical models and statistical inferences have not yet been well
established. The goal of this paper is to construct the sublinear expectation
regression and investigate its statistical inference. First, a sublinear
expectation linear regression is defined and its identifiability is given.
Then, based on the representation theorem of sublinear expectation and the
newly defined model, several parameter estimations and model predictions are
suggested, the asymptotic normality of estimations and the mini-max property of
predictions are obtained. Furthermore, new methods are developed to realize
variable selection for high-dimensional model. Finally, simulation studies and
a real-life example are carried out to illustrate the new models and
methodologies. All notions and methodologies developed are essentially
different from classical ones and can be thought of as a foundation for general
nonlinear expectation statistics
Penalized Likelihood and Bayesian Function Selection in Regression Models
Challenging research in various fields has driven a wide range of
methodological advances in variable selection for regression models with
high-dimensional predictors. In comparison, selection of nonlinear functions in
models with additive predictors has been considered only more recently. Several
competing suggestions have been developed at about the same time and often do
not refer to each other. This article provides a state-of-the-art review on
function selection, focusing on penalized likelihood and Bayesian concepts,
relating various approaches to each other in a unified framework. In an
empirical comparison, also including boosting, we evaluate several methods
through applications to simulated and real data, thereby providing some
guidance on their performance in practice
Data-driven discovery of coordinates and governing equations
The discovery of governing equations from scientific data has the potential
to transform data-rich fields that lack well-characterized quantitative
descriptions. Advances in sparse regression are currently enabling the
tractable identification of both the structure and parameters of a nonlinear
dynamical system from data. The resulting models have the fewest terms
necessary to describe the dynamics, balancing model complexity with descriptive
ability, and thus promoting interpretability and generalizability. This
provides an algorithmic approach to Occam's razor for model discovery. However,
this approach fundamentally relies on an effective coordinate system in which
the dynamics have a simple representation. In this work, we design a custom
autoencoder to discover a coordinate transformation into a reduced space where
the dynamics may be sparsely represented. Thus, we simultaneously learn the
governing equations and the associated coordinate system. We demonstrate this
approach on several example high-dimensional dynamical systems with
low-dimensional behavior. The resulting modeling framework combines the
strengths of deep neural networks for flexible representation and sparse
identification of nonlinear dynamics (SINDy) for parsimonious models. It is the
first method of its kind to place the discovery of coordinates and models on an
equal footing.Comment: 25 pages, 6 figures; added acknowledgment
Structured additive regression for multicategorical space-time data: A mixed model approach
In many practical situations, simple regression models suffer from the fact that the dependence of responses on covariates can not be sufficiently described by a purely parametric predictor. For example effects of continuous covariates may be nonlinear or complex interactions between covariates may be present. A specific problem of space-time data is that observations are in general spatially and/or temporally correlated. Moreover, unobserved heterogeneity between individuals or units may be present. While, in recent years, there has been a lot of work in this area dealing with univariate response models, only limited attention has been given to models for multicategorical space-time data. We propose a general class of structured additive regression models (STAR) for multicategorical responses, allowing for a flexible semiparametric predictor. This class includes models for multinomial responses with unordered categories as well as models for ordinal responses. Non-linear effects of continuous covariates, time trends and interactions between continuous covariates are modelled through Bayesian versions of penalized splines and flexible seasonal components. Spatial effects can be estimated based on Markov random fields, stationary Gaussian random fields or two-dimensional penalized splines. We present our approach from a Bayesian perspective, allowing to treat all functions and effects within a unified general framework by assigning appropriate priors with different forms and degrees of smoothness. Inference is performed on the basis of a multicategorical linear mixed model representation. This can be viewed as posterior mode estimation and is closely related to penalized likelihood estimation in a frequentist setting. Variance components, corresponding to inverse smoothing parameters, are then estimated by using restricted maximum likelihood. Numerically efficient algorithms allow computations even for fairly large data sets. As a typical example we present results on an analysis of data from a forest health survey
Geoadditive hazard regression for interval censored survival times
The Cox proportional hazards model is the most commonly used method when analyzing the impact of covariates on continuous survival times. In its classical form, the Cox model was introduced in the setting of right-censored observations. However, in practice other sampling schemes are frequently encountered and therefore extensions allowing for interval and left censoring or left truncation are clearly desired. Furthermore, many applications require a more flexible modeling of covariate information than the usual linear predictor. For example, effects of continuous covariates are likely to be of nonlinear form or spatial information is to be included appropriately. Further extensions should allow for time-varying effects of covariates or covariates that are themselves time-varying. Such models relax the assumption of proportional hazards. We propose a regression model for the hazard rate that combines and extends the above-mentioned features on the basis of a unifying Bayesian model formulation. Nonlinear and time-varying effects as well as the baseline hazard rate are modeled by penalized splines. Spatial effects can be included based on either Markov random fields or stationary Gaussian random fields. The model allows for arbitrary combinations of left, right and interval censoring as well as left truncation. Estimation is based on a reparameterisation of the model as a variance components mixed model. The variance parameters corresponding to inverse smoothing parameters can then be estimated based on an approximate marginal likelihood approach. As an application we present an analysis on childhood mortality in Nigeria, where the interval censoring framework also allows to deal with the problem of heaped survival times caused by memory effects. In a simulation study we investigate the effect of ignoring the impact of interval censored observations
Probabilistic Inference from Arbitrary Uncertainty using Mixtures of Factorized Generalized Gaussians
This paper presents a general and efficient framework for probabilistic
inference and learning from arbitrary uncertain information. It exploits the
calculation properties of finite mixture models, conjugate families and
factorization. Both the joint probability density of the variables and the
likelihood function of the (objective or subjective) observation are
approximated by a special mixture model, in such a way that any desired
conditional distribution can be directly obtained without numerical
integration. We have developed an extended version of the expectation
maximization (EM) algorithm to estimate the parameters of mixture models from
uncertain training examples (indirect observations). As a consequence, any
piece of exact or uncertain information about both input and output values is
consistently handled in the inference and learning stages. This ability,
extremely useful in certain situations, is not found in most alternative
methods. The proposed framework is formally justified from standard
probabilistic principles and illustrative examples are provided in the fields
of nonparametric pattern classification, nonlinear regression and pattern
completion. Finally, experiments on a real application and comparative results
over standard databases provide empirical evidence of the utility of the method
in a wide range of applications
- …