5,942 research outputs found
Learning mixtures of structured distributions over discrete domains
Let be a class of probability distributions over the discrete
domain We show that if satisfies a rather
general condition -- essentially, that each distribution in can
be well-approximated by a variable-width histogram with few bins -- then there
is a highly efficient (both in terms of running time and sample complexity)
algorithm that can learn any mixture of unknown distributions from
We analyze several natural types of distributions over , including
log-concave, monotone hazard rate and unimodal distributions, and show that
they have the required structural property of being well-approximated by a
histogram with few bins. Applying our general algorithm, we obtain
near-optimally efficient algorithms for all these mixture learning problems.Comment: preliminary full version of soda'13 pape
Semiparametric posterior limits
We review the Bayesian theory of semiparametric inference following Bickel
and Kleijn (2012) and Kleijn and Knapik (2013). After an overview of efficiency
in parametric and semiparametric estimation problems, we consider the
Bernstein-von Mises theorem (see, e.g., Le Cam and Yang (1990)) and generalize
it to (LAN) regular and (LAE) irregular semiparametric estimation problems. We
formulate a version of the semiparametric Bernstein-von Mises theorem that does
not depend on least-favourable submodels, thus bypassing the most restrictive
condition in the presentation of Bickel and Kleijn (2012). The results are
applied to the (regular) estimation of the linear coefficient in partial linear
regression (with a Gaussian nuisance prior) and of the kernel bandwidth in a
model of normal location mixtures (with a Dirichlet nuisance prior), as well as
the (irregular) estimation of the boundary of the support of a monotone family
of densities (with a Gaussian nuisance prior).Comment: 47 pp., 1 figure, submitted for publication. arXiv admin note:
substantial text overlap with arXiv:1007.017
Change-point Problem and Regression: An Annotated Bibliography
The problems of identifying changes at unknown times and of estimating the location of changes in stochastic processes are referred to as the change-point problem or, in the Eastern literature, as disorder .
The change-point problem, first introduced in the quality control context, has since developed into a fundamental problem in the areas of statistical control theory, stationarity of a stochastic process, estimation of the current position of a time series, testing and estimation of change in the patterns of a regression model, and most recently in the comparison and matching of DNA sequences in microarray data analysis.
Numerous methodological approaches have been implemented in examining change-point models. Maximum-likelihood estimation, Bayesian estimation, isotonic regression, piecewise regression, quasi-likelihood and non-parametric regression are among the methods which have been applied to resolving challenges in change-point problems. Grid-searching approaches have also been used to examine the change-point problem.
Statistical analysis of change-point problems depends on the method of data collection. If the data collection is ongoing until some random time, then the appropriate statistical procedure is called sequential. If, however, a large finite set of data is collected with the purpose of determining if at least one change-point occurred, then this may be referred to as non-sequential. Not surprisingly, both the former and the latter have a rich literature with much of the earlier work focusing on sequential methods inspired by applications in quality control for industrial processes. In the regression literature, the change-point model is also referred to as two- or multiple-phase regression, switching regression, segmented regression, two-stage least squares (Shaban, 1980), or broken-line regression.
The area of the change-point problem has been the subject of intensive research in the past half-century. The subject has evolved considerably and found applications in many different areas. It seems rather impossible to summarize all of the research carried out over the past 50 years on the change-point problem. We have therefore confined ourselves to those articles on change-point problems which pertain to regression.
The important branch of sequential procedures in change-point problems has been left out entirely. We refer the readers to the seminal review papers by Lai (1995, 2001). The so called structural change models, which occupy a considerable portion of the research in the area of change-point, particularly among econometricians, have not been fully considered. We refer the reader to Perron (2005) for an updated review in this area. Articles on change-point in time series are considered only if the methodologies presented in the paper pertain to regression analysis
Estimation of extended mixed models using latent classes and latent processes: the R package lcmm
The R package lcmm provides a series of functions to estimate statistical
models based on linear mixed model theory. It includes the estimation of mixed
models and latent class mixed models for Gaussian longitudinal outcomes (hlme),
curvilinear and ordinal univariate longitudinal outcomes (lcmm) and curvilinear
multivariate outcomes (multlcmm), as well as joint latent class mixed models
(Jointlcmm) for a (Gaussian or curvilinear) longitudinal outcome and a
time-to-event that can be possibly left-truncated right-censored and defined in
a competing setting. Maximum likelihood esimators are obtained using a modified
Marquardt algorithm with strict convergence criteria based on the parameters
and likelihood stability, and on the negativity of the second derivatives. The
package also provides various post-fit functions including goodness-of-fit
analyses, classification, plots, predicted trajectories, individual dynamic
prediction of the event and predictive accuracy assessment. This paper
constitutes a companion paper to the package by introducing each family of
models, the estimation technique, some implementation details and giving
examples through a dataset on cognitive aging
- …