Search CORE

5,942 research outputs found

Learning mixtures of structured distributions over discrete domains

Author: Chan Siu-on
Diakonikolas Ilias
Servedio Rocco A.
Sun Xiaorui
Publication venue
Publication date: 02/10/2012
Field of study

Let

\mathfrak{C}

be a class of probability distributions over the discrete domain

[n] = \{1,...,n\}.

We show that if

\mathfrak{C}

satisfies a rather general condition -- essentially, that each distribution in

\mathfrak{C}

can be well-approximated by a variable-width histogram with few bins -- then there is a highly efficient (both in terms of running time and sample complexity) algorithm that can learn any mixture of

k

unknown distributions from

\mathfrak{C}.

We analyze several natural types of distributions over

[n]

, including log-concave, monotone hazard rate and unimodal distributions, and show that they have the required structural property of being well-approximated by a histogram with few bins. Applying our general algorithm, we obtain near-optimally efficient algorithms for all these mixture learning problems.Comment: preliminary full version of soda'13 pape

arXiv.org e-Print Archive

CiteSeerX

Crossref

Semiparametric posterior limits

Author: Kleijn B. J. K.
Publication venue
Publication date: 21/05/2013
Field of study

We review the Bayesian theory of semiparametric inference following Bickel and Kleijn (2012) and Kleijn and Knapik (2013). After an overview of efficiency in parametric and semiparametric estimation problems, we consider the Bernstein-von Mises theorem (see, e.g., Le Cam and Yang (1990)) and generalize it to (LAN) regular and (LAE) irregular semiparametric estimation problems. We formulate a version of the semiparametric Bernstein-von Mises theorem that does not depend on least-favourable submodels, thus bypassing the most restrictive condition in the presentation of Bickel and Kleijn (2012). The results are applied to the (regular) estimation of the linear coefficient in partial linear regression (with a Gaussian nuisance prior) and of the kernel bandwidth in a model of normal location mixtures (with a Dirichlet nuisance prior), as well as the (irregular) estimation of the boundary of the support of a monotone family of densities (with a Gaussian nuisance prior).Comment: 47 pp., 1 figure, submitted for publication. arXiv admin note: substantial text overlap with arXiv:1007.017

arXiv.org e-Print Archive

CiteSeerX

Change-point Problem and Regression: An Annotated Bibliography

Author: Asgharian Masoud
Khodadadi Ahmad
Publication venue: Collection of Biostatistics Research Archive
Publication date: 12/11/2008
Field of study

The problems of identifying changes at unknown times and of estimating the location of changes in stochastic processes are referred to as the change-point problem or, in the Eastern literature, as disorder . The change-point problem, first introduced in the quality control context, has since developed into a fundamental problem in the areas of statistical control theory, stationarity of a stochastic process, estimation of the current position of a time series, testing and estimation of change in the patterns of a regression model, and most recently in the comparison and matching of DNA sequences in microarray data analysis. Numerous methodological approaches have been implemented in examining change-point models. Maximum-likelihood estimation, Bayesian estimation, isotonic regression, piecewise regression, quasi-likelihood and non-parametric regression are among the methods which have been applied to resolving challenges in change-point problems. Grid-searching approaches have also been used to examine the change-point problem. Statistical analysis of change-point problems depends on the method of data collection. If the data collection is ongoing until some random time, then the appropriate statistical procedure is called sequential. If, however, a large finite set of data is collected with the purpose of determining if at least one change-point occurred, then this may be referred to as non-sequential. Not surprisingly, both the former and the latter have a rich literature with much of the earlier work focusing on sequential methods inspired by applications in quality control for industrial processes. In the regression literature, the change-point model is also referred to as two- or multiple-phase regression, switching regression, segmented regression, two-stage least squares (Shaban, 1980), or broken-line regression. The area of the change-point problem has been the subject of intensive research in the past half-century. The subject has evolved considerably and found applications in many different areas. It seems rather impossible to summarize all of the research carried out over the past 50 years on the change-point problem. We have therefore confined ourselves to those articles on change-point problems which pertain to regression. The important branch of sequential procedures in change-point problems has been left out entirely. We refer the readers to the seminal review papers by Lai (1995, 2001). The so called structural change models, which occupy a considerable portion of the research in the area of change-point, particularly among econometricians, have not been fully considered. We refer the reader to Perron (2005) for an updated review in this area. Articles on change-point in time series are considered only if the methodologies presented in the paper pertain to regression analysis

Collection Of Biostatistics Research Archive

Estimation of extended mixed models using latent classes and latent processes: the R package lcmm

Author: Liquet Benoit
Philipps Viviane
Proust-Lima Cécile
Publication venue: 'Foundation for Open Access Statistic'
Publication date: 24/01/2016
Field of study

The R package lcmm provides a series of functions to estimate statistical models based on linear mixed model theory. It includes the estimation of mixed models and latent class mixed models for Gaussian longitudinal outcomes (hlme), curvilinear and ordinal univariate longitudinal outcomes (lcmm) and curvilinear multivariate outcomes (multlcmm), as well as joint latent class mixed models (Jointlcmm) for a (Gaussian or curvilinear) longitudinal outcome and a time-to-event that can be possibly left-truncated right-censored and defined in a competing setting. Maximum likelihood esimators are obtained using a modified Marquardt algorithm with strict convergence criteria based on the parameters and likelihood stability, and on the negativity of the second derivatives. The package also provides various post-fit functions including goodness-of-fit analyses, classification, plots, predicted trajectories, individual dynamic prediction of the event and predictive accuracy assessment. This paper constitutes a companion paper to the package by introducing each family of models, the estimation technique, some implementation details and giving examples through a dataset on cognitive aging

arXiv.org e-Print Archive

Directory of Open Access Journals

Queensland University of Technology ePrints Archive

Journal of Statistical Software