31,049 research outputs found

    Population Synthesis via k-Nearest Neighbor Crossover Kernel

    Full text link
    The recent development of multi-agent simulations brings about a need for population synthesis. It is a task of reconstructing the entire population from a sampling survey of limited size (1% or so), supplying the initial conditions from which simulations begin. This paper presents a new kernel density estimator for this task. Our method is an analogue of the classical Breiman-Meisel-Purcell estimator, but employs novel techniques that harness the huge degree of freedom which is required to model high-dimensional nonlinearly correlated datasets: the crossover kernel, the k-nearest neighbor restriction of the kernel construction set and the bagging of kernels. The performance as a statistical estimator is examined through real and synthetic datasets. We provide an "optimization-free" parameter selection rule for our method, a theory of how our method works and a computational cost analysis. To demonstrate the usefulness as a population synthesizer, our method is applied to a household synthesis task for an urban micro-simulator.Comment: 10 pages, 4 figures, IEEE International Conference on Data Mining (ICDM) 201

    Improvements in Maximum Likelihood Estimators of Truncated Normal Samples with Prior Knowledge of Ļƒ

    Get PDF
    Researchers analyzing historical data on human stature have long sought an estimator that performs well in truncated-normal samples. This paper reviews that search, focusing on two currently widespread procedures: truncated least squares (TLS) and truncated maximum likelihood (TML). The first suffers from bias. The second suffers in practical application from excessive variability. A simple procedure is developed to convert TLS truncated means into estimates of the underlying population means, assuming the contemporary population standard deviation. This procedure is shown to be equivalent to restricted TML estimation. Simulation methods are used to establish the mean squared error performance characteristics of the restricted and unconstrained TML estimators in relation to several population and sample parameters. The results provide general insight into the bias-precision tradeoff in restricted estimation and a specific practical guide to optimal estimator choice for researchers in anthropometrics

    A likelihood-based analysis for relaxing the exclusion restriction in randomized experiments with imperfect compliance

    Get PDF
    This paper examines the problem of relaxing the exclusion restriction for the evaluation of causal effects in randomized experiments with imperfect compliance. Exclusion restriction is a relevant assumption for identifying causal effects by the nonparametric instrumental variables technique, in which the template of a randomized experiment with imperfect compliance represents a natural parametric extension. However, the full relaxation of the exclusion restriction yields likelihood functions characterized by the presence of mixtures of distributions. This complicates a likelihood-based analysis because it implies partially identified models and more than one maximum likelihood point. We consider the model identifiability when the outcome distributions of various compliance states are in the same parametric class. A two-step estimation procedure based on detecting the root closest to the method of moments estimate of the parameter vector is proposed and analyzed in detail under normally distributed outcomes. An economic example with real data on return to schooling concludes the paper.compliers, exclusion restriction, mixture distributions, return to schooling.

    Model Selection for Gaussian Mixture Models

    Full text link
    This paper is concerned with an important issue in finite mixture modelling, the selection of the number of mixing components. We propose a new penalized likelihood method for model selection of finite multivariate Gaussian mixture models. The proposed method is shown to be statistically consistent in determining of the number of components. A modified EM algorithm is developed to simultaneously select the number of components and to estimate the mixing weights, i.e. the mixing probabilities, and unknown parameters of Gaussian distributions. Simulations and a real data analysis are presented to illustrate the performance of the proposed method

    Intervention analysis with state-space models to estimate discontinuities due to a survey redesign

    Full text link
    An important quality aspect of official statistics produced by national statistical institutes is comparability over time. To maintain uninterrupted time series, surveys conducted by national statistical institutes are often kept unchanged as long as possible. To improve the quality or efficiency of a survey process, however, it remains inevitable to adjust methods or redesign this process from time to time. Adjustments in the survey process generally affect survey characteristics such as response bias and therefore have a systematic effect on the parameter estimates of a sample survey. Therefore, it is important that the effects of a survey redesign on the estimated series are explained and quantified. In this paper a structural time series model is applied to estimate discontinuities in series of the Dutch survey on social participation and environmental consciousness due to a redesign of the underlying survey process.Comment: Published in at http://dx.doi.org/10.1214/09-AOAS305 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org
    • ā€¦
    corecore