3,004 research outputs found

    Small area estimation of general parameters with application to poverty indicators: A hierarchical Bayes approach

    Full text link
    Poverty maps are used to aid important political decisions such as allocation of development funds by governments and international organizations. Those decisions should be based on the most accurate poverty figures. However, often reliable poverty figures are not available at fine geographical levels or for particular risk population subgroups due to the sample size limitation of current national surveys. These surveys cannot cover adequately all the desired areas or population subgroups and, therefore, models relating the different areas are needed to 'borrow strength" from area to area. In particular, the Spanish Survey on Income and Living Conditions (SILC) produces national poverty estimates but cannot provide poverty estimates by Spanish provinces due to the poor precision of direct estimates, which use only the province specific data. It also raises the ethical question of whether poverty is more severe for women than for men in a given province. We develop a hierarchical Bayes (HB) approach for poverty mapping in Spanish provinces by gender that overcomes the small province sample size problem of the SILC. The proposed approach has a wide scope of application because it can be used to estimate general nonlinear parameters. We use a Bayesian version of the nested error regression model in which Markov chain Monte Carlo procedures and the convergence monitoring therein are avoided. A simulation study reveals good frequentist properties of the HB approach. The resulting poverty maps indicate that poverty, both in frequency and intensity, is localized mostly in the southern and western provinces and it is more acute for women than for men in most of the provinces.Comment: Published in at http://dx.doi.org/10.1214/13-AOAS702 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Empirical Likelihood for Regression Discontinuity Design

    Get PDF
    This paper proposes empirical likelihood based inference methods for causal effects identified from regression discontinuity designs. We consider both the sharp and fuzzy regression discontinuity designs and treat the regression functions as nonparametric. The proposed inference procedures do not require asymptotic variance estimation and the confidence sets have natural shapes, unlike the conventional Wald-type method. These features are illustrated by simulations and an empirical example which evaluates the effect of class size on pupils' scholastic achievements. Bandwidth selection methods, higher-order properties, and extensions to incorporate additional covariates and parametric functional forms are also discussed.Empirical likelihood, Nonparametric methods, Regression discontinuity design, Treatment effect

    Identifying Real Estate Opportunities using Machine Learning

    Full text link
    The real estate market is exposed to many fluctuations in prices because of existing correlations with many variables, some of which cannot be controlled or might even be unknown. Housing prices can increase rapidly (or in some cases, also drop very fast), yet the numerous listings available online where houses are sold or rented are not likely to be updated that often. In some cases, individuals interested in selling a house (or apartment) might include it in some online listing, and forget about updating the price. In other cases, some individuals might be interested in deliberately setting a price below the market price in order to sell the home faster, for various reasons. In this paper, we aim at developing a machine learning application that identifies opportunities in the real estate market in real time, i.e., houses that are listed with a price substantially below the market price. This program can be useful for investors interested in the housing market. We have focused in a use case considering real estate assets located in the Salamanca district in Madrid (Spain) and listed in the most relevant Spanish online site for home sales and rentals. The application is formally implemented as a regression problem that tries to estimate the market price of a house given features retrieved from public online listings. For building this application, we have performed a feature engineering stage in order to discover relevant features that allows for attaining a high predictive performance. Several machine learning algorithms have been tested, including regression trees, k-nearest neighbors, support vector machines and neural networks, identifying advantages and handicaps of each of them.Comment: 24 pages, 13 figures, 5 table

    Assessing the Number of Components in Mixture Models: a Review.

    Get PDF
    Despite the widespread application of finite mixture models, the decision of how many classes are required to adequately represent the data is, according to many authors, an important, but unsolved issue. This work aims to review, describe and organize the available approaches designed to help the selection of the adequate number of mixture components (including Monte Carlo test procedures, information criteria and classification-based criteria); we also provide some published simulation results about their relative performance, with the purpose of identifying the scenarios where each criterion is more effective (adequate).Finite mixture; number of mixture components; information criteria; simulation studies.

    Automated design of robust discriminant analysis classifier for foot pressure lesions using kinematic data

    Get PDF
    In the recent years, the use of motion tracking systems for acquisition of functional biomechanical gait data, has received increasing interest due to the richness and accuracy of the measured kinematic information. However, costs frequently restrict the number of subjects employed, and this makes the dimensionality of the collected data far higher than the available samples. This paper applies discriminant analysis algorithms to the classification of patients with different types of foot lesions, in order to establish an association between foot motion and lesion formation. With primary attention to small sample size situations, we compare different types of Bayesian classifiers and evaluate their performance with various dimensionality reduction techniques for feature extraction, as well as search methods for selection of raw kinematic variables. Finally, we propose a novel integrated method which fine-tunes the classifier parameters and selects the most relevant kinematic variables simultaneously. Performance comparisons are using robust resampling techniques such as Bootstrap632+632+and k-fold cross-validation. Results from experimentations with lesion subjects suffering from pathological plantar hyperkeratosis, show that the proposed method can lead tosim96sim 96%correct classification rates with less than 10% of the original features

    Introduction

    Get PDF
    Zadanie pt. „Digitalizacja i udostępnienie w Cyfrowym Repozytorium Uniwersytetu Ɓódzkiego kolekcji czasopism naukowych wydawanych przez Uniwersytet Ɓódzki” nr 885/P-DUN/2014 zostaƂo dofinansowane ze ƛrodków MNiSW w ramach dziaƂalnoƛci upowszechniającej naukę

    Fuzzy Supernova Templates II: Parameter Estimation

    Full text link
    Wide field surveys will soon be discovering Type Ia supernovae (SNe) at rates of several thousand per year. Spectroscopic follow-up can only scratch the surface for such enormous samples, so these extensive data sets will only be useful to the extent that they can be characterized by the survey photometry alone. In a companion paper (Rodney and Tonry, 2009) we introduced the SOFT method for analyzing SNe using direct comparison to template light curves, and demonstrated its application for photometric SN classification. In this work we extend the SOFT method to derive estimates of redshift and luminosity distance for Type Ia SNe, using light curves from the SDSS and SNLS surveys as a validation set. Redshifts determined by SOFT using light curves alone are consistent with spectroscopic redshifts, showing a root-mean-square scatter in the residuals of RMS_z=0.051. SOFT can also derive simultaneous redshift and distance estimates, yielding results that are consistent with the currently favored Lambda-CDM cosmological model. When SOFT is given spectroscopic information for SN classification and redshift priors, the RMS scatter in Hubble diagram residuals is 0.18 mags for the SDSS data and 0.28 mags for the SNLS objects. Without access to any spectroscopic information, and even without any redshift priors from host galaxy photometry, SOFT can still measure reliable redshifts and distances, with an increase in the Hubble residuals to 0.37 mags for the combined SDSS and SNLS data set. Using Monte Carlo simulations we predict that SOFT will be able to improve constraints on time-variable dark energy models by a factor of 2-3 with each new generation of large-scale SN surveys.Comment: 20 pages, 7 figures, accepted to ApJ; paper 1 is arXiv:0910.370
    • 

    corecore