3,004 research outputs found
Small area estimation of general parameters with application to poverty indicators: A hierarchical Bayes approach
Poverty maps are used to aid important political decisions such as allocation
of development funds by governments and international organizations. Those
decisions should be based on the most accurate poverty figures. However, often
reliable poverty figures are not available at fine geographical levels or for
particular risk population subgroups due to the sample size limitation of
current national surveys. These surveys cannot cover adequately all the desired
areas or population subgroups and, therefore, models relating the different
areas are needed to 'borrow strength" from area to area. In particular, the
Spanish Survey on Income and Living Conditions (SILC) produces national poverty
estimates but cannot provide poverty estimates by Spanish provinces due to the
poor precision of direct estimates, which use only the province specific data.
It also raises the ethical question of whether poverty is more severe for women
than for men in a given province. We develop a hierarchical Bayes (HB) approach
for poverty mapping in Spanish provinces by gender that overcomes the small
province sample size problem of the SILC. The proposed approach has a wide
scope of application because it can be used to estimate general nonlinear
parameters. We use a Bayesian version of the nested error regression model in
which Markov chain Monte Carlo procedures and the convergence monitoring
therein are avoided. A simulation study reveals good frequentist properties of
the HB approach. The resulting poverty maps indicate that poverty, both in
frequency and intensity, is localized mostly in the southern and western
provinces and it is more acute for women than for men in most of the provinces.Comment: Published in at http://dx.doi.org/10.1214/13-AOAS702 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Empirical Likelihood for Regression Discontinuity Design
This paper proposes empirical likelihood based inference methods for causal effects identified from regression discontinuity designs. We consider both the sharp and fuzzy regression discontinuity designs and treat the regression functions as nonparametric. The proposed inference procedures do not require asymptotic variance estimation and the confidence sets have natural shapes, unlike the conventional Wald-type method. These features are illustrated by simulations and an empirical example which evaluates the effect of class size on pupils' scholastic achievements. Bandwidth selection methods, higher-order properties, and extensions to incorporate additional covariates and parametric functional forms are also discussed.Empirical likelihood, Nonparametric methods, Regression discontinuity design, Treatment effect
Identifying Real Estate Opportunities using Machine Learning
The real estate market is exposed to many fluctuations in prices because of
existing correlations with many variables, some of which cannot be controlled
or might even be unknown. Housing prices can increase rapidly (or in some
cases, also drop very fast), yet the numerous listings available online where
houses are sold or rented are not likely to be updated that often. In some
cases, individuals interested in selling a house (or apartment) might include
it in some online listing, and forget about updating the price. In other cases,
some individuals might be interested in deliberately setting a price below the
market price in order to sell the home faster, for various reasons. In this
paper, we aim at developing a machine learning application that identifies
opportunities in the real estate market in real time, i.e., houses that are
listed with a price substantially below the market price. This program can be
useful for investors interested in the housing market. We have focused in a use
case considering real estate assets located in the Salamanca district in Madrid
(Spain) and listed in the most relevant Spanish online site for home sales and
rentals. The application is formally implemented as a regression problem that
tries to estimate the market price of a house given features retrieved from
public online listings. For building this application, we have performed a
feature engineering stage in order to discover relevant features that allows
for attaining a high predictive performance. Several machine learning
algorithms have been tested, including regression trees, k-nearest neighbors,
support vector machines and neural networks, identifying advantages and
handicaps of each of them.Comment: 24 pages, 13 figures, 5 table
Assessing the Number of Components in Mixture Models: a Review.
Despite the widespread application of finite mixture models, the decision of how many classes are required to adequately represent the data is, according to many authors, an important, but unsolved issue. This work aims to review, describe and organize the available approaches designed to help the selection of the adequate number of mixture components (including Monte Carlo test procedures, information criteria and classification-based criteria); we also provide some published simulation results about their relative performance, with the purpose of identifying the scenarios where each criterion is more effective (adequate).Finite mixture; number of mixture components; information criteria; simulation studies.
Automated design of robust discriminant analysis classifier for foot pressure lesions using kinematic data
In the recent years, the use of motion tracking systems for acquisition of functional biomechanical gait data, has received increasing interest due to the richness and accuracy of the measured kinematic information. However, costs frequently restrict the number of subjects employed, and this makes the dimensionality of the collected data far higher than the available samples. This paper applies discriminant analysis algorithms to the classification of patients with different types of foot lesions, in order to establish an association between foot motion and lesion formation. With primary attention to small sample size situations, we compare different types of Bayesian classifiers and evaluate their performance with various dimensionality reduction techniques for feature extraction, as well as search methods for selection of raw kinematic variables. Finally, we propose a novel integrated method which fine-tunes the classifier parameters and selects the most relevant kinematic variables simultaneously. Performance comparisons are using robust resampling techniques such as Bootstrapand k-fold cross-validation. Results from experimentations with lesion subjects suffering from pathological plantar hyperkeratosis, show that the proposed method can lead tocorrect classification rates with less than 10% of the original features
Introduction
Zadanie pt. âDigitalizacja i udostÄpnienie w Cyfrowym Repozytorium Uniwersytetu ĆĂłdzkiego kolekcji czasopism naukowych wydawanych przez Uniwersytet ĆĂłdzkiâ nr 885/P-DUN/2014 zostaĆo dofinansowane ze ĆrodkĂłw MNiSW w ramach dziaĆalnoĆci upowszechniajÄ
cej naukÄ
Fuzzy Supernova Templates II: Parameter Estimation
Wide field surveys will soon be discovering Type Ia supernovae (SNe) at rates
of several thousand per year. Spectroscopic follow-up can only scratch the
surface for such enormous samples, so these extensive data sets will only be
useful to the extent that they can be characterized by the survey photometry
alone. In a companion paper (Rodney and Tonry, 2009) we introduced the SOFT
method for analyzing SNe using direct comparison to template light curves, and
demonstrated its application for photometric SN classification. In this work we
extend the SOFT method to derive estimates of redshift and luminosity distance
for Type Ia SNe, using light curves from the SDSS and SNLS surveys as a
validation set. Redshifts determined by SOFT using light curves alone are
consistent with spectroscopic redshifts, showing a root-mean-square scatter in
the residuals of RMS_z=0.051. SOFT can also derive simultaneous redshift and
distance estimates, yielding results that are consistent with the currently
favored Lambda-CDM cosmological model. When SOFT is given spectroscopic
information for SN classification and redshift priors, the RMS scatter in
Hubble diagram residuals is 0.18 mags for the SDSS data and 0.28 mags for the
SNLS objects. Without access to any spectroscopic information, and even without
any redshift priors from host galaxy photometry, SOFT can still measure
reliable redshifts and distances, with an increase in the Hubble residuals to
0.37 mags for the combined SDSS and SNLS data set. Using Monte Carlo
simulations we predict that SOFT will be able to improve constraints on
time-variable dark energy models by a factor of 2-3 with each new generation of
large-scale SN surveys.Comment: 20 pages, 7 figures, accepted to ApJ; paper 1 is arXiv:0910.370
- âŠ