40,158 research outputs found
Learning From Labeled And Unlabeled Data: An Empirical Study Across Techniques And Domains
There has been increased interest in devising learning techniques that
combine unlabeled data with labeled data ? i.e. semi-supervised learning.
However, to the best of our knowledge, no study has been performed across
various techniques and different types and amounts of labeled and unlabeled
data. Moreover, most of the published work on semi-supervised learning
techniques assumes that the labeled and unlabeled data come from the same
distribution. It is possible for the labeling process to be associated with a
selection bias such that the distributions of data points in the labeled and
unlabeled sets are different. Not correcting for such bias can result in biased
function approximation with potentially poor performance. In this paper, we
present an empirical study of various semi-supervised learning techniques on a
variety of datasets. We attempt to answer various questions such as the effect
of independence or relevance amongst features, the effect of the size of the
labeled and unlabeled sets and the effect of noise. We also investigate the
impact of sample-selection bias on the semi-supervised learning techniques
under study and implement a bivariate probit technique particularly designed to
correct for such bias
Monte Carlo modified profile likelihood in models for clustered data
The main focus of the analysts who deal with clustered data is usually not on
the clustering variables, and hence the group-specific parameters are treated
as nuisance. If a fixed effects formulation is preferred and the total number
of clusters is large relative to the single-group sizes, classical frequentist
techniques relying on the profile likelihood are often misleading. The use of
alternative tools, such as modifications to the profile likelihood or
integrated likelihoods, for making accurate inference on a parameter of
interest can be complicated by the presence of nonstandard modelling and/or
sampling assumptions. We show here how to employ Monte Carlo simulation in
order to approximate the modified profile likelihood in some of these
unconventional frameworks. The proposed solution is widely applicable and is
shown to retain the usual properties of the modified profile likelihood. The
approach is examined in two instances particularly relevant in applications,
i.e. missing-data models and survival models with unspecified censoring
distribution. The effectiveness of the proposed solution is validated via
simulation studies and two clinical trial applications
A Spatial Quantile Regression Hedonic Model of Agricultural Land Prices
Abstract Land price studies typically employ hedonic analysis to identify the impact of land characteristics on price. Owing to the spatial fixity of land, however, the question of possible spatial dependence in agricultural land prices arises. The presence of spatial dependence in agricultural land prices can have serious consequences for the hedonic model analysis. Ignoring spatial autocorrelation can lead to biased estimates in land price hedonic models. We propose using a flexible quantile regression-based estimation of the spatial lag hedonic model allowing for varying effects of the characteristics and, more importantly, varying degrees of spatial autocorrelation. In applying this approach to a sample of agricultural land sales in Northern Ireland we find that the market effectively consists of two relatively separate segments. The larger of these two segments conforms to the conventional hedonic model with no spatial lag dependence, while the smaller, much thinner market segment exhibits considerable spatial lag dependence. Un mod�le h�donique � r�gression quantile spatiale des prix des terrains agricoles R�sum� Les �tudes sur le prix des terrains font g�n�ralement usage d'une analyse h�donique pour identifier l'impact des caract�ristiques des terrains sur le prix. Toutefois, du fait de la fixit� spatiale des terrains, la question d'une �ventuelle d�pendance spatiale sur la valeur des terrains agricoles se pose. L'existence d'une d�pendance spatiale dans le prix des terrains agricoles peut avoir des cons�quences importantes sur l'analyse du mod�le h�donique. En ignorant cette corr�lation s�rielle, on s'expose au risque d'�valuations biais�es des mod�les h�doniques du prix des terrains. Nous proposons l'emploi d'une estimation � base de r�gression flexible du mod�le h�donique � d�calage spatial, tenant compte de diff�rents effets des caract�ristiques, et surtout de diff�rents degr�s de corr�lations s�rielles spatiales. En appliquant ce principe � un �chantillon de ventes de terrains agricoles en Irlande du Nord, nous d�couvrons que le march� se compose de deux segments relativement distincts. Le plus important de ces deux segments est conforme au mod�le h�donique traditionnel, sans d�pendance du d�calage spatial, tandis que le deuxi�me segment du march�, plus petit et beaucoup plus �troit, pr�sente une d�pendance consid�rable du d�calage spatial. Un modelo hed�nico de regresi�n cuantil espacial de los precios del terreno agr�cola Resumen T�picamente, los estudios del precio de la tierra emplean un an�lisis hed�nico para identificar el impacto de las caracter�sticas de la tierra sobre el precio. No obstante, debido a la fijeza espacial de la tierra, surge la cuesti�n de una posible dependencia espacial en los precios del terreno agr�cola. La presencia de dependencia espacial en los precios del terreno agr�cola puede tener consecuencias graves para el modelo de an�lisis hed�nico. Ignorar la autocorrelaci�n espacial puede conducir a estimados parciales en los modelos hed�nicos del precio de la tierra. Proponemos el uso de una valoraci�n basada en una regresi�n cuantil flexible del modelo hed�nico del lapso espacial que tenga en cuenta los diversos efectos de las caracter�sticas y, particularmente, los diversos grados de autocorrelaci�n espacial. Al aplicar este planteamiento a una muestra de ventas de terreno agr�cola en Irlanda del Norte, descubrimos que el mercado consiste efectivamente de dos segmento relativamente separados. El m�s grande de estos dos segmentos se ajusta al modelo hed�nico convencional sin dependencia del lapso espacial, mientras que el segmento m�s peque�o, y mucho m�s fino, muestra una dependencia considerable del lapso espacial.Spatial lag, quantile regression, hedonic model, C13, C14, C21, Q24,
Fast conditional density estimation for quantitative structure-activity relationships
Many methods for quantitative structure-activity relationships (QSARs) deliver point estimates only, without quantifying the uncertainty inherent in the prediction. One way to quantify the uncertainy of a QSAR prediction is to predict the conditional density of the activity given the structure instead of a point estimate. If a conditional density estimate is available, it is easy to derive prediction intervals of activities. In this paper, we experimentally evaluate and compare three methods for conditional density estimation for their suitability in QSAR modeling. In contrast to traditional methods for conditional density estimation, they are based on generic machine learning schemes, more specifically, class probability estimators. Our experiments show that a kernel estimator based on class probability estimates from a random forest classifier is highly competitive with Gaussian process regression, while taking only a fraction of the time for training. Therefore, generic machine-learning based methods for conditional density estimation may be a good and fast option for quantifying uncertainty in QSAR modeling.http://www.aaai.org/ocs/index.php/AAAI/AAAI10/paper/view/181
- …