40,158 research outputs found

    Learning From Labeled And Unlabeled Data: An Empirical Study Across Techniques And Domains

    Full text link
    There has been increased interest in devising learning techniques that combine unlabeled data with labeled data ? i.e. semi-supervised learning. However, to the best of our knowledge, no study has been performed across various techniques and different types and amounts of labeled and unlabeled data. Moreover, most of the published work on semi-supervised learning techniques assumes that the labeled and unlabeled data come from the same distribution. It is possible for the labeling process to be associated with a selection bias such that the distributions of data points in the labeled and unlabeled sets are different. Not correcting for such bias can result in biased function approximation with potentially poor performance. In this paper, we present an empirical study of various semi-supervised learning techniques on a variety of datasets. We attempt to answer various questions such as the effect of independence or relevance amongst features, the effect of the size of the labeled and unlabeled sets and the effect of noise. We also investigate the impact of sample-selection bias on the semi-supervised learning techniques under study and implement a bivariate probit technique particularly designed to correct for such bias

    Monte Carlo modified profile likelihood in models for clustered data

    Get PDF
    The main focus of the analysts who deal with clustered data is usually not on the clustering variables, and hence the group-specific parameters are treated as nuisance. If a fixed effects formulation is preferred and the total number of clusters is large relative to the single-group sizes, classical frequentist techniques relying on the profile likelihood are often misleading. The use of alternative tools, such as modifications to the profile likelihood or integrated likelihoods, for making accurate inference on a parameter of interest can be complicated by the presence of nonstandard modelling and/or sampling assumptions. We show here how to employ Monte Carlo simulation in order to approximate the modified profile likelihood in some of these unconventional frameworks. The proposed solution is widely applicable and is shown to retain the usual properties of the modified profile likelihood. The approach is examined in two instances particularly relevant in applications, i.e. missing-data models and survival models with unspecified censoring distribution. The effectiveness of the proposed solution is validated via simulation studies and two clinical trial applications

    A Spatial Quantile Regression Hedonic Model of Agricultural Land Prices

    Get PDF
    Abstract Land price studies typically employ hedonic analysis to identify the impact of land characteristics on price. Owing to the spatial fixity of land, however, the question of possible spatial dependence in agricultural land prices arises. The presence of spatial dependence in agricultural land prices can have serious consequences for the hedonic model analysis. Ignoring spatial autocorrelation can lead to biased estimates in land price hedonic models. We propose using a flexible quantile regression-based estimation of the spatial lag hedonic model allowing for varying effects of the characteristics and, more importantly, varying degrees of spatial autocorrelation. In applying this approach to a sample of agricultural land sales in Northern Ireland we find that the market effectively consists of two relatively separate segments. The larger of these two segments conforms to the conventional hedonic model with no spatial lag dependence, while the smaller, much thinner market segment exhibits considerable spatial lag dependence. Un mod�le h�donique � r�gression quantile spatiale des prix des terrains agricoles R�sum� Les �tudes sur le prix des terrains font g�n�ralement usage d'une analyse h�donique pour identifier l'impact des caract�ristiques des terrains sur le prix. Toutefois, du fait de la fixit� spatiale des terrains, la question d'une �ventuelle d�pendance spatiale sur la valeur des terrains agricoles se pose. L'existence d'une d�pendance spatiale dans le prix des terrains agricoles peut avoir des cons�quences importantes sur l'analyse du mod�le h�donique. En ignorant cette corr�lation s�rielle, on s'expose au risque d'�valuations biais�es des mod�les h�doniques du prix des terrains. Nous proposons l'emploi d'une estimation � base de r�gression flexible du mod�le h�donique � d�calage spatial, tenant compte de diff�rents effets des caract�ristiques, et surtout de diff�rents degr�s de corr�lations s�rielles spatiales. En appliquant ce principe � un �chantillon de ventes de terrains agricoles en Irlande du Nord, nous d�couvrons que le march� se compose de deux segments relativement distincts. Le plus important de ces deux segments est conforme au mod�le h�donique traditionnel, sans d�pendance du d�calage spatial, tandis que le deuxi�me segment du march�, plus petit et beaucoup plus �troit, pr�sente une d�pendance consid�rable du d�calage spatial. Un modelo hed�nico de regresi�n cuantil espacial de los precios del terreno agr�cola Resumen T�picamente, los estudios del precio de la tierra emplean un an�lisis hed�nico para identificar el impacto de las caracter�sticas de la tierra sobre el precio. No obstante, debido a la fijeza espacial de la tierra, surge la cuesti�n de una posible dependencia espacial en los precios del terreno agr�cola. La presencia de dependencia espacial en los precios del terreno agr�cola puede tener consecuencias graves para el modelo de an�lisis hed�nico. Ignorar la autocorrelaci�n espacial puede conducir a estimados parciales en los modelos hed�nicos del precio de la tierra. Proponemos el uso de una valoraci�n basada en una regresi�n cuantil flexible del modelo hed�nico del lapso espacial que tenga en cuenta los diversos efectos de las caracter�sticas y, particularmente, los diversos grados de autocorrelaci�n espacial. Al aplicar este planteamiento a una muestra de ventas de terreno agr�cola en Irlanda del Norte, descubrimos que el mercado consiste efectivamente de dos segmento relativamente separados. El m�s grande de estos dos segmentos se ajusta al modelo hed�nico convencional sin dependencia del lapso espacial, mientras que el segmento m�s peque�o, y mucho m�s fino, muestra una dependencia considerable del lapso espacial.Spatial lag, quantile regression, hedonic model, C13, C14, C21, Q24,

    Fast conditional density estimation for quantitative structure-activity relationships

    Get PDF
    Many methods for quantitative structure-activity relationships (QSARs) deliver point estimates only, without quantifying the uncertainty inherent in the prediction. One way to quantify the uncertainy of a QSAR prediction is to predict the conditional density of the activity given the structure instead of a point estimate. If a conditional density estimate is available, it is easy to derive prediction intervals of activities. In this paper, we experimentally evaluate and compare three methods for conditional density estimation for their suitability in QSAR modeling. In contrast to traditional methods for conditional density estimation, they are based on generic machine learning schemes, more specifically, class probability estimators. Our experiments show that a kernel estimator based on class probability estimates from a random forest classifier is highly competitive with Gaussian process regression, while taking only a fraction of the time for training. Therefore, generic machine-learning based methods for conditional density estimation may be a good and fast option for quantifying uncertainty in QSAR modeling.http://www.aaai.org/ocs/index.php/AAAI/AAAI10/paper/view/181
    corecore