Search CORE

40,158 research outputs found

Learning From Labeled And Unlabeled Data: An Empirical Study Across Techniques And Domains

Author: Chawla N. V.
Karakoulas Grigoris
Publication venue: 'AI Access Foundation'
Publication date: 09/09/2011
Field of study

There has been increased interest in devising learning techniques that combine unlabeled data with labeled data ? i.e. semi-supervised learning. However, to the best of our knowledge, no study has been performed across various techniques and different types and amounts of labeled and unlabeled data. Moreover, most of the published work on semi-supervised learning techniques assumes that the labeled and unlabeled data come from the same distribution. It is possible for the labeling process to be associated with a selection bias such that the distributions of data points in the labeled and unlabeled sets are different. Not correcting for such bias can result in biased function approximation with potentially poor performance. In this paper, we present an empirical study of various semi-supervised learning techniques on a variety of datasets. We attempt to answer various questions such as the effect of independence or relevance amongst features, the effect of the size of the labeled and unlabeled sets and the effect of noise. We also investigate the impact of sample-selection bias on the semi-supervised learning techniques under study and implement a bivariate probit technique particularly designed to correct for such bias

arXiv.org e-Print Archive

Crossref

Monte Carlo modified profile likelihood in models for clustered data

Author: Cortese Giuliana
Di Caterina Claudia
Sartori Nicola
Publication venue
Publication date: 29/12/2018
Field of study

The main focus of the analysts who deal with clustered data is usually not on the clustering variables, and hence the group-specific parameters are treated as nuisance. If a fixed effects formulation is preferred and the total number of clusters is large relative to the single-group sizes, classical frequentist techniques relying on the profile likelihood are often misleading. The use of alternative tools, such as modifications to the profile likelihood or integrated likelihoods, for making accurate inference on a parameter of interest can be complicated by the presence of nonstandard modelling and/or sampling assumptions. We show here how to employ Monte Carlo simulation in order to approximate the modified profile likelihood in some of these unconventional frameworks. The proposed solution is widely applicable and is shown to retain the usual properties of the modified profile likelihood. The approach is examined in two instances particularly relevant in applications, i.e. missing-data models and survival models with unspecified censoring distribution. The effectiveness of the proposed solution is validated via simulation studies and two clinical trial applications

arXiv.org e-Print Archive

Crossref

Catalogo dei prodotti della ricerca

Archivio istituzionale della ricerca - Università di Padova

A Spatial Quantile Regression Hedonic Model of Agricultural Land Prices

Author: Basile R.
Fotheringham A. S.
Freeman M.
Greene W.
Kelejian H. H.
Lin X.
Philip Kostov
Su L.
Publication venue: 'Informa UK Limited'
Publication date: 26/03/2009
Field of study

Abstract Land price studies typically employ hedonic analysis to identify the impact of land characteristics on price. Owing to the spatial fixity of land, however, the question of possible spatial dependence in agricultural land prices arises. The presence of spatial dependence in agricultural land prices can have serious consequences for the hedonic model analysis. Ignoring spatial autocorrelation can lead to biased estimates in land price hedonic models. We propose using a flexible quantile regression-based estimation of the spatial lag hedonic model allowing for varying effects of the characteristics and, more importantly, varying degrees of spatial autocorrelation. In applying this approach to a sample of agricultural land sales in Northern Ireland we find that the market effectively consists of two relatively separate segments. The larger of these two segments conforms to the conventional hedonic model with no spatial lag dependence, while the smaller, much thinner market segment exhibits considerable spatial lag dependence. Un mod�le h�donique � r�gression quantile spatiale des prix des terrains agricoles R�sum� Les �tudes sur le prix des terrains font g�n�ralement usage d'une analyse h�donique pour identifier l'impact des caract�ristiques des terrains sur le prix. Toutefois, du fait de la fixit� spatiale des terrains, la question d'une �ventuelle d�pendance spatiale sur la valeur des terrains agricoles se pose. L'existence d'une d�pendance spatiale dans le prix des terrains agricoles peut avoir des cons�quences importantes sur l'analyse du mod�le h�donique. En ignorant cette corr�lation s�rielle, on s'expose au risque d'�valuations biais�es des mod�les h�doniques du prix des terrains. Nous proposons l'emploi d'une estimation � base de r�gression flexible du mod�le h�donique � d�calage spatial, tenant compte de diff�rents effets des caract�ristiques, et surtout de diff�rents degr�s de corr�lations s�rielles spatiales. En appliquant ce principe � un �chantillon de ventes de terrains agricoles en Irlande du Nord, nous d�couvrons que le march� se compose de deux segments relativement distincts. Le plus important de ces deux segments est conforme au mod�le h�donique traditionnel, sans d�pendance du d�calage spatial, tandis que le deuxi�me segment du march�, plus petit et beaucoup plus �troit, pr�sente une d�pendance consid�rable du d�calage spatial. Un modelo hed�nico de regresi�n cuantil espacial de los precios del terreno agr�cola Resumen T�picamente, los estudios del precio de la tierra emplean un an�lisis hed�nico para identificar el impacto de las caracter�sticas de la tierra sobre el precio. No obstante, debido a la fijeza espacial de la tierra, surge la cuesti�n de una posible dependencia espacial en los precios del terreno agr�cola. La presencia de dependencia espacial en los precios del terreno agr�cola puede tener consecuencias graves para el modelo de an�lisis hed�nico. Ignorar la autocorrelaci�n espacial puede conducir a estimados parciales en los modelos hed�nicos del precio de la tierra. Proponemos el uso de una valoraci�n basada en una regresi�n cuantil flexible del modelo hed�nico del lapso espacial que tenga en cuenta los diversos efectos de las caracter�sticas y, particularmente, los diversos grados de autocorrelaci�n espacial. Al aplicar este planteamiento a una muestra de ventas de terreno agr�cola en Irlanda del Norte, descubrimos que el mercado consiste efectivamente de dos segmento relativamente separados. El m�s grande de estos dos segmentos se ajusta al modelo hed�nico convencional sin dependencia del lapso espacial, mientras que el segmento m�s peque�o, y mucho m�s fino, muestra una dependencia considerable del lapso espacial.Spatial lag, quantile regression, hedonic model, C13, C14, C21, Q24,

CLoK

Crossref

Research Papers in Economics

Fast conditional density estimation for quantitative structure-activity relationships

Author: Buchwald Fabian
Frank Eibe
Girschick Tobias
Kramer Stefan
Publication venue: 'Association for the Advancement of Artificial Intelligence (AAAI)'
Publication date: 01/01/2010
Field of study

Many methods for quantitative structure-activity relationships (QSARs) deliver point estimates only, without quantifying the uncertainty inherent in the prediction. One way to quantify the uncertainy of a QSAR prediction is to predict the conditional density of the activity given the structure instead of a point estimate. If a conditional density estimate is available, it is easy to derive prediction intervals of activities. In this paper, we experimentally evaluate and compare three methods for conditional density estimation for their suitability in QSAR modeling. In contrast to traditional methods for conditional density estimation, they are based on generic machine learning schemes, more specifically, class probability estimators. Our experiments show that a kernel estimator based on class probability estimates from a random forest classifier is highly competitive with Gaussian process regression, while taking only a fraction of the time for training. Therefore, generic machine-learning based methods for conditional density estimation may be a good and fast option for quantifying uncertainty in QSAR modeling.http://www.aaai.org/ocs/index.php/AAAI/AAAI10/paper/view/181

Research Commons@Waikato

Association for the Advancement of Artificial Intelligence: AAAI Publications

Gutenberg Open