Search CORE

160 research outputs found

Kernel Carpentry for Online Regression using Randomly Varying Coefficient Model

Author: Edakunni Narayanan U.
Schaal Stefan
Vijayakumar Sethu
Publication venue
Publication date: 31/08/2010
Field of study

We present a Bayesian formulation of locally weighted learning (LWL) using the novel concept of a randomly varying coefficient model. Based on thi

Edinburgh Research Archive

Bayesian locally weighted online learning

Author: Edakunni Narayanan U.
Publication venue: The University of Edinburgh
Publication date: 01/01/2010
Field of study

Locally weighted regression is a non-parametric technique of regression that is capable of coping with non-stationarity of the input distribution. Online algorithms like Receptive FieldWeighted Regression and Locally Weighted Projection Regression use a sparse representation of the locally weighted model to approximate a target function, resulting in an efficient learning algorithm. However, these algorithms are fairly sensitive to parameter initializations and have multiple open learning parameters that are usually set using some insights of the problem and local heuristics. In this thesis, we attempt to alleviate these problems by using a probabilistic formulation of locally weighted regression followed by a principled Bayesian inference of the parameters. In the Randomly Varying Coefficient (RVC) model developed in this thesis, locally weighted regression is set up as an ensemble of regression experts that provide a local linear approximation to the target function. We train the individual experts independently and then combine their predictions using a Product of Experts formalism. Independent training of experts allows us to adapt the complexity of the regression model dynamically while learning in an online fashion. The local experts themselves are modeled using a hierarchical Bayesian probability distribution with Variational Bayesian Expectation Maximization steps to learn the posterior distributions over the parameters. The Bayesian modeling of the local experts leads to an inference procedure that is fairly insensitive to parameter initializations and avoids problems like overfitting. We further exploit the Bayesian inference procedure to derive efficient online update rules for the parameters. Learning in the regression setting is also extended to handle a classification task by making use of a logistic regression to model discrete class labels. The main contribution of the thesis is a spatially localised online learning algorithm set up in a probabilistic framework with principled Bayesian inference rule for the parameters of the model that learns local models completely independent of each other, uses only local information and adapts the local model complexity in a data driven fashion. This thesis, for the first time, brings together the computational efficiency and the adaptability of ‘non-competitive’ locally weighted learning schemes and the modelling guarantees of the Bayesian formulation

Edinburgh Research Archive

Lazy Lasso for local regression

Author: A Hoerl
AS Fotheringham
B Efron
C Loader
CL Mallows
Concha Bielza
D Donoho
D Ruppert
DC Wheeler
Diego Vidaurre
DM Allen
EB Fowlkes
F Ferraty
F Ferraty
F Ferraty
GAF Seber
H Wang
H Zou
H Zou
J Barrientos-Marin
J Fan
J Fan
J Lafferty
J Ramsay
JA Khan
JP Jones
K Knight
L Breiman
L Grosenick
N Meinshausen
P Larrañaga
P Zhao
Pedro Larrañaga
R Tibshirani
RE Kass
S Ma
S Weisberg
SD Foster
SJ Devlin
T Hastie
T Hesterberg
WS Cleveland
WS Cleveland
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

Locally weighted regression is a technique that predicts the response for new data items from their neighbors in the training data set, where closer data items are assigned higher weights in the prediction. However, the original method may suffer from overfitting and fail to select the relevant variables. In this paper we propose combining a regularization approach with locally weighted regression to achieve sparse models. Specifically, the lasso is a shrinkage and selection method for linear regression. We present an algorithm that embeds lasso in an iterative procedure that alternatively computes weights and performs lasso-wise regression. The algorithm is tested on three synthetic scenarios and two real data sets. Results show that the proposed method outperforms linear and local models for several kinds of scenario

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Oxford University Research Archive

Archivo Digital UPM

Technical and Vocational Education and Training (TVET) interventions to improve employability and employment of young people in low- and middle-income countries : a systematic review

Author: Brown Chris
Hombrados Jorge
Hovish Kimberly
Newman Mark
Steinka-Fry Katarzyna
Tripney Janice
Wilkey Eric
Publication venue: Campbell Collaboration
Publication date: 02/09/2013
Field of study

UCL Discovery

Real-time statistical learning for robotics and human augmentation

Author: D'Souza S.
Ijspeert A.J.
Nakanishi J.
Schaal S.
Vijayakumar S.
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 03/11/2005
Field of study

Infoscience - École polytechnique fédérale de Lausanne

Land valuation using an innovative model combining machine learning and spatial context

Author: Tanrikulu Feride
Publication venue: University of Missouri--Columbia
Publication date
Field of study

Valuation predictions are used by buyers, sellers, regulators, and authorities to assess the fairness of the value being asked. Urbanization demands a modern and efficient land valuation system since the conventional approach is costly, slow, and relatively subjective towards locational factors. This necessitates the development of alternative methods that are faster, user-friendly, and digitally based. These approaches should use geographic information systems and strong analytical tools to produce reliable and accurate valuations. Location information in the form of spatial data is crucial because the price can vary significantly based on the neighborhood and context of where the parcel is located. In this thesis, a model has been proposed that combines machine learning and spatial context. It integrates raster information derived from remote sensing as well as vector information from geospatial analytics to predict land values, in the City of Springfield. These are used to investigate whether a joint model can improve the value estimation. The study also identifies the factors that are most influential in driving these models. A geodatabase was created by calculating proximity and accessibility to key locations as well as integrating socio-economic variables, and by adding statistics related to green space density and vegetation index utilizing Sentinel-2 -satellite data. The model has been trained using Greene County government data as truth appraisal land values through supervised machine learning models and the impact of each data type on price prediction was explored. Two types of modeling were conducted. Initially, only spatial context data were used to assess their predictive capability. Subsequently, socio-economic variables were added to the dataset to compare the performance of the models. The results showed that there was a slight difference in performance between the random forest and gradient boosting algorithm as well as using distance measures data derived from GIS and adding socioeconomic variables to them. Furthermore, spatial autocorrelation analysis was conducted to investigate how the distribution of similar attributes related to the location of the land affects its value. This analysis also aimed to identify the disparities that exist in terms of socio-economic structure and to measure their magnitude.Includes bibliographical references

University of Missouri: MOspace

An Automatic Method for Extracting Chemical Impurity Profiles of Illicit Drugs from Chromatoraphic-Mass Spectrometric Data and Their Comparison Using Bayesian Reasoning

Author: Salonen Tuomas
Publication venue: Helsingin yliopisto
Publication date: 01/01/2017
Field of study

In this work, an automated procedure for extracting chemical profiles of illicit drugs from chromatographic-mass spectrometric data is presented along with a method for comparison of the profiles using Bayesian inference. The described methods aim to ease the work of a forensic chemist who is tasked with comparing two samples of a drug, such as amphetamine, and delivering an answer to a question of the form 'Are these two samples from the same source?' Additionally, more statistical rigour is introduced to the process of comparison. The chemical profiles consist of the relative amounts of certain impurities present in seized drug samples. In order to obtain such profiles, the amounts of the target compounds must be recovered from chromatographic-mass spectrometric measurements, which amounts to searching the raw signals for peaks corresponding to the targets. The areas of these peaks must then be integrated and normalized by the sum of all target peak areas. The automated impurity profile extraction presented in this thesis works by first filtering the data corresponding to a sample, which includes discarding irrelevant parts of the raw data, estimating and removing signal baseline using the asymmetrical reweighed penalized least squares (arPLS) algorithm, and smoothing the relevant signals using a Savitzky-Golay (SG) filter. The SG filter is also used to estimate signal derivatives. These derivatives are used in the next step to detect signal peaks from which parameters are estimated for an exponential-Gaussian hybrid peak model. The signal is reconstructed using the estimated model peaks and optimal parameters are found by fitting the reconstructed signal to the measurements via non-linear least squares methods. In the last step, impurity profiles are extracted by integrating the areas of the optimized models for target compound peaks. These areas are then normalized by their sum to obtain relative amounts of the substances. In order to separate the peaks from noise, a model for noise dependency on signal level was fitted to replicate measurements of amphetamine quality control samples non-parametrically. This model was used to compute detection limits based on estimated baseline of the signals. Finally, the classical Pearson correlation based comparison method for these impurity profiles was compared to two Bayesian methods, the Bayes factor (BF) and the predictive agreement(PA). The Bayesian methods used a probabilistic model assuming normally distributed values with normal-gamma prior distribution for the mean and precision parameters. These methods were compared using simulation tests and application to 90 samples of seized amphetamine

Helsingin yliopiston digitaalinen arkisto

Nonparametric variable selection and dimension reduction methods and their applications in pharmacogenomics

Author: Zhu Jingyi
Publication venue: 'Purdue University (bepress)'
Publication date: 01/01/2014
Field of study

Nowadays it is common to collect large volumes of data in many fields with an extensive amount of variables, but often a small or moderate number of samples. For example, in the analysis of genomic data, the number of genes can be very large, varying from tens of thousands to several millions, whereas the number of samples is several hundreds to thousands. Pharmacogenomics is an example of genomics data analysis that we are considering here. Pharmacogenomics research uses whole-genome genetic information to predict individuals\u27 drug response. Because whole-genome data are high dimensional and their relationships to drug response are complicated, we are developing a variety of nonparametric methods, including variable selection using local regression and extended dimension reduction techniques, to detect nonlinear patterns in the relationship between genetic variants and clinical response.^ High dimensional data analysis has become a popular research topic in the Statistics society in recent years. However, the nature of high dimensional data makes many traditional statistical methods fail, because most methods rely on the assumption that the sample size n is larger than the variable dimension p. Consequently, variable selection or dimension reduction is often the first step in high dimensional data analysis. Meanwhile, another important issue arises as the choice of an appropriate statistical modeling strategy for conducting variable selection or dimension reduction. It has been found from our studies that the traditional parametric linear model might not work well for detecting nonlinear patterns of relationships between predictors and response. The limitations of the linear model and other parametric statistical approaches motivate us to consider nonparametric/nonlinear models for conducting variable selection or dimension reduction.^ The thesis is composed of two major parts. In the first part, we develop a nonparametric predictive model of the response based on a small number of predictors, which are selected from a nonparametric forward variable selection procedure. We also propose strategies to identify subpopulations with enhanced treatment effects. In the second part, we develop an alternating least squares method to extend the classical Sliced Inverse Regression (SIR) [Li, 1991] to the context of high dimensional data. Both methods are demonstrated by simulation studies and a pharmacogenomics study of bortezomib in multiple myeloma [Mulligan et al., 2007]. The proposed methods have favorable performances compared to other existing methods in the literature

Purdue E-Pubs