Search CORE

2 research outputs found

Online transfer learning for concept drifting data streams

Author: Damoulas Theodoros
Griffiths Nathan
Mckay H.
Taylor Phillip M.
Zhou X.
Publication venue: 'American College of Medical Physics (ACMP)'
Publication date: 05/08/2019
Field of study

Warwick Research Archives Portal Repository

A Framework for Multistream Regression With Direct Density Ratio Estimation

Author: Chandra Swarup
Haque Ahsanul
Khan Latifur
Liu Jie
Tao Hemeng
Publication venue: Association for the Advancement of Artificial Intelligence
Publication date: 29/04/2018
Field of study

Regression over a stream of data is challenging due to unbounded data size and non-stationary distribution over time. Typically, a traditional supervised regression model over a data stream is trained on data instances occurring within a short time period by assuming a stationary distribution. This model is later used to predict value of response-variable in future instances. Over time, the model may degrade in performance due to changes in data distribution among incoming data instances. Updating the model for change adaptation requires true value for every recent data instances, which is scarce in practice. To overcome this issue, recent studies have employed techniques that sample fewer instances to be used for model retraining. Yet, this may introduce sampling bias that adversely affects the model performance. In this paper, we study the regression problem over data streams in a novel setting. We consider two independent, yet related, non-stationary data streams, which are referred to as the source and the target stream. The target stream continuously generates data instances whose value of response variable is unknown. The source stream, however, continuously generates data instances along with corresponding value for the response-variable, and has a biased data distribution with respect to the target stream. We refer to the problem of using a model trained on the biased source stream to predict the response-variable’s value in data instances occurring on the target stream as Multistream Regression. In this paper, we describe a framework for multistream regression that simultaneously overcomes distribution bias and detects change in data distribution represented by the two streams over time using a Gaussian kernel model. We analyze the theoretical properties of the proposed approach and empirically evaluate it on both real-world and synthetic data sets. Importantly, our results indicate superior performance by the framework compared to other baseline regression methods

Association for the Advancement of Artificial Intelligence: AAAI Publications