Leveraging Related Instances for Better Prediction

Abstract

One fundamental task of machine learning is to predict output responses y from input data x. However, despite significant advances in the past decade, most current predictive models still only consider every single x in isolation when making predictions, which inevitably impacts model performance as the model may lose the opportunity to extract helpful information from other related instances to better predict x. This dissertation pushes the boundaries of machine learning research by explicitly taking advantage of related instances for better prediction. We find that leveraging multiple learned or intrinsically-related instances when making predictions in a data-driven and flexible manner is important for achieving good performance over a myriad of tasks. When x is a single instance, we can flexibly find related instances based on similarity measurements. We develop algorithms that consider related neighborhood instances for a specific given x during prediction. Our assumption is that similar instances can be found near one another in an embedding space and they locally share a similar predictive function. We develop a model, Meta-Neighborhood, to learn a dictionary of neighbor points during training so that we can retrieve related instances from this dictionary during inference for improved classification and regression. Furthermore, this work is extended to Differentiable Wavetable Synthesis (DWTS) which leverages a dictionary of related basis waveforms for audio synthesis. We show that realistic audio can be synthesized by directly combining those basis waveforms. Next, we consider the case where x is a given collection that contains multiple instances. In this case, x already included multiple related instances and we develop methods that learn how to exploit these related instances together to improve the prediction. Algorithms are developed to discover those instances more related to the prediction tasks and encourage the model to focus on these related instances for prediction. We first develop a transparent and human-understandable algorithm CKME that summarizes millions of instances into hundreds whilst being comparably accurate for single-cell set classification. Then an algorithm NRTSI for time series imputation is developed that treats the time series as a set and imputes missing data by leveraging those observed related data.Doctor of Philosoph

    Similar works