Search CORE

2 research outputs found

A Probabilistic Approach To Multiple-Instance Learning

Author: Zhang Silu
Publication venue: eGrove
Publication date: 01/01/2017
Field of study

This study introduced a probabilistic approach to the multiple-instance learning (mil) problem. In particular, two bayes classication algorithms were proposed where posterior probabilities were estimated under dierent assumptions. The rst algorithm, named instance-vote, assumes that the probability of a bag being positive or negative depends upon the percentage of its instances being positive or negative. This probability is estimated using a k-nn classication of instances. In the second approach, embedded kernel density estimation (ekde), bags are represented in an instance induced (very high dimensional) space. A parametric stochastic neighbor embedding method is applied to learn a mapping that projects bags into a 2-d or 1-d space. Class conditional probability densities are then estimated in this low dimensional space via kernel density estimation. Both algorithms were evaluated using musk benchmark data sets and the results are highly competitive with existing methods

eGrove (Univ. of Mississippi)

Improving random forests by feature dependence analysis

Author: Zhang Silu
Publication venue: eGrove
Publication date: 01/01/2019
Field of study

Random forests (RFs) have been widely used for supervised learning tasks because of their high prediction accuracy good model interpretability and fast training process. However they are not able to learn from local structures as convolutional neural networks (CNNs) do when there exists high dependency among features. They also cannot utilize features that are jointly dependent on the label but marginally independent of it. In this dissertation we present two approaches to address these two problems respectively by dependence analysis. First a local feature sampling (LFS) approach is proposed to learn and use the locality information of features to group dependent/correlated features to train each tree. For image data the local information of features (pixels) is defined by the 2-D grid of the image. For non-image data we provided multiple ways of estimating this local structure. Our experiments shows that RF with LFS has reduced correlation and improved accuracy on multiple UCI datasets. To address the latter issue of random forest mentioned we propose a way to categorize features as marginally dependent features and jointly dependent features the latter is defined by minimum dependence sets (MDS\u27s) or by stronger dependence sets (SDS\u27s). Algorithms to identify MDS\u27s and SDS\u27s are provided. We then present a feature dependence mapping (FDM) approach to map the jointly dependent features to another feature space where they are marginally dependent. We show that by using FDM decision tree and RF have improved prediction performance on artificial datasets and a protein expression dataset

eGrove (Univ. of Mississippi)