3,273 research outputs found
Graph ensemble boosting for imbalanced noisy graph stream classification
© 2014 IEEE. Many applications involve stream data with structural dependency, graph representations, and continuously increasing volumes. For these applications, it is very common that their class distributions are imbalanced with minority (or positive) samples being only a small portion of the population, which imposes significant challenges for learning models to accurately identify minority samples. This problem is further complicated with the presence of noise, because they are similar to minority samples and any treatment for the class imbalance may falsely focus on the noise and result in deterioration of accuracy. In this paper, we propose a classification model to tackle imbalanced graph streams with noise. Our method, graph ensemble boosting, employs an ensemble-based framework to partition graph stream into chunks each containing a number of noisy graphs with imbalanced class distributions. For each individual chunk, we propose a boosting algorithm to combine discriminative subgraph pattern selection and model learning as a unified framework for graph classification. To tackle concept drifting in graph streams, an instance level weighting mechanism is used to dynamically adjust the instance weight, through which the boosting framework can emphasize on difficult graph samples. The classifiers built from different graph chunks form an ensemble for graph stream classification. Experiments on real-life imbalanced graph streams demonstrate clear benefits of our boosting design for handling imbalanced noisy graph stream
Multiple Instance Learning: A Survey of Problem Characteristics and Applications
Multiple instance learning (MIL) is a form of weakly supervised learning
where training instances are arranged in sets, called bags, and a label is
provided for the entire bag. This formulation is gaining interest because it
naturally fits various problems and allows to leverage weakly labeled data.
Consequently, it has been used in diverse application fields such as computer
vision and document classification. However, learning from bags raises
important challenges that are unique to MIL. This paper provides a
comprehensive survey of the characteristics which define and differentiate the
types of MIL problems. Until now, these problem characteristics have not been
formally identified and described. As a result, the variations in performance
of MIL algorithms from one data set to another are difficult to explain. In
this paper, MIL problem characteristics are grouped into four broad categories:
the composition of the bags, the types of data distribution, the ambiguity of
instance labels, and the task to be performed. Methods specialized to address
each category are reviewed. Then, the extent to which these characteristics
manifest themselves in key MIL application areas are described. Finally,
experiments are conducted to compare the performance of 16 state-of-the-art MIL
methods on selected problem characteristics. This paper provides insight on how
the problem characteristics affect MIL algorithms, recommendations for future
benchmarking and promising avenues for research
Early hospital mortality prediction using vital signals
Early hospital mortality prediction is critical as intensivists strive to
make efficient medical decisions about the severely ill patients staying in
intensive care units. As a result, various methods have been developed to
address this problem based on clinical records. However, some of the laboratory
test results are time-consuming and need to be processed. In this paper, we
propose a novel method to predict mortality using features extracted from the
heart signals of patients within the first hour of ICU admission. In order to
predict the risk, quantitative features have been computed based on the heart
rate signals of ICU patients. Each signal is described in terms of 12
statistical and signal-based features. The extracted features are fed into
eight classifiers: decision tree, linear discriminant, logistic regression,
support vector machine (SVM), random forest, boosted trees, Gaussian SVM, and
K-nearest neighborhood (K-NN). To derive insight into the performance of the
proposed method, several experiments have been conducted using the well-known
clinical dataset named Medical Information Mart for Intensive Care III
(MIMIC-III). The experimental results demonstrate the capability of the
proposed method in terms of precision, recall, F1-score, and area under the
receiver operating characteristic curve (AUC). The decision tree classifier
satisfies both accuracy and interpretability better than the other classifiers,
producing an F1-score and AUC equal to 0.91 and 0.93, respectively. It
indicates that heart rate signals can be used for predicting mortality in
patients in the ICU, achieving a comparable performance with existing
predictions that rely on high dimensional features from clinical records which
need to be processed and may contain missing information.Comment: 11 pages, 5 figures, preprint of accepted paper in IEEE&ACM CHASE
2018 and published in Smart Health journa
- …