3,414 research outputs found
Classification under Streaming Emerging New Classes: A Solution using Completely Random Trees
This paper investigates an important problem in stream mining, i.e.,
classification under streaming emerging new classes or SENC. The common
approach is to treat it as a classification problem and solve it using either a
supervised learner or a semi-supervised learner. We propose an alternative
approach by using unsupervised learning as the basis to solve this problem. The
SENC problem can be decomposed into three sub problems: detecting emerging new
classes, classifying for known classes, and updating models to enable
classification of instances of the new class and detection of more emerging new
classes. The proposed method employs completely random trees which have been
shown to work well in unsupervised learning and supervised learning
independently in the literature. This is the first time, as far as we know,
that completely random trees are used as a single common core to solve all
three sub problems: unsupervised learning, supervised learning and model update
in data streams. We show that the proposed unsupervised-learning-focused method
often achieves significantly better outcomes than existing
classification-focused methods
Constructing Confidence Intervals of the Summary Statistics in the Least-Squares SROC Model
The accuracy of a diagnostic test is often evaluated with the measures of sensitivity and specificity, and the joint dependence between these two measures is captured by receiver operating characteristic (ROC) curve. To combine multiple testing results from studies that are assumed to follow the same underlying probability law, a smooth summary receiver operating characteristic (SROC) curve can be fitted. Moses, Shapiro, and Littenberg (1993) proposed a least-squares approach to fit the smooth SROC curve, and the variances of the estimated parameters were derived by ignoring the variance of the independent variable. Since the independent variable was in fact random, the variances were likely underestimated. Hence we propose another way to estimate the variances of the statistics of interest, and use a real example to demonstrate the differences. We also perform a simulation study to examine these two approaches. The results suggest that the least squares estimates of the coefficients are biased, and that the averaged confidence coverage is not equal to its nominal level, regardless of the methods. While Moses et al.\u27s method tends to underestimate the confidence interval, our method sometimes overestimates the interval, depending on the ratio of the intra-study variation over the inter-study variation. Our estimation of the variances appears to be slightly better than the method proposed by Moses et al. because the coverage probability of the confidence interval is closer to the nominal level
Successful radiofrequency ablation of a right posteroseptal accessory pathway through an anomalous inferior vena cava and azygos continuation in a patient with incomplete situs inversus
We present a 43-year-old patient with paroxysmal supraventricular tachycardia. In the process
of catheter ablation, we found interruption of the inferior vena cava with azygos continuation
with incomplete situs inversus. In this patient, we adopted the lower approach via the anomalous
inferior vena cava and azygos continuation to achieve stability of radiofrequency catheter
for right posteroseptal accessory pathway, and successfully abolished the preexcitation
Training spamassassin with active semi-supervised learning
Most spam filters include some automatic pattern classifiers based on machine learning and pattern recognition techniques. Such classifiers often require a large training set of labeled emails to attain a good discriminant capability between spam and legitimate emails. In addition, they must be frequently updated because of the changes introduced by spammers to their emails to evade spam filters. To address this issue active learning and semi-supervised learning techniques can be used. Many spam filters allow the user to give a feedback on personal emails automatically labeled during filter operation, and some filters include a self-training mechanism to exploit the large number of unlabeled emails collected during filter operation. However, users are usually willing to label only a few emails, and the benefits of selftraining techniques are limited. In this paper we propose an active semi-supervised learning method to better exploit unlabeled emails, which can be easily implemented as a plug-in in real spam filters. Our method is based on clustering unlabeled emails, querying the label of one email per cluster, and propagating such label to the most similar emails of the same cluster. The effectiveness of our method is evaluated using the well known open source SpamAssassin filter, on a large and publicly available corpus of real legitimate and spam emails. 1
- …