Search CORE

3,414 research outputs found

Classification under Streaming Emerging New Classes: A Solution using Completely Random Trees

Author: Mu Xin
Ting Kai Ming
Zhou Zhi-Hua
Publication venue
Publication date: 30/05/2016
Field of study

This paper investigates an important problem in stream mining, i.e., classification under streaming emerging new classes or SENC. The common approach is to treat it as a classification problem and solve it using either a supervised learner or a semi-supervised learner. We propose an alternative approach by using unsupervised learning as the basis to solve this problem. The SENC problem can be decomposed into three sub problems: detecting emerging new classes, classifying for known classes, and updating models to enable classification of instances of the new class and detection of more emerging new classes. The proposed method employs completely random trees which have been shown to work well in unsupervised learning and supervised learning independently in the literature. This is the first time, as far as we know, that completely random trees are used as a single common core to solve all three sub problems: unsupervised learning, supervised learning and model update in data streams. We show that the proposed unsupervised-learning-focused method often achieves significantly better outcomes than existing classification-focused methods

arXiv.org e-Print Archive

Crossref

Federation ResearchOnline

Constructing Confidence Intervals of the Summary Statistics in the Least-Squares SROC Model

Author: Fan Ming-Yu
Zhou Xiao-Hua
Publication venue: Collection of Biostatistics Research Archive
Publication date: 28/03/2005
Field of study

The accuracy of a diagnostic test is often evaluated with the measures of sensitivity and specificity, and the joint dependence between these two measures is captured by receiver operating characteristic (ROC) curve. To combine multiple testing results from studies that are assumed to follow the same underlying probability law, a smooth summary receiver operating characteristic (SROC) curve can be fitted. Moses, Shapiro, and Littenberg (1993) proposed a least-squares approach to fit the smooth SROC curve, and the variances of the estimated parameters were derived by ignoring the variance of the independent variable. Since the independent variable was in fact random, the variances were likely underestimated. Hence we propose another way to estimate the variances of the statistics of interest, and use a real example to demonstrate the differences. We also perform a simulation study to examine these two approaches. The results suggest that the least squares estimates of the coefficients are biased, and that the averaged confidence coverage is not equal to its nominal level, regardless of the methods. While Moses et al.\u27s method tends to underestimate the confidence interval, our method sometimes overestimates the interval, depending on the ratio of the intra-study variation over the inter-study variation. Our estimation of the variances appears to be slightly better than the method proposed by Moses et al. because the coverage probability of the confidence interval is closer to the nominal level

Collection Of Biostatistics Research Archive

Successful radiofrequency ablation of a right posteroseptal accessory pathway through an anomalous inferior vena cava and azygos continuation in a patient with incomplete situs inversus

Author: Liu Qi-ming
Ouyang Fei-fan
Zhou Sheng-hua
Publication venue: 'Salvia Medical Sciences Ltd'
Publication date: 06/01/2009
Field of study

We present a 43-year-old patient with paroxysmal supraventricular tachycardia. In the process of catheter ablation, we found interruption of the inferior vena cava with azygos continuation with incomplete situs inversus. In this patient, we adopted the lower approach via the anomalous inferior vena cava and azygos continuation to achieve stability of radiofrequency catheter for right posteroseptal accessory pathway, and successfully abolished the preexcitation

Via Medica Journals

Training spamassassin with active semi-supervised learning

Author: Fabio Roli
Giorgio Fumera
Jun-ming Xu
Zhi-hua Zhou
Publication venue
Publication date: 01/01/2009
Field of study

Most spam filters include some automatic pattern classifiers based on machine learning and pattern recognition techniques. Such classifiers often require a large training set of labeled emails to attain a good discriminant capability between spam and legitimate emails. In addition, they must be frequently updated because of the changes introduced by spammers to their emails to evade spam filters. To address this issue active learning and semi-supervised learning techniques can be used. Many spam filters allow the user to give a feedback on personal emails automatically labeled during filter operation, and some filters include a self-training mechanism to exploit the large number of unlabeled emails collected during filter operation. However, users are usually willing to label only a few emails, and the benefits of selftraining techniques are limited. In this paper we propose an active semi-supervised learning method to better exploit unlabeled emails, which can be easily implemented as a plug-in in real spam filters. Our method is based on clustering unlabeled emails, querying the label of one email per cluster, and propagating such label to the most similar emails of the same cluster. The effectiveness of our method is evaluated using the well known open source SpamAssassin filter, on a large and publicly available corpus of real legitimate and spam emails. 1

CiteSeerX

Archivio istituzionale della ricerca - Università di Cagliari

Archivio istituzionale della ricerca - Università di Genova