2,125 research outputs found
Classification under Streaming Emerging New Classes: A Solution using Completely Random Trees
This paper investigates an important problem in stream mining, i.e.,
classification under streaming emerging new classes or SENC. The common
approach is to treat it as a classification problem and solve it using either a
supervised learner or a semi-supervised learner. We propose an alternative
approach by using unsupervised learning as the basis to solve this problem. The
SENC problem can be decomposed into three sub problems: detecting emerging new
classes, classifying for known classes, and updating models to enable
classification of instances of the new class and detection of more emerging new
classes. The proposed method employs completely random trees which have been
shown to work well in unsupervised learning and supervised learning
independently in the literature. This is the first time, as far as we know,
that completely random trees are used as a single common core to solve all
three sub problems: unsupervised learning, supervised learning and model update
in data streams. We show that the proposed unsupervised-learning-focused method
often achieves significantly better outcomes than existing
classification-focused methods
Distribution-Based Trajectory Clustering
Trajectory clustering enables the discovery of common patterns in trajectory
data. Current methods of trajectory clustering rely on a distance measure
between two points in order to measure the dissimilarity between two
trajectories. The distance measures employed have two challenges: high
computational cost and low fidelity. Independent of the distance measure
employed, existing clustering algorithms have another challenge: either
effectiveness issues or high time complexity. In this paper, we propose to use
a recent Isolation Distributional Kernel (IDK) as the main tool to meet all
three challenges. The new IDK-based clustering algorithm, called TIDKC, makes
full use of the distributional kernel for trajectory similarity measuring and
clustering. TIDKC identifies non-linearly separable clusters with irregular
shapes and varied densities in linear time. It does not rely on random
initialisation and is robust to outliers. An extensive evaluation on 7 large
real-world trajectory datasets confirms that IDK is more effective in capturing
complex structures in trajectories than traditional and deep learning-based
distance measures. Furthermore, the proposed TIDKC has superior clustering
performance and efficiency to existing trajectory clustering algorithms
A Social Referral Mechanism for Job Reference Recommendation
Recently, with the popularity of various social media, this new trend of information technologies has impacted our lives, redefined the way we interact with each other, and facilitated the communication and influence cross different social groups, such as enhancing the power of social search and appraisal. _x000D_ In this research, we mainly focus on this mystery process of information exchanges existing long ago on the base of sociology and apply this power in the field of job seeking. Considering the factors of both willingness and influence, we generate the list of proper reference candidates to desired job for job seekers to provide more job-related information or to be referrals. Integrating the knowledge of human resources management, we implement this social referral application with the support of information technologies and strive to enrich the service of social media, turning the passively searching for job seeking to actively consulting for exclusively job information._x000D
The Impact of Isolation Kernel on Agglomerative Hierarchical Clustering Algorithms
Agglomerative hierarchical clustering (AHC) is one of the popular clustering
approaches. Existing AHC methods, which are based on a distance measure, have
one key issue: it has difficulty in identifying adjacent clusters with varied
densities, regardless of the cluster extraction methods applied on the
resultant dendrogram. In this paper, we identify the root cause of this issue
and show that the use of a data-dependent kernel (instead of distance or
existing kernel) provides an effective means to address it. We analyse the
condition under which existing AHC methods fail to extract clusters
effectively; and the reason why the data-dependent kernel is an effective
remedy. This leads to a new approach to kernerlise existing hierarchical
clustering algorithms such as existing traditional AHC algorithms, HDBSCAN, GDL
and PHA. In each of these algorithms, our empirical evaluation shows that a
recently introduced Isolation Kernel produces a higher quality or purer
dendrogram than distance, Gaussian Kernel and adaptive Gaussian Kernel
- …