24 research outputs found
Improved Density Peak Clustering Algorithm Based on Choosing Strategy Automatically for Cut-off Distance and Cluster Centre
Due to the defect of quick search density peak clustering algorithm required an artificial attempt to determine the cut-off distance and circle the clustering centres, density peak clustering algorithm based on choosing strategy automatically for cut-off distance and cluster center (CSA-DP) is proposed. The algorithm introduces the improved idea of determining cut-off distance and clustering centres, according to the approximate distance that maximum density sample point to minimum density sample point and the variation of similarity between the points which may be clustering centres. First, obtaining the sample point density according to the k-nearest neighbour samples and tapping the sample sorting of the distance to the maximum density point; then finding the turning position of density trends and determining the cut-off distance on the basis of the turning position; finally, in view of the density peak clustering algorithm, finding the data points which may be the centres of the cluster, comparing the similarity between them and determining the final clustering centres. The simulation results show that the improved algorithm proposed in this paper can automatically determine the cut-off distance, circle the centres, and make the clustering results become more accurate. In the end, this paper makes an empirical analysis on the stock of 147 bio pharmaceutical listed companies by using the improved algorithm, which provides a reliable basis for the classification and evaluation of listed companies. It has a wide range of applicability
A Comparative Study of Clustering Analysis Algorithm
Clustering analysis is a widely used in data mining to classify data into categories on the basis of their similarity. Its applications broadly range from pattern recognition to microarray, multimedia, bibliometrics, bioinfomatics, and astronomy. Through the decades, many clustering techniques, such as hierarchical and non-hierarchical algorithm have been developed. Recently, fast search by density peaks of clustering algorithm was presented in the science journal. In this thesis, we perform a comparative study of the performance of the fast search and the existing methods on the benchmark data sets in the literature. From computational experiments, we notice that the accuracy of the fast search is more or less sensitive to the value of parameters for the cluster centers.초록 ⅰ
목차 ⅱ
그림 목차 ⅳ
표 목차 ⅴ
제 1 장. 서론
1.1. 연구 배경 1
1.2. 연구 내용 2
제 2 장. 군집 분석 방법
2.1. 군집분석의 개념 3
2.2. 계층적 군집방법 (Hierarchical Clustering) 4
2.2.1. 연결법의 군집 방법 6
2.2.2. 워드 방법(Ward's method) 7
2-3. 비계층적 군집방법 (Non-hierarchical Clustering)
2.3.1. K-means 군집방법 8
2.3.2. K-medoids 군집방법 9
1). PAM (Partitioning Around Medoids) 10
2). CLARA (Clustering LARge Applications) 12
3). CLARANS
(Clustering Large Applications based on RANdomized Search) 13
4). K-means-like 알고리즘 14
2.3.3. 퍼지 K-means 군집방법 (Fuzzy K-means Algorithm) 14
제 3 장. 다양한 군집 분석 방법
3.1. DBSCAN
(Density-Based Spatial Clustering of Application with Noise) 16
3.2. 다중 가우스함수의 EM 군집방법
(Multi-Gaussian with Expectation-Maximization) 18
3.2.1. 혼합 모형 (Mixture Model) 18
3.2.2. 군집 분석 모형 19
3.2.3. 다중 가우스함수의 EM 군집방법 21
3.3. Fast Search 22
3.3.1. Fast Search의 문제점 23
제 4 장. 전산 실험 결과
4.1. Iris 데이터셋 26
4.2. UC Irvine의 데이터 분석 결과 32
제 5 장 결론 및 추후 연구
5.1. 결론 35
5.2. 추후 연구 35
참고 문헌 36
부 록 4
Kernel Density Estimation with Linked Boundary Conditions
Kernel density estimation on a finite interval poses an outstanding challenge
because of the well-recognized bias at the boundaries of the interval.
Motivated by an application in cancer research, we consider a boundary
constraint linking the values of the unknown target density function at the
boundaries. We provide a kernel density estimator (KDE) that successfully
incorporates this linked boundary condition, leading to a non-self-adjoint
diffusion process and expansions in non-separable generalized eigenfunctions.
The solution is rigorously analyzed through an integral representation given by
the unified transform (or Fokas method). The new KDE possesses many desirable
properties, such as consistency, asymptotically negligible bias at the
boundaries, and an increased rate of approximation, as measured by the AMISE.
We apply our method to the motivating example in biology and provide numerical
experiments with synthetic data, including comparisons with state-of-the-art
KDEs (which currently cannot handle linked boundary constraints). Results
suggest that the new method is fast and accurate. Furthermore, we demonstrate
how to build statistical estimators of the boundary conditions satisfied by the
target function without apriori knowledge. Our analysis can also be extended to
more general boundary conditions that may be encountered in applications
Multi-Evidence Filtering and Fusion for Multi-Label Classification, Object Detection and Semantic Segmentation Based on Weakly Supervised Learning
Supervised object detection and semantic segmentation require object or even
pixel level annotations. When there exist image level labels only, it is
challenging for weakly supervised algorithms to achieve accurate predictions.
The accuracy achieved by top weakly supervised algorithms is still
significantly lower than their fully supervised counterparts. In this paper, we
propose a novel weakly supervised curriculum learning pipeline for multi-label
object recognition, detection and semantic segmentation. In this pipeline, we
first obtain intermediate object localization and pixel labeling results for
the training images, and then use such results to train task-specific deep
networks in a fully supervised manner. The entire process consists of four
stages, including object localization in the training images, filtering and
fusing object instances, pixel labeling for the training images, and
task-specific network training. To obtain clean object instances in the
training images, we propose a novel algorithm for filtering, fusing and
classifying object instances collected from multiple solution mechanisms. In
this algorithm, we incorporate both metric learning and density-based
clustering to filter detected object instances. Experiments show that our
weakly supervised pipeline achieves state-of-the-art results in multi-label
image classification as well as weakly supervised object detection and very
competitive results in weakly supervised semantic segmentation on MS-COCO,
PASCAL VOC 2007 and PASCAL VOC 2012.Comment: accepted by IEEE International Conference on Computer Vision and
Pattern Recognition (CVPR) 201