24 research outputs found

    Improved Density Peak Clustering Algorithm Based on Choosing Strategy Automatically for Cut-off Distance and Cluster Centre

    Get PDF
    Due to the defect of quick search density peak clustering algorithm required an artificial attempt to determine the cut-off distance and circle the clustering centres, density peak clustering algorithm based on choosing strategy automatically for cut-off distance and cluster center (CSA-DP) is proposed. The algorithm introduces the improved idea of determining cut-off distance and clustering centres, according to the approximate distance that maximum density sample point to minimum density sample point and the variation of similarity between the points which may be clustering centres. First, obtaining the sample point density according to the k-nearest neighbour samples and tapping the sample sorting of the distance to the maximum density point; then finding the turning position of density trends and determining the cut-off distance on the basis of the turning position; finally, in view of the density peak clustering algorithm, finding the data points which may be the centres of the cluster, comparing the similarity between them and determining the final clustering centres. The simulation results show that the improved algorithm proposed in this paper can automatically determine the cut-off distance, circle the centres, and make the clustering results become more accurate. In the end, this paper makes an empirical analysis on the stock of 147 bio pharmaceutical listed companies by using the improved algorithm, which provides a reliable basis for the classification and evaluation of listed companies. It has a wide range of applicability

    A Comparative Study of Clustering Analysis Algorithm

    Get PDF
    Clustering analysis is a widely used in data mining to classify data into categories on the basis of their similarity. Its applications broadly range from pattern recognition to microarray, multimedia, bibliometrics, bioinfomatics, and astronomy. Through the decades, many clustering techniques, such as hierarchical and non-hierarchical algorithm have been developed. Recently, fast search by density peaks of clustering algorithm was presented in the science journal. In this thesis, we perform a comparative study of the performance of the fast search and the existing methods on the benchmark data sets in the literature. From computational experiments, we notice that the accuracy of the fast search is more or less sensitive to the value of parameters for the cluster centers.초록 ⅰ 목차 ⅱ 그림 목차 ⅳ 표 목차 ⅴ 제 1 장. 서론 1.1. 연구 배경 1 1.2. 연구 내용 2 제 2 장. 군집 분석 방법 2.1. 군집분석의 개념 3 2.2. 계층적 군집방법 (Hierarchical Clustering) 4 2.2.1. 연결법의 군집 방법 6 2.2.2. 워드 방법(Ward's method) 7 2-3. 비계층적 군집방법 (Non-hierarchical Clustering) 2.3.1. K-means 군집방법 8 2.3.2. K-medoids 군집방법 9 1). PAM (Partitioning Around Medoids) 10 2). CLARA (Clustering LARge Applications) 12 3). CLARANS (Clustering Large Applications based on RANdomized Search) 13 4). K-means-like 알고리즘 14 2.3.3. 퍼지 K-means 군집방법 (Fuzzy K-means Algorithm) 14 제 3 장. 다양한 군집 분석 방법 3.1. DBSCAN (Density-Based Spatial Clustering of Application with Noise) 16 3.2. 다중 가우스함수의 EM 군집방법 (Multi-Gaussian with Expectation-Maximization) 18 3.2.1. 혼합 모형 (Mixture Model) 18 3.2.2. 군집 분석 모형 19 3.2.3. 다중 가우스함수의 EM 군집방법 21 3.3. Fast Search 22 3.3.1. Fast Search의 문제점 23 제 4 장. 전산 실험 결과 4.1. Iris 데이터셋 26 4.2. UC Irvine의 데이터 분석 결과 32 제 5 장 결론 및 추후 연구 5.1. 결론 35 5.2. 추후 연구 35 참고 문헌 36 부 록 4

    Kernel Density Estimation with Linked Boundary Conditions

    Get PDF
    Kernel density estimation on a finite interval poses an outstanding challenge because of the well-recognized bias at the boundaries of the interval. Motivated by an application in cancer research, we consider a boundary constraint linking the values of the unknown target density function at the boundaries. We provide a kernel density estimator (KDE) that successfully incorporates this linked boundary condition, leading to a non-self-adjoint diffusion process and expansions in non-separable generalized eigenfunctions. The solution is rigorously analyzed through an integral representation given by the unified transform (or Fokas method). The new KDE possesses many desirable properties, such as consistency, asymptotically negligible bias at the boundaries, and an increased rate of approximation, as measured by the AMISE. We apply our method to the motivating example in biology and provide numerical experiments with synthetic data, including comparisons with state-of-the-art KDEs (which currently cannot handle linked boundary constraints). Results suggest that the new method is fast and accurate. Furthermore, we demonstrate how to build statistical estimators of the boundary conditions satisfied by the target function without apriori knowledge. Our analysis can also be extended to more general boundary conditions that may be encountered in applications

    Multi-Evidence Filtering and Fusion for Multi-Label Classification, Object Detection and Semantic Segmentation Based on Weakly Supervised Learning

    Get PDF
    Supervised object detection and semantic segmentation require object or even pixel level annotations. When there exist image level labels only, it is challenging for weakly supervised algorithms to achieve accurate predictions. The accuracy achieved by top weakly supervised algorithms is still significantly lower than their fully supervised counterparts. In this paper, we propose a novel weakly supervised curriculum learning pipeline for multi-label object recognition, detection and semantic segmentation. In this pipeline, we first obtain intermediate object localization and pixel labeling results for the training images, and then use such results to train task-specific deep networks in a fully supervised manner. The entire process consists of four stages, including object localization in the training images, filtering and fusing object instances, pixel labeling for the training images, and task-specific network training. To obtain clean object instances in the training images, we propose a novel algorithm for filtering, fusing and classifying object instances collected from multiple solution mechanisms. In this algorithm, we incorporate both metric learning and density-based clustering to filter detected object instances. Experiments show that our weakly supervised pipeline achieves state-of-the-art results in multi-label image classification as well as weakly supervised object detection and very competitive results in weakly supervised semantic segmentation on MS-COCO, PASCAL VOC 2007 and PASCAL VOC 2012.Comment: accepted by IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) 201
    corecore