11 research outputs found

    Multidimensional Balance-Based Cluster Boundary Detection for High-Dimensional Data

    Full text link
    © 2018 IEEE. The balance of neighborhood space around a central point is an important concept in cluster analysis. It can be used to effectively detect cluster boundary objects. The existing neighborhood analysis methods focus on the distribution of data, i.e., analyzing the characteristic of the neighborhood space from a single perspective, and could not obtain rich data characteristics. In this paper, we analyze the high-dimensional neighborhood space from multiple perspectives. By simulating each dimension of a data point's k nearest neighbors space (k NNs) as a lever, we apply the lever principle to compute the balance fulcrum of each dimension after proving its inevitability and uniqueness. Then, we model the distance between the projected coordinate of the data point and the balance fulcrum on each dimension and construct the DHBlan coefficient to measure the balance of the neighborhood space. Based on this theoretical model, we propose a simple yet effective cluster boundary detection algorithm called Lever. Experiments on both low- and high-dimensional data sets validate the effectiveness and efficiency of our proposed algorithm

    An Illumination Invariant Accurate Face Recognition with Down Scaling of DCT Coefficients

    Get PDF
    In this paper, a novel approach for illumination normalization under varying lighting conditions is presented. Our approach utilizes the fact that discrete cosine transform (DCT) low-frequency coefficients correspond to illumination variations in a digital image. Under varying illuminations, the images captured may have low contrast; initially we apply histogram equalization on these for contrast stretching. Then the low-frequency DCT coefficients are scaled down to compensate the illumination variations. The value of scaling down factor and the number of low-frequency DCT coefficients, which are to be re-scaled, are obtained experimentally. The classification is done using k-nearest neighbor classification and nearest mean classification on the images obtained by inverse DCT on the processed coefficients. The correlation coefficient and Euclidean distance obtained using principal component analysis are used as distance metrics in classification. We have tested our face recognition method using Yale face database B. The results show that our method performs without any error (100% face recognition performance) even on the most extreme illumination variations. There are different schemes in the literature for illumination normalization under varying lighting conditions, but no one is claimed to give 100% recognition rate under all illumination variations for this database. The proposed technique is computationally efficient and can easily be implemented for real time face recognition system

    Voronoi classfied and clustered constellation data structure for three-dimensional urban buildings

    Get PDF
    In the past few years, the growth of urban area has been increasing and has resulted immense number of urban datasets. This situation contributes to the difficulties in handling and managing issues related to urban area. Huge and massive datasets can degrade the performance of data retrieval and information analysis. In addition, urban environments are very difficult to manage because they involved with various types of data, such as multiple types of zoning themes in urban mixeduse development. Thus, a special technique for efficient data handling and management is necessary. In this study, a new three-dimensional (3D) spatial access method, the Voronoi Classified and Clustered Data Constellation (VOR-CCDC) is introduced. The VOR-CCDC data structure operates on the basis of two filters, classification and clustering. To boost up the performance of data retrieval, VORCCDC offers a minimal percentage of overlap among nodes and a minimal coverage area in order to avoid repetitive data entry and multi-path queries. Besides that, VOR-CCDC data structure is supplemented with an extra element of nearest neighbour information. Encoded neighbouring information in the Voronoi diagram allows VOR-CCDC to optimally explore the data. There are three types of nearest neighbour queries that are presented in this study to verify the VOR-CCDC’s ability in finding the nearest neighbour information. The queries are Single Search Nearest Neighbour query, k Nearest Neighbour (kNN) query and Reverse k Nearest Neighbour (RkNN) query. Each query is tested with two types of 3D datasets; single layer and multi-layer. The test demonstrated that VOR-CCDC performs the least amount of input/output than their best competitor, the 3D R-Tree. Besides that, VOR-CCDC is also tested for performance evaluation. The results indicate that VOR-CCDC outperforms its competitor by responding 60 to 80 percent faster to the query operation. In the future, VOR-CCDC structure is expected to be expanded for temporal and dynamic objects. Besides that, VOR-CCDC structure can also be used in other applications such as brain cell database for analysing the spatial arrangement of neurons or analysing the protein chain reaction in bioinformatics applications

    Scalable Query Processing on Spatial Networks

    Get PDF
    Spatial networks (e.g., road networks) are general graphs with spatial information (e.g., latitude/longitude) information associated with the vertices and/or the edges of the graph. Techniques are presented for query processing on spatial networks that are based on the observed coherence between the spatial positions of the vertices and the shortest paths between them. This facilitates aggregation of the vertices into coherent regions that share vertices on the shortest paths between them. Using this observation, a framework, termed SILC, is introduced that precomputes and compactly encodes the N^2 shortest path and network distances between every pair of vertices on a spatial network containing N vertices. The compactness of the shortest paths from source vertex V is achieved by partitioning the destination vertices into subsets based on the identity of the first edge to them from V. The spatial coherence of these subsets is captured by using a quadtree representation whose dimension-reducing property enables the storage requirements of each subset to be reduced to be proportional to the perimeter of the spatially coherent regions, instead of to the number of vertices in the spatial network. In particular, experiments on a number of large road networks as well as a theoretical analysis have shown that the total storage for the shortest paths has been reduced from O(N^3) to O(N^1.5). In addition to SILC, another framework, termed PCP, is proposed that also takes advantage of the spatial coherence of the source vertices and makes use of the Well Separated Pair decomposition to further reduce the storage, under suitably defined conditions, to O(N). Using these frameworks, scalable algorithms are presented to implement a wide variety of operations such as nearest neighbor finding and distance joins on large datasets of locations residing on a spatial network. These frameworks essentially decouple the process of computing shortest paths from that of spatial query processing as well as also decouple the domain of the participating objects from the domain of the vertices of the spatial network. This means that as long as the spatial network is unchanged, the algorithm and underlying representation of the shortest paths in the spatial network can be used with different sets of objects

    Exquisitor:Interactive Learning for Multimedia

    Get PDF

    Unknown Clutter Estimation by FMM Approach in Multitarget Tracking Algorithm

    Get PDF
    Finite mixture model (FMM) approach is a research focus in multitarget tracking field. The clutter was treated as uniform distribution previously. Aiming at severe bias caused by unknown and complex clutter, a multitarget tracking algorithm based on clutter model estimation is put forward in this paper. Multitarget likelihood function is established with FMM. In this frame, the algorithms of expectation maximum (EM) and Markov Chain Monte Carlo (MCMC) are both consulted in FMM parameters estimation. Furthermore, target number and multitarget states can be estimated precisely after the clutter model fitted. Association between target and measurement can be avoided. Simulation proved that the proposed algorithm has a good performance in dealing with unknown and complex clutter

    Multi-Dimensional Joins

    Get PDF
    We present three novel algorithms for performing multi-dimensional joins and an in-depth survey and analysis of a low-dimensional spatial join. The first algorithm, the Iterative Spatial Join, performs a spatial join on low-dimensional data and is based on a plane-sweep technique. As we show analytically and experimentally, the Iterative Spatial Join performs well when internal memory is limited, compared to competing methods. This suggests that the Iterative Spatial Join would be useful for very large data sets or in situations where internal memory is a shared resource and is therefore limited, such as with today's database engines which share internal memory amongst several queries. Furthermore, the performance of the Iterative Spatial Join is predictable and has no parameters which need to be tuned, unlike other algorithms. The second algorithm, the Quickjoin algorithm, performs a higher-dimensional similarity join in which pairs of objects that lie within a certain distance epsilon of each other are reported. The Quickjoin algorithm overcomes drawbacks of competing methods, such as requiring embedding methods on the data first or using multi-dimensional indices, which limit the ability to discriminate between objects in each dimension, thereby degrading performance. A formal analysis is provided of the Quickjoin method, and experiments show that the Quickjoin method significantly outperforms competing methods. The third algorithm adapts incremental join techniques to improve the speed of calculating the Hausdorff distance, which is used in applications such as image matching, image analysis, and surface approximations. The nearest neighbor incremental join technique for indices that are based on hierarchical containment use a priority queue of index node pairs and bounds on the distance values between pairs, both of which need to modified in order to calculate the Hausdorff distance. Results of experiments are described that confirm the performance improvement. Finally, a survey is provided which instead of just summarizing the literature and presenting each technique in its entirety, describes distinct components of the different techniques, and each technique is decomposed into an overall framework for performing a spatial join

    To appear in PAMI K-Nearest Neighbor Finding Using MaxNearestDist

    No full text
    Similarity searching often reduces to finding the k nearest neighbors to a query object. Finding the k nearest neighbors is achieved by applying either a depth-first or a best-first algorithm to the search hierarchy containing the data. These algorithms are generally applicable to any index based on hierarchical clustering. The idea is that the data is partitioned into clusters which are aggregated to form other clusters, with the total aggregation being represented as a tree. These algorithms have traditionally used a lower bound corresponding to the minimum distance at which a nearest neighbor can be found (termed MINDIST) to prune the search process by avoiding the processing of some of the clusters as well as individual objects when they can be shown to be farther from the query object q than all of the current k nearest neighbors of q. An alternative pruning technique that uses an upper bound corresponding to the maximum possible distance at which a nearest neighbor is guaranteed to be found (termed MAXNEARESTDIST) is described. The MAXNEARESTDIST upper bound is adapted to enable its use for finding the k nearest neighbors instead of just the nearest neighbor (i.e., k = 1) as in its previous uses. Both the depth-first and best-first k-nearest neighbor algorithms are modified to use MAXNEARESTDIST, which is shown to enhance both algorithms by overcoming their shortcomings. In particular, for the depthfirst algorithm, the number of clusters in the search hierarchy that must be examined is not increased thereby potentially lowering its execution time, while for the best-first algorithm, the number of clusters in the search hierarchy that must be retained in the priority queue used to control the ordering of processing of the clusters is also not increased, thereby potentially lowering its storage requirements. Index Terms — k-nearest neighbors; similarity searching; metric spaces; depth-first nearest neighbor finding; best-first nearest neighbor finding I
    corecore