601 research outputs found

    Efficient Nearest Neighbor Classification Using a Cascade of Approximate Similarity Measures

    Full text link
    Nearest neighbor classification using shape context can yield highly accurate results in a number of recognition problems. Unfortunately, the approach can be too slow for practical applications, and thus approximation strategies are needed to make shape context practical. This paper proposes a method for efficient and accurate nearest neighbor classification in non-Euclidean spaces, such as the space induced by the shape context measure. First, a method is introduced for constructing a Euclidean embedding that is optimized for nearest neighbor classification accuracy. Using that embedding, multiple approximations of the underlying non-Euclidean similarity measure are obtained, at different levels of accuracy and efficiency. The approximations are automatically combined to form a cascade classifier, which applies the slower approximations only to the hardest cases. Unlike typical cascade-of-classifiers approaches, that are applied to binary classification problems, our method constructs a cascade for a multiclass problem. Experiments with a standard shape data set indicate that a two-to-three order of magnitude speed up is gained over the standard shape context classifier, with minimal losses in classification accuracy.National Science Foundation (IIS-0308213, IIS-0329009, EIA-0202067); Office of Naval Research (N00014-03-1-0108

    DTW-Global Constraint Learning Using Tabu Search Algorithm

    Get PDF
    AbstractMany methods have been proposed to measure the similarity between time series data sets, each with advantages and weaknesses. It is to choose the most appropriate similarity measure depending on the intended application domain and data considered. The performance of machine learning algorithms depends on the metric used to compare two objects. For time series, Dynamic Time Warping (DTW) is the most appropriate distance measure used. Many variants of DTW intended to accelerate the calculation of this distance are proposed. The distance learning is a subject already well studied. Indeed Data Mining tools, such as the algorithm of k-Means clustering, and K-Nearest Neighbor classification, require the use of a similarity/distance measure. This measure must be adapted to the application domain. For this reason, it is important to have and develop effective methods of computation and algorithms that can be applied to a large data set integrating the constraints of the specific field of study. In this paper a new hybrid approach to learn a global constraint of DTW distance is proposed. This approach is based on Large Margin Nearest Neighbors classification and Tabu Search algorithm. Experiments show the effectiveness of this approach to improve time series classification results

    Segmenting Motion Capture Data Using a Qualitative Analysis

    Get PDF
    Many interactive 3D games utilize motion capture for both character animation and user input. These applications require short, meaningful sequences of data. Manually producing these segments of motion capture data is a laborious, time-consuming process that is impractical for real-time applications. We present a method to automatically produce semantic segmentations of general motion capture data by examining the qualitative properties that are intrinsic to all motions, using Laban Movement Analysis (LMA). LMA provides a good compromise between high-level semantic features, which are difficult to extract for general motions, and lowlevel kinematic features, which often yield unsophisticated segmentations. Our method finds motion sequences which exhibit high output similarity from a collection of neural networks trained with temporal variance. We show that segmentations produced using LMA features are more similar to manual segmentations, both at the frame and the segment level, than several other automatic segmentation methods

    Similarity Discriminant Analysis

    Get PDF

    Clustering-Based Pre-Processing Approaches To Improve Similarity Join Techniques

    Get PDF
    Research on similarity join techniques is becoming one of the growing practical areas for study, especially with the increasing E-availability of vast amounts of digital data from more and more source systems. This research is focused on pre-processing clustering-based techniques to improve existing similarity join approaches. Identifying and extracting the same real-world entities from different data sources is still a big challenge and a significant task in the digital information era. Dissimilar extracts may indeed represent the same real-world entity because of inconsistent values and naming conventions, incorrect or missing data values, or incomplete information. Therefore discovering efficient and accurate approaches to determine the similarity of data objects or values is of theoretical as well as practical significance. Semantic problems are raised even on the concept of similarity regarding its usage and foundation. Existing similarity join approaches often have a very specific view of similarity measures and pre-defined predicates that represent a narrow focus on the context of similarity for a given scenario. The predicates have been assumed to be a group of clustering [MSW 72] related attributes on the join. To identify those entities for data integration purposes requires a broader view of similarity; for instance a number of generic similarity measures are useful in a given data integration systems. This study focused on string similarity join, namely based on the Levenshtein or edit distance and Q-gram. Proposed effective and efficient pre-processing clustering-based techniques were the focus of this study to identify clustering related predicates based on either attribute value or data value that improve existing similarity join techniques in enterprise data integration scenarios

    Pseudometrics for Nearest Neighbor Classification of Time Series Data

    Get PDF
    We propose that pseudometric, a subadditive distance measure, has sufficient properties to be a good structure to perform nearest neighbor pattern classification. There exist some theoretical results that asymptotically guarantee the classification accuracy of k-nearest neighbor when the sample size grows larger. These results hold true under the assumption that the distance measure is a metric. The results still hold for pseudometrics up to some technicality. Whether the results are valid for the non-subadditive distance measures is still left unanswered. Pseudometric is also practically appealing. Once we have a subadditive distance measure, the measure will have at least one significant advantage over the non-subadditive; one can directly plug such distance measure into systems which exploit the subadditivity to perform faster nearest neighbor search techniques. This work focuses on pseudometrics for time series. We propose two frameworks for studying and designing subadditive distance measures and a few examples of distance measures resulting from the frameworks. One framework is more general than the other and can be used to tailor distances from the other framework to gain better classification performance. Experimental results of nearest neighbor classification of the designed pseudometrics in comparison with well-known existing distance measures including Dynamic Time Warping showed that the designed distance measures are practical for time series classification
    corecore