242,993 research outputs found

    k-Nearest Neighbour Classifiers: 2nd Edition (with Python examples)

    Get PDF
    Perhaps the most straightforward classifier in the arsenal or machine learning techniques is the Nearest Neighbour Classifier -- classification is achieved by identifying the nearest neighbours to a query example and using those neighbours to determine the class of the query. This approach to classification is of particular importance because issues of poor run-time performance is not such a problem these days with the computational power that is available. This paper presents an overview of techniques for Nearest Neighbour classification focusing on; mechanisms for assessing similarity (distance), computational issues in identifying nearest neighbours and mechanisms for reducing the dimension of the data. This paper is the second edition of a paper previously published as a technical report. Sections on similarity measures for time-series, retrieval speed-up and intrinsic dimensionality have been added. An Appendix is included providing access to Python code for the key methods.Comment: 22 pages, 15 figures: An updated edition of an older tutorial on kN

    DTW-Global Constraint Learning Using Tabu Search Algorithm

    Get PDF
    AbstractMany methods have been proposed to measure the similarity between time series data sets, each with advantages and weaknesses. It is to choose the most appropriate similarity measure depending on the intended application domain and data considered. The performance of machine learning algorithms depends on the metric used to compare two objects. For time series, Dynamic Time Warping (DTW) is the most appropriate distance measure used. Many variants of DTW intended to accelerate the calculation of this distance are proposed. The distance learning is a subject already well studied. Indeed Data Mining tools, such as the algorithm of k-Means clustering, and K-Nearest Neighbor classification, require the use of a similarity/distance measure. This measure must be adapted to the application domain. For this reason, it is important to have and develop effective methods of computation and algorithms that can be applied to a large data set integrating the constraints of the specific field of study. In this paper a new hybrid approach to learn a global constraint of DTW distance is proposed. This approach is based on Large Margin Nearest Neighbors classification and Tabu Search algorithm. Experiments show the effectiveness of this approach to improve time series classification results

    Learning feature extraction for learning from audio data

    Get PDF
    Today, large collections of digital music plays are available. These audio data are time series which need to be indexed and classified for diverse applications. Indexing and classification differs from time series analysis, in that it generalises several series, whereas time series analysis handles just one series a time. The classification of audio data cannot use similarity measures defined on the raw data, e.g. using time warping, or generalise the shape of the series. The appropriate similarity or generalisation for audio data requires feature extraction before classification can successfully be applied to the transformed data. Methods for extracting features that allow to classify audio data have been developed. However, the development of appropriate feature extraction methods is a tedious effort, particularly because every new classification task requires to tailor the feature set anew. Hence, we consider the construction of feature extraction methods from elementary operators itself a first learning step. We use a genetic programming approach. After the feature extraction, a second process learns a classifier from the transformed data. The practical use of the methods is shown by two types of experiments: classification of genres and classification according to user preferences --

    Free congruence: an exploration of expanded similarity measures for time series data

    Get PDF
    Time series similarity measures are highly relevant in a wide range of emerging applications including training machine learning models, classification, and predictive modeling. Standard similarity measures for time series most often involve point-to-point distance measures including Euclidean distance and Dynamic Time Warping. Such similarity measures fundamentally require the fluctuation of values in the time series being compared to follow a corresponding order or cadence for similarity to be established. This paper is spurred by the exploration of a broader definition of similarity, namely one that takes into account the sheer numerical resemblance between sets of statistical properties for time series segments irrespectively of value labeling. Further, the presence of common pattern components between time series segments was examined even if they occur in a permuted order, which would not necessarily satisfy the criteria of more conventional point-to-point distance measures. Results were compared with those of Dynamic Time Warping on the same data for context. Surprisingly, the test for the numerical resemblance between sets of statistical properties established a stronger resemblance for pairings of decline years with greater statistical significance than Dynamic Time Warping on the particular data and sample size used

    Free congruence: an exploration of expanded similarity measures for time series data

    Get PDF
    Time series similarity measures are highly relevant in a wide range of emerging applications including training machine learning models, classification, and predictive modeling. Standard similarity measures for time series most often involve point-to-point distance measures including Euclidean distance and Dynamic Time Warping. Such similarity measures fundamentally require the fluctuation of values in the time series being compared to follow a corresponding order or cadence for similarity to be established. This paper is spurred by the exploration of a broader definition of similarity, namely one that takes into account the sheer numerical resemblance between sets of statistical properties for time series segments irrespectively of value labeling. Further, the presence of common pattern components between time series segments was examined even if they occur in a permuted order, which would not necessarily satisfy the criteria of more conventional point-to-point distance measures. Results were compared with those of Dynamic Time Warping on the same data for context. Surprisingly, the test for the numerical resemblance between sets of statistical properties established a stronger resemblance for pairings of decline years with greater statistical significance than Dynamic Time Warping on the particular data and sample size used

    SIMILARITY-BASED MULTI-SOURCE TRANSFER LEARNING APPROACH FOR TIME SERIES CLASSIFICATION

    Get PDF
    This study aims to develop an effective method of classification concerning time series signals for machine state prediction to advance predictive maintenance (PdM). Conventional machine learning (ML) algorithms are widely adopted in PdM, however, most existing methods assume that the training (source) and testing (target) data follow the same distribution, and that labeled data are available in both source and target domains. For real-world PdM applications, the heterogeneity in machine original equipment manufacturers (OEMs), operating conditions, facility environment, and maintenance records collectively lead to heterogeneous distribution for data collected from different machines. This will significantly limit the performance of conventional ML algorithms in PdM. Moreover, labeling data is generally costly and time-consuming. Finally, industrial processes incorporate complex conditions, and unpredictable breakdown modes lead to extreme complexities for PdM. In this study, similarity-based multi-source transfer learning (SiMuS-TL) approach is proposed for real-time classification of time series signals. A new domain, called "mixed domain," is established to model the hidden similarities among the multiple sources and the target. The proposed SiMuS-TL model mainly includes three key steps: 1) learning group-based feature patterns, 2) developing group-based pre-trained models, and 3) weight transferring. The proposed SiMuS-TL model is validated by observing the state of the rotating machinery using a dataset collected on the Skill boss manufacturing system, publicly available standard bearing datasets, Case Western Reserve University (CWRU), and Paderborn University (PU) bearing datasets. The results of the performance comparison demonstrate that the proposed SiMuS-TL method outperformed conventional Support Vector Machine (SVM), Artificial Neural Network (ANN), and Transfer learning with neural networks (TLNN) without similarity-based transfer learning methods
    • …
    corecore