402 research outputs found

    Biometric signals compression with time- and subject-adaptive dictionary for wearable devices

    Get PDF
    This thesis work is dedicated to the design of a lightweight compression technique for the real-time processing of biomedical signals in wearable devices. The proposed approach exploits the unsupervised learning algorithm of the time-adaptive self-organizing map (TASOM) to create a subject-adaptive codebook applied to the vector quantization of a signal. The codebook is obtained and then dynamically refined in an online fashion, without requiring any prior information on the signal itsel

    Querying and Efficiently Searching Large, Temporal Text Corpora

    Get PDF

    An Experimental Evaluation of Time Series Classification Using Various Distance Measures

    Get PDF
    In recent years a vast number of distance measures for time series classification has been proposed. Obviously, the definition of a distance measure is crucial to further data mining tasks, thus there is a need to decide which measure should we choose for a particular dataset. The objective of this study is to provide a comprehensive comparison of 26 distance measures enriched with extensive statistical analysis. We compare different kinds of distance measures: shape-based, edit-based, feature-based and structure-based. Experimental results carried out on 34 benchmark datasets from UCR Time Series Classification Archive are provided. We use an one nearest neighbour (1NN) classifier to compare the efficiency of the examined measures. Computation times were taken into consideration as well

    Data-driven fault detection using trending analysis

    Get PDF
    The objective of this research is to develop data-driven fault detection methods which do not rely on mathematical models yet are capable of detecting process malfunctions. Instead of using mathematical models for comparing performances, the methods developed rely on extensive collection of data to establish classification schemes that detect faults in new data. The research develops two different trending approaches. One uses the normal data to define a one-class classifier. The second approach uses a data mining technique, e.g. support vector machine (SVM) to define multi class classifiers. Each classifier is trained on a set of example objects. The one-class classification assumes that only information of one of the classes, namely the normal class, is available. The boundary between the two classes, normal and faulty, is estimated from data of the normal class only. The research assumes that the convex hull of the normal data can be used to define a boundary separating normal and faulty data. The multi class classifier is implemented through several binary classifiers. It is assumed that data from two classes are available and the decision boundary is supported from both sides by example objects. In order to detect significant trends in the data the research implements a non-uniform quantization technique, based on Lloyd’s algorithm and defines a special subsequence-based kernel. The effect of the subsequence length is examined through computer simulations and theoretical analysis. The test bed used to collect data and implement the fault detection is a six degrees of freedom, rigid body model of a B747 100/200 and only faults in the actuators are considered. In order to thoroughly test the efficiency of the approach, the test use only sensor data that does not include manipulated variables. Even with this handicap the approach is effective with the average of 79.5% correct detection and 16.7% missed alarm and 3.9% false alarms for six different faults

    Database Matching Under Noisy Synchronization Errors

    Full text link
    The re-identification or de-anonymization of users from anonymized data through matching with publicly available correlated user data has raised privacy concerns, leading to the complementary measure of obfuscation in addition to anonymization. Recent research provides a fundamental understanding of the conditions under which privacy attacks, in the form of database matching, are successful in the presence of obfuscation. Motivated by synchronization errors stemming from the sampling of time-indexed databases, this paper presents a unified framework considering both obfuscation and synchronization errors and investigates the matching of databases under noisy entry repetitions. By investigating different structures for the repetition pattern, replica detection and seeded deletion detection algorithms are devised and sufficient and necessary conditions for successful matching are derived. Finally, the impacts of some variations of the underlying assumptions, such as the adversarial deletion model, seedless database matching, and zero-rate regime, on the results are discussed. Overall, our results provide insights into the privacy-preserving publication of anonymized and obfuscated time-indexed data as well as the closely related problem of the capacity of synchronization channels

    NEW METHODS FOR MINING SEQUENTIAL AND TIME SERIES DATA

    Get PDF
    Data mining is the process of extracting knowledge from large amounts of data. It covers a variety of techniques aimed at discovering diverse types of patterns on the basis of the requirements of the domain. These techniques include association rules mining, classification, cluster analysis and outlier detection. The availability of applications that produce massive amounts of spatial, spatio-temporal (ST) and time series data (TSD) is the rationale for developing specialized techniques to excavate such data. In spatial data mining, the spatial co-location rule problem is different from the association rule problem, since there is no natural notion of transactions in spatial datasets that are embedded in continuous geographic space. Therefore, we have proposed an efficient algorithm (GridClique) to mine interesting spatial co-location patterns (maximal cliques). These patterns are used as the raw transactions for an association rule mining technique to discover complex co-location rules. Our proposal includes certain types of complex relationships – especially negative relationships – in the patterns. The relationships can be obtained from only the maximal clique patterns, which have never been used until now. Our approach is applied on a well-known astronomy dataset obtained from the Sloan Digital Sky Survey (SDSS). ST data is continuously collected and made accessible in the public domain. We present an approach to mine and query large ST data with the aim of finding interesting patterns and understanding the underlying process of data generation. An important class of queries is based on the flock pattern. A flock is a large subset of objects moving along paths close to each other for a predefined time. One approach to processing a “flock query” is to map ST data into high-dimensional space and to reduce the query to a sequence of standard range queries that can be answered using a spatial indexing structure; however, the performance of spatial indexing structures rapidly deteriorates in high-dimensional space. This thesis sets out a preprocessing strategy that uses a random projection to reduce the dimensionality of the transformed space. We use probabilistic arguments to prove the accuracy of the projection and to present experimental results that show the possibility of managing the curse of dimensionality in a ST setting by combining random projections with traditional data structures. In time series data mining, we devised a new space-efficient algorithm (SparseDTW) to compute the dynamic time warping (DTW) distance between two time series, which always yields the optimal result. This is in contrast to other approaches which typically sacrifice optimality to attain space efficiency. The main idea behind our approach is to dynamically exploit the existence of similarity and/or correlation between the time series: the more the similarity between the time series, the less space required to compute the DTW between them. Other techniques for speeding up DTW, impose a priori constraints and do not exploit similarity characteristics that may be present in the data. Our experiments demonstrate that SparseDTW outperforms these approaches. We discover an interesting pattern by applying SparseDTW algorithm: “pairs trading” in a large stock-market dataset, of the index daily prices from the Australian stock exchange (ASX) from 1980 to 2002

    High-throughput DNA sequence data compression

    Get PDF

    A Data-driven, Piecewise Linear Approach to Modeling Human Motions

    Get PDF
    Motion capture, or mocap, is a prevalent technique for capturing and analyzing human articulations. Nowadays, mocap data are becoming one of the primary sources of realistic human motions for computer animation as well as education, training, sports medicine, video games, and special effects in movies. As more and more applications rely on high-quality mocap data and huge amounts of mocap data become available, there are imperative needs for more effective and robust motion capture techniques, better ways of organizing motion databases, as well as more efficient methods to compress motion sequences. I propose a data-driven, segment-based, piecewise linear modeling approach to exploit the redundancy and local linearity exhibited by human motions and describe human motions with a small number of parameters. This approach models human motions with a collection of low-dimensional local linear models. I first segment motion sequences into subsequences, i.e. motion segments, of simple behaviors. Motion segments of similar behaviors are then grouped together and modeled with a unique local linear model. I demonstrate this approach's utility in four challenging driving problems: estimating human motions from a reduced marker set; missing marker estimation; motion retrieval; and motion compression

    Eddy current defect response analysis using sum of Gaussian methods

    Get PDF
    This dissertation is a study of methods to automatedly detect and produce approximations of eddy current differential coil defect signatures in terms of a summed collection of Gaussian functions (SoG). Datasets consisting of varying material, defect size, inspection frequency, and coil diameter were investigated. Dimensionally reduced representations of the defect responses were obtained utilizing common existing reduction methods and novel enhancements to them utilizing SoG Representations. Efficacy of the SoG enhanced representations were studied utilizing common Machine Learning (ML) interpretable classifier designs with the SoG representations indicating significant improvement of common analysis metrics
    • …
    corecore