521 research outputs found

    Similarity Measures and Dimensionality Reduction Techniques for Time Series Data Mining

    Get PDF
    The chapter is organized as follows. Section 2 will introduce the similarity matching problem on time series. We will note the importance of the use of efficient data structures to perform search, and the choice of an adequate distance measure. Section 3 will show some of the most used distance measure for time series data mining. Section 4 will review the above mentioned dimensionality reduction techniques

    Systems aspects of COBE science data compression

    Get PDF
    A general approach to compression of diverse data from large scientific projects has been developed and this paper addresses the appropriate system and scientific constraints together with the algorithm development and test strategy. This framework has been implemented for the COsmic Background Explorer spacecraft (COBE) by retrofitting the existing VAS-based data management system with high-performance compression software permitting random access to the data. Algorithms which incorporate scientific knowledge and consume relatively few system resources are preferred over ad hoc methods. COBE exceeded its planned storage by a large and growing factor and the retrieval of data significantly affects the processing, delaying the availability of data for scientific usage and software test. Embedded compression software is planned to make the project tractable by reducing the data storage volume to an acceptable level during normal processing

    Advances in Manipulation and Recognition of Digital Ink

    Get PDF
    Handwriting is one of the most natural ways for a human to record knowledge. Recently, this type of human-computer interaction has received increasing attention due to the rapid evolution of touch-based hardware and software. While hardware support for digital ink reached its maturity, algorithms for recognition of handwriting in certain domains, including mathematics, are lacking robustness. Simultaneously, users may possess several pen-based devices and sharing of training data in adaptive recognition setting can be challenging. In addition, resolution of pen-based devices keeps improving making the ink cumbersome to process and store. This thesis develops several advances for efficient processing, storage and recognition of handwriting, which are applicable to the classification methods based on functional approximation. In particular, we propose improvements to classification of isolated characters and groups of rotated characters, as well as symbols of substantially different size. We then develop an algorithm for adaptive classification of handwritten mathematical characters of a user. The adaptive algorithm can be especially useful in the cloud-based recognition framework, which is described further in the thesis. We investigate whether the training data available in the cloud can be useful to a new writer during the training phase by extracting styles of individuals with similar handwriting and recommending styles to the writer. We also perform factorial analysis of the algorithm for recognition of n-grams of rotated characters. Finally, we show a fast method for compression of linear pieces of handwritten strokes and compare it with an enhanced version of the algorithm based on functional approximation of strokes. Experimental results demonstrate validity of the theoretical contributions, which form a solid foundation for the next generation handwriting recognition systems

    Intelligent Pattern Analysis of the Foetal Electrocardiogram

    Get PDF
    The aim of the project on which this thesis is based is to develop reliable techniques for foetal electrocardiogram (ECG) based monitoring, to reduce incidents of unnecessary medical intervention and foetal injury during labour. World-wide electronic foetal monitoring is based almost entirely on the cardiotocogram (CTG), which is a continuous display of the foetal heart rate (FHR) pattern together with the contraction of the womb. Despite the widespread use of the CTG, there is no significant improvement in foetal outcome. In the UK alone it is estimated that birth related negligence claims cost the health authorities over £400M per-annum. An expert system, known as INFANT, has recently been developed to assist CTG interpretation. However, the CTG alone does not always provide all the information required to improve the outcome of labour. The widespread use of ECG analysis has been hindered by the difficulties with poor signal quality and the difficulties in applying the specialised knowledge required for interpreting ECG patterns, in association with other events in labour, in an objective way. A fundamental investigation and development of optimal signal enhancement techniques that maximise the available information in the ECG signal, along with different techniques for detecting individual waveforms from poor quality signals, has been carried out. To automate the visual interpretation of the ECG waveform, novel techniques have been developed that allow reliable extraction of key features and hence allow a detailed ECG waveform analysis. Fuzzy logic is used to automatically classify the ECG waveform shape using these features by using knowledge that was elicited from expert sources and derived from example data. This allows the subtle changes in the ECG waveform to be automatically detected in relation to other events in labour, and thus improve the clinicians position for making an accurate diagnosis. To ensure the interpretation is based on reliable information and takes place in the proper context, a new and sensitive index for assessing the quality of the ECG has been developed. New techniques to capture, for the first time in machine form, the clinical expertise / guidelines for electronic foetal monitoring have been developed based on fuzzy logic and finite state machines, The software model provides a flexible framework to further develop and optimise rules for ECG pattern analysis. The signal enhancement, QRS detection and pattern recognition of important ECG waveform shapes have had extensive testing and results are presented. Results show that no significant loss of information is incurred as a result of the signal enhancement and feature extraction techniques

    RUBIK: Efficient Threshold Queries on Massive Time Series

    Get PDF
    An increasing number of applications from finance, meteorology, science and others are producing time series as output. The analysis of the vast amount of time series is key to understand the phenomena studied, particularly in the simulation sciences, where the analysis of time series resulting from simulation allows scientists to refine the model simulated. Existing approaches to query time series typically keep a compact representation in main memory, use it to answer queries approximately and then access the exact time series data on disk to validate the result. The more precise the in-memory representation, the fewer disk accesses are needed to validate the result. With the massive sizes of today's datasets, however, current in-memory representations oftentimes no longer fit into main memory. To make them fit, their precision has to be reduced considerably resulting in substantial disk access which impedes query execution today and limits scalability for even bigger datasets in the future. In this paper we develop RUBIK, a novel approach to compressing and indexing time series. RUBIK exploits that time series in many applications and particularly in the simulation sciences are similar to each other. It compresses similar time series, i.e., observation values as well as time information, achieving better space efficiency and improved precision. RUBIK translates threshold queries into two dimensional spatial queries and efficiently executes them on the compressed time series by exploiting the pruning power of a tree structure to find the result, thereby outperforming the state-of-the-art by a factor of between 6 and 23. As our experiments further indicate, exploiting similarity within and between time series is crucial to make query execution scale and to ultimately decouple query execution time from the growth of the data (size and number of time series)

    Application-Specific Number Representation

    No full text
    Reconfigurable devices, such as Field Programmable Gate Arrays (FPGAs), enable application- specific number representations. Well-known number formats include fixed-point, floating- point, logarithmic number system (LNS), and residue number system (RNS). Such different number representations lead to different arithmetic designs and error behaviours, thus produc- ing implementations with different performance, accuracy, and cost. To investigate the design options in number representations, the first part of this thesis presents a platform that enables automated exploration of the number representation design space. The second part of the thesis shows case studies that optimise the designs for area, latency or throughput from the perspective of number representations. Automated design space exploration in the first part addresses the following two major issues: ² Automation requires arithmetic unit generation. This thesis provides optimised arithmetic library generators for logarithmic and residue arithmetic units, which support a wide range of bit widths and achieve significant improvement over previous designs. ² Generation of arithmetic units requires specifying the bit widths for each variable. This thesis describes an automatic bit-width optimisation tool called R-Tool, which combines dynamic and static analysis methods, and supports different number systems (fixed-point, floating-point, and LNS numbers). Putting it all together, the second part explores the effects of application-specific number representation on practical benchmarks, such as radiative Monte Carlo simulation, and seismic imaging computations. Experimental results show that customising the number representations brings benefits to hardware implementations: by selecting a more appropriate number format, we can reduce the area cost by up to 73.5% and improve the throughput by 14.2% to 34.1%; by performing the bit-width optimisation, we can further reduce the area cost by 9.7% to 17.3%. On the performance side, hardware implementations with customised number formats achieve 5 to potentially over 40 times speedup over software implementations

    Adaptive techniques with polynomial models for segmentation, approximation and analysis of faces in video sequences

    Get PDF

    Reservoir Flooding Optimization by Control Polynomial Approximations

    Get PDF
    In this dissertation, we provide novel parametrization procedures for water-flooding production optimization problems, using polynomial approximation techniques. The methods project the original infinite dimensional controls space into a polynomial subspace. Our contribution includes new parameterization formulations using natural polynomials, orthogonal Chebyshev polynomials and Cubic spline interpolation. We show that the proposed methods are well suited for black-box approach with stochastic global-search method as they tend to produce smooth control trajectories, while reducing the solution space size. We demonstrate their efficiency on synthetic two-dimensional problems and on a realistic 3-dimensional problem. By contributing with a new adjoint method formulation for polynomial approximation, we implemented the methods also with gradient-based algorithms. In addition to fine-scale simulation, we also performed reduced order modeling, where we demonstrated a synergistic effect when combining polynomial approximation with model order reduction, that leads to faster optimization with higher gains in terms of Net Present Value. Finally, we performed gradient-based optimization under uncertainty. We proposed a new multi-objective function with three components, one that maximizes the expected value of all realizations, and two that maximize the averages of distribution tails from both sides. The new objective provides decision makers with the flexibility to choose the amount of risk they are willing to take, while deciding on production strategy or performing reserves estimation (P10;P50;P90)
    corecore