19 research outputs found

    Quantifying the Security of Recognition Passwords: Gestures and Signatures

    Get PDF
    Gesture and signature passwords are two-dimensional figures created by drawing on the surface of a touchscreen with one or more fingers. Prior results about their security have used resilience to either shoulder surfing, a human observation attack, or dictionary attacks. These evaluations restrict generalizability since the results are: non-comparable to other password systems (e.g. PINs), harder to reproduce, and attacker-dependent. Strong statements about the security of a password system use an analysis of the statistical distribution of the password space, which models a best-case attacker who guesses passwords in order of most likely to least likely. Estimating the distribution of recognition passwords is challenging because many different trials need to map to one password. In this paper, we solve this difficult problem by: (1) representing a recognition password of continuous data as a discrete alphabet set, and (2) estimating the password distribution through modeling the unseen passwords. We use Symbolic Aggregate approXimation (SAX) to represent time series data as symbols and develop Markov chains to model recognition passwords. We use a partial guessing metric, which demonstrates how many guesses an attacker needs to crack a percentage of the entire space, to compare the security of the distributions for gestures, signatures, and Android unlock patterns. We found the lower bounds of the partial guessing metric of gestures and signatures are much higher than the upper bound of the partial guessing metric of Android unlock patterns

    Identification of Global Transcriptional Dynamics

    Get PDF
    One of the challenges in exploiting high throughput measurement techniques such as microarrays is the conversion of the vast amounts of data obtained into relevant knowledge. Of particular importance is the identification of the intrinsic response of a transcriptional experiment and the characterization of the underlying dynamics.The proposed algorithm seeks to provide the researcher a summary as to various aspects relating to the dynamic progression of a biological system, rather than that of individual genes. The approach is based on the identification of smaller number of expression motifs that define the transcriptional state of the system which quantifies the deviation of the cellular response from a control state in the presence of an external perturbation. The approach is demonstrated with a number of data sets including a synthetic base case and four animal studies. The synthetic dataset will be used to establish the response of the algorithm on a "null" dataset, whereas the four different experimental datasets represent a spectrum of possible time course experiments in terms of the degree of perturbation associated with the experiment as well as representing a wide range of temporal sampling strategies. This wide range of experimental datasets will thus allow us to explore the performance of the proposed algorithm and determine its ability identify relevant information.In this work, we present a computational approach which operates on high throughput temporal gene expression data to assess the information content of the experiment, identify dynamic markers of important processes associated with the experimental perturbation, and summarize in a concise manner the evolution of the system over time with respect to the experimental perturbation

    Analyzing time series from eye tracking using Symbolic Aggregate Approximation

    Get PDF
    This thesis explores the viability of transforming the data produced when tracking the eyes into a discrete symbolic representation. For this transformation, we utilize Symbolic Aggregate Approximation to investigate a new possibility for effectively categorizing data collected via eye tracking technologies. This categorization illustrates tendencies for, e.g., tracking problems, problems with the set-up, normal vision, or vision disturbances. Accordingly, this will contribute to evaluating the eyes' performance and allow professionals to develop a diagnosis based on evidence from objective measurements. The results are based on implementing a symbolic discretization method applied to experiments on a real-world dataset containing recordings of eye movements. In the future, the knowledge and transformation via the SAX method can be utilized to make sense of data and identify anomalies implemented in various domains and for multiple stakeholders.Masteroppgave i Programutvikling samarbeid med HVLPROG399MAMN-PRO

    Time series motif discovery

    Get PDF
    Programa doutoral MAP-i em Computer ScienceTime series data are daily produced in massive proportions in virtually every field. Most of the data are stored in time series databases. To find patterns in the databases is an important problem. These patterns, also known as motifs, provide useful insight to the domain expert and summarize the database. They have been widely used in areas as diverse as finance and medicine. Despite there are many algorithms for the task, they typically do not scale and need to set several parameters. We propose a novel algorithm that runs in linear time, is also space efficient and only needs to set one parameter. It fully exploits the state of the art time series representation (SAX _ Symbolic Aggregate Approximation) technique to extract motifs at several resolutions. This property allows the algorithm to skip expensive distance calculations that are typically employed by other algorithms. We also propose an approach to calculate time series motifs statistical significance. Despite there are many approaches in the literature to find time series motifs e_ciently, surprisingly there is no approach that calculates a motifs statistical significance. Our proposal leverages work from the bioinformatics community by using a symbolic definition of time series motifs to derive each motif's p-value. We estimate the expected frequency of a motif by using Markov Chain models. The p-value is then assessed by comparing the actual frequency to the estimated one using statistical hypothesis tests. Our contribution gives means to the application of a powerful technique - statistical tests - to a time series setting. This provides researchers and practitioners with an important tool to evaluate automatically the degree of relevance of each extracted motif. Finally, we propose an approach to automatically derive the Symbolic Aggregate Approximation (iSAX) time series representation's parameters. This technique is widely used in time series data mining. Its popularity arises from the fact that it is symbolic, reduces the dimensionality of the series, allows lower bounding and is space efficient. However, the need to set the symbolic length and alphabet size parameters limits the applicability of the representation since the best parameter setting is highly application dependent. Typically, these are either set to a fixed value (e.g. 8) or experimentally probed for the best configuration. The technique, referred as AutoiSAX, not only discovers the best parameter setting for each time series in the database but also finds the alphabet size for each iSAX symbol within the same word. It is based on the simple and intuitive ideas of time series complexity and standard deviation. The technique can be smoothly embedded in existing data mining tasks as an efficient sub-routine. We analyse the impact of using AutoiSAX in visualization interpretability, classification accuracy and motif mining results. Our contribution aims to make iSAX a more general approach as it evolves towards a parameter-free method.As séries temporais são produzidas diariamente em quantidades massivas em diferentes áreas de trabalho. Estes dados são guardados em bases de dados de séries temporais. Descobrir padrões desconhecidos e repetidos em bases de dados de séries temporais é um desafio pertinente. Estes padrões, também conhecidos como motivos, dão uma nova perspectiva da base de dados, ajudando a explorá-la e sumarizá-la. São frequentemente utilizados em áreas tão diversas como as finanças ou a medicina. Apesar de existirem diversos algoritmos destinados à execução desta tarefa, geralmente não apresentam uma boa escalabilidade e exigem a configuração de vários parâmetros. Propomos, neste trabalho, a criação de um novo algoritmo que executa em tempo linear e que é igualmente eficiente em termos de memória usada, necessitando apenas de um parâmetro. Este algoritmo usufrui da melhor técnica de representação de séries temporais para extrair motivos em várias resoluções (SAX). Esta propriedade permite evitar o cálculo de distâncias que têm um custo computacional muito elevado, cálculo este geralmente presente noutros algoritmos. Nesta tese também fazemos uma proposta para calcular a significância estatística de motivos em séries temporais. Apesar de existirem muitas propostas para a detecção eficiente de motivos em séries temporais, surpreendentemente não existe nenhuma aproximação para calcular a sua significância estatística. A nossa proposta é enriquecida pelo trabalho da área bioinformática, sendo usada uma definição simbólica de motivo para derivar o seu respectivo p-value. Estimamos a frequência esperada de um motivo usando modelos de cadeias de Markov. O p-value associado a um teste estatístico é calculado comparando a frequência real com a frequência estimada de cada padrão. A nossa contribuição permite a aplicação de uma técnica poderosa, testes estatísticos, para a área das séries temporais. Proporciona assim, aos investigadores e utilizadores, uma ferramenta importante para avaliarem, de forma automática, a relevância de cada motivo extraído dos seus dados. Por fim, propomos uma metodologia para derivar de forma automática os parâmetros da representação de séries temporais Symbolic Aggregate Approximation (iSAX). Esta técnica é vastamente utilizada na área de Extracção de Conhecimento em séries temporais. A sua popularidade surge associada ao facto de ser simbólica, de reduzir o tamanho das séries, de permitir aproximar a Distância Euclidiana nas séries originais e ser eficiente em termos de espaço. Contudo, a necessidade de definir os parâmetros comprimento da representação e tamanho do alfabeto limita a sua utilização na prática, uma vez que o parâmetro mais adequado está dependente da área em causa. Normalmente, estes são definidos quer para um valor fixo (por exemplo, 8). A técnica, designada por AutoiSAX, não só extrai a melhor configuração do parâmetro para cada série temporal da base de dados como consegue encontrar a dimensão do alfabeto para cada símbolo iSAX dentro da mesma palavra. Baseia-se em ideias simples e intuitivas como a complexidade das séries temporais e no desvio padrão. A técnica pode ser facilmente incorporada como uma sub-rotina eficiente em tarefas existentes de extracção de conhecimento. Analisamos também o impacto da utilização do AutoiSAX na capacidade interpretativa em tarefas de visualização, exactidão da classificação e na qualidade dos motivos extraídos. A nossa proposta pretende que a iSAX se consolide como uma abordagem mais geral à medida que se vai constituindo como uma metodologia livre de parâmetros.Fundação para a Ciência e Tecnologia (FCT) - SFRH / BD / 33303 / 200

    Mining Predictive Patterns and Extension to Multivariate Temporal Data

    Get PDF
    An important goal of knowledge discovery is the search for patterns in the data that can help explaining its underlying structure. To be practically useful, the discovered patterns should be novel (unexpected) and easy to understand by humans. In this thesis, we study the problem of mining patterns (defining subpopulations of data instances) that are important for predicting and explaining a specific outcome variable. An example is the task of identifying groups of patients that respond better to a certain treatment than the rest of the patients. We propose and present efficient methods for mining predictive patterns for both atemporal and temporal (time series) data. Our first method relies on frequent pattern mining to explore the search space. It applies a novel evaluation technique for extracting a small set of frequent patterns that are highly predictive and have low redundancy. We show the benefits of this method on several synthetic and public datasets. Our temporal pattern mining method works on complex multivariate temporal data, such as electronic health records, for the event detection task. It first converts time series into time-interval sequences of temporal abstractions and then mines temporal patterns backwards in time, starting from patterns related to the most recent observations. We show the benefits of our temporal pattern mining method on two real-world clinical tasks

    Mining previously unknown patterns in time series data

    Get PDF
    The emerging importance of distributed computing systems raises the needs of gaining a better understanding of system performance. As a major indicator of system performance, analysing CPU host load helps evaluate system performance in many ways. Discovering similar patterns in CPU host load is very useful since many applications rely on the pattern mined from the CPU host load, such as pattern-based prediction, classification and relative rule mining of CPU host load. Essentially, the problem of mining patterns in CPU host load is mining the time series data. Due to the complexity of the problem, many traditional mining techniques for time series data are not suitable anymore. Comparing to mining known patterns in time series, mining unknown patterns is a much more challenging task. In this thesis, we investigate the major difficulties of the problem and develop the techniques for mining unknown patterns by extending the traditional techniques of mining the known patterns. In this thesis, we develop two different CPU host load discovery methods: the segment-based method and the reduction-based method to optimize the pattern discovery process. The segment-based method works by extracting segment features while the reduction-based method works by reducing the size of raw data. The segment-based pattern discovery method maps the CPU host load segments to a 5-dimension space, then applies the DBSCAN clustering method to discover similar segments. The reduction-based method reduces the dimensionality and numerosity of the CPU host load to reduce the search space. A cascade method is proposed to support accurate pattern mining while maintaining efficiency. The investigations into the CPU host load data inspired us to further develop a pattern mining algorithm for general time series data. The method filters out the unlikely starting positions for reoccurring patterns at the early stage and then iteratively locates all best-matching patterns. The results obtained by our method do not contain any meaningless patterns, which has been a different problematic issue for a long time. Comparing to the state of art techniques, our method is more efficient and effective in most scenarios

    Measuring Expressive Music Performances: a Performance Science Model using Symbolic Approximation

    Get PDF
    Music Performance Science (MPS), sometimes termed systematic musicology in Northern Europe, is concerned with designing, testing and applying quantitative measurements to music performances. It has applications in art musics, jazz and other genres. It is least concerned with aesthetic judgements or with ontological considerations of artworks that stand alone from their instantiations in performances. Musicians deliver expressive performances by manipulating multiple, simultaneous variables including, but not limited to: tempo, acceleration and deceleration, dynamics, rates of change of dynamic levels, intonation and articulation. There are significant complexities when handling multivariate music datasets of significant scale. A critical issue in analyzing any types of large datasets is the likelihood of detecting meaningless relationships the more dimensions are included. One possible choice is to create algorithms that address both volume and complexity. Another, and the approach chosen here, is to apply techniques that reduce both the dimensionality and numerosity of the music datasets while assuring the statistical significance of results. This dissertation describes a flexible computational model, based on symbolic approximation of timeseries, that can extract time-related characteristics of music performances to generate performance fingerprints (dissimilarities from an ‘average performance’) to be used for comparative purposes. The model is applied to recordings of Arnold Schoenberg’s Phantasy for Violin with Piano Accompaniment, Opus 47 (1949), having initially been validated on Chopin Mazurkas.1 The results are subsequently used to test hypotheses about evolution in performance styles of the Phantasy since its composition. It is hoped that further research will examine other works and types of music in order to improve this model and make it useful to other music researchers. In addition to its benefits for performance analysis, it is suggested that the model has clear applications at least in music fraud detection, Music Information Retrieval (MIR) and in pedagogical applications for music education

    FAULT DETECTION AND PREDICTION IN ELECTROMECHANICAL SYSTEMS VIA THE DISCRETIZED STATE VECTOR-BASED PATTERN ANALYSIS OF MULTI-SENSOR SIGNALS

    Get PDF
    Department of System Design and Control EngineeringIn recent decades, operation and maintenance strategies for industrial applications have evolved from corrective maintenance and preventive maintenance, to condition-based monitoring and eventually predictive maintenance. High performance sensors and data logging technologies have enabled us to monitor the operational states of systems and predict fault occurrences. Several time series analysis methods have been proposed in the literature to classify system states via multi-sensor signals. Since the time series of sensor signals is often characterized as very-short, intermittent, transient, highly nonlinear, and non-stationary random signals, they make time series analyses more complex. Therefore, time series discretization has been popularly applied to extract meaningful features from original complex signals. There are several important issues to be addressed in discretization for fault detection and prediction: (i) What is the fault pattern that represents a system???s faulty states, (ii) How can we effectively search for fault patterns, (iii) What is a symptom pattern to predict fault occurrences, and (iv) What is a systematic procedure for online fault detection and prediction. In this regard, this study proposes a fault detection and prediction framework that consists of (i) definition of system???s operational states, (ii) definitions of fault and symptom patterns, (iii) multivariate discretization, (iv) severity and criticality analyses, and (v) online detection and prediction procedures. Given the time markers of fault occurrences, we can divide a system???s operational states into fault and no-fault states. We postulate that a symptom state precedes the occurrence of a fault within a certain time period and hence a no-fault state consists of normal and symptom states. Fault patterns are therefore found only in fault states, whereas symptom patterns are either only found in the system???s symptom states (being absent in the normal states) or not found in the given time series, but similar to fault patterns. To determine the length of a symptom state, we present a symptom pattern-based iterative search method. In order to identify the distinctive behaviors of multi-sensor signals, we propose a multivariate discretization approach that consists mainly of label definition, label specification, and event codification. Discretization parameters are delicately controlled by considering the key characteristics of multi-sensor signals. We discuss how to measure the severity degrees of fault and symptom patterns, and how to assess the criticalities of fault states. We apply the fault and symptom pattern extraction and severity assessment methods to online fault detection and prediction. Finally, we demonstrate the performance of the proposed framework through the following six case studies: abnormal cylinder temperature in a marine diesel engine, automotive gasoline engine knockings, laser weld defects, buzz, squeak, and rattle (BSR) noises from a car door trim (using a typical acoustic sensor array and using acoustic emission sensors respectively), and visual stimuli cognition tests by the P300 experiment.ope
    corecore