566 research outputs found

    Applications of high-frequency telematics for driving behavior analysis

    Get PDF
    A thesis submitted in partial fulfillment of the requirements for the degree of Doctor in Information Management, specialization in Statistics and EconometricsProcessing driving data and investigating driving behavior has been receiving an increasing interest in the last decades, with applications ranging from car insurance pricing to policy-making. A popular way of analyzing driving behavior is to move the focus to the maneuvers as they give useful information about the driver who is performing them. Previous research on maneuver detection can be divided into two strategies, namely, 1) using fixed thresholds in inertial measurements to define the start and end of specific maneuvers or 2) using features extracted from rolling windows of sensor data in a supervised learning model to detect maneuvers. While the first strategy is not adaptable and requires fine-tuning, the second needs a dataset with labels (which is time-consuming) and cannot identify maneuvers with different lengths in time. To tackle these shortcomings, we investigate a new way of identifying maneuvers from vehicle telematics data, through motif detection in time-series. Using a publicly available naturalistic driving dataset (the UAH-DriveSet), we conclude that motif detection algorithms are not only capable of extracting simple maneuvers such as accelerations, brakes, and turns, but also more complex maneuvers, such as lane changes and overtaking maneuvers, thus validating motif discovery as a worthwhile line for future research in driving behavior. We also propose TripMD, a system that extracts the most relevant driving patterns from sensor recordings (such as acceleration) and provides a visualization that allows for an easy investigation. We test TripMD in the same UAH-DriveSet dataset and show that (1) our system can extract a rich number of driving patterns from a single driver that are meaningful to understand driving behaviors and (2) our system can be used to identify the driving behavior of an unknown driver from a set of drivers whose behavior we know.Nas últimas décadas, o processamento e análise de dados de condução tem recebido um interesse cada vez maior, com aplicações que abrangem a área de seguros de automóveis até a atea de regulação. Tipicamente, a análise de condução compreende a extração e estudo de manobras uma vez que estas contêm informação relevante sobre a performance do condutor. A investigação prévia sobre este tema pode ser dividida em dois tipos de estratégias, a saber, 1) o uso de valores fixos de aceleração para definir o início e fim de cada manobra ou 2) a utilização de modelos de aprendizagem supervisionada em janelas temporais. Enquanto o primeiro tipo de estratégias é inflexível e requer afinação dos parâmetros, o segundo precisa de dados de condução anotados (o que é moroso) e não é capaz de identificar manobras de diferentes durações. De forma a mitigar estas lacunas, neste trabalho, aplicamos métodos desenvolvidos na área de investigação de séries temporais de forma a resolver o problema de deteção de manobras. Em particular, exploramos área de deteção de motifs em séries temporais e testamos se estes métodos genéricos são bem-sucedidos na deteção de manobras. Também propomos o TripMD, um sistema que extrai os padrões de condução mais relevantes de um conjuntos de viagens e fornece uma simples visualização. TripMD é testado num conjunto de dados públicos (o UAH-DriveSet), do qual concluímos que (1) o nosso sistema é capaz de extrair padrões de condução/manobras de um único condutor que estão relacionados com o perfil de condução do condutor em questão e (2) o nosso sistema pode ser usado para identificar o perfil de condução de um condutor desconhecido de um conjunto de condutores cujo comportamento nos é conhecido

    Identifying networks with common organizational principles

    Full text link
    Many complex systems can be represented as networks, and the problem of network comparison is becoming increasingly relevant. There are many techniques for network comparison, from simply comparing network summary statistics to sophisticated but computationally costly alignment-based approaches. Yet it remains challenging to accurately cluster networks that are of a different size and density, but hypothesized to be structurally similar. In this paper, we address this problem by introducing a new network comparison methodology that is aimed at identifying common organizational principles in networks. The methodology is simple, intuitive and applicable in a wide variety of settings ranging from the functional classification of proteins to tracking the evolution of a world trade network.Comment: 26 pages, 7 figure

    A survey of DNA motif finding algorithms

    Get PDF
    Background: Unraveling the mechanisms that regulate gene expression is a major challenge in biology. An important task in this challenge is to identify regulatory elements, especially the binding sites in deoxyribonucleic acid (DNA) for transcription factors. These binding sites are short DNA segments that are called motifs. Recent advances in genome sequence availability and in high-throughput gene expression analysis technologies have allowed for the development of computational methods for motif finding. As a result, a large number of motif finding algorithms have been implemented and applied to various motif models over the past decade. This survey reviews the latest developments in DNA motif finding algorithms.Results: Earlier algorithms use promoter sequences of coregulated genes from single genome and search for statistically overrepresented motifs. Recent algorithms are designed to use phylogenetic footprinting or orthologous sequences and also an integrated approach where promoter sequences of coregulated genes and phylogenetic footprinting are used. All the algorithms studied have been reported to correctly detect the motifs that have been previously detected by laboratory experimental approaches, and some algorithms were able to find novel motifs. However, most of these motif finding algorithms have been shown to work successfully in yeast and other lower organisms, but perform significantly worse in higher organisms.Conclusion: Despite considerable efforts to date, DNA motif finding remains a complex challenge for biologists and computer scientists. Researchers have taken many different approaches in developing motif discovery tools and the progress made in this area of research is very encouraging. Performance comparison of different motif finding tools and identification of the best tools have proven to be a difficult task because tools are designed based on algorithms and motif models that are diverse and complex and our incomplete understanding of the biology of regulatory mechanism does not always provide adequate evaluation of underlying algorithms over motif models.Peer reviewedComputer Scienc

    Exploring time-series motifs through DTW-SOM

    Full text link
    Motif discovery is a fundamental step in data mining tasks for time-series data such as clustering, classification and anomaly detection. Even though many papers have addressed the problem of how to find motifs in time-series by proposing new motif discovery algorithms, not much work has been done on the exploration of the motifs extracted by these algorithms. In this paper, we argue that visually exploring time-series motifs computed by motif discovery algorithms can be useful to understand and debug results. To explore the output of motif discovery algorithms, we propose the use of an adapted Self-Organizing Map, the DTW-SOM, on the list of motif's centers. In short, DTW-SOM is a vanilla Self-Organizing Map with three main differences, namely (1) the use the Dynamic Time Warping distance instead of the Euclidean distance, (2) the adoption of two new network initialization routines (a random sample initialization and an anchor initialization) and (3) the adjustment of the Adaptation phase of the training to work with variable-length time-series sequences. We test DTW-SOM in a synthetic motif dataset and two real time-series datasets from the UCR Time Series Classification Archive. After an exploration of results, we conclude that DTW-SOM is capable of extracting relevant information from a set of motifs and display it in a visualization that is space-efficient.Comment: 8 pages, 12 figures, Accepted for presentation at the International Joint Conference on Neural Networks (IJCNN) 202

    Transcription Factor-DNA Binding Via Machine Learning Ensembles

    Full text link
    We present ensemble methods in a machine learning (ML) framework combining predictions from five known motif/binding site exploration algorithms. For a given TF the ensemble starts with position weight matrices (PWM's) for the motif, collected from the component algorithms. Using dimension reduction, we identify significant PWM-based subspaces for analysis. Within each subspace a machine classifier is built for identifying the TF's gene (promoter) targets (Problem 1). These PWM-based subspaces form an ML-based sequence analysis tool. Problem 2 (finding binding motifs) is solved by agglomerating k-mer (string) feature PWM-based subspaces that stand out in identifying gene targets. We approach Problem 3 (binding sites) with a novel machine learning approach that uses promoter string features and ML importance scores in a classification algorithm locating binding sites across the genome. For target gene identification this method improves performance (measured by the F1 score) by about 10 percentage points over the (a) motif scanning method and (b) the coexpression-based association method. Top motif outperformed 5 component algorithms as well as two other common algorithms (BEST and DEME). For identifying individual binding sites on a benchmark cross species database (Tompa et al., 2005) we match the best performer without much human intervention. It also improved the performance on mammalian TFs. The ensemble can integrate orthogonal information from different weak learners (potentially using entirely different types of features) into a machine learner that can perform consistently better for more TFs. The TF gene target identification component (problem 1 above) is useful in constructing a transcriptional regulatory network from known TF-target associations. The ensemble is easily extendable to include more tools as well as future PWM-based information.Comment: 33 page

    Eddy current defect response analysis using sum of Gaussian methods

    Get PDF
    This dissertation is a study of methods to automatedly detect and produce approximations of eddy current differential coil defect signatures in terms of a summed collection of Gaussian functions (SoG). Datasets consisting of varying material, defect size, inspection frequency, and coil diameter were investigated. Dimensionally reduced representations of the defect responses were obtained utilizing common existing reduction methods and novel enhancements to them utilizing SoG Representations. Efficacy of the SoG enhanced representations were studied utilizing common Machine Learning (ML) interpretable classifier designs with the SoG representations indicating significant improvement of common analysis metrics

    Design, synthesis and molecular modeling studies of drug candidate compounds against prion diseases

    Get PDF
    Prion diseases are a group of invariably fatal disorders, for which there is no cure. Despite their rare incidence in humans, prion diseases have captured very large attention from the scientific community due to the unconventional mechanism by which they are transmitted.1 The central feature of prion diseases is the accumulation in the brain and some other tissues of the diseaseassociated PrPSc, which is derived from the host-encoded cellular PrPC.1 The conversion from a normal form (PrPC) to an infectious isoform (scrapie, PrPSc) is triggered by the interaction between PrPC-PrPSc,2 as well as protein-protein interaction (PPI).3...
    corecore