785 research outputs found

    Ranking and significance of variable-length similarity-based time series motifs

    Get PDF
    The detection of very similar patterns in a time series, commonly called motifs, has received continuous and increasing attention from diverse scientific communities. In particular, recent approaches for discovering similar motifs of different lengths have been proposed. In this work, we show that such variable-length similarity-based motifs cannot be directly compared, and hence ranked, by their normalized dissimilarities. Specifically, we find that length-normalized motif dissimilarities still have intrinsic dependencies on the motif length, and that lowest dissimilarities are particularly affected by this dependency. Moreover, we find that such dependencies are generally non-linear and change with the considered data set and dissimilarity measure. Based on these findings, we propose a solution to rank those motifs and measure their significance. This solution relies on a compact but accurate model of the dissimilarity space, using a beta distribution with three parameters that depend on the motif length in a non-linear way. We believe the incomparability of variable-length dissimilarities could go beyond the field of time series, and that similar modeling strategies as the one used here could be of help in a more broad context.Comment: 20 pages, 10 figure

    One-Step or Two-Step Optimization and the Overfitting Phenomenon: A Case Study on Time Series Classification

    Get PDF
    For the last few decades, optimization has been developing at a fast rate. Bio-inspired optimization algorithms are metaheuristics inspired by nature. These algorithms have been applied to solve different problems in engineering, economics, and other domains. Bio-inspired algorithms have also been applied in different branches of information technology such as networking and software engineering. Time series data mining is a field of information technology that has its share of these applications too. In previous works we showed how bio-inspired algorithms such as the genetic algorithms and differential evolution can be used to find the locations of the breakpoints used in the symbolic aggregate approximation of time series representation, and in another work we showed how we can utilize the particle swarm optimization, one of the famous bio-inspired algorithms, to set weights to the different segments in the symbolic aggregate approximation representation. In this paper we present, in two different approaches, a new meta optimization process that produces optimal locations of the breakpoints in addition to optimal weights of the segments. The experiments of time series classification task that we conducted show an interesting example of how the overfitting phenomenon, a frequently encountered problem in data mining which happens when the model overfits the training set, can interfere in the optimization process and hide the superior performance of an optimization algorithm

    MISSEL: a method to identify a large number of small species-specific genomic subsequences and its application to viruses classification

    Get PDF
    Continuous improvements in next generation sequencing technologies led to ever-increasing collections of genomic sequences, which have not been easily characterized by biologists, and whose analysis requires huge computational effort. The classification of species emerged as one of the main applications of DNA analysis and has been addressed with several approaches, e.g., multiple alignments-, phylogenetic trees-, statistical- and character-based methods

    New Trends in Artificial Intelligence: Applications of Particle Swarm Optimization in Biomedical Problems

    Get PDF
    Optimization is a process to discover the most effective element or solution from a set of all possible resources or solutions. Currently, there are various biological problems such as extending from biomolecule structure prediction to drug discovery that can be elevated by opting standard protocol for optimization. Particle swarm optimization (PSO) process, purposed by Dr. Eberhart and Dr. Kennedy in 1995, is solely based on population stochastic optimization technique. This method was designed by the researchers after inspired by social behavior of flocking bird or schooling fishes. This method shares numerous resemblances with the evolutionary computation procedures such as genetic algorithms (GA). Since, PSO algorithms is easy process to subject with minor adjustment of a few restrictions, it has gained more attention or advantages over other population based algorithms. Hence, PSO algorithms is widely used in various research fields like ranging from artificial neural network training to other areas where GA can be used in the system

    Current Studies and Applications of Krill Herd and Gravitational Search Algorithms in Healthcare

    Full text link
    Nature-Inspired Computing or NIC for short is a relatively young field that tries to discover fresh methods of computing by researching how natural phenomena function to find solutions to complicated issues in many contexts. As a consequence of this, ground-breaking research has been conducted in a variety of domains, including synthetic immune functions, neural networks, the intelligence of swarm, as well as computing of evolutionary. In the domains of biology, physics, engineering, economics, and management, NIC techniques are used. In real-world classification, optimization, forecasting, and clustering, as well as engineering and science issues, meta-heuristics algorithms are successful, efficient, and resilient. There are two active NIC patterns: the gravitational search algorithm and the Krill herd algorithm. The study on using the Krill Herd Algorithm (KH) and the Gravitational Search Algorithm (GSA) in medicine and healthcare is given a worldwide and historical review in this publication. Comprehensive surveys have been conducted on some other nature-inspired algorithms, including KH and GSA. The various versions of the KH and GSA algorithms and their applications in healthcare are thoroughly reviewed in the present article. Nonetheless, no survey research on KH and GSA in the healthcare field has been undertaken. As a result, this work conducts a thorough review of KH and GSA to assist researchers in using them in diverse domains or hybridizing them with other popular algorithms. It also provides an in-depth examination of the KH and GSA in terms of application, modification, and hybridization. It is important to note that the goal of the study is to offer a viewpoint on GSA with KH, particularly for academics interested in investigating the capabilities and performance of the algorithm in the healthcare and medical domains.Comment: 35 page

    Methods for Quantitative Local Structure Analysis of Crystalline Materials Employing High Performance Computing

    Get PDF
    A fundamental computational methodology was investigated to extract quantitative local structure information from single crystal diffuse scattering data. The principles of a highly efficient, parallelizable local structure analysis using massively parallel computing resources at Oak Ridge National Laboratory (ORNL) are demonstrated on an organic hydrocarbon compound containing stacking faults, Tris(bicyclo[2.1.1]hexeno)benzene. A probabilistic model of the stacking variations with a five layer interaction depth was developed. The final model structure motif statistics are verified using the steady state distribution of Markov matrix representing the four to five layer transitions. The computations revealed that highly parallelizable “structure-clones” could replace less computationally efficient “structure lots”. Further testing of the method is under way, using a new comprehensive modeling software suite ZODS (Zürich Oak Ridge Disorder Simulations) developed in Zürich, on synchrotron and lab X-Ray data of a highly efficient light-upconversion member of the NaLnF44 [Sodium Lanthanide tetra fluoride] family. Initially, a synchrotron data set was collected at the high resolution Swiss-Norwegian Beam Line at the European Synchrotron Radiation Facility and is being analyzed. High resolution neutron diffraction data were recently collected at the time-of-flight Laue single crystal diffractometer TOPAZ at the Spallation Neutron Source at ORNL using the newly available event-mode processing. Currently, exploration of the event-mode data treatment and event based corrections for data preparation are under way. Simultaneous massively parallel local structure simulations of NaLaF4 [Sodium Lanthanum tetra fluoride] using ZODS on the National Energy Research Scientific Computing Center are in progress. A step-wise modeling approach was adopted. The largest contributors to the X-Ray diffuse scattering, La2 [Lanthanum 2] and Na2 [Sodium 2] column neighbor interactions were modeled first, followed by F1 [Fluorine 1] shift from its average position toward La [Lanthanum] and away from Na [Sodium]. This work provides a basis for streamlining diffuse scattering analysis and yields a quantitative interpretation of the local atomic arrangement of crystalline materials, which may provide valuable information for interpreting their structure property relationships

    Discovery of motifs to forecast outlier occurrence in time series

    Get PDF
    The forecasting process of real-world time series has to deal with especially unexpected values, commonly known as outliers. Outliers in time series can lead to unreliable modeling and poor forecasts. Therefore, the identification of future outlier occurrence is an essential task in time series analysis to reduce the average forecasting error. The main goal of this work is to predict the occurrence of outliers in time series, based on the discovery of motifs. In this sense, motifs will be those pattern sequences preceding certain data marked as anomalous by the proposed metaheuristic in a training set. Once the motifs are discovered, if data to be predicted are preceded by any of them, such data are identified as outliers, and treated separately from the rest of regular data. The forecasting of outlier occurrence has been added as an additional step in an existing time series forecasting algorithm (PSF), which was based on pattern sequence similarities. Robust statistical methods have been used to evaluate the accuracy of the proposed approach regarding the forecasting of both occurrence of outliers and their corresponding values. Finally, the methodology has been tested on six electricity-related time series, in which most of the outliers were properly found and forecasted.Ministerio de Ciencia y Tecnología TIN2007- 68084-C-00Junta de Andalucia P07-TIC- 0261
    corecore