3,163 research outputs found

    GENDIS : genetic discovery of shapelets

    Get PDF
    In the time series classification domain, shapelets are subsequences that are discriminative of a certain class. It has been shown that classifiers are able to achieve state-of-the-art results by taking the distances from the input time series to different discriminative shapelets as the input. Additionally, these shapelets can be visualized and thus possess an interpretable characteristic, making them appealing in critical domains, where longitudinal data are ubiquitous. In this study, a new paradigm for shapelet discovery is proposed, which is based on evolutionary computation. The advantages of the proposed approach are that: (i) it is gradient-free, which could allow escaping from local optima more easily and supports non-differentiable objectives; (ii) no brute-force search is required, making the algorithm scalable; (iii) the total amount of shapelets and the length of each of these shapelets are evolved jointly with the shapelets themselves, alleviating the need to specify this beforehand; (iv) entire sets are evaluated at once as opposed to single shapelets, which results in smaller final sets with fewer similar shapelets that result in similar predictive performances; and (v) the discovered shapelets do not need to be a subsequence of the input time series. We present the results of the experiments, which validate the enumerated advantages

    Bridge damage detection based on vibration data: past and new developments

    Get PDF
    Overtime, bridge condition declines due to a number of degradation processes such as creep, corrosion, and cyclic loading, among others. Traditionally, vibration-based damage detection techniques in bridges have focused on monitoring changes to modal parameters. These techniques can often suffer to their sensitivity to changes in environmental and operational conditions, mistaking them as structural damage. Recent research has seen the emergence of more advanced computational techniques that not only allow the assessment of noisier and more complex data but also allow research to veer away from monitoring changes in modal parameters alone. This paper presents a review of the current state-of-the-art developments in vibration-based damage detection in small to medium span bridges with particular focus on the utilization of advanced computational methods that avoid traditional damage detection pitfalls. A case study based on the S101 bridge is also presented to test the damage sensitivity to a chosen methodology.Peer ReviewedPostprint (published version

    Robust evolutionary algorithms

    Get PDF
    Evolutionary Algorithms (EAs) have shown great potential to solve complex real world problems, but their dependence on problem specific configuration in order to obtain high quality performance prevents EAs from achieving widespread use. While it is widely accepted that statically configuring an EA is already a complex problem, dynamic configuration of an EA is a combinatorially harder problem. Evidence provided here supports the claim that EAs achieve the best results when using dynamic configurations. By designing methods that automatically configure parts of an EA or by changing how EAs work to avoid configurable aspects, EAs can be made more robust, allowing them better performance on a wider variety of problems with less requirements on the user. Two methods are presented in this thesis to increase the robustness of EAs. The first is a novel algorithm designed to automatically configure and dynamically update the recombination method which is used by the EA to exploit known information to create new solutions. The techniques used by this algorithm can likely be applied to other aspects of an EA in the future, leading to even more robust EAs. The second is an existing set of algorithms which only require a single configurable parameter. The analysis of the existing set led to the creation of a new variation, as well as a better understanding of how these algorithms work. Both methods are able to outperform more traditional EAs while also making both easier to apply to new problems. By building upon these methods, and perhaps combining them, EAs can become even more robust and become more widely used --Abstract, page iv

    A New Algorithm for Multivariate Genome Wide Association Studies Based on Differential Evolution and Extreme Learning Machines

    Get PDF
    Genome-wide association studies (GWAS) are observational studies of a large set of genetic variants, whose aim is to find those that are linked to a certain trait or illness. Due to the multivariate nature of these kinds of studies, machine learning methodologies have been already applied in them, showing good performance. This work presents a new methodology for GWAS that makes use of extreme learning machines and differential evolution. The proposed methodology was tested with the help of the genetic information (370,750 single-nucleotide polymorphisms) of 2049 individuals, 1076 of whom suffer from colorectal cancer. The possible relationship of 10 different pathways with this illness was tested. The results achieved showed that the proposed methodology is suitable for detecting relevant pathways for the trait under analysis with a lower computational cost than other machine learning methodologies previously proposed

    Doctor of Philosophy

    Get PDF
    dissertationFor decades, researchers have explored the e ects of clinical and biomolecular factors on disease outcomes and have identi ed several candidate prognostic markers. Now, thanks to technological advances, researchers have at their disposal unprecedented quantities of biomolecular data that may add to existing knowledge about prognosis. However, commensurate challenges accompany these advances. For example, sophisticated informatics techniques are necessary to store, retrieve, and analyze large data sets. Additionally, advanced algorithms may be necessary to account for the joint e ects of tens, hundreds, or thousands of variables. Moreover, it is essential that analyses evaluating such algorithms be conducted in a systematic and consistent way to ensure validity, repeatability, and comparability across studies. For this study, a novel informatics framework was developed to address these needs. Within this framework, the user can apply existing, general-purpose algorithms that are designed to make multivariate predictions for large, hetergeneous data sets. The framework also contains logic for aggregating evidence across multiple algorithms and data categories via ensemble-learning approaches. In this study, this informatics framework was applied to developing multivariate prognisis models for human glioblastoma multiforme, a highly aggressive form of brain cancer that results in a median survival of only 12-15 months. Data for this study came from The Cancer Genome Atlas, a publicly available repository containing clinical, treatment, histological, and biomolecular variables for hundreds of patients. A variety of variable-selection approaches and multivariate algorithms were applied in a cross-validated design, and the quality of the resulting models was measured using the error rate, area under the receiver operating characteristic curve, and log-rank statistic. Although performance of the algorithms varied substantially across the data categories, some models performed well for all three metrics|particularly models based on age, treatments, and DNA methylation. Also encouragingly, the performance of ensemble-learning methods often approximated the best individual results. As multimodal data sets become more prevalent, analytic approaches that account for multiple data categories and algorithms will be increasingly relevant. This study suggests that such approaches hold promise to guide researchers and clinicians in their quest to improve outcomes for devastating diseases like GBM

    SUSTAINABLE LIFETIME VALUE CREATION THROUGH INNOVATIVE PRODUCT DESIGN: A PRODUCT ASSURANCE MODEL

    Get PDF
    In the field of product development, many organizations struggle to create a value proposition that can overcome the headwinds of technology change, regulatory requirements, and intense competition, in an effort to satisfy the long-term goals of sustainability. Today, organizations are realizing that they have lost portfolio value due to poor reliability, early product retirement, and abandoned design platforms. Beyond Lean and Green Manufacturing, shareholder value can be enhanced by taking a broader perspective, and integrating sustainability innovation elements into product designs in order to improve the delivery process and extend the life of product platforms. This research is divided into two parts that lead to closing the loop towards Sustainable Value Creation in product development. The first section presents a framework for achieving Sustainable Lifetime Value through a toolset that bridges the gap between financial success and sustainable product design. Focus is placed on the analysis of the sustainable value proposition between producers, consumers, society, and the environment and the half-life of product platforms. The Half-Life Return Model is presented, designed to provide feedback to producers in the pursuit of improving the return on investment for the primary stakeholders. The second part applies the driving aspects of the framework with the development of an Adaptive Genetic Search Algorithm. The algorithm is designed to improve fault detection and mitigation during the product delivery process. A computer simulation is used to study the effectiveness of primary aspects introduced in the search algorithm, in order to attempt to improve the reliability growth of the system during the development life-cycle. The results of the analysis draw attention to the sensitivity of the driving aspects identified in the product development lifecycle, which affect the long term goals of sustainable product development. With the use of the techniques identified in this research, cost effective test case generation can be improved without a major degradation in the diversity of the search patterns required to insure a high level of fault detection. This in turn can lead to improvements in the driving aspects of the Half-Life Return Model, and ultimately the goal of designing sustainable products and processes

    Automating Large-Scale Simulation Calibration to Real-World Sensor Data

    Get PDF
    Many key decisions and design policies are made using sophisticated computer simulations. However, these sophisticated computer simulations have several major problems. The two main issues are 1) gaps between the simulation model and the actual structure, and 2) limitations of the modeling engine\u27s capabilities. This dissertation\u27s goal is to address these simulation deficiencies by presenting a general automated process for tuning simulation inputs such that simulation output matches real world measured data. The automated process involves the following key components -- 1) Identify a model that accurately estimates the real world simulation calibration target from measured sensor data; 2) Identify the key real world measurements that best estimate the simulation calibration target; 3) Construct a mapping from the most useful real world measurements to actual simulation outputs; 4) Build fast and effective simulation approximation models that predict simulation output using simulation input; 5) Build a relational model that captures inter variable dependencies between simulation inputs and outputs; and finally 6) Use the relational model to estimate the simulation input variables from the mapped sensor data, and use either the simulation model or approximate simulation model to fine tune input simulation parameter estimates towards the calibration system. The work in this dissertation individually validates and completes five out of the six calibration components with respect to the residential energy domain. Step 1 is satisfied by identifying the best model for predicting next hour residential electrical consumption, the calibration target. Step 2 is completed by identifying the most important sensors for predicting residential electrical consumption, the real world measurements. While step 3 is completed by domain experts, step 4 is addressed by using techniques from the Big Data machine learning domain to build approximations for the EnergyPlus (E+) simulator. Step 5\u27s solution leverages the same Big Data machine learning techniques to build a relational model that describes how the simulator\u27s variables are probabilistically related. Finally, step 6 is partially demonstrated by using the relational model to estimate simulation parameters for E+ simulations with known ground truth simulation inputs

    Adjustability of a discrete particle swarm optimization for the dynamic TSP

    Get PDF
    This paper presents a detailed study of the discrete particle swarm optimization algorithm (DPSO) applied to solve the dynamic traveling salesman problem which has many practical applications in planning, logistics and chip manufacturing. The dynamic version is especially important in practical applications in which new circumstances, e.g., a traffic jam or a machine failure, could force changes to the problem specification. The DPSO algorithm was enriched with a pheromone memory which is used to guide the search process similarly to the ant colony optimization algorithm. The paper extends our previous work on the DPSO algorithm in various ways. Firstly, the performance of the algorithm is thoroughly tested on a set of newly generated DTSP instances which differ in the number and the size of the changes. Secondly, the impact of the pheromone memory on the convergence of the DPSO is investigated and compared with the version without a pheromone memory. Moreover, the results are compared with two ant colony optimization algorithms, namely the (Formula presented.)–(Formula presented.) ant system (MMAS) and the population-based ant colony optimization (PACO). The results show that the DPSO is able to find high-quality solutions to the DTSP and its performance is competitive with the performance of the MMAS and the PACO algorithms. Moreover, the pheromone memory has a positive impact on the convergence of the algorithm, especially in the face of dynamic changes to the problem’s definition

    Efficient Benchmarking of Algorithm Configuration Procedures via Model-Based Surrogates

    Get PDF
    The optimization of algorithm (hyper-)parameters is crucial for achieving peak performance across a wide range of domains, ranging from deep neural networks to solvers for hard combinatorial problems. The resulting algorithm configuration (AC) problem has attracted much attention from the machine learning community. However, the proper evaluation of new AC procedures is hindered by two key hurdles. First, AC benchmarks are hard to set up. Second and even more significantly, they are computationally expensive: a single run of an AC procedure involves many costly runs of the target algorithm whose performance is to be optimized in a given AC benchmark scenario. One common workaround is to optimize cheap-to-evaluate artificial benchmark functions (e.g., Branin) instead of actual algorithms; however, these have different properties than realistic AC problems. Here, we propose an alternative benchmarking approach that is similarly cheap to evaluate but much closer to the original AC problem: replacing expensive benchmarks by surrogate benchmarks constructed from AC benchmarks. These surrogate benchmarks approximate the response surface corresponding to true target algorithm performance using a regression model, and the original and surrogate benchmark share the same (hyper-)parameter space. In our experiments, we construct and evaluate surrogate benchmarks for hyperparameter optimization as well as for AC problems that involve performance optimization of solvers for hard combinatorial problems, drawing training data from the runs of existing AC procedures. We show that our surrogate benchmarks capture overall important characteristics of the AC scenarios, such as high- and low-performing regions, from which they were derived, while being much easier to use and orders of magnitude cheaper to evaluate
    corecore