97 research outputs found

    Computational Molecular Biology

    No full text
    Computational Biology is a fairly new subject that arose in response to the computational problems posed by the analysis and the processing of biomolecular sequence and structure data. The field was initiated in the late 60's and early 70's largely by pioneers working in the life sciences. Physicists and mathematicians entered the field in the 70's and 80's, while Computer Science became involved with the new biological problems in the late 1980's. Computational problems have gained further importance in molecular biology through the various genome projects which produce enormous amounts of data. For this bibliography we focus on those areas of computational molecular biology that involve discrete algorithms or discrete optimization. We thus neglect several other areas of computational molecular biology, like most of the literature on the protein folding problem, as well as databases for molecular and genetic data, and genetic mapping algorithms. Due to the availability of review papers and a bibliography this bibliography

    MTFuzz: Fuzzing with a Multi-Task Neural Network

    Full text link
    Fuzzing is a widely used technique for detecting software bugs and vulnerabilities. Most popular fuzzers generate new inputs using an evolutionary search to maximize code coverage. Essentially, these fuzzers start with a set of seed inputs, mutate them to generate new inputs, and identify the promising inputs using an evolutionary fitness function for further mutation. Despite their success, evolutionary fuzzers tend to get stuck in long sequences of unproductive mutations. In recent years, machine learning (ML) based mutation strategies have reported promising results. However, the existing ML-based fuzzers are limited by the lack of quality and diversity of the training data. As the input space of the target programs is high dimensional and sparse, it is prohibitively expensive to collect many diverse samples demonstrating successful and unsuccessful mutations to train the model. In this paper, we address these issues by using a Multi-Task Neural Network that can learn a compact embedding of the input space based on diverse training samples for multiple related tasks (i.e., predicting for different types of coverage). The compact embedding can guide the mutation process by focusing most of the mutations on the parts of the embedding where the gradient is high. \tool uncovers 1111 previously unseen bugs and achieves an average of 2Ɨ2\times more edge coverage compared with 5 state-of-the-art fuzzer on 10 real-world programs.Comment: ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE) 202

    Using evolutionary algorithms for the unit testing of object-oriented software

    Full text link

    Evolving Code with A Large Language Model

    Full text link
    Algorithms that use Large Language Models (LLMs) to evolve code arrived on the Genetic Programming (GP) scene very recently. We present LLM GP, a formalized LLM-based evolutionary algorithm designed to evolve code. Like GP, it uses evolutionary operators, but its designs and implementations of those operators radically differ from GP's because they enlist an LLM, using prompting and the LLM's pre-trained pattern matching and sequence completion capability. We also present a demonstration-level variant of LLM GP and share its code. By addressing algorithms that range from the formal to hands-on, we cover design and LLM-usage considerations as well as the scientific challenges that arise when using an LLM for genetic programming.Comment: 34 pages, 9 figures, 6 Table

    Fine-grained annotation and classification of de novo predicted LTR retrotransposons

    Get PDF
    Long terminal repeat (LTR) retrotransposons and endogenous retroviruses (ERVs) are transposable elements in eukaryotic genomes well suited for computational identification. De novo identification tools determine the position of potential LTR retrotransposon or ERV insertions in genomic sequences. For further analysis, it is desirable to obtain an annotation of the internal structure of such candidates. This article presents LTRdigest, a novel software tool for automated annotation of internal features of putative LTR retrotransposons. It uses local alignment and hidden Markov model-based algorithms to detect retrotransposon-associated protein domains as well as primer binding sites and polypurine tracts. As an example, we used LTRdigest results to identify 88 (near) full-length ERVs in the chromosome 4 sequence of Mus musculus, separating them from truncated insertions and other repeats. Furthermore, we propose a work flow for the use of LTRdigest in de novo LTR retrotransposon classification and perform an exemplary de novo analysis on the Drosophila melanogaster genome as a proof of concept. Using a new method solely based on the annotations generated by LTRdigest, 518 potential LTR retrotransposons were automatically assigned to 62 candidate groups. Representative sequences from 41 of these 62 groups were matched to reference sequences with >80% global sequence similarity

    Linearized biogeography-based optimization with re-initialization and local search

    Get PDF
    Biogeography-based optimization (BBO) is an evolutionary optimization algorithm that uses migration to share information among candidate solutions. One limitation of BBO is that it changes only one independent variable at a time in each candidate solution. In this paper, a linearized version of BBO, called LBBO, is proposed to reduce rotational variance. The proposed method is combined with periodic re-initialization and local search operators to obtain an algorithm for global optimization in a continuous search space. Experiments have been conducted on 45 benchmarks from the 2005 and 2011 Congress on Evolutionary Computation, and LBBO performance is compared with the results published in those conferences. The results show that LBBO provides competitive performance with state-of-the-art evolutionary algorithms. In particular, LBBO performs particularly well for certain types of multimodal problems, including high-dimensional real-world problems. Also, LBBO is insensitive to whether or not the solution lies on the search domain boundary, in a wide or narrow basin, and within or outside the initialization domain

    Recent Advances in General Game Playing

    Get PDF
    The goal of General Game Playing (GGP) has been to develop computer programs that can perform well across various game types. It is natural for human game players to transfer knowledge from games they already know how to play to other similar games. GGP research attempts to design systems that work well across different game types, including unknown new games. In this review, we present a survey of recent advances (2011 to 2014) in GGP for both traditional games and video games. It is notable that research on GGP has been expanding into modern video games. Monte-Carlo Tree Search and its enhancements have been the most influential techniques in GGP for both research domains. Additionally, international competitions have become important events that promote and increase GGP research. Recently, a video GGP competition was launched. In this survey, we review recent progress in the most challenging research areas of Artificial Intelligence (AI) related to universal game playing

    Diversification and Intensification in Hybrid Metaheuristics for Constraint Satisfaction Problems

    Get PDF
    Metaheuristics are used to find feasible solutions to hard Combinatorial Optimization Problems (COPs). Constraint Satisfaction Problems (CSPs) may be formulated as COPs, where the objective is to reduce the number of violated constraints to zero. The popular puzzle Sudoku is an NP-complete problem that has been used to study the effectiveness of metaheuristics in solving CSPs. Applying the Simulated Annealing (SA) metaheuristic to Sudoku has been shown to be a successful method to solve CSPs. However, the ā€˜easy-hard-easyā€™ phase-transition behavior frequently attributed to a certain class of CSPs makes finding a solution extremely difficult in the hard phase because of the vast search space, the small number of solutions and a fitness landscape marked by many plateaus and local minima. Two key mechanisms that metaheuristics employ for searching are diversification and intensification. Diversification is the method of identifying diverse promising regions of the search space and is achieved through the process of heating/reheating. Intensification is the method of finding a solution in one of these promising regions and is achieved through the process of cooling. The hard phase area of the search terrain makes traversal without becoming trapped very challenging. Running the best available method - a Constraint Propagation/Depth-First Search algorithm - against 30,000 benchmark problem-instances, 20,240 remain unsolved after ten runs at one minute per run which we classify as very hard. This dissertation studies the delicate balance between diversification and intensification in the search process and offers a hybrid SA algorithm to solve very hard instances. The algorithm presents (a) a heating/reheating strategy that incorporates the lowest solution cost for diversification; (b) a more complex two-stage cooling schedule for faster intensification; (c) Constraint Programming (CP) hybridization to reduce the search space and to escape a local minimum; (d) a three-way swap, secondary neighborhood operator for a low expense method of diversification. These techniques are tested individually and in hybrid combinations for a total of 11 strategies, and the effectiveness of each is evaluated by percentage solved and average best run-time to solution. In the final analysis, all strategies are an improvement on current methods, but the most remarkable results come from the application of the ā€œQuick Resetā€ technique between cooling stages

    Hybrid biogeography-based evolutionary algorithms

    Get PDF
    Hybrid evolutionary algorithms (EAs) are effective optimization methods that combine multiple EAs. We propose several hybrid EAs by combining some recently-developed EAs with a biogeography-based hybridization strategy. We test our hybrid EAs on the continuous optimization benchmarks from the 2013 Congress on Evolutionary Computation (CEC) and on some real-world traveling salesman problems. The new hybrid EAs include two approaches to hybridization: (1) iteration-level hybridization, in which various EAs and BBO are executed in sequence; and (2) algorithm-level hybridization, which runs various EAs independently and then exchanges information between them using ideas from biogeography. Our empirical study shows that the new hybrid EAs significantly outperforms their constituent algorithms with the selected tuning parameters and generation limits, and algorithm-level hybridization is generally better than iteration-level hybridization. Results also show that the best new hybrid algorithm in this paper is competitive with the algorithms from the 2013 CEC competition. In addition, we show that the new hybrid EAs are generally robust to tuning parameters. In summary, the contribution of this paper is the introduction of biogeography-based hybridization strategies to the EA community

    A Memetic Algorithm for whole test suite generation

    Get PDF
    The generation of unit-level test cases for structural code coverage is a task well-suited to Genetic Algorithms. Method call sequences must be created that construct objects, put them into the right state and then execute uncovered code. However, the generation of primitive values, such as integers and doubles, characters that appear in strings, and arrays of primitive values, are not so straightforward. Often, small local changes are required to drive the value toward the one needed to execute some target structure. However, global searches like Genetic Algorithms tend to make larger changes that are not concentrated on any particular aspect of a test case. In this paper, we extend the Genetic Algorithm behind the EvoSuiTE test generation tool into a Memetic Algorithm, by equipping it with several local search operators. These operators are designed to efficiently optimize primitive values and other aspects of a test suite that allow the search for test cases to function more effectively. We evaluate our operators using a rigorous experimental methodology on over 12,000 Java classes, comprising open source classes of various different kinds, including numerical applications and text processors. Our study shows that increases in branch coverage of up to 53% are possible for an individual class in practice
    • ā€¦
    corecore