Search CORE

97 research outputs found

Computational Molecular Biology

Author: Lenhof H.
Mutzel P.
Vingron M.
Publication venue: Max-Planck-Institut für Informatik
Publication date: 01/01/1996
Field of study

Computational Biology is a fairly new subject that arose in response to the computational problems posed by the analysis and the processing of biomolecular sequence and structure data. The field was initiated in the late 60's and early 70's largely by pioneers working in the life sciences. Physicists and mathematicians entered the field in the 70's and 80's, while Computer Science became involved with the new biological problems in the late 1980's. Computational problems have gained further importance in molecular biology through the various genome projects which produce enormous amounts of data. For this bibliography we focus on those areas of computational molecular biology that involve discrete algorithms or discrete optimization. We thus neglect several other areas of computational molecular biology, like most of the literature on the protein folding problem, as well as databases for molecular and genetic data, and genetic mapping algorithms. Due to the availability of review papers and a bibliography this bibliography

MPG.PuRe

MTFuzz: Fuzzing with a Multi-Task Neural Network

Author: Abadi Martín
Blazytko Tim
Böhme Marcel
Cadar Cristian
Caruana Rich
Chen Xi
Dolan-Gavitt B.
Finn Chelsea
Gan Shuitao
Godefroid Patrice
Lemieux Caroline
Long Mingsheng
McMinn Phil
Mihalkova Lilyana
Pratt Lorien Y.
She Dongdong
Wang Jinghan
You Wei Zhen
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 11/09/2020
Field of study

Fuzzing is a widely used technique for detecting software bugs and vulnerabilities. Most popular fuzzers generate new inputs using an evolutionary search to maximize code coverage. Essentially, these fuzzers start with a set of seed inputs, mutate them to generate new inputs, and identify the promising inputs using an evolutionary fitness function for further mutation. Despite their success, evolutionary fuzzers tend to get stuck in long sequences of unproductive mutations. In recent years, machine learning (ML) based mutation strategies have reported promising results. However, the existing ML-based fuzzers are limited by the lack of quality and diversity of the training data. As the input space of the target programs is high dimensional and sparse, it is prohibitively expensive to collect many diverse samples demonstrating successful and unsuccessful mutations to train the model. In this paper, we address these issues by using a Multi-Task Neural Network that can learn a compact embedding of the input space based on diverse training samples for multiple related tasks (i.e., predicting for different types of coverage). The compact embedding can guide the mutation process by focusing most of the mutations on the parts of the embedding where the gradient is high. \tool uncovers

11

previously unseen bugs and achieves an average of

2\times

more edge coverage compared with 5 state-of-the-art fuzzer on 10 real-world programs.Comment: ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE) 202

arXiv.org e-Print Archive

Crossref

Using evolutionary algorithms for the unit testing of object-oriented software

Author
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2005
Field of study

Crossref

Evolving Code with A Large Language Model

Author: Hemberg Erik
Moskal Stephen
O'Reilly Una-May
Publication venue
Publication date: 13/01/2024
Field of study

Algorithms that use Large Language Models (LLMs) to evolve code arrived on the Genetic Programming (GP) scene very recently. We present LLM GP, a formalized LLM-based evolutionary algorithm designed to evolve code. Like GP, it uses evolutionary operators, but its designs and implementations of those operators radically differ from GP's because they enlist an LLM, using prompting and the LLM's pre-trained pattern matching and sequence completion capability. We also present a demonstration-level variant of LLM GP and share its code. By addressing algorithms that range from the formal to hands-on, we cover design and LLM-usage considerations as well as the scientific challenges that arise when using an LLM for genetic programming.Comment: 34 pages, 9 figures, 6 Table

arXiv.org e-Print Archive

Fine-grained annotation and classification of de novo predicted LTR retrotransposons

Author: Abrusán
Altschul
Bartolome
Bergman
Bergman
Biémont
Chan
Durbin
Eddy
Eilbeck
Ellinghaus
Feschotte
Finn
Finnegan
Gordon Gremme
Gremme
Havecker
Hubbard
Jern
Jurka
Kalyanaraman
Kaminker
Kohany
Llorens
Lowe
Mak
Maksakova
Marquet
McCarthy
McCarthy
Rho
Rice
Sascha Steinbiss
Slotkin
Smith
Sperber
Stefan Kurtz
Steinbiss
Tweedie
Ute Willhoeft
Vogt
Wicker
Wilhelm
Wilhelm
Wilhelm
Xu
Publication venue: Oxford University Press
Publication date: 01/01/2009
Field of study

Long terminal repeat (LTR) retrotransposons and endogenous retroviruses (ERVs) are transposable elements in eukaryotic genomes well suited for computational identification. De novo identification tools determine the position of potential LTR retrotransposon or ERV insertions in genomic sequences. For further analysis, it is desirable to obtain an annotation of the internal structure of such candidates. This article presents LTRdigest, a novel software tool for automated annotation of internal features of putative LTR retrotransposons. It uses local alignment and hidden Markov model-based algorithms to detect retrotransposon-associated protein domains as well as primer binding sites and polypurine tracts. As an example, we used LTRdigest results to identify 88 (near) full-length ERVs in the chromosome 4 sequence of Mus musculus, separating them from truncated insertions and other repeats. Furthermore, we propose a work flow for the use of LTRdigest in de novo LTR retrotransposon classification and perform an exemplary de novo analysis on the Drosophila melanogaster genome as a proof of concept. Using a new method solely based on the annotations generated by LTRdigest, 518 potential LTR retrotransposons were automatically assigned to 62 candidate groups. Representative sequences from 41 of these 62 groups were matched to reference sequences with >80% global sequence similarity

CiteSeerX

Crossref

PubMed Central

Linearized biogeography-based optimization with re-initialization and local search

Author: Clerc Maurice
Omran Mahamed G. H.
Simon Dan
Publication venue: EngagedScholarship@CSU
Publication date: 01/01/2013
Field of study

Biogeography-based optimization (BBO) is an evolutionary optimization algorithm that uses migration to share information among candidate solutions. One limitation of BBO is that it changes only one independent variable at a time in each candidate solution. In this paper, a linearized version of BBO, called LBBO, is proposed to reduce rotational variance. The proposed method is combined with periodic re-initialization and local search operators to obtain an algorithm for global optimization in a continuous search space. Experiments have been conducted on 45 benchmarks from the 2005 and 2011 Congress on Evolutionary Computation, and LBBO performance is compared with the results published in those conferences. The results show that LBBO provides competitive performance with state-of-the-art evolutionary algorithms. In particular, LBBO performs particularly well for certain types of multimodal problems, including high-dimensional real-world problems. Also, LBBO is insensitive to whether or not the solution lies on the search domain boundary, in a wide or narrow basin, and within or outside the initialization domain

CiteSeerX

Crossref

Cleveland-Marshall College of Law

Recent Advances in General Game Playing

Author: HyunSoo Park
Jacek Mańdziuk
Kyung-Joong Kim
Maciej Świechowski
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2015
Field of study

The goal of General Game Playing (GGP) has been to develop computer programs that can perform well across various game types. It is natural for human game players to transfer knowledge from games they already know how to play to other similar games. GGP research attempts to design systems that work well across different game types, including unknown new games. In this review, we present a survey of recent advances (2011 to 2014) in GGP for both traditional games and video games. It is notable that research on GGP has been expanding into modern video games. Monte-Carlo Tree Search and its enhancements have been the most influential techniques in GGP for both research domains. Additionally, international competitions have become important events that promote and increase GGP research. Recently, a video GGP competition was launched. In this survey, we review recent progress in the most challenging research areas of Artificial Intelligence (AI) related to universal game playing

Crossref

Directory of Open Access Journals

Diversification and Intensification in Hybrid Metaheuristics for Constraint Satisfaction Problems

Author: Lynden John M.
Publication venue: NSUWorks
Publication date: 01/01/2019
Field of study

Metaheuristics are used to find feasible solutions to hard Combinatorial Optimization Problems (COPs). Constraint Satisfaction Problems (CSPs) may be formulated as COPs, where the objective is to reduce the number of violated constraints to zero. The popular puzzle Sudoku is an NP-complete problem that has been used to study the effectiveness of metaheuristics in solving CSPs. Applying the Simulated Annealing (SA) metaheuristic to Sudoku has been shown to be a successful method to solve CSPs. However, the ‘easy-hard-easy’ phase-transition behavior frequently attributed to a certain class of CSPs makes finding a solution extremely difficult in the hard phase because of the vast search space, the small number of solutions and a fitness landscape marked by many plateaus and local minima. Two key mechanisms that metaheuristics employ for searching are diversification and intensification. Diversification is the method of identifying diverse promising regions of the search space and is achieved through the process of heating/reheating. Intensification is the method of finding a solution in one of these promising regions and is achieved through the process of cooling. The hard phase area of the search terrain makes traversal without becoming trapped very challenging. Running the best available method - a Constraint Propagation/Depth-First Search algorithm - against 30,000 benchmark problem-instances, 20,240 remain unsolved after ten runs at one minute per run which we classify as very hard. This dissertation studies the delicate balance between diversification and intensification in the search process and offers a hybrid SA algorithm to solve very hard instances. The algorithm presents (a) a heating/reheating strategy that incorporates the lowest solution cost for diversification; (b) a more complex two-stage cooling schedule for faster intensification; (c) Constraint Programming (CP) hybridization to reduce the search space and to escape a local minimum; (d) a three-way swap, secondary neighborhood operator for a low expense method of diversification. These techniques are tested individually and in hybrid combinations for a total of 11 strategies, and the effectiveness of each is evaluated by percentage solved and average best run-time to solution. In the final analysis, all strategies are an improvement on current methods, but the most remarkable results come from the application of the “Quick Reset” technique between cooling stages

NSU Works

Hybrid biogeography-based evolutionary algorithms

Author: Chen Zixiang
Fei Minrui
Ma Haiping
Shu Xinzhan
Simon Dan
Publication venue: EngagedScholarship@CSU
Publication date: 01/04/2014
Field of study

Hybrid evolutionary algorithms (EAs) are effective optimization methods that combine multiple EAs. We propose several hybrid EAs by combining some recently-developed EAs with a biogeography-based hybridization strategy. We test our hybrid EAs on the continuous optimization benchmarks from the 2013 Congress on Evolutionary Computation (CEC) and on some real-world traveling salesman problems. The new hybrid EAs include two approaches to hybridization: (1) iteration-level hybridization, in which various EAs and BBO are executed in sequence; and (2) algorithm-level hybridization, which runs various EAs independently and then exchanges information between them using ideas from biogeography. Our empirical study shows that the new hybrid EAs significantly outperforms their constituent algorithms with the selected tuning parameters and generation limits, and algorithm-level hybridization is generally better than iteration-level hybridization. Results also show that the best new hybrid algorithm in this paper is competitive with the algorithms from the 2013 CEC competition. In addition, we show that the new hybrid EAs are generally robust to tuning parameters. In summary, the contribution of this paper is the introduction of biogeography-based hybridization strategies to the EA community

Crossref

Cleveland-Marshall College of Law

A Memetic Algorithm for whole test suite generation

Author: Arcuri A.
Fraser G.
McMinn P.
Publication venue: 'Elsevier BV'
Publication date: 28/05/2014
Field of study

The generation of unit-level test cases for structural code coverage is a task well-suited to Genetic Algorithms. Method call sequences must be created that construct objects, put them into the right state and then execute uncovered code. However, the generation of primitive values, such as integers and doubles, characters that appear in strings, and arrays of primitive values, are not so straightforward. Often, small local changes are required to drive the value toward the one needed to execute some target structure. However, global searches like Genetic Algorithms tend to make larger changes that are not concentrated on any particular aspect of a test case. In this paper, we extend the Genetic Algorithm behind the EvoSuiTE test generation tool into a Memetic Algorithm, by equipping it with several local search operators. These operators are designed to efficiently optimize primitive values and other aspects of a test suite that allow the search for test cases to function more effectively. We evaluate our operators using a rigorous experimental methodology on over 12,000 Java classes, comprising open source classes of various different kinds, including numerical applications and text processors. Our study shows that increases in branch coverage of up to 53% are possible for an individual class in practice

CiteSeerX

Elsevier - Publisher Connector

Crossref

White Rose Research Online