72,266 research outputs found
Machine learning-guided directed evolution for protein engineering
Machine learning (ML)-guided directed evolution is a new paradigm for
biological design that enables optimization of complex functions. ML methods
use data to predict how sequence maps to function without requiring a detailed
model of the underlying physics or biological pathways. To demonstrate
ML-guided directed evolution, we introduce the steps required to build ML
sequence-function models and use them to guide engineering, making
recommendations at each stage. This review covers basic concepts relevant to
using ML for protein engineering as well as the current literature and
applications of this new engineering paradigm. ML methods accelerate directed
evolution by learning from information contained in all measured variants and
using that information to select sequences that are likely to be improved. We
then provide two case studies that demonstrate the ML-guided directed evolution
process. We also look to future opportunities where ML will enable discovery of
new protein functions and uncover the relationship between protein sequence and
function.Comment: Made significant revisions to focus on aspects most relevant to
applying machine learning to speed up directed evolutio
Deriving Models for Software Project Effort Estimation By Means of Genetic Programming
Software engineering, effort estimation, genetic programming, symbolic regression. This paper presents the application of a computational intelligence methodology in effort estimation for software projects. Namely, we apply a genetic programming model for symbolic regression; aiming to produce mathematical expressions that (1) are highly accurate and (2) can be used for estimating the development effort by revealing relationships between the project’s features and the required work. We selected to investigate the effectiveness of this methodology into two software engineering domains. The system was proved able to generate models in the form of handy mathematical expressions that are more accurate than those found in literature.
DeepSQLi: Deep Semantic Learning for Testing SQL Injection
Security is unarguably the most serious concern for Web applications, to
which SQL injection (SQLi) attack is one of the most devastating attacks.
Automatically testing SQLi vulnerabilities is of ultimate importance, yet is
unfortunately far from trivial to implement. This is because the existence of a
huge, or potentially infinite, number of variants and semantic possibilities of
SQL leading to SQLi attacks on various Web applications. In this paper, we
propose a deep natural language processing based tool, dubbed DeepSQLi, to
generate test cases for detecting SQLi vulnerabilities. Through adopting deep
learning based neural language model and sequence of words prediction, DeepSQLi
is equipped with the ability to learn the semantic knowledge embedded in SQLi
attacks, allowing it to translate user inputs (or a test case) into a new test
case, which is semantically related and potentially more sophisticated.
Experiments are conducted to compare DeepSQLi with SQLmap, a state-of-the-art
SQLi testing automation tool, on six real-world Web applications that are of
different scales, characteristics and domains. Empirical results demonstrate
the effectiveness and the remarkable superiority of DeepSQLi over SQLmap, such
that more SQLi vulnerabilities can be identified by using a less number of test
cases, whilst running much faster
Connectionist Theory Refinement: Genetically Searching the Space of Network Topologies
An algorithm that learns from a set of examples should ideally be able to
exploit the available resources of (a) abundant computing power and (b)
domain-specific knowledge to improve its ability to generalize. Connectionist
theory-refinement systems, which use background knowledge to select a neural
network's topology and initial weights, have proven to be effective at
exploiting domain-specific knowledge; however, most do not exploit available
computing power. This weakness occurs because they lack the ability to refine
the topology of the neural networks they produce, thereby limiting
generalization, especially when given impoverished domain theories. We present
the REGENT algorithm which uses (a) domain-specific knowledge to help create an
initial population of knowledge-based neural networks and (b) genetic operators
of crossover and mutation (specifically designed for knowledge-based networks)
to continually search for better network topologies. Experiments on three
real-world domains indicate that our new algorithm is able to significantly
increase generalization compared to a standard connectionist theory-refinement
system, as well as our previous algorithm for growing knowledge-based networks.Comment: See http://www.jair.org/ for any accompanying file
- …