3,355 research outputs found
Derivation of Context-free Stochastic L-Grammar Rules for Promoter Sequence Modeling Using Support Vector Machine
Formal grammars can used for describing complex repeatable structures such as DNA sequences. In
this paper, we describe the structural composition of DNA sequences using a context-free stochastic L-grammar.
L-grammars are a special class of parallel grammars that can model the growth of living organisms, e.g. plant
development, and model the morphology of a variety of organisms. We believe that parallel grammars also can
be used for modeling genetic mechanisms and sequences such as promoters. Promoters are short regulatory
DNA sequences located upstream of a gene. Detection of promoters in DNA sequences is important for
successful gene prediction. Promoters can be recognized by certain patterns that are conserved within a species,
but there are many exceptions which makes the promoter recognition a complex problem. We replace the
problem of promoter recognition by induction of context-free stochastic L-grammar rules, which are later used for
the structural analysis of promoter sequences. L-grammar rules are derived automatically from the drosophila and
vertebrate promoter datasets using a genetic programming technique and their fitness is evaluated using a
Support Vector Machine (SVM) classifier. The artificial promoter sequences generated using the derived L-
grammar rules are analyzed and compared with natural promoter sequences
Decision Making in the Medical Domain: Comparing the Effectiveness of GP-Generated Fuzzy Intelligent Structures
ABSTRACT: In this work, we examine the effectiveness of two intelligent models in medical domains. Namely, we apply grammar-guided genetic programming to produce fuzzy intelligent structures, such as fuzzy rule-based systems and fuzzy Petri nets, in medical data mining tasks. First, we use two context-free grammars to describe fuzzy rule-based systems and fuzzy Petri nets with genetic programming. Then, we apply cellular encoding in order to express the fuzzy Petri nets with arbitrary size and topology. The models are examined thoroughly in four real-world medical data sets. Results are presented in detail and the competitive advantages and drawbacks of the selected methodologies are discussed, in respect to the nature of each application domain. Conclusions are drawn on the effectiveness and efficiency of the presented approach
Interpretable Categorization of Heterogeneous Time Series Data
Understanding heterogeneous multivariate time series data is important in
many applications ranging from smart homes to aviation. Learning models of
heterogeneous multivariate time series that are also human-interpretable is
challenging and not adequately addressed by the existing literature. We propose
grammar-based decision trees (GBDTs) and an algorithm for learning them. GBDTs
extend decision trees with a grammar framework. Logical expressions derived
from a context-free grammar are used for branching in place of simple
thresholds on attributes. The added expressivity enables support for a wide
range of data types while retaining the interpretability of decision trees. In
particular, when a grammar based on temporal logic is used, we show that GBDTs
can be used for the interpretable classi cation of high-dimensional and
heterogeneous time series data. Furthermore, we show how GBDTs can also be used
for categorization, which is a combination of clustering and generating
interpretable explanations for each cluster. We apply GBDTs to analyze the
classic Australian Sign Language dataset as well as data on near mid-air
collisions (NMACs). The NMAC data comes from aircraft simulations used in the
development of the next-generation Airborne Collision Avoidance System (ACAS
X).Comment: 9 pages, 5 figures, 2 tables, SIAM International Conference on Data
Mining (SDM) 201
Automated DNA Motif Discovery
Ensembl's human non-coding and protein coding genes are used to automatically
find DNA pattern motifs. The Backus-Naur form (BNF) grammar for regular
expressions (RE) is used by genetic programming to ensure the generated strings
are legal. The evolved motif suggests the presence of Thymine followed by one
or more Adenines etc. early in transcripts indicate a non-protein coding gene.
Keywords: pseudogene, short and microRNAs, non-coding transcripts, systems
biology, machine learning, Bioinformatics, motif, regular expression, strongly
typed genetic programming, context-free grammar.Comment: 12 pages, 2 figure
Learning to solve planning problems efficiently by means of genetic programming
Declarative problem solving, such as planning, poses interesting challenges for Genetic Programming (GP). There have been recent attempts to apply GP to planning that fit two approaches: (a) using GP to search in plan space or (b) to evolve a planner. In this article, we propose to evolve only the heuristics to make a particular planner more efficient. This approach is more feasible than (b) because it does not have to build a planner from scratch but can take advantage of already existing planning systems. It is also more efficient than (a) because once the heuristics have been evolved, they can be used to solve a whole class of different planning problems in a planning domain, instead of running GP for every new planning problem. Empirical results show that our approach (EVOCK) is able to evolve heuristics in two planning domains (the blocks world and the logistics domain) that improve PRODIGY4.0 performance. Additionally, we experiment with a new genetic operator - Instance-Based Crossover - that is able to use traces of the base planner as raw genetic material to be injected into the evolving population.Publicad
Genetic Algorithm for Grammar Induction and Rules Verification through a PDA Simulator
The focus of this paper is towards developing a grammatical inference system uses a genetic algorithm (GA), has a powerful global exploration capability that can exploit the optimum offspring. The implemented system runs in two phases: first, generation of grammar rules and verification and then applies the GA’s operation to optimize the rules. A pushdown automata simulator has been developed, which parse the training data over the grammar’s rules. An inverted mutation with random mask and then ‘XOR’ operator has been applied introduces diversity in the population, helps the GA not to get trapped at local optimum. Taguchi method has been incorporated to tune the parameters makes the proposed approach more robust, statistically sound and quickly convergent. The performance of the proposed system has been compared with: classical GA, random offspring GA and crowding algorithms. Overall, a grammatical inference system has been developed that employs a PDA simulator for verification
Generating networks of genetic processors
[EN] The Networks of Genetic Processors (NGPs) are non-conventional models of computation based on genetic operations over strings, namely mutation and crossover operations as it was established in genetic algorithms. Initially, they have been proposed as acceptor machines which are decision problem solvers. In that case, it has been shown that they are universal computing models equivalent to Turing machines. In this work, we propose NGPs as enumeration devices and we analyze their computational power. First, we define the model and we propose its definition as parallel genetic algorithms. Once the correspondence between the two formalisms has been established, we carry out a study of the generation capacity of the NGPs under the research framework of the theory of formal languages. We investigate the relationships between the number of processors of the model and its generative power. Our results show that the number of processors is important to increase the generative capability of the model up to an upper bound, and that NGPs are universal models of computation if they are formulated as generation devices. This allows us to affirm that parallel genetic algorithms working under certain restrictions can be considered equivalent to Turing machines and, therefore, they are universal models of computation.This research was partially supported by TAILOR, a project funded by EU Horizon 2020 research and innovation programme under GA No 952215.Campos Frances, M.; Sempere Luna, JM. (2022). Generating networks of genetic processors. Genetic Programming and Evolvable Machines. 23(1):133-155. https://doi.org/10.1007/s10710-021-09423-713315523
Towards automatic extraction of definitions
Definition extraction can be useful for the creation of glossaries and in question answering systems. It is a tedious task to extract such sentences manually, and thus an automatic system is desirable. In this work we review various attempts at rule-based approaches reported in the literature and discuss their results. We also propose a novel experiment involving the use of genetic programming and genetic algorithms, aimed at assisting the discovery of grammar rules which can be used for the task of definition extraction.peer-reviewe
Attribute grammar evolution
The final publication is available at Springer via http://dx.doi.org/10.1007/11499305_19Proceedings of First International Work-Conference on the Interplay Between Natural and Artificial Computation, IWINAC 2005, Las Palmas, Canary Islands, Spain, June 15-18, 2005This paper describes Attribute Grammar Evolution (AGE), a new Automatic Evolutionary Programming algorithm that extends standard Grammar Evolution (GE) by replacing context-free grammars by attribute grammars. GE only takes into account syntactic restrictions to generate valid individuals. AGE adds semantics to ensure that both semantically and syntactically valid individuals are generated. Attribute grammars make it possible to semantically describe the solution. The paper shows empirically that AGE is as good as GE for a classical problem, and proves that including semantics in the grammar can improve GE performance. An important conclusion is that adding too much semantics
can make the search difficult
- …