Search CORE

392 research outputs found

On linear genetic programming

Author: Brameier Markus
Publication venue
Publication date: 18/01/2005
Field of study

The thesis is about linear genetic programming (LGP), a machine learning approach that evolves computer programs as sequences of imperative instructions. Two fundamental differences to the more commontree-based variant (TGP) may be identified. These are the graph-based functional structure of linear genetic programs, on the one hand, and the existence of structurally noneffective code, on the other hand.The two major objectives of this work comprise(1) the development of more advanced methods and variation operators to produce better and more compact program solutions and (2) the analysis of general EA/GP phenomena in linear GP, including intron code, neutral variations, and code growth, among others.First, we introduce efficient algorithms for extracting features of the imperative and functional structure of linear genetic programs.In doing so, especially the detection and elimination of noneffective code during runtime will turn out as a powerful tool to accelerate the time-consuming step of fitness evaluation in GP.Variation operators are discussed systematically for the linear program representation. We will demonstrate that so called effective instruction mutations achieve the best performance in terms of solution quality.These mutations operate only on the (structurally) effective codeand restrict the mutation step size to one instruction.One possibility to further improve their performance is to explicitly increase the probability of neutral variations. As a second, more time-efficient alternative we explicitly controlthe mutation step size on the effective code (effective step size).Minimum steps do not allow more than one effective instruction to change its effectiveness status. That is, only a single node may beconnected to or disconnected from the effective graph component. It is an interesting phenomenon that, to some extent, the effective code becomes more robust against destructions over the generations already implicitly. A special concern of this thesis is to convince the reader that thereare some serious arguments for using a linear representation.In a crossover-based comparison LGP has been found superior to TGPover a set of benchmark problems. Furthermore, linear solutions turned out to be more compact than tree solutions due to (1) multiple usage of subgraph results and (2) implicit parsimony pressure by structurally noneffective code.The phenomenon of code growth is analyzed for different lineargenetic operators. When applying instruction mutations exclusivelyalmost only neutral variations may be held responsible for the emergence and propagation of intron code. It is noteworthy that linear geneticprograms may not grow if all neutral variation effects are rejected and if the variation step size is minimum.For the same reasons effective instruction mutations realize an implicit complexity control in linear GP which reduces a possible negative effect of code growth to a minimum.Another noteworthy result in this context is that program size is strongly increased by crossover while it is hardly influenced by mutation even if step sizes are not explicitly restricted. Finally, we investigate program teams as one possibility to increasethe dimension of genetic programs. It will be demonstrated that muchmore powerful solutions may be found by teams than by individuals. Moreover, the complexity of team solutions remains surprisingly small compared to individual programs. Both is the result of specialization and cooperation of team members

Eldorado - Ressourcen aus und für Lehre, Studium und Forschung

Evolving Teams of Multiple Predictors with Genetic Programming

Author: Banzhaf Wolfgang
Brameier Markus
Publication venue
Publication date: 29/10/2001
Field of study

This paper reports on the evolution of GP teams in different classiffication and regression problems and compares dierent methods for combining the outputs of the team programs. These include hybrid approaches where (1) a neural network is used to optimize the weights of programs in a team for a common decision and (2) a real-numbered vector of weights (the representation of evolution strategies) is evolved with each team in parallel. The cooperative team approach results in an improved training and generalization performance compared to the standard GP method. The higher computational overhead of coevolving several genetic programs is counteracted by using a fast variant of linear GP. In particular, the processing time of linear genetic programs is reduced signicantly by removing intron code before program execution

Eldorado - Ressourcen aus und für Lehre, Studium und Forschung

Automatic discovery of cross-family sequence features associated with protein function

Author: Brameier Markus
Haan Josien
Krings Andrea
MacCallum Robert M
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: Methods for predicting protein function directly from amino acid sequences are useful tools in the study of uncharacterised protein families and in comparative genomics. Until now, this problem has been approached using machine learning techniques that attempt to predict membership, or otherwise, to predefined functional categories or subcellular locations. A potential drawback of this approach is that the human-designated functional classes may not accurately reflect the underlying biology, and consequently important sequence-to-function relationships may be missed. RESULTS: We show that a self-supervised data mining approach is able to find relationships between sequence features and functional annotations. No preconceived ideas about functional categories are required, and the training data is simply a set of protein sequences and their UniProt/Swiss-Prot annotations. The main technical aspect of the approach is the co-evolution of amino acid-based regular expressions and keyword-based logical expressions with genetic programming. Our experiments on a strictly non-redundant set of eukaryotic proteins reveal that the strongest and most easily detected sequence-to-function relationships are concerned with targeting to various cellular compartments, which is an area already well studied both experimentally and computationally. Of more interest are a number of broad functional roles which can also be correlated with sequence features. These include inhibition, biosynthesis, transcription and defence against bacteria. Despite substantial overlaps between these functions and their corresponding cellular compartments, we find clear differences in the sequence motifs used to predict some of these functions. For example, the presence of polyglutamine repeats appears to be linked more strongly to the "transcription" function than to the general "nuclear" function/location. CONCLUSION: We have developed a novel and useful approach for knowledge discovery in annotated sequence data. The technique is able to identify functionally important sequence features and does not require expert knowledge. By viewing protein function from a sequence perspective, the approach is also suitable for discovering unexpected links between biological processes, such as the recently discovered role of ubiquitination in transcription

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Feature selection for modular GA-based classification

Author: Anand
Battiti
Brameier
Falco
Fangming Zhu
Gonzalez
Guan
Guan
Ishibuchi
Jenkins
Kwak
Lerner
Lu
Pal
Setiono
Setnes
Steven Guan
Verikas
Publication venue: 'Elsevier BV'
Publication date: 01/09/2004
Field of study

Genetic algorithms (GAs) have been used as conventional methods for classifiers to adaptively evolve solutions for classification problems. Feature selection plays an important role in finding relevant features in classification. In this paper, feature selection is explored with modular GA-based classification. A new feature selection technique, Relative Importance Factor (RIF), is proposed to find less relevant features in the input domain of each class module. By removing these features, it is aimed to reduce the classification error and dimensionality of classification problems. Benchmark classification data sets are used to evaluate the proposed approach. The experiment results show that RIF can be used to find less relevant features and help achieve lower classification error with the feature space dimension reduced

Crossref

Brunel University Research Archive

ScholarBank@NUS

A genome-wide survey for prion-regulated miRNAs associated with cholesterol homeostasis

Author: Ann-Christin Schmädicke
Dirk Motzkus
Hermann M Schätzl
Judith Montag
Markus Brameier
Sabine Gilch
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

Crossref

Springer - Publisher Connector

Genome-wide comparative analysis of microRNAs in three non-human primates

Author: A Nahvi
B Zhang
D Bartel
D Gusfield
E Berezikov
E Lai
I Hofacker
J Brown
J Hertel
J Nam
J Yue
L He
L Lim
M Brameier
M Legendre
M Saunders
M Weber
Markus Brameier
R Raaum
S Altschul
S Griffths-Jones
U Ohler
V Ambros
V Baev
X Wang
Y Altuvia
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background MicroRNAs (miRNAs) are negative regulators of gene expression in multicellular eukaryotes. With the recently completed sequencing of three primate genomes, the study of miRNA evolution within the primate lineage has only begun and may be expected to provide the genetic and molecular explanations for many phenotypic differences between human and non-human primates. Findings We scanned all three genomes of non-human primates, including chimpanzee (Pan troglodytes), orangutan (Pongo pygmaeus), and rhesus monkey (Macaca mulatta), for homologs of human miRNA genes. Besides sequence homology analysis, our comparative method relies on various postprocessing filters to verify other features of miRNAs, including, in particular, their precursor structure or their occurrence (prediction) in other primate genomes. Our study allows direct comparisons between the different species in terms of their miRNA repertoire, their evolutionary distance to human, the effects of filters, as well as the identification of common and species-specific miRNAs in the primate lineage. More than 500 novel putative miRNA genes have been discovered in orangutan that show at least 85 percent identity in precursor sequence. Only about 40 percent are found to be 100 percent identical with their human ortholog. Conclusion Homologs of human precursor miRNAs with perfect or near-perfect sequence identity may be considered to be likely functional in other primates. The computational identification of homologs with less similar sequence, instead, requires further evidence to be provided.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

GRO.publications (Univ. Göttingen)

Evolving DNA motifs to predict GeneChip probe performance

Author: AP Harrison
BJ Ross
DJ Montana
F Naef
GJ Upton
HG Beyer
JR Koza
M Brameier
M Brameier
M O'Neill
MA Stalteri
ML Wong
NJ Radcliff
PA Whigham
PA Whigham
RI McKay
T Barrett
T Bäck
T Handstad
WB Langdon
WB Langdon
WB Langdon
WB Langdon
WB Langdon
WB Langdon
WB Langdon
WB Langdon
WB Langdon
WB Langdon
WB Langdon
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

Background: Affymetrix High Density Oligonuclotide Arrays (HDONA) simultaneously measure expression of thousands of genes using millions of probes. We use correlations between measurements for the same gene across 6685 human tissue samples from NCBI's GEO database to indicated the quality of individual HG-U133A probes. Low correlation indicates a poor probe. Results: Regular expressions can be automatically created from a Backus-Naur form (BNF) context-free grammar using strongly typed genetic programming. Conclusion: The automatically produced motif is better at predicting poor DNA sequences than an existing human generated RE, suggesting runs of Cytosine and Guanine and mixtures should all be avoided. © 2009 Langdon and Harrison; licensee BioMed Central Ltd

University of Essex Research Repository

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

UCL Discovery

PubMed Central

FunSimMat: a comprehensive functional similarity database

Author: A. Schlicker
Brameier
Camon
Dowell
Draghici
Finn
Franke
Freudenberg
Froehlich
Gene Ontology Consortium
Letunic
Lin
Liu
Lord
Lu
M. Albrecht
Perez-Iratxeta
Pu
Ramirez
Rossi
Schlicker
Schlicker
Schlicker
Sen
Suthram
Wu
Zhang
Publication venue: Oxford University Press
Publication date: 01/01/2008
Field of study

Functional similarity based on Gene Ontology (GO) annotation is used in diverse applications like gene clustering, gene expression data analysis, protein interaction prediction and evaluation. However, there exists no comprehensive resource of functional similarity values although such a database would facilitate the use of functional similarity measures in different applications. Here, we describe FunSimMat (Functional Similarity Matrix, http://funsimmat.bioinf.mpi-inf.mpg.de/), a large new database that provides several different semantic similarity measures for GO terms. It offers various precomputed functional similarity values for proteins contained in UniProtKB and for protein families in Pfam and SMART. The web interface allows users to efficiently perform both semantic similarity searches with GO terms and functional similarity searches with proteins or protein families. All results can be downloaded in tab-delimited files for use with other tools. An additional XML–RPC interface gives automatic online access to FunSimMat for programs and remote services

Crossref

PubMed Central

MPG.PuRe

Mitogenomic phylogeny of the common long-tailed macaque (Macaca fascicularis fascicularis)

Author: Abdul-Latiff Muhammad AB
Abdul-Patah Pazil
Ampeng Ahmad
Brameier Markus
Böker Kai O
Kolleck Jakob
Lakim Maklarin
Liedigk Rasmus
Md-Zain Badrul Munir
Meijaard Erik
Roos Christian
Tosi Anthony J
Zinner Dietmar
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 29/11/2018
Field of study

Background Long-tailed macaques (Macaca fascicularis) are an important model species in biomedical research and reliable knowledge about their evolutionary history is essential for biomedical inferences. Ten subspecies have been recognized, of which most are restricted to small islands of Southeast Asia. In contrast, the common long-tailed macaque (M. f. fascicularis) is distributed over large parts of the Southeast Asian mainland and the Sundaland region. To shed more light on the phylogeny of M. f. fascicularis, we sequenced complete mitochondrial (mtDNA) genomes of 40 individuals from all over the taxon’s range, either by classical PCR-amplification and Sanger sequencing or by DNA-capture and high-throughput sequencing. Results Both laboratory approaches yielded complete mtDNA genomes from M. f. fascicularis with high accuracy and/or coverage. According to our phylogenetic reconstructions, M. f. fascicularis initially diverged into two clades 1.70 million years ago (Ma), with one including haplotypes from mainland Southeast Asia, the Malay Peninsula and North Sumatra (Clade A) and the other, haplotypes from the islands of Bangka, Java, Borneo, Timor, and the Philippines (Clade B). The three geographical populations of Clade A appear as paraphyletic groups, while local populations of Clade B form monophyletic clades with the exception of a Philippine individual which is nested within the Borneo clade. Further, in Clade B the branching pattern among main clades/lineages remains largely unresolved, most likely due to their relatively rapid diversification 0.93-0.84 Ma. Conclusions Both laboratory methods have proven to be powerful to generate complete mtDNA genome data with similarly high accuracy, with the DNA-capture and high-throughput sequencing approach as the most promising and only practical option to obtain such data from highly degraded DNA, in time and with relatively low costs. The application of complete mtDNA genomes yields new insights into the evolutionary history of M. f. fascicularis by providing a more robust phylogeny and more reliable divergence age estimations than earlier studies

The Australian National University

Ab initio identification of human microRNAs based on structure motifs

Author: A Rodriguez
A Sewer
C Xue
Carsten Wiuf
D Bartel
D Gusfield
E Berezikov
E Bonnet
E Lai
I Bentwich
I Hofacker
I Hofacker
J Han
J Krol
J Nam
L He
L Lim
L Lim
M Brameier
M Legendre
M Weber
Markus Brameier
P Jiang
P Saetrom
S Altschul
S Baskerville
S Griffiths-Jones
S Helvik
S Kwang Loong
S Ying
T Gingeras
U Ohler
V Ambros
W Ritchie
X Wang
Y Altuvia
Y Grad
Y Zeng
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Abstract Background MicroRNAs (miRNAs) are short, non-coding RNA molecules that are directly involved in post-transcriptional regulation of gene expression. The mature miRNA sequence binds to more or less specific target sites on the mRNA. Both their small size and sequence specificity make the detection of completely new miRNAs a challenging task. This cannot be based on sequence information alone, but requires structure information about the miRNA precursor. Unlike comparative genomics approaches, <it>ab initio </it>approaches are able to discover species-specific miRNAs without known sequence homology. Results MiRPred is a novel method for <it>ab initio </it>prediction of miRNAs by genome scanning that only relies on (predicted) secondary structure to distinguish miRNA precursors from other similar-sized segments of the human genome. We apply a machine learning technique, called linear genetic programming, to develop special classifier programs which include multiple regular expressions (motifs) matched against the secondary structure sequence. Special attention is paid to scanning issues. The classifiers are trained on fixed-length sequences as these occur when shifting a window in regular steps over a genome region. Various statistical and empirical evidence is collected to validate the correctness of and increase confidence in the predicted structures. Among other things, we propose a new criterion to select miRNA candidates with a higher stability of folding that is based on the number of matching windows around their genome location. An ensemble of 16 motif-based classifiers achieves 99.9 percent specificity with sensitivity remaining on an acceptable high level when requiring all classifiers to agree on a positive decision. A low false positive rate is considered more important than a low false negative rate, when searching larger genome regions for unknown miRNAs. 117 new miRNAs have been predicted close to known miRNAs on human chromosome 19. All candidate structures match the free energy distribution of miRNA precursors which is significantly shifted towards lower free energies. We employed a human EST library and found that around 75 percent of the candidate sequences are likely to be transcribed, with around 35 percent located in introns. Conclusion Our motif finding method is at least competitive to state-of-the-art feature-based methods for <it>ab initio </it>miRNA discovery. In doing so, it requires less previous knowledge about miRNA precursor structures while programs and motifs allow a more straightforward interpretation and extraction of the acquired knowledge.</p

Crossref

Springer

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central