7 research outputs found
Transcription factor binding site prediction with multivariate gene expression data
Multi-sample microarray experiments have become a standard experimental
method for studying biological systems. A frequent goal in such studies is to
unravel the regulatory relationships between genes. During the last few years,
regression models have been proposed for the de novo discovery of cis-acting
regulatory sequences using gene expression data. However, when applied to
multi-sample experiments, existing regression based methods model each
individual sample separately. To better capture the dynamic relationships in
multi-sample microarray experiments, we propose a flexible method for the joint
modeling of promoter sequence and multivariate expression data. In higher order
eukaryotic genomes expression regulation usually involves combinatorial
interaction between several transcription factors. Experiments have shown that
spacing between transcription factor binding sites can significantly affect
their strength in activating gene expression. We propose an adaptive model
building procedure to capture such spacing dependent cis-acting regulatory
modules. We apply our methods to the analysis of microarray time-course
experiments in yeast and in Arabidopsis. These experiments exhibit very
different dynamic temporal relationships. For both data sets, we have found all
of the well-known cis-acting regulatory elements in the related context, as
well as being able to predict novel elements.Comment: Published in at http://dx.doi.org/10.1214/10.1214/07-AOAS142 the
Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute
of Mathematical Statistics (http://www.imstat.org
Genetic algorithm-neural network: feature extraction for bioinformatics data.
With the advance of gene expression data in the bioinformatics field, the questions which frequently arise,
for both computer and medical scientists, are which genes are significantly involved in discriminating cancer
classes and which genes are significant with respect to a specific cancer pathology. Numerous computational analysis models have been developed to identify informative genes from the microarray data, however, the integrity of the reported genes is still uncertain. This is mainly due to the
misconception of the objectives of microarray study. Furthermore, the application of various preprocessing
techniques in the microarray data has jeopardised the quality of the microarray data. As a result, the
integrity of the findings has been compromised by the improper use of techniques and the ill-conceived
objectives of the study. This research proposes an innovative hybridised model based on genetic algorithms (GAs) and artificial neural networks (ANNs), to extract the highly differentially expressed genes for a specific cancer pathology. The proposed method can efficiently extract the informative genes from the original data set and this has
reduced the gene variability errors incurred by the preprocessing techniques. The novelty of the research comes from two perspectives. Firstly, the research emphasises on extracting informative features from a high dimensional and highly complex data set, rather than to improve classification results. Secondly, the use of ANN to compute the fitness function of GA which is rare in the context
of feature extraction. Two benchmark microarray data have been taken to research the prominent genes expressed in the tumour development and the results show that the genes respond to different stages of tumourigenesis (i.e. different fitness precision levels) which may be useful for early malignancy detection. The extraction ability of the
proposed model is validated based on the expected results in the synthetic data sets. In addition, two bioassay data have been used to examine the efficiency of the proposed model to extract significant features from the large, imbalanced and multiple data representation bioassay data
MAPK related phenotypic decisions in yeast
The evolutionarily conserved mitogen-activated protein kinase (MAPK) network is a signalling module, which enables the coordination and processing of various extracellular stimuli. It thereby guarantees a specific biological response to a precise dose of a given stimuli. The haploid yeast Saccharomyces cerevisiae uses this network to select particular mating partners by quantitatively interpreting the pheromone concentration gradient generated by potential mates. Activation of the mating MAPK module occurs only above a certain pheromone concentration threshold and relies on the pheromone-induced recruitment of a protein complex consisting of the scaffold Ste5 and the MAPKs Ste11, Ste7 and Fus3. This module ensures a robust morphological response in form of a mating projection to a defined pheromone concentration and allows cells to gauge the distance to a potential mating partner. Yet it remains unclear which of the module’s features interpret the pheromone concentration to decide when and where to generate a mating projection.
To infer the network structure of the mating MAPK module, we developed a reverse engineering approach, which is based on the detection of pheromone response dependent changes in protein complex abundances. Interactions within the MAPK module were measured by fluorescence correlation spectroscopy (FCS), of which all possible protein-species in the MAPK module were resolved by applying a linear regression analysis (LRA). Using this approach, we were able to identify a cytosolic kinase-substrate interaction between Fus3 and the upstream Ste11, which constitutes a hitherto uncharacterized negative feedback. It affects the readout of the pheromone gradient and provides robustness to changes in the components involved in the loop. This negative feedback occurs by phosphorylation of S243 on Ste11 that hinders its binding to the scaffold Ste5 and thereby uncouples Ste11 from Fus3 activity. Controlling this mechanism provides ultrasensitivity at the first step of the MAPK cascade, as part of the hierarchical cascade arrangement, ensures a switch-like mating response and triggers shmoo formation at the right distance to a partner. This cytoplasmic feedback has a spatial component that confines the cytoplasmic Fus3 phosphorylation gradient. It thereby generates and maintains a localized source of active Fus3 at the mating tip, which in turn spatially restricts shmoo formation.
This work shows how a network motif in the MAPK module enables the interpretation of the pheromone concentration gradient to sense potential mates, and how this extracellular gradient is translated into an intracellular activity gradient by spatial control of signalling, ultimately deciding both when and where to respond
Genetic algorithm-neural network : feature extraction for bioinformatics data
With the advance of gene expression data in the bioinformatics field, the questions which frequently arise, for both computer and medical scientists, are which genes are significantly involved in discriminating cancer classes and which genes are significant with respect to a specific cancer pathology. Numerous computational analysis models have been developed to identify informative genes from the microarray data, however, the integrity of the reported genes is still uncertain. This is mainly due to the misconception of the objectives of microarray study. Furthermore, the application of various preprocessing techniques in the microarray data has jeopardised the quality of the microarray data. As a result, the integrity of the findings has been compromised by the improper use of techniques and the ill-conceived objectives of the study. This research proposes an innovative hybridised model based on genetic algorithms (GAs) and artificial neural networks (ANNs), to extract the highly differentially expressed genes for a specific cancer pathology. The proposed method can efficiently extract the informative genes from the original data set and this has reduced the gene variability errors incurred by the preprocessing techniques. The novelty of the research comes from two perspectives. Firstly, the research emphasises on extracting informative features from a high dimensional and highly complex data set, rather than to improve classification results. Secondly, the use of ANN to compute the fitness function of GA which is rare in the context of feature extraction. Two benchmark microarray data have been taken to research the prominent genes expressed in the tumour development and the results show that the genes respond to different stages of tumourigenesis (i.e. different fitness precision levels) which may be useful for early malignancy detection. The extraction ability of the proposed model is validated based on the expected results in the synthetic data sets. In addition, two bioassay data have been used to examine the efficiency of the proposed model to extract significant features from the large, imbalanced and multiple data representation bioassay data.EThOS - Electronic Theses Online ServiceGBUnited Kingdo
Readings in Targeted Maximum Likelihood Estimation
This is a compilation of current and past work on targeted maximum likelihood estimation. It features the original targeted maximum likelihood learning paper as well as chapters on super (machine) learning using cross validation, randomized controlled trials, realistic individualized treatment rules in observational studies, biomarker discovery, case-control studies, and time-to-event outcomes with censored data, among others. We hope this collection is helpful to the interested reader and stimulates additional research in this important area