7 research outputs found

    Transcription factor binding site prediction with multivariate gene expression data

    Get PDF
    Multi-sample microarray experiments have become a standard experimental method for studying biological systems. A frequent goal in such studies is to unravel the regulatory relationships between genes. During the last few years, regression models have been proposed for the de novo discovery of cis-acting regulatory sequences using gene expression data. However, when applied to multi-sample experiments, existing regression based methods model each individual sample separately. To better capture the dynamic relationships in multi-sample microarray experiments, we propose a flexible method for the joint modeling of promoter sequence and multivariate expression data. In higher order eukaryotic genomes expression regulation usually involves combinatorial interaction between several transcription factors. Experiments have shown that spacing between transcription factor binding sites can significantly affect their strength in activating gene expression. We propose an adaptive model building procedure to capture such spacing dependent cis-acting regulatory modules. We apply our methods to the analysis of microarray time-course experiments in yeast and in Arabidopsis. These experiments exhibit very different dynamic temporal relationships. For both data sets, we have found all of the well-known cis-acting regulatory elements in the related context, as well as being able to predict novel elements.Comment: Published in at http://dx.doi.org/10.1214/10.1214/07-AOAS142 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Novel strategies for control of fermentation processes

    Get PDF

    Genetic algorithm-neural network: feature extraction for bioinformatics data.

    Get PDF
    With the advance of gene expression data in the bioinformatics field, the questions which frequently arise, for both computer and medical scientists, are which genes are significantly involved in discriminating cancer classes and which genes are significant with respect to a specific cancer pathology. Numerous computational analysis models have been developed to identify informative genes from the microarray data, however, the integrity of the reported genes is still uncertain. This is mainly due to the misconception of the objectives of microarray study. Furthermore, the application of various preprocessing techniques in the microarray data has jeopardised the quality of the microarray data. As a result, the integrity of the findings has been compromised by the improper use of techniques and the ill-conceived objectives of the study. This research proposes an innovative hybridised model based on genetic algorithms (GAs) and artificial neural networks (ANNs), to extract the highly differentially expressed genes for a specific cancer pathology. The proposed method can efficiently extract the informative genes from the original data set and this has reduced the gene variability errors incurred by the preprocessing techniques. The novelty of the research comes from two perspectives. Firstly, the research emphasises on extracting informative features from a high dimensional and highly complex data set, rather than to improve classification results. Secondly, the use of ANN to compute the fitness function of GA which is rare in the context of feature extraction. Two benchmark microarray data have been taken to research the prominent genes expressed in the tumour development and the results show that the genes respond to different stages of tumourigenesis (i.e. different fitness precision levels) which may be useful for early malignancy detection. The extraction ability of the proposed model is validated based on the expected results in the synthetic data sets. In addition, two bioassay data have been used to examine the efficiency of the proposed model to extract significant features from the large, imbalanced and multiple data representation bioassay data

    MAPK related phenotypic decisions in yeast

    Get PDF
    The evolutionarily conserved mitogen-activated protein kinase (MAPK) network is a signalling module, which enables the coordination and processing of various extracellular stimuli. It thereby guarantees a specific biological response to a precise dose of a given stimuli. The haploid yeast Saccharomyces cerevisiae uses this network to select particular mating partners by quantitatively interpreting the pheromone concentration gradient generated by potential mates. Activation of the mating MAPK module occurs only above a certain pheromone concentration threshold and relies on the pheromone-induced recruitment of a protein complex consisting of the scaffold Ste5 and the MAPKs Ste11, Ste7 and Fus3. This module ensures a robust morphological response in form of a mating projection to a defined pheromone concentration and allows cells to gauge the distance to a potential mating partner. Yet it remains unclear which of the module’s features interpret the pheromone concentration to decide when and where to generate a mating projection. To infer the network structure of the mating MAPK module, we developed a reverse engineering approach, which is based on the detection of pheromone response dependent changes in protein complex abundances. Interactions within the MAPK module were measured by fluorescence correlation spectroscopy (FCS), of which all possible protein-species in the MAPK module were resolved by applying a linear regression analysis (LRA). Using this approach, we were able to identify a cytosolic kinase-substrate interaction between Fus3 and the upstream Ste11, which constitutes a hitherto uncharacterized negative feedback. It affects the readout of the pheromone gradient and provides robustness to changes in the components involved in the loop. This negative feedback occurs by phosphorylation of S243 on Ste11 that hinders its binding to the scaffold Ste5 and thereby uncouples Ste11 from Fus3 activity. Controlling this mechanism provides ultrasensitivity at the first step of the MAPK cascade, as part of the hierarchical cascade arrangement, ensures a switch-like mating response and triggers shmoo formation at the right distance to a partner. This cytoplasmic feedback has a spatial component that confines the cytoplasmic Fus3 phosphorylation gradient. It thereby generates and maintains a localized source of active Fus3 at the mating tip, which in turn spatially restricts shmoo formation. This work shows how a network motif in the MAPK module enables the interpretation of the pheromone concentration gradient to sense potential mates, and how this extracellular gradient is translated into an intracellular activity gradient by spatial control of signalling, ultimately deciding both when and where to respond

    Genetic algorithm-neural network : feature extraction for bioinformatics data

    Get PDF
    With the advance of gene expression data in the bioinformatics field, the questions which frequently arise, for both computer and medical scientists, are which genes are significantly involved in discriminating cancer classes and which genes are significant with respect to a specific cancer pathology. Numerous computational analysis models have been developed to identify informative genes from the microarray data, however, the integrity of the reported genes is still uncertain. This is mainly due to the misconception of the objectives of microarray study. Furthermore, the application of various preprocessing techniques in the microarray data has jeopardised the quality of the microarray data. As a result, the integrity of the findings has been compromised by the improper use of techniques and the ill-conceived objectives of the study. This research proposes an innovative hybridised model based on genetic algorithms (GAs) and artificial neural networks (ANNs), to extract the highly differentially expressed genes for a specific cancer pathology. The proposed method can efficiently extract the informative genes from the original data set and this has reduced the gene variability errors incurred by the preprocessing techniques. The novelty of the research comes from two perspectives. Firstly, the research emphasises on extracting informative features from a high dimensional and highly complex data set, rather than to improve classification results. Secondly, the use of ANN to compute the fitness function of GA which is rare in the context of feature extraction. Two benchmark microarray data have been taken to research the prominent genes expressed in the tumour development and the results show that the genes respond to different stages of tumourigenesis (i.e. different fitness precision levels) which may be useful for early malignancy detection. The extraction ability of the proposed model is validated based on the expected results in the synthetic data sets. In addition, two bioassay data have been used to examine the efficiency of the proposed model to extract significant features from the large, imbalanced and multiple data representation bioassay data.EThOS - Electronic Theses Online ServiceGBUnited Kingdo

    Readings in Targeted Maximum Likelihood Estimation

    Get PDF
    This is a compilation of current and past work on targeted maximum likelihood estimation. It features the original targeted maximum likelihood learning paper as well as chapters on super (machine) learning using cross validation, randomized controlled trials, realistic individualized treatment rules in observational studies, biomarker discovery, case-control studies, and time-to-event outcomes with censored data, among others. We hope this collection is helpful to the interested reader and stimulates additional research in this important area
    corecore