330 research outputs found

    Evolving DNA motifs to predict GeneChip probe performance

    Get PDF
    Background: Affymetrix High Density Oligonuclotide Arrays (HDONA) simultaneously measure expression of thousands of genes using millions of probes. We use correlations between measurements for the same gene across 6685 human tissue samples from NCBI's GEO database to indicated the quality of individual HG-U133A probes. Low correlation indicates a poor probe. Results: Regular expressions can be automatically created from a Backus-Naur form (BNF) context-free grammar using strongly typed genetic programming. Conclusion: The automatically produced motif is better at predicting poor DNA sequences than an existing human generated RE, suggesting runs of Cytosine and Guanine and mixtures should all be avoided. © 2009 Langdon and Harrison; licensee BioMed Central Ltd

    Automated DNA Motif Discovery

    Get PDF
    Ensembl's human non-coding and protein coding genes are used to automatically find DNA pattern motifs. The Backus-Naur form (BNF) grammar for regular expressions (RE) is used by genetic programming to ensure the generated strings are legal. The evolved motif suggests the presence of Thymine followed by one or more Adenines etc. early in transcripts indicate a non-protein coding gene. Keywords: pseudogene, short and microRNAs, non-coding transcripts, systems biology, machine learning, Bioinformatics, motif, regular expression, strongly typed genetic programming, context-free grammar.Comment: 12 pages, 2 figure

    Gene Expression : From Microarrays to Functional Genomics

    Get PDF
    The time of the large sequencing projects has enabled unprecedented possibilities of investigating more complex aspects of living organisms. Among the high-throughput technologies based on the genomic sequences, the DNA microarrays are widely used for many purposes, including the measurement of the relative quantity of the messenger RNAs. However, the reliability of microarrays has been strongly doubted as robust analysis of the complex microarray output data has been developed only after the technology had already been spread in the community. An objective of this study consisted of increasing the performance of microarrays, and was measured by the successful validation of the results by independent techniques. To this end, emphasis has been given to the possibility of selecting candidate genes with remarkable biological significance within specific experimental design. Along with literature evidence, the re-annotation of the probes and model-based normalization algorithms were found to be beneficial when analyzing Affymetrix GeneChip data. Typically, the analysis of microarrays aims at selecting genes whose expression is significantly different in different conditions followed by grouping them in functional categories, enabling a biological interpretation of the results. Another approach investigates the global differences in the expression of functionally related groups of genes. Here, this technique has been effective in discovering patterns related to temporal changes during infection of human cells. Another aspect explored in this thesis is related to the possibility of combining independent gene expression data for creating a catalog of genes that are selectively expressed in healthy human tissues. Not all the genes present in human cells are active; some involved in basic activities (named housekeeping genes) are expressed ubiquitously. Other genes (named tissue-selective genes) provide more specific functions and they are expressed preferably in certain cell types or tissues. Defining the tissue-selective genes is also important as these genes can cause disease with phenotype in the tissues where they are expressed. The hypothesis that gene expression could be used as a measure of the relatedness of the tissues has been also proved. Microarray experiments provide long lists of candidate genes that are often difficult to interpret and prioritize. Extending the power of microarray results is possible by inferring the relationships of genes under certain conditions. Gene transcription is constantly regulated by the coordinated binding of proteins, named transcription factors, to specific portions of the its promoter sequence. In this study, the analysis of promoters from groups of candidate genes has been utilized for predicting gene networks and highlighting modules of transcription factors playing a central role in the regulation of their transcription. Specific modules have been found regulating the expression of genes selectively expressed in the hippocampus, an area of the brain having a central role in the Major Depression Disorder. Similarly, gene networks derived from microarray results have elucidated aspects of the development of the mesencephalon, another region of the brain involved in Parkinson Disease.The time of the large sequencing projects has enabled unprecedented possibilities of investigating more complex aspects of living organisms. Among the high-throughput technologies based on the genomic sequences, the DNA microarrays are widely used for many purposes, including the measurement of the relative quantity of the messenger RNAs. However, the reliability of microarrays has been strongly doubted as robust analysis of the complex microarray output data has been developed only after the technology had already been spread in the community. An objective of this study consisted of increasing the performance of microarrays, and was measured by the successful validation of the results by independent techniques. To this end, emphasis has been given to the possibility of selecting candidate genes with remarkable biological significance within specific experimental design. Along with literature evidence, the re-annotation of the probes and model-based normalization algorithms were found to be beneficial when analyzing Affymetrix GeneChip data. Typically, the analysis of microarrays aims at selecting genes whose expression is significantly different in different conditions followed by grouping them in functional categories, enabling a biological interpretation of the results. Another approach investigates the global differences in the expression of functionally related groups of genes. Here, this technique has been effective in discovering patterns related to temporal changes during infection of human cells. Another aspect explored in this thesis is related to the possibility of combining independent gene expression data for creating a catalog of genes that are selectively expressed in healthy human tissues. Not all the genes present in human cells are active; some involved in basic activities (named housekeeping genes) are expressed ubiquitously. Other genes (named tissue-selective genes) provide more specific functions and they are expressed preferably in certain cell types or tissues. Defining the tissue-selective genes is also important as these genes can cause disease with phenotype in the tissues where they are expressed. The hypothesis that gene expression could be used as a measure of the relatedness of the tissues has been also proved. Microarray experiments provide long lists of candidate genes that are often difficult to interpret and prioritize. Extending the power of microarray results is possible by inferring the relationships of genes under certain conditions. Gene transcription is constantly regulated by the coordinated binding of proteins, named transcription factors, to specific portions of the its promoter sequence. In this study, the analysis of promoters from groups of candidate genes has been utilized for predicting gene networks and highlighting modules of transcription factors playing a central role in the regulation of their transcription. Specific modules have been found regulating the expression of genes selectively expressed in the hippocampus, an area of the brain having a central role in the Major Depression Disorder. Similarly, gene networks derived from microarray results have elucidated aspects of the development of the mesencephalon, another region of the brain involved in Parkinson Disease

    Splicing factor ESRP1 controls ER-positive breast cancer by altering metabolic pathways

    Get PDF
    The epithelial splicing regulatory proteins 1 and 2 (ESRP1 and ESRP2) control the epithelial-to-mesenchymal transition (EMT) splicing program in cancer. However, their role in breast cancer recurrence is unclear. In this study, we report that high levels of ESRP1, but not ESRP2, are associated with poor prognosis in estrogen receptor positive (ER+) breast tumors. Knockdown of ESRP1 in endocrine-resistant breast cancer models decreases growth significantly and alters the EMT splicing signature, which we confirm using TCGA SpliceSeq data of ER+ BRCA tumors. However, these changes are not accompanied by the development of a mesenchymal phenotype or a change in key EMT-transcription factors. In tamoxifen-resistant cells, knockdown of ESRP1 affects lipid metabolism and oxidoreductase processes, resulting in the decreased expression of fatty acid synthase (FASN), stearoyl-CoA desaturase 1 (SCD1), and phosphoglycerate dehydrogenase (PHGDH) at both the mRNA and protein levels. Furthermore, ESRP1 knockdown increases the basal respiration and spare respiration capacity. This study reports a novel role for ESRP1 that could form the basis for the prevention of tamoxifen resistance in ER+ breast cancer

    Gene Expression Divergence between Subspecies of the House Mouse and the Contribution to Reproductive Isolation

    Get PDF
    Under the Biological Species Concept species are groups of interbreeding individuals that are reproductively isolated from other such groups. A common form of isolation in animals is intrinsic postzygotic isolation via hybrid sterility/inviability. There is a strong consensus that hybrid dysfunctions are caused by epistatic interactions between incompatible alleles from different loci (Dobzhansky-Muller incompatibilities). The identification of genes that contribute to reproductive isolation between taxa is critical to the understanding of the process of speciation, but identifying such genes has proven to be difficult. It appears that regulatory evolution might play an important role in postzygotic isolation and the formation of species. The present study employs a whole genome microarray approach to identify genes with regulatory differences between three subspecies of the house mouse, Mus musculus musculus, M. m. domesticus and M. m. castaneus. The within-locus mode of inheritance for gene expression was assessed for three different tissues (brain, liver and testis) by studying the subspecies and their male reciprocal F1 hybrids. The vast majority of transcripts are additively expressed in the hybrids with only few transcripts showing dominance or overdominance in expression except for one direction of one cross, which shows large misexpression in the testis. The reliability of the observed pattern was ensured by three different analysis methods as well as control experiments. The results suggest that additivity is the general mode of inheritance regarding gene expression changes between house mouse subspecies. Differentially expressed transcripts provide promising candidate genes that could be related to reproductive isolation through regulatory incompatibilities. Several transcripts with expression differences between M. m. musculus and M. m. domesticus were selected for further investigation. A validation approach using quantitative Real-Time PCR strongly emphasizes the need for confirmation of microarray candidate genes. The results show that sequence differences even between closely related taxa have the potential to influence expression data from both microarray and follow-up validation approaches. As divergent gene expression evolution between taxa may be entirely neutral, samples from a transect of the natural musculus-domesticus hybrid zone in Bavaria were analyzed in order to assess functional consequences for two candidate genes. Both genes show large expression differences between the subspecies. The analysis revealed that it is unlikely that the two genes contribute to reproductive isolation between the subspecies as no sign of limited introgression is evident. Rather, the hybrid zone approach in combination with population genetic analyses suggests adaptive introgression of those alleles that are associated with high expression levels. In both cases, the high expression phenotype represents the derived state and is associated with reduced levels of nucleotide polymorphism and a negative Tajima's D. For both genes, regulatory and protein-coding evolution is decoupled and the expression difference results from cis- rather than trans-acting changes

    Comparative transcriptomics in plants

    Get PDF
    Comparative genomics is the study of the structural and functional rela- tionships between the genomes of different species or strains. Recently microarray experiments have yielded massive amounts of expression infor- mation for many genes under various conditions or in different tissues for different model species. Expression compendia grouping multiple microar- ray experiments performed in similar (or different) experimental condition make it possible to define correlated expression patterns between genes. Genes within such a coexpression cluster are expected to have more similar functionality compared to genes lacking expression similarity. In this thesis the different steps required to systematically compare expres- sion data across species are described and some future applications of plant comparative transcriptomics are highlighted. Then we analyzed if function- ally related genes show coexpression in Arabidopsis and rice and developed a general framework to measure expression context conservation (ECC) for orthologous genes. Additionally, we studied the evolutionary parameters influencing ECC conservation and compared expression with sequence evo- lution. At the end, a new method is presented to define high quality tis- sue specific genes in seven different plant species; A.thaliana (Arabidopsis), Z.mays (Maize), M.truncatula (Medicago), P.trichocarpa (Poplar), O.sativa (Rice), G.max (Soybean) and V.vinifera (Grape) using Affymetrix microar- ray expression profiles. We also performed an in-depth study on the rela- tionship between leaf tissue specific genes coexpression clusters, within a species and in comparison with other species for a set of strictly selected genes

    Microarray tools and analysis methods to better characterize biological networks

    Get PDF
    To accurately model a biological system (e.g. cell), we first need to characterize each of its distinct networks. While omics data has given us unprecedented insight into the structure and dynamics of these networks, the associated analysis routines are more involved and the accuracy and precision of the experimental technologies not sufficiently examined. The main focus of our research has been to develop methods and tools to better manage and interpret microarray data. How can we improve methods to store and retrieve microarray data from a relational database? What experimental and biological factors most influence our interpretation of a microarray's measurements? By accounting for these factors, can we improve the accuracy and precision of microarray measurements? It's essential to address these last two questions before using 'omics data for downstream analyses, such as inferring transciption regulatory networks from microarray data. While answers to such questions are vital to microarray research in particular, they are equally relevant to systems biology in general. We developed three studies to investigate aspects of these questions when using Affymetrix expression arrays. In the first study, we develop the Data-FATE framework to improve the handling of large scientific data sets. In the next two studies, we developed methods and tools that allow us to examine the impact of physical and technical factors known or suspected to dramatically alter the interpretation of a microarray experiment. In the second study, we develop ArrayInitiative -- a tool that simplifies the process of creating custom CDFs -- so that we can easily re-design the array specifications for Affymetrix 3' IVT expression arrays. This tool is essential for testing the impact of the various factors, and for making the framework easy to communicate and re-use. We then use ArrayInitiative in a case study to illustrate the impact of several factors known to distort microarray signals. In the third study, we systematically and exhaustively examine the effect of physical and technical factors -- both generally accepted and novel -- on our interpretation of dozens of experiments using hundreds of E. coli Affymetrix microarrays

    Integrative methods for reconstruction of dynamic networks in chondrogenesis

    Get PDF
    Application of human mesenchymal stem cells represents a promising approach in the field of regenerative medicine. Specific stimulation can give rise to chondrocytes, osteocytes or adipocytes. Investigation of the underlying biological processes which induce the observed cellular differentiation is essential to efficiently generate specific tissues for therapeutic purposes. Upon treatment with diverse stimuli, gene expression levels of cultivated human mesenchymal stem cells were monitored using time series microarray experiments for the three lineages. Application of gene network inference is a common approach to identify the regulatory dependencies among a set of investigated genes. This thesis applies the NetGenerator V2.0 tool, which is capable to deal with multiple time series data, which investigates the effect of multiple external stimuli. The applied model is based on a system of linear ordinary differential equations, whose parameters are optimised to reproduce the given time series datasets. Several procedures in the inference process were adapted in this new version in order to allow for the integration of multiple datasets. Network inference was applied on in silico network examples as well as on multi-experiment microarray data of mesenchymal stem cells. The resulting chondrogenesis model was evaluated on the basis of several features including the model adaptation to the data, total number of connections, proportion of connections associated with prior knowledge and the model stability in a resampling procedure. Altogether, NetGenerator V2.0 has provided an automatic and efficient way to integrate experimental datasets and to enhance the interpretability and reliability of the resulting network. In a second chondrogenesis model, the miRNA and mRNA time series data were integrated for the purpose of network inference. One hypothesis of the model was verified by experiments, which demonstrated the negative effect of miR-524-5p on downstream genes

    Complex genetic approaches to neurodegenerative diseases.

    Get PDF
    Neurodegenerative diseases are fatal disorders in which disease pathogenesis results in the progressive degeneration of the central and/or the peripheral nervous systems. These diseases currently affect -2% of the population but are expected to increase in prevalence as average life expectancy increases. The majority of these diseases have a complex genetic basis. The work presented in this thesis aimed to investigate the genetic basis of two neurodegenerative diseases, amyotrophic lateral sclerosis (ALS) and the human prion diseases kuru and sporadic Creutzfeldt-Jakob disease (sCJD), using novel complex genetic approaches. ALS is a fatal neurodegenerative disease in which motor neurons are seen to degenerate. It is a complex disease with 10% of individuals having a family history and the remaining 90% of non-familial cases having some genetic component. The gene DYNC1H1 is involved in retrograde axonal transport and is a good candidate for ALS. In this thesis the genetic architecture of DYNC1H1 was elucidated and a mutation screen of exons 8, 13 and 14 was undertaken in familial forms of ALS and other motor neuron diseases. No mutations were found. A linkage disequilibrium (LD) based association study was conducted using two tagging single nucleotide polymorphisms (tSNPs) which were identified as sufficient to represent genetic variation across DYNC1HI. These tSNPs were tested for an association with sporadic ALS (SALS) in 261 cases and 225 matched controls but no association was identified. Kuru is a devastating epidemic prion disease which affected a highly geographically restricted area of the Papua New Guinea highlands, predominantly affected adult women and children. Its incidence has steadily declined since the cessation of its route of transmission, endocannibalism, in the late 1950's. Kuru imposed strong balancing selection on codon 129 of the prion gene (PRNP). Analysis of kuru-exposed and unexposed populations showed significant deviations from Hardy-Weinberg equilibrium (HWE) consistent with the known protective effect of codon 129 heterozygosity. Signatures of selection were investigated in the surviving populations, such as deviations from HWE and an increasing cline in codon 129 valine allele frequency, which covaried with disease exposure. A novel PRNP G127V polymorphism was detected which, while common in the area of highest kuru incidence, was absent from kuru patients and unexposed population groups. Genealogical analysis revealed that the heterozygous PRNP G127V genotype confers strong prion disease resistance, which has been selected by the kuru epidemic. Finally, PRNP copy number was investigated as a possible genetic mechanism for susceptibility to kuru and sCJD. No conclusive copy number changes were identified
    corecore