31 research outputs found

    Vertebrate conserved non coding DNA regions have a high persistence length and a short persistence time

    Get PDF
    BACKGROUND: The comparison of complete genomes has revealed surprisingly large numbers of conserved non-protein-coding (CNC) DNA regions. However, the biological function of CNC remains elusive. CNC differ in two aspects from conserved protein-coding regions. They are not conserved across phylum boundaries, and they do not contain readily detectable sub-domains. Here we characterize the persistence length and time of CNC and conserved protein-coding regions in the vertebrate and insect lineages. RESULTS: The persistence length is the length of a genome region over which a certain level of sequence identity is consistently maintained. The persistence time is the evolutionary period during which a conserved region evolves under the same selective constraints.Our main findings are: (i) Insect genomes contain 1.60 times less conserved information than vertebrates; (ii) Vertebrate CNC have a higher persistence length than conserved coding regions or insect CNC; (iii) CNC have shorter persistence times as compared to conserved coding regions in both lineages. CONCLUSION: Higher persistence length of vertebrate CNC indicates that the conserved information in vertebrates and insects is organized in functional elements of different lengths. These findings might be related to the higher morphological complexity of vertebrates and give clues about the structure of active CNC elements.Shorter persistence time might explain the previously puzzling observations of highly conserved CNC within each phylum, and of a lack of conservation between phyla. It suggests that CNC divergence might be a key factor in vertebrate evolution. Further evolutionary studies will help to relate individual CNC to specific developmental processes

    Widespread Polymorphism in the Positions of Stop Codons in Drosophila melanogaster

    Get PDF
    The mechanisms underlying evolutionary changes in protein length are poorly understood. Protein domains are lost and gained between species and must have arisen first as within-species polymorphisms. Here, we use Drosophila melanogaster population genomic data combined with between species divergence information to understand the evolutionary forces that generate and maintain polymorphisms causing changes in protein length in D. melanogaster. Specifically, we looked for protein length variations resulting from premature termination codons (PTCs) and stop codon losses (SCLs). We discovered that 438 genes contained polymorphisms resulting in truncation of the translated region (PTCs) and 119 genes contained polymorphisms predicted to lengthen the translated region (SCLs). Stop codon polymorphisms (SCPs) (especially PTCs) appear to be more deleterious than other polymorphisms, including protein amino acid changes. Genes harboring SCPs are in general less selectively constrained, more narrowly expressed, and enriched for dispensable biological functions. However, we also observed exceptional cases such as genes that have multiple independent SCPs, alleles that are shared between D. melanogaster and Drosophila simulans, and high-frequency alleles that cause extreme changes in gene length. SCPs likely have an important role in the evolution of these genes

    Context Differences Reveal Insulator and Activator Functions of a Su(Hw) Binding Region

    Get PDF
    Insulators are DNA elements that divide chromosomes into independent transcriptional domains. The Drosophila genome contains hundreds of binding sites for the Suppressor of Hairy-wing [Su(Hw)] insulator protein, corresponding to locations of the retroviral gypsy insulator and non-gypsy binding regions (BRs). The first non-gypsy BR identified, 1A-2, resides in cytological region 1A. Using a quantitative transgene system, we show that 1A-2 is a composite insulator containing enhancer blocking and facilitator elements. We discovered that 1A-2 separates the yellow (y) gene from a previously unannotated, non-coding RNA gene, named yar for y-achaete (ac) intergenic RNA. The role of 1A-2 was elucidated using homologous recombination to excise these sequences from the natural location, representing the first deletion of any Su(Hw) BR in the genome. Loss of 1A-2 reduced yar RNA accumulation, without affecting mRNA levels from the neighboring y and ac genes. These data indicate that within the 1A region, 1A-2 acts an activator of yar transcription. Taken together, these studies reveal that the properties of 1A-2 are context-dependent, as this element has both insulator and enhancer activities. These findings imply that the function of non-gypsy Su(Hw) BRs depends on the genomic environment, predicting that Su(Hw) BRs represent a diverse collection of genomic regulatory elements

    Comparative Genomics of the Anopheline Glutathione S-Transferase Epsilon Cluster

    Get PDF
    Enzymes of the glutathione S-transferase (GST) family play critical roles in detoxification of xenobiotics across many taxa. While GSTs are ubiquitous both in animals and plants, the GST epsilon class (GSTE) is insect-specific and has been associated with resistance to chemical insecticides. While both Aedes aegypti and Anopheles gambiae GSTE clusters consist of eight members, only four putative orthologs are identifiable between the species, suggesting independent expansions of the class in each lineage. We used a primer walking approach, sequencing almost the entire cluster from three Anopheles species (An. stephensi, An. funestus (both Cellia subgenus) and An. plumbeus (Anopheles subgenus)) and compared the sequences to putative orthologs in An. gambiae (Cellia) in an attempt to trace the evolution of the cluster within the subfamily Anophelinae. Furthermore, we measured transcript levels from the identified GSTE loci by real time reverse transcription PCR to determine if all genes were similarly transcribed at different life stages. Among the species investigated, gene order and orientation were similar with three exceptions: (i) GSTE1 was absent in An. plumbeus; (ii) GSTE2 is duplicated in An. plumbeus and (iii) an additional transcriptionally active pseudogene (ψAsGSTE2) was found in An. stephensi. Further statistical analysis and protein modelling gave evidence for positive selection on codons of the catalytic site in GSTE5 albeit its origin seems to predate the introduction of chemical insecticides. Gene expression profiles revealed differences in expression pattern among genes at different life stages. With the exception of GSTE1, ψAsGSTE2 and GSTE2b, all Anopheles species studied share orthologs and hence we assume that GSTE expansion generally predates radiation into subgenera, though the presence of GSTE1 may also suggest a recent duplication event in the Old World Cellia subgenus, instead of a secondary loss. The modifications of the catalytic site within GSTE5 may represent adaptations to new habitats

    Similarities and differences of polyadenylation signals in human and fly.

    Get PDF
    BACKGROUND: Cleavage of messenger RNA (mRNA) precursors is an essential step in mRNA maturation. The signal recognized by the cleavage enzyme complex has been characterized as an A rich region upstream of the cleavage site containing a motif with consensus AAUAAA, followed by a U or UG rich region downstream of the cleavage site. RESULTS: We studied these signals using exhaustive databases of cleavage sites obtained from aligning raw expressed sequence tags (EST) sequences to genomic sequences in Homo sapiens and Drosophila melanogaster. These data show that the polyadenylation signal is highly conserved in human and fly. In addition, de novo motif searches generated a refined description of the U-rich downstream sequence (DSE) element, which shows more divergence between the two species. These refined motifs are applied, within a Hidden Markov Model (HMM) framework, to predict mRNA cleavage sites. CONCLUSION: We demonstrate that the DSE is a specific motif in both human and Drosophila. These findings shed light on the sequence correlates of a highly conserved biological process, and improve in silico prediction of 3' mRNA cleavage and polyadenylation sites

    A shared promoter region suggests a common ancestor for the human VCX/Y, SPANX, and CSAG gene families and the murine CYPT family

    No full text
    Many testis-specific genes from the sex chromosomes are subject to rapid evolution, which can make it difficult to identify murine genes in the human genome. The murine CYPT gene family includes 15 members, but orthologs were undetectable in the human genome. However, using refined homology search, sequences corresponding to the shared promoter region of the CYPT family were identified at 39 loci. Most loci were located immediately upstream of genes belonging to the VCX/Y, SPANX, or CSAG gene families. Sequence comparison of the loci revealed a conserved CYPT promoter-like (CPL) element featuring TATA and CCAAT boxes. The expression of members of the three families harboring the CPL resembled the murine expression of the CYPT family, with weak expression in late pachytene spermatocytes and predominant expression in spermatids, but some genes were also weakly expressed in somatic cells and in other germ cell types. The genomic regions harboring the gene families were rich in direct and inverted segmental duplications (SD), which may facilitate gene conversion and rapid evolution. The conserved CPL and the common expression profiles suggest that the human VCX/Y, SPANX, and CSAG2 gene families together with the murine SPANX gene and the CYPT family may share a common ancestor. Finally, we present evidence that VCX/Y and SPANX may be paralogs with a similar protein structure consisting of C terminal acidic repeats of variable lengths

    Asap: A Framework for Over-Representation Statistics for Transcription Factor Binding Sites

    Get PDF
    Background: In studies of gene regulation the efficient computational detection of over-represented transcription factor binding sites is an increasingly important aspect. Several published methods can be used for testing whether a set of hypothesised co-regulated genes share a common regulatory regime based on the occurrence of the modelled transcription factor binding sites. However there is little or no information available for guiding the end users choice of method. Furthermore it would be necessary to obtain several different software programs from various sources to make a well-founded choice. Methodology: We introduce a software package, Asap, for fast searching with position weight matrices that include several standard methods for assessing over-representation. We have compared the ability of these methods to detect overrepresented transcription factor binding sites in artificial promoter sequences. Controlling all aspects of our input data we are able to identify the optimal statistics across multiple threshold values and for sequence sets containing different distributions of transcription factor binding sites. Conclusions: We show that our implementation is significantly faster than more naïve scanning algorithms when searching with many weight matrices in large sequence sets. When comparing the various statistics, we show that those based on binomial over-representation and Fisher’s exact test performs almost equally good and better than the others. An onlin
    corecore