12,045 research outputs found

    Segmentally Variable Genes: A New Perspective on Adaptation

    Get PDF
    Genomic sequence variation is the hallmark of life and is key to understanding diversity and adaptation among the numerous microorganisms on earth. Analysis of the sequenced microbial genomes suggests that genes are evolving at many different rates. We have attempted to derive a new classification of genes into three broad categories: lineage-specific genes that evolve rapidly and appear unique to individual species or strains; highly conserved genes that frequently perform housekeeping functions; and partially variable genes that contain highly variable regions, at least 70 amino acids long, interspersed among well-conserved regions. The latter we term segmentally variable genes (SVGs), and we suggest that they are especially interesting targets for biochemical studies. Among these genes are ones necessary to deal with the environment, including genes involved in host–pathogen interactions, defense mechanisms, and intracellular responses to internal and environmental changes. For the most part, the detailed function of these variable regions remains unknown. We propose that they are likely to perform important binding functions responsible for protein–protein, protein–nucleic acid, or protein–small molecule interactions. Discerning their function and identifying their binding partners may offer biologists new insights into the basic mechanisms of adaptation, context-dependent evolution, and the interaction between microbes and their environment. Segmentally variable genes show a mosaic pattern of one or more rapidly evolving, variable regions. Discerning their function may provide new insights into the forces that shape genome diversity and adaptationNational Science Foundation (998088, 0239435

    A study of CIS-acting elements required for dosage compensation in Drosophila Melanogaster : a thesis presented in partial fulfilment of the requirements for the degree of Master of Science in Genetics at Massey University, Palmerston North, New Zealand

    Get PDF
    Dosage compensation (the equalisation of X-linked gene products) occurs in Drosophila melanogaster by a two fold transcriptional up-regulation of X-linked gene expression in males. This involves the binding of five proteins, MSL-1, MSL-2, MSL-3, MLE, MOF, and potentially an RNA (roXl or roX2), to hundreds of sites along the male X chromosome. The cis-acting X-linked DNA sequences required for dosage compensation (called dosage compensation regulatory elements or DCREs) remain elusive, despite numerous attempts of identify them. An insulated reporter gene assay system has been developed to minimise problems previously encountered with identification of these elements. The reporter system consists of the constitutive armadillo promoter fused to the lacZ reporter gene (called arm-lacZ). This reporter construct is flanked by SCS/SCS' insulator elements to block potential repressive effects of an autosomal chromatin environment. The role of the roX genes during dosage compensation was investigated. Initially both the roXl and roX2 RNAs were expressed from within the arm-lacZ insulated system. Expression of either RNA lead to a significant increase in lacZ expression in males, although consistently less than two-fold. These results suggested that either the MSL complex was binding to the roX genes or the expression of the roX RNAs in cis lead to male-specific hypertranscription of lacZ. To test these possibilities roX1 and roX2 cDNAs were inserted into the arm-lacZ reporter. Insertion of either cDNA lead to a significant increase in lacZ expression in males, suggesting that the transcribed regions of the roX genes contain binding site(s) for the MSL complex. Interestingly the level of lacZ hypertranscription in males was significantly higher in homozygous roX1 cDNA lines than homozygous roX1 gene lines. This may indicate that too high a local concentration of roX1 RNA has a dampening effect on the level of hypertranscription meditated by the MSL complex. In a set of experiments designed to identify the MSL binding site(s) in roX1, two regions of the cDNA sequence were amplified and inserted into the arm-lacZ system. One of these fragments, containing a proposed DNAseI hypersensitivity site and possible GAGA binding sites, increased lacZ expression in males, but to levels lower than the entire cDNA. This suggests there may be more than one MSL biding site in roX1. A second method of dosage compensation is thought to occur in Drosophila, independently of the MSL proteins. The arm-lacZ insulated reporter system was used to investigate the hypothesis that some genes may be dosage compensated due to repression by Sex-lethal (Sxl) in females. Several genes have been found to contain three or more Sxl binding sites in their 3' UTRs. with some also carrying Sxl binding sites in the 5' UTR. Fragments from the Sxl, Cut and Small Forked genes, containing numerous Sxl binding sites from the 3' UTR, were inserted into the 3' UTR region of arm-lacZ. Males carrying autosomal insertions of the construct had on average 1.07 - 1.50 times the level of β-galactosidase in females. This suggests that some genes could be partially compensated through Sxl repression in females. In addition to inserting 3' UTR fragments into arm-lacZ, a synthetic oligonucleotide containing a long Sxl binding site was inserted into the 5' region of an arm-lacZ construct already carrying the Runt 3' UTR fragment. Males carrying autosomal insertions of the construct had levels of β-galactosidase activity similar to those lines carrying autosomal insertions of the 3' UTR fragments alone. This suggests that other factors such as RNA binding proteins or RNA secondary structure may be required in order to obtain efficient translation repression by Sxl. Finally three X-linked DNA fragments, from the 1C region, were inserted individually between the SCS' element and the armadillo promoter. If the X-linked fragment contained a DCRE then males carrying autosomal insertions of the construct would produce twice the β-galactosidase activity of females. However, males and females expressed the same levels of lacZ

    Entropy-scaling search of massive biological data

    Get PDF
    Many datasets exhibit a well-defined structure that can be exploited to design faster search tools, but it is not always clear when such acceleration is possible. Here, we introduce a framework for similarity search based on characterizing a dataset's entropy and fractal dimension. We prove that searching scales in time with metric entropy (number of covering hyperspheres), if the fractal dimension of the dataset is low, and scales in space with the sum of metric entropy and information-theoretic entropy (randomness of the data). Using these ideas, we present accelerated versions of standard tools, with no loss in specificity and little loss in sensitivity, for use in three domains---high-throughput drug screening (Ammolite, 150x speedup), metagenomics (MICA, 3.5x speedup of DIAMOND [3,700x BLASTX]), and protein structure search (esFragBag, 10x speedup of FragBag). Our framework can be used to achieve "compressive omics," and the general theory can be readily applied to data science problems outside of biology.Comment: Including supplement: 41 pages, 6 figures, 4 tables, 1 bo

    Homology modelling of transferrin-binding protein A from Neisseria meningitidis

    Get PDF
    Neisseria meningitidis, a causative agent of bacterial meningitis, obtains transferrin-bound iron by expressing two outer membrane located transferrin-binding proteins, TbpA and TbpB. TbpA is thought to be an integral outer membrane pore that facilitates iron uptake. Evidence suggests that TbpA is a useful antigen for inclusion in a vaccine effective against meningococcal disease, hence the identification of regions involved in ligand binding is of paramount importance to design strategies to block uptake of iron. The protein shares sequence and functional similarities to the Escherichia coli siderophore receptors FepA and FhuA, whose structures have been determined. These receptors are composed of two domains, a 22-stranded b-barrel and an N-terminal plug region that sits within the barrel and occludes the transmembrane pore. A three-dimensional TbpA model was constructed using FepA and FhuA structural templates, hydrophobicity analysis and homology modelling. TbpA was found to possess a similar architecture to the siderophore receptors. In addition to providing insights into the highly immunogenic nature of TbpA and allowing the prediction of potentially important ligandbinding epitopes, the model also reveals a narrow channel through its entire length. The relevance of this channel and the spatial arrangement of external loops, to the mechanism of iron translocation employed by TbpA is discussed

    Application of protein structure alignments to iterated hidden Markov model protocols for structure prediction.

    Get PDF
    BackgroundOne of the most powerful methods for the prediction of protein structure from sequence information alone is the iterative construction of profile-type models. Because profiles are built from sequence alignments, the sequences included in the alignment and the method used to align them will be important to the sensitivity of the resulting profile. The inclusion of highly diverse sequences will presumably produce a more powerful profile, but distantly related sequences can be difficult to align accurately using only sequence information. Therefore, it would be expected that the use of protein structure alignments to improve the selection and alignment of diverse sequence homologs might yield improved profiles. However, the actual utility of such an approach has remained unclear.ResultsWe explored several iterative protocols for the generation of profile hidden Markov models. These protocols were tailored to allow the inclusion of protein structure alignments in the process, and were used for large-scale creation and benchmarking of structure alignment-enhanced models. We found that models using structure alignments did not provide an overall improvement over sequence-only models for superfamily-level structure predictions. However, the results also revealed that the structure alignment-enhanced models were complimentary to the sequence-only models, particularly at the edge of the "twilight zone". When the two sets of models were combined, they provided improved results over sequence-only models alone. In addition, we found that the beneficial effects of the structure alignment-enhanced models could not be realized if the structure-based alignments were replaced with sequence-based alignments. Our experiments with different iterative protocols for sequence-only models also suggested that simple protocol modifications were unable to yield equivalent improvements to those provided by the structure alignment-enhanced models. Finally, we found that models using structure alignments provided fold-level structure assignments that were superior to those produced by sequence-only models.ConclusionWhen attempting to predict the structure of remote homologs, we advocate a combined approach in which both traditional models and models incorporating structure alignments are used

    Systematic analysis of somatic mutations driving cancer: Uncovering functional protein regions in disease development

    Get PDF
    Background: Recent advances in sequencing technologies enable the large-scale identification of genes that are affected by various genetic alterations in cancer. However, understanding tumor development requires insights into how these changes cause altered protein function and impaired network regulation in general and/or in specific cancer types. Results: In this work we present a novel method called iSiMPRe that identifies regions that are significantly enriched in somatic mutations and short in-frame insertions or deletions (indels). Applying this unbiased method to the complete human proteome, by using data enriched through various cancer genome projects, we identified around 500 protein regions which could be linked to one or more of 27 distinct cancer types. These regions covered the majority of known cancer genes, surprisingly even tumor suppressors. Additionally, iSiMPRe also identified novel genes and regions that have not yet been associated with cancer. Conclusions: While local somatic mutations correspond to only a subset of genetic variations that can lead to cancer, our systematic analyses revealed that they represent an accompanying feature of most cancer driver genes regardless of the primary mechanism by which they are perturbed during tumorigenesis. These results indicate that the accumulation of local somatic mutations can be used to pinpoint genes responsible for cancer formation and can also help to understand the effect of cancer mutations at the level of functional modules in a broad range of cancer driver genes. Reviewers: This article was reviewed by Sándor Pongor, Michael Gromiha and Zoltán Gáspári. © 2016 Mészáros et al
    corecore