10,785 research outputs found

    Fibronectin Contributes To Notochord Intercalation In The Invertebrate Chordate, Ciona Intestinalis

    Get PDF
    Background: Genomic analysis has upended chordate phylogeny, placing the tunicates as the sister group to the vertebrates. This taxonomic rearrangement raises questions about the emergence of a tunicate/vertebrate ancestor. Results: Characterization of developmental genes uniquely shared by tunicates and vertebrates is one promising approach for deciphering developmental shifts underlying acquisition of novel, ancestral traits. The matrix glycoprotein Fibronectin (FN) has long been considered a vertebrate-specific gene, playing a major instructive role in vertebrate embryonic development. However, the recent computational prediction of an orthologous “vertebrate-like” Fn gene in the genome of a tunicate, Ciona savignyi, challenges this viewpoint suggesting that Fn may have arisen in the shared tunicate/vertebrate ancestor. Here we verify the presence of a tunicate Fn ortholog. Transgenic reporter analysis was used to characterize a Ciona Fn enhancer driving expression in the notochord. Targeted knockdown in the notochord lineage indicates that FN is required for proper convergent extension. Conclusions: These findings suggest that acquisition of Fn was associated with altered notochord morphogenesis in the vertebrate/tunicate ancestor

    Categorization of 77 dystrophin exons into 5 groups by a decision tree using indexes of splicing regulatory factors as decision markers

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Duchenne muscular dystrophy, a fatal muscle-wasting disease, is characterized by dystrophin deficiency caused by mutations in the <it>dystrophin </it>gene. Skipping of a target <it>dystrophin </it>exon during splicing with antisense oligonucleotides is attracting much attention as the most plausible way to express dystrophin in DMD. Antisense oligonucleotides have been designed against splicing regulatory sequences such as splicing enhancer sequences of target exons. Recently, we reported that a chemical kinase inhibitor specifically enhances the skipping of mutated <it>dystrophin </it>exon 31, indicating the existence of exon-specific splicing regulatory systems. However, the basis for such individual regulatory systems is largely unknown. Here, we categorized the <it>dystrophin </it>exons in terms of their splicing regulatory factors.</p> <p>Results</p> <p>Using a computer-based machine learning system, we first constructed a decision tree separating 77 authentic from 14 known cryptic exons using 25 indexes of splicing regulatory factors as decision markers. We evaluated the classification accuracy of a novel cryptic exon (exon 11a) identified in this study. However, the tree mislabeled exon 11a as a true exon. Therefore, we re-constructed the decision tree to separate all 15 cryptic exons. The revised decision tree categorized the 77 authentic exons into five groups. Furthermore, all nine disease-associated novel exons were successfully categorized as exons, validating the decision tree. One group, consisting of 30 exons, was characterized by a high density of exonic splicing enhancer sequences. This suggests that AOs targeting splicing enhancer sequences would efficiently induce skipping of exons belonging to this group.</p> <p>Conclusions</p> <p>The decision tree categorized the 77 authentic exons into five groups. Our classification may help to establish the strategy for exon skipping therapy for Duchenne muscular dystrophy.</p

    Network Evolution: Rewiring and Signatures of Conservation in Signaling

    Get PDF
    The analysis of network evolution has been hampered by limited availability of protein interaction data for different organisms. In this study, we investigate evolutionary mechanisms in Src Homology 3 (SH3) domain and kinase interaction networks using high-resolution specificity profiles. We constructed and examined networks for 23 fungal species ranging from Saccharomyces cerevisiae to Schizosaccharomyces pombe. We quantify rates of different rewiring mechanisms and show that interaction change through binding site evolution is faster than through gene gain or loss. We found that SH3 interactions evolve swiftly, at rates similar to those found in phosphoregulation evolution. Importantly, we show that interaction changes are sufficiently rapid to exhibit saturation phenomena at the observed timescales. Finally, focusing on the SH3 interaction network, we observe extensive clustering of binding sites on target proteins by SH3 domains and a strong correlation between the number of domains that bind a target protein (target in-degree) and interaction conservation. The relationship between in-degree and interaction conservation is driven by two different effects, namely the number of clusters that correspond to interaction interfaces and the number of domains that bind to each cluster leads to sequence specific conservation, which in turn results in interaction conservation. In summary, we uncover several network evolution mechanisms likely to generalize across peptide recognition modules

    Inference of biomolecular interactions from sequence data

    Get PDF
    This thesis describes our work on the inference of biomolecular interactions from sequence data. In particular, the first part of the thesis focuses on proteins and describes computational methods that we have developed for the inference of both intra- and inter-protein interactions from genomic data. The second part of the thesis centers around protein-RNA interactions and describes a method for the inference of binding motifs of RNA-binding proteins from high-throughput sequencing data. The thesis is organized as follows. In the first part, we start by introducing a novel mathematical model for the characterization of protein sequences (chapter 1). We then show how, using genomic data, this model can be successfully applied to two different problems, namely to the inference of interacting amino acid residues in the tertiary structure of protein domains (chapter 2) and to the prediction of protein-protein interactions in large paralogous protein families (chapters 3 and 4). We conclude the first part by a discussion of potential extensions and generalizations of the methods presented (chapter 5). In the second part of this thesis, we first give a general introduction about RNA- binding proteins (chapter 6). We then describe a novel experimental method for the genome-wide identification of target RNAs of RNA-binding proteins and show how this method can be used to infer the binding motifs of RNA-binding proteins (chapter 7). Finally, we discuss a potential mechanism by which KH domain-containing RNA- binding proteins could achieve the specificity of interaction with their target RNAs and conclude the second part of the thesis by proposing a novel type of motif finding algorithm tailored for the inference of their recognition elements (chapter 8)

    TFCONES: A database of vertebrate transcription factor-encoding genes and their associated conserved noncoding elements

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Transcription factors (TFs) regulate gene transcription and play pivotal roles in various biological processes such as development, cell cycle progression, cell differentiation and tumor suppression. Identifying <it>cis</it>-regulatory elements associated with TF-encoding genes is a crucial step in understanding gene regulatory networks. To this end, we have used a comparative genomics approach to identify putative <it>cis</it>-regulatory elements associated with TF-encoding genes in vertebrates.</p> <p>Description</p> <p>We have created a database named TFCONES (Transcription Factor Genes & Associated COnserved Noncoding ElementS) (<url>http://tfcones.fugu-sg.org</url>) which contains all human, mouse and fugu TF-encoding genes and conserved noncoding elements (CNEs) associated with them. The CNEs were identified by gene-by-gene alignments of orthologous TF-encoding gene loci using MLAGAN. We also predicted putative transcription factor binding sites within the CNEs. A significant proportion of human-fugu CNEs contain experimentally defined binding sites for transcriptional activators and repressors, indicating that a majority of the CNEs may function as transcriptional regulatory elements. The TF-encoding genes that are involved in nervous system development are generally enriched for human-fugu CNEs. Users can retrieve TF-encoding genes and their associated CNEs by conducting a keyword search or by selecting a family of DNA-binding proteins.</p> <p>Conclusion</p> <p>The conserved noncoding elements identified in TFCONES represent a catalog of highly prioritized putative <it>cis</it>-regulatory elements of TF-encoding genes and are candidates for functional assay.</p

    BLSSpeller : exhaustive comparative discovery of conserved cis-regulatory elements

    Get PDF
    Motivation: The accurate discovery and annotation of regulatory elements remains a challenging problem. The growing number of sequenced genomes creates new opportunities for comparative approaches to motif discovery. Putative binding sites are then considered to be functional if they are conserved in orthologous promoter sequences of multiple related species. Existing methods for comparative motif discovery usually rely on pregenerated multiple sequence alignments, which are difficult to obtain for more diverged species such as plants. As a consequence, misaligned regulatory elements often remain undetected. Results: We present a novel algorithm that supports both alignment-free and alignment-based motif discovery in the promoter sequences of related species. Putative motifs are exhaustively enumerated as words over the IUPAC alphabet and screened for conservation using the branch length score. Additionally, a confidence score is established in a genome-wide fashion. In order to take advantage of a cloud computing infrastructure, the MapReduce programming model is adopted. The method is applied to four monocotyledon plant species and it is shown that high-scoring motifs are significantly enriched for open chromatin regions in Oryza sativa and for transcription factor binding sites inferred through protein-binding microarrays in O. sativa and Zea mays. Furthermore, the method is shown to recover experimentally profiled ga2ox1-like KN1 binding sites in Z. mays

    Islands of linkage in an ocean of pervasive recombination reveals two-speed evolution of human cytomegalovirus genomes

    Get PDF
    Human cytomegalovirus (HCMV) infects most of the population worldwide, persisting throughout the host's life in a latent state with periodic episodes of reactivation. While typically asymptomatic, HCMV can cause fatal disease among congenitally infected infants and immunocompromised patients. These clinical issues are compounded by the emergence of antiviral resistance and the absence of an effective vaccine, the development of which is likely complicated by the numerous immune evasins encoded by HCMV to counter the host's adaptive immune responses, a feature that facilitates frequent super-infections. Understanding the evolutionary dynamics of HCMV is essential for the development of effective new drugs and vaccines. By comparing viral genomes from uncultivated or low-passaged clinical samples of diverse origins, we observe evidence of frequent homologous recombination events, both recent and ancient, and no structure of HCMV genetic diversity at the whole-genome scale. Analysis of individual gene-scale loci reveals a striking dichotomy: while most of the genome is highly conserved, recombines essentially freely and has evolved under purifying selection, 21 genes display extreme diversity, structured into distinct genotypes that do not recombine with each other. Most of these hyper-variable genes encode glycoproteins involved in cell entry or escape of host immunity. Evidence that half of them have diverged through episodes of intense positive selection suggests that rapid evolution of hyper-variable loci is likely driven by interactions with host immunity. It appears that this process is enabled by recombination unlinking hyper-variable loci from strongly constrained neighboring sites. It is conceivable that viral mechanisms facilitating super-infection have evolved to promote recombination between diverged genotypes, allowing the virus to continuously diversify at key loci to escape immune detection, while maintaining a genome optimally adapted to its asymptomatic infectious lifecycle

    Finding undetected protein associations in cell signaling by belief propagation

    Full text link
    External information propagates in the cell mainly through signaling cascades and transcriptional activation, allowing it to react to a wide spectrum of environmental changes. High throughput experiments identify numerous molecular components of such cascades that may, however, interact through unknown partners. Some of them may be detected using data coming from the integration of a protein-protein interaction network and mRNA expression profiles. This inference problem can be mapped onto the problem of finding appropriate optimal connected subgraphs of a network defined by these datasets. The optimization procedure turns out to be computationally intractable in general. Here we present a new distributed algorithm for this task, inspired from statistical physics, and apply this scheme to alpha factor and drug perturbations data in yeast. We identify the role of the COS8 protein, a member of a gene family of previously unknown function, and validate the results by genetic experiments. The algorithm we present is specially suited for very large datasets, can run in parallel, and can be adapted to other problems in systems biology. On renowned benchmarks it outperforms other algorithms in the field.Comment: 6 pages, 3 figures, 1 table, Supporting Informatio

    Blueprint for a high-performance biomaterial: full-length spider dragline silk genes.

    Get PDF
    Spider dragline (major ampullate) silk outperforms virtually all other natural and manmade materials in terms of tensile strength and toughness. For this reason, the mass-production of artificial spider silks through transgenic technologies has been a major goal of biomimetics research. Although all known arthropod silk proteins are extremely large (&gt;200 kiloDaltons), recombinant spider silks have been designed from short and incomplete cDNAs, the only available sequences. Here we describe the first full-length spider silk gene sequences and their flanking regions. These genes encode the MaSp1 and MaSp2 proteins that compose the black widow's high-performance dragline silk. Each gene includes a single enormous exon (&gt;9000 base pairs) that translates into a highly repetitive polypeptide. Patterns of variation among sequence repeats at the amino acid and nucleotide levels indicate that the interaction of selection, intergenic recombination, and intragenic recombination governs the evolution of these highly unusual, modular proteins. Phylogenetic footprinting revealed putative regulatory elements in non-coding flanking sequences. Conservation of both upstream and downstream flanking sequences was especially striking between the two paralogous black widow major ampullate silk genes. Because these genes are co-expressed within the same silk gland, there may have been selection for similarity in regulatory regions. Our new data provide complete templates for synthesis of recombinant silk proteins that significantly improve the degree to which artificial silks mimic natural spider dragline fibers

    High-throughput typing of Staphylococcus aureus by amplified fragment length polymorphism (AFLP) or multi-locus variable number of tandem repeat analysis (MLVA) reveals consistent strain relatedness

    Get PDF
    This study investigates aspects of the general assumption that, in bacteria, genetic variation in functionally-constrained genomic regions accumulates at a lower rate than in regions of hypermutability such as DNA repeat loci. We compared whole genome polymorphism (using high-throughput amplified fragment length polymorphism [ht-AFLP]) as well as short sequence repeat length variation (using multi-locus variable number of tandem repeat analysis [MLVA]) for 994 Staphylococcus aureus strains isolated from both healthy carriers and invasive infections. MLVA and ht-AFLP minimum spanning trees (MSTs) were similar in their identification of totally different types of genetic variants. This suggests that, despite the enhanced inherent variability of repeats, clusters of strains remain traceable. Finally, no specific molecular marker of epidemicity or virulence was identified in this large strain collection by the MLVA approach. We demonstrate that there is a difference in the rates of cross-genome mutation versus regional repeat variability in the clonal bacterial pathogen S. aureus. Despite these dynamic differences, a conservation of type assignments as based upon these two inherently different typing techniques was observed
    corecore