764 research outputs found

    The evolution of Dscam genes across the arthropods

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>One way of creating phenotypic diversity is through alternative splicing of precursor mRNAs. A gene that has evolved a hypervariable form is <it>Down syndrome cell adhesion molecule </it>(<it>Dscam-hv</it>), which in <it>Drosophila melanogaster </it>can produce thousands of isoforms via mutually exclusive alternative splicing. The extracellular region of this protein is encoded by three variable exon clusters, each containing multiple exon variants. The protein is vital for neuronal wiring where the extreme variability at the somatic level is required for axonal guidance, and it plays a role in immunity where the variability has been hypothesised to relate to recognition of different antigens. <it>Dscam-hv </it>has been found across the Pancrustacea. Additionally, three paralogous non-hypervariable <it>Dscam-like </it>genes have also been described for <it>D. melanogaster</it>. Here we took a bioinformatics approach, building profile Hidden Markov Models to search across species for putative orthologs to the <it>Dscam </it>genes and for hypervariable alternatively spliced exons, and inferring the phylogenetic relationships among them. Our aims were to examine whether <it>Dscam </it>orthologs exist outside the Bilateria, whether the origin of <it>Dscam-hv </it>could lie outside the Pancrustacea, when the <it>Dscam-like </it>orthologs arose, how many alternatively spliced exons of each exon cluster were present in the most common recent ancestor, and how these clusters evolved.</p> <p>Results</p> <p>Our results suggest that the origin of <it>Dscam </it>genes may lie after the split between the Cnidaria and the Bilateria and supports the hypothesis that <it>Dscam-hv </it>originated in the common ancestor of the Pancrustacea. Our phylogeny of <it>Dscam </it>gene family members shows six well-supported clades: five containing <it>Dscam-like </it>genes and one containing all the <it>Dscam-hv </it>genes, a seventh clade contains arachnid putative <it>Dscam </it>genes. Furthermore, the exon clusters appear to have experienced different evolutionary histories.</p> <p>Conclusions</p> <p><it>Dscam </it>genes have undergone independent duplication events in the insects and in an arachnid genome, which adds to the more well-known tandem duplications that have taken place within <it>Dscam-hv </it>genes. Therefore, two forms of gene expansion seem to be active within this gene family. The evolutionary history of this dynamic gene family will be further unfolded as genomes of species from more disparate groups become available.</p

    Assessing long-distance RNA sequence connectivity via RNA-templated DNA-DNA ligation

    Get PDF
    Many RNAs, including pre-mRNAs and long non-coding RNAs, can be thousands of nucleotides long and undergo complex post-transcriptional processing. Multiple sites of alternative splicing within a single gene exponentially increase the number of possible spliced isoforms, with most human genes currently estimated to express at least ten. To understand the mechanisms underlying these complex isoform expression patterns, methods are needed that faithfully maintain long-range exon connectivity information in individual RNA molecules. In this study, we describe SeqZip, a methodology that uses RNA-templated DNA-DNA ligation to retain and compress connectivity between distant sequences within single RNA molecules. Using this assay, we test proposed coordination between distant sites of alternative exon utilization in mouse Fn1, and we characterize the extraordinary exon diversity of Drosophila melanogaster Dscam1

    Understanding the functionality of transcript diversity

    Get PDF
    Recent years have seen a huge increase in the amount of genomic DNA being sequenced from a wide variety of organisms, giving us an unprecedented insight into the molecular diversity seen in nature. As a result a host of methods have been developed, both experimental and computational, to understand the functional significance of such diversity and how it relates to organismal and environmental complexity. In this thesis I use comparative approaches to explore two areas of molecular biology where there is evidence for large amounts of transcript diversity. Firstly, I explore the unprecedented view of microbial sequence diversity offered by metagenomic sequencing projects, using sequence similarity and adapted genomic context methods to quantify the amount of functional novelty in these samples. Secondly, I look at the transcript diversity generated by alternative splicing. I develop methods to detect and visualise alternative splicing events and apply these to the detection of conserved alternative splicing events

    Transcript assembly, quantification and differential alternative splicing detection from RNA-Seq

    Get PDF
    This dissertation is focused on improving RNA-Seq processing in terms of transcript assembly, transcript quantification and detection of differential alternative splicing. There are two major challenges of solving these three problems. The first is accurately deriving transcript-level expression values from RNA-Seq reads that often align ambiguously to a set of overlapping isoforms. To make matter worse, gene annotation tends to misguide transcript quantification as new transcripts are often discovered in new RNA-Seq experiments. The second challenge is accounting for intrinsic uncertainties or variabilities in RNA-Seq measurement when calling differential alternative splicing from multiple samples across two conditions. Those uncertainties include coverage bias and biological variations. Failing to account for these variabilities can lead to higher false positive rates. To addressed these challenges, I develop a series of novel algorithms which are implemented in a software package called Strawberry. To tackle the read assignment uncertainty challenge, Strawberry assembles aligned RNA-Seq reads into transcripts using a constrained flow network algorithm. After the assembly, Strawberry uses a latent class model to assign reads to transcripts. These two steps use different optimization frameworks but utilize the same graph structure, which allows a highly efficient, expandable and accurate algorithm for dealing large data. To infer differential alternative splicing, Strawberry extends the single sample quantification model by imposing a generalized linear model on the relative transcript proportions. To account for count overdispersion, Strawberry uses an empirical Bayesian hierarchical model. For coverage bias, Strawberry performs a bias correction step which borrows information across samples and genes before fitting the differential analysis model. A serious of simulated and real data are used to evaluate and benchmark Strawberry\u27s result. Strawberry outperforms Cufflinks and StringTie in terms of both assembly and quantification accuracies. In terms of detecting differential alternative splicing, Strawberry also outperforms several state-of-the-art methods including DEXSeq, Cuffdiff 2 and DSGseq. Strawberry and its supporting code, e.g., simulation and validation, are freely available at my github (\url{https://github.com/ruolin})

    Genomics and phylogeny of cytoskeletal proteins: Tools and analyses.

    Get PDF

    Predicting mutually exclusive spliced exons based on exon length, splice site and reading frame conservation, and exon sequence homology

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Alternative splicing of pre-mature RNA is an important process eukaryotes utilize to increase their repertoire of different protein products. Several types of different alternative splice forms exist including exon skipping, differential splicing of exons at their 3'- or 5'-end, intron retention, and mutually exclusive splicing. The latter term is used for clusters of internal exons that are spliced in a mutually exclusive manner.</p> <p>Results</p> <p>We have implemented an extension to the WebScipio software to search for mutually exclusive exons. Here, the search is based on the precondition that mutually exclusive exons encode regions of the same structural part of the protein product. This precondition provides restrictions to the search for candidate exons concerning their length, splice site conservation and reading frame preservation, and overall homology. Mutually exclusive exons that are not homologous and not of about the same length will not be found. Using the new algorithm, mutually exclusive exons in several example genes, a dynein heavy chain, a muscle myosin heavy chain, and Dscam were correctly identified. In addition, the algorithm was applied to the whole <it>Drosophila melanogaster </it>X chromosome and the results were compared to the Flybase annotation and an <it>ab initio </it>prediction. Clusters of mutually exclusive exons might be subsequent to each other and might encode dozens of exons.</p> <p>Conclusions</p> <p>This is the first implementation of an automatic search for mutually exclusive exons in eukaryotes. Exons are predicted and reconstructed in the same run providing the complete gene structure for the protein query of interest. WebScipio offers high quality gene structure figures with the clusters of mutually exclusive exons colour-coded, and several analysis tools for further manual inspection. The genome scale analysis of all genes of the <it>Drosophila melanogaster </it>X chromosome showed that WebScipio is able to find all but two of the 28 annotated mutually exclusive spliced exons and predicts 39 new candidate exons. Thus, WebScipio should be able to identify mutually exclusive spliced exons in any query sequence from any species with a very high probability. WebScipio is freely available to academics at <url>http://www.webscipio.org</url>.</p
    • 

    corecore