9 research outputs found

    A new approach to bias correction in RNA-Seq

    Get PDF
    Motivation: Quantification of sequence abundance in RNA-Seq experiments is often conflated by protocol-specific sequence bias. The exact sources of the bias are unknown, but may be influenced by polymerase chain reaction amplification, or differing primer affinities and mixtures, for example. The result is decreased accuracy in many applications, such as de novo gene annotation and transcript quantification

    Anti-bias training for (sc)RNA-seq : experimental and computational approaches to improve precision

    Get PDF
    RNA-seq, including single cell RNA-seq (scRNA-seq), is plagued by insufficient sensitivity and lack of precision. As a result, the full potential of (sc)RNA-seq is limited. Major factors in this respect are the presence of global bias in most datasets, which affects detection and quantitation of RNA in a length-dependent fashion. In particular, scRNA-seq is affected by technical noise and a high rate of dropouts, where the vast majority of original transcripts is not converted into sequencing reads. We discuss these biases origins and implications, bioinformatics approaches to correct for them, and how biases can be exploited to infer characteristics of the sample preparation process, which in turn can be used to improve library preparation

    EVOLUTION AND DYNAMICS OF TRANSCRIPTIONAL REGULATION IN BACTERIA

    Get PDF
    Although transcription is one of the most important biological functions of cells, our understanding of its regulation is still limited. In this dissertation, we have studied the transcriptional regulation in prokaryotes in three aspects. First, we investigated the extent to which cis-regulatory elements are conserved during the course of evolution using the LexA regulons in cyanobacteria as an example. We found that in most cyanobacterial genomes analyzed, LexA appears to function as the transcriptional regulator of the key SOS response genes. The loss of lexA in some genomes might lead to the degradation of its binding sites. Second, directional RNA-seq techniques have recently become the workhorse for transcriptome profiling in prokaryotes, however, it is a challenging task to accurately assemble highly labile prokaryotic transcriptomes for further analyses. To fill this gap, we have developed a hidden Markov model based transcriptome assembler which outperforms the state-of-the-art assemblers. Using our tool, we characterized alternative operon structures in E. coli K12 under various growth conditions and growth phases, and found that they are more complex and dynamic than previously anticipated. Lastly, we determined anti-sense and non-coding transcription patterns in E. coli K12 under various growth conditions and time points. We found that a large portion of genes have antisense transcription in a condition-dependent manner. Most antisense transcripts are initiated and restricted to the 5?-end of the gene on the sense strand, and their expression levels are correlated with those of the genes on the sense strand, suggesting that these antisense transcripts might play an important role in transcriptional regulation

    Transcript assembly, quantification and differential alternative splicing detection from RNA-Seq

    Get PDF
    This dissertation is focused on improving RNA-Seq processing in terms of transcript assembly, transcript quantification and detection of differential alternative splicing. There are two major challenges of solving these three problems. The first is accurately deriving transcript-level expression values from RNA-Seq reads that often align ambiguously to a set of overlapping isoforms. To make matter worse, gene annotation tends to misguide transcript quantification as new transcripts are often discovered in new RNA-Seq experiments. The second challenge is accounting for intrinsic uncertainties or variabilities in RNA-Seq measurement when calling differential alternative splicing from multiple samples across two conditions. Those uncertainties include coverage bias and biological variations. Failing to account for these variabilities can lead to higher false positive rates. To addressed these challenges, I develop a series of novel algorithms which are implemented in a software package called Strawberry. To tackle the read assignment uncertainty challenge, Strawberry assembles aligned RNA-Seq reads into transcripts using a constrained flow network algorithm. After the assembly, Strawberry uses a latent class model to assign reads to transcripts. These two steps use different optimization frameworks but utilize the same graph structure, which allows a highly efficient, expandable and accurate algorithm for dealing large data. To infer differential alternative splicing, Strawberry extends the single sample quantification model by imposing a generalized linear model on the relative transcript proportions. To account for count overdispersion, Strawberry uses an empirical Bayesian hierarchical model. For coverage bias, Strawberry performs a bias correction step which borrows information across samples and genes before fitting the differential analysis model. A serious of simulated and real data are used to evaluate and benchmark Strawberry\u27s result. Strawberry outperforms Cufflinks and StringTie in terms of both assembly and quantification accuracies. In terms of detecting differential alternative splicing, Strawberry also outperforms several state-of-the-art methods including DEXSeq, Cuffdiff 2 and DSGseq. Strawberry and its supporting code, e.g., simulation and validation, are freely available at my github (\url{https://github.com/ruolin})

    A transcriptome analysis of apple (Malus x domestica Borkh.) cv ā€˜golden deliciousā€™ fruit during fruit growth and development

    Get PDF
    Philosophiae Doctor - PhDThe growth and development of apple (Malus x domestica Borkh.) fruit occurs over a period of about 150 days after anthesis to full ripeness. During this period morphological and physiological changes occur defining fruit quality. These changes are a result of spatial and temporal patterns of gene expression during fruit development as regulated by environmental, genetic and environmental-by-genetic factors. A number of previous studies partially characterised the transcriptomes of apple leaf, fruit pulp, whole fruit, and peel plus pulp tissues, using cDNA micro arrays and other PCR based technologies. These studies, however, remain limited in throughput and specificity for transcripts of low abundance. Hence, the aim of this project was to apply a high throughput technique to characterise the full mRNA transcriptome of the ā€˜Golden Deliciousā€™ fruit peels and pulp tissues in order to understand the molecular mechanisms underlying the morphophysiological changes that occur during fruit development

    Development of molecular tools to enhance understanding of antiviral RNAi in mosquitoes

    Get PDF
    Mosquito-borne arboviruses are a considerable threat to human and animal health across the world. Many of them are classed as emerging or remerging pathogens and the incidence of disease for a number of serious viral infections has increased as they expand their geographical and host ranges. As with other invertebrates, mosquitoes lack the adaptive immune response present in vertebrates and instead rely on their innate immune defences to modulate viral infections. Nevertheless, in contrast to vertebrates, arboviral infections in their arthropod vector are non-pathogenic and have no cytopathic effect or detrimental impact on their survival. The response considered to be the most important for antiviral defence in mosquitoes is RNA interference (RNAi) which is a sequence-specific, RNA silencing mechanism. Most of what is known about antiviral RNAi in arthropods has been established in Drosophila as the model insect organism. These studies have benefited from an extensive range of genetic mutants, molecular tools, reporter assays and genetic profiling. The absence of these tools for use in mosquito research is a substantial deficit for arboviral studies in their natural vector system and must be rectified in order to fully understand the influence vector immunity has on virus transmission. This thesis discusses the development of a ā€˜molecular tool-boxā€™ for advancing the acquisition of knowledge in this area. Efficient RNAi gene silencing and its effect on the antiviral RNAi response was established in vitro using Semliki Forest virus (SFV) as model arbovirus. This assay determined that knock-down of Argonaute-2 had the most substantial impact on virus replication compared to the knockdown of other RNAi proteins. In addition, the limited detection of virus-derived small RNAs, key molecules of the antiviral RNAi response by Northern blot analysis provides further support to previous evidence that SFV may circumvent the antiviral response by sequestering its genomic RNA, resulting in restricted access by the RNAi machinery and preventing the generation of large quantities of virus-derived small RNAs. However, some SFV-derived small RNAs are known to be produced and these have been shown to generate a pattern of ā€˜hotā€™ and ā€˜coldā€™ spots along the full-length coding sequences. This thesis has determined that this pattern is not exclusive to viral-derived dsRNA trigger molecules but is also exhibited following the treatment of mosquito cells in culture with non-viral dsRNA. This implies that all exogenous dsRNA is processed by RNAi in a similar manner. This study has also characterised the presence of an RNA-dependent RNA polymerase (RdRP) encoded by Aedes aegypti mosquitoes. RdRPs are important for the amplification and spread of the RNAi signal in other organisms such as plants and worms; however, only one study suggested the existence of one in Drosophila. Although, this project proposed the presence and transcription of a homologue of the Drosophila RdRP in the Aedes aegypti-derived Aag2 cell line, protein knockdown assays revealed that it has no effect on virus replication in vitro; suggesting that it does not function as an RdRP. Due to the lack of antibodies against the major RNAi proteins Dicer-1, Dicer-2, Argonaute-1 and Argonaute-2 in mosquitoes, these were designed and screened which allowed the identification of several candidates for the detection of the proteins in mosquito cells in culture. Further to this, recombinant forms of the RNAi initiator protein Dicer-2 and the slicer protein Argonaute-2 were successfully generated and tested in vitro using different promoters to establish their use for future temporal and spatial kinetic studies. It was concluded that of the promoters tested the most successful for the expression of these reporter constructs was the subgenomic promoter of SFV. On the other hand a second promoter, the PUb promoter, may prove more suitable in the future. Finally, this project studied the antiviral capabilities of a non-haematophagous mosquito cell line which would not come across an arboviral infection by traditional blood- feeding routes. Instead the mosquito larvae sustain their adult life stages by feeding on the larvae of other species which may be vertically infected. A cell line derived from Toxorhynchites amboinensis was characterised and was shown to carry out RNAi if induced by dsRNA suggesting that they are able to mount an antiviral response to acquired infections. This study also determined that the cell line contains an endogenous insect specific virus and, although the source of this is unknown, it adds an interesting new dimension to mosquito antiviral immunity. This thesis enhances RNAi research in Aedes mosquitoes by presenting novel molecular tools and reporter assays which will be highly valuable for facilitating future investigations. The studies performed also add to what is already understood regarding the interaction between SFV and mosquito antiviral immunity through the RNAi response
    corecore