997 research outputs found

    Alternative Splicing Regulation During C. elegans Development: Splicing Factors as Regulated Targets

    Get PDF
    Alternative splicing generates protein diversity and allows for post-transcriptional gene regulation. Estimates suggest that 10% of the genes in Caenorhabditis elegans undergo alternative splicing. We constructed a splicing-sensitive microarray to detect alternative splicing for 352 cassette exons and tested for changes in alternative splicing of these genes during development. We found that the microarray data predicted that 62/352 (∼18%) of the alternative splicing events studied show a strong change in the relative levels of the spliced isoforms (>4-fold) during development. Confirmation of the microarray data by RT-PCR was obtained for 70% of randomly selected genes tested. Among the genes with the most developmentally regulated alternatively splicing was the hnRNP F/H splicing factor homolog, W02D3.11 – now named hrpf-1. For the cassette exon of hrpf-1, the inclusion isoform comprises 65% of hrpf-1 steady state messages in embryos but only 0.1% in the first larval stage. This dramatic change in the alternative splicing of an alternative splicing factor suggests a complex cascade of splicing regulation during development. We analyzed splicing in embryos from a strain with a mutation in the splicing factor sym-2, another hnRNP F/H homolog. We found that approximately half of the genes with large alternative splicing changes between the embryo and L1 stages are regulated by sym-2 in embryos. An analysis of the role of nonsense-mediated decay in regulating steady-state alternative mRNA isoforms was performed. We found that 8% of the 352 events studied have alternative isoforms whose relative steady-state levels in embryos change more than 4-fold in a nonsense-mediated decay mutant, including hrpf-1. Strikingly, 53% of these alternative splicing events that are affected by NMD in our experiment are not obvious substrates for NMD based on the presence of premature termination codons. This suggests that the targeting of splicing factors by NMD may have downstream effects on alternative splicing regulation

    A primer on molecular biology

    No full text
    Modern molecular biology provides a rich source of challenging machine learning problems. This tutorial chapter aims to provide the necessary biological background knowledge required to communicate with biologists and to understand and properly formalize a number of most interesting problems in this application domain. The largest part of the chapter (its first section) is devoted to the cell as the basic unit of life. Four aspects of cells are reviewed in sequence: (1) the molecules that cells make use of (above all, proteins, RNA, and DNA); (2) the spatial organization of cells (``compartmentalization''); (3) the way cells produce proteins (``protein expression''); and (4) cellular communication and evolution (of cells and organisms). In the second section, an overview is provided of the most frequent measurement technologies, data types, and data sources. Finally, important open problems in the analysis of these data (bioinformatics challenges) are briefly outlined

    Developing variant interpretation pipelines for inherited retinal diseases and ciliopathies: using medical genomics to improve diagnostic yield

    Get PDF
    Primary ciliopathies are a group of rare inherited disorders caused by defects in the structure or function of primary cilia (the ‘cell’s antenna’). This thesis describes approaches to improve molecular diagnosis rates for primary ciliopathy patients over the ~40-80% currently achieved, through whole genome sequencing (WGS) analysis and functional variant interpretation. Firstly, I analysed WGS data from the 100,000 Genomes Project (100K) for participants who were clinically suspected to have primary ciliopathies. I identified a molecular diagnosis rate for n=45/83 (54.2%), providing a 21.7% diagnostic uplift compared to results previously reported by Genomics England (GEL). I then performed a reverse phenotyping study, starting by looking for pathogenic variants in nine multisystemic ciliopathy disease genes across the 100K rare disease dataset. This was linked back to available clinical data, aiming to identify participants with “hidden” ciliopathy diagnoses recruited to alternative categories. I identified 18 new, reportable diagnoses and 44 previously reported by GEL. I also found 11 un-reportable molecular diagnoses, lacking key clinical features to provide a confident fit for phenotype. This shows that the quality of entered phenotypic data is critical to allow accurate genotype-phenotype correlation. In a third study, I developed strategies for functional interpretation of eight TMEM67 missense variants of uncertain significance (VUSs) with collaborators in Ireland, using CRISPR/Cas9 gene editing in a human ciliated cell-line (RPE-1) and C. elegans. These assays provided interpretation of three VUS as benign and five as pathogenic. The two 100K studies show that diagnosis rates for ciliopathies can be improved through WGS analysis, especially structural and splice variant analysis. We are a long way from delivering a high-throughput system for VUS interpretation that could provide clinical utility in the diagnostic setting. Overall, we have provided benefit for ciliopathy patients through additional molecular diagnoses, accompanied by transferable skills applicable to wider patient group

    Assessing the impact of alternative splicing on the diversity and evolution of the proteome in plants

    Get PDF
    Splicing is one of the key processing steps during the maturation of a gene’s primary transcript into the mRNA molecule used as a template for protein production. Splicing involves the removal of segments called introns and re-joining of the remaining segments called exons. It is by now well established that not always the same segments are removed from a gene’s primary transcript during the splicing process. The consequence of this splicing variation, termed Alternative Splicing (AS), is that multiple distinct mature mRNA molecules can be produced from a single gene. One of the two biological roles that are ascribed to AS is that of a mechanism which enables an organism to produce multiple functionally distinct proteins from a single gene. Alternatively, AS can serve as a means for controlling gene expression at the post-transcriptional level. Although many clear examples have been reported for both roles, the extent to which AS increases the functional diversity of the proteome, regulates gene expression or simply reflects noise in splicing machinery is not well known. Determining the full functional impact of AS by designing and performing wet-lab experiments for all AS events is unfeasible and bioinformatics approaches have therefore widely been used for studying the impact of AS at a genome-wide scale. In this thesis four bioinformatics studies are presented that were aimed at determining the extent to which AS is used in plants as a mechanism for producing multiple distinct functional proteins from a single gene. Each chapter uses a different method for analyzing specific properties of AS. Under the premise that functional genetic features are more likely to be conserved than non-functional ones, AS events that are present in two or more species are more likely to be biologically relevant than those that are confined to a single species. In chapter 2 we analyzed the conservation of AS by performing a comparative analysis between three divergent plant species. The results of that study indicated that the vast majority of AS events does not persist over long periods of evolution. We concluded, based on this lack of conservation, that AS only has a limited impact on the functional diversity of the proteome in plants. Following this conclusion, it can hypothesized that the variation that AS induces at the transcriptome level is not likely to be manifested at the protein level. In chapter 3 we tested this hypothesis by analyzing two independent proteomics datasets. This type of data can be used to directly identify proteins present in a biological sample. Our results indicated that the variation induced by AS at the transcriptome level is also manifested at the protein level. We concluded that either many AS events have a confined species-specific (not conserved) function or simply produce protein variants that are stable enough to escape rapid turn-over. Another method for determining whether AS increases the functional diversity of the proteome is by determining whether protein sequence variations that are typically induced by AS are common within the plant kingdom. We found (chapter 4) that this is not the case in plants and concluded that novel functions do not frequently arise through AS. We also found that most of the AS-induced variation is lost, similarly as for redundant gene copies, within a very short evolutionary time period. One limitation of genome-wide analyses is that these capture only the more general patterns. However, the functional impact of AS can be very different in different genes or gene-families. In order fully assess the functional impact of AS, it is therefore important to also study the process within the functional context of individual genes or gene families. In chapter 5 we demonstrated this concept by performing a detailed analysis of AS within the MADS-box gene family. We were able to provide clues as to how AS might impact the protein-protein interaction capabilities of individual MADS proteins. Some of our predictions were supported by experimental evidence. We further showed how AS can serve as an evolutionary mechanism for experimenting with novel functions (novel interactions) without the explicit loss of existing functions. The overall conclusion, based on the performed analyses is as follows: AS primarily is a consequence of noise in the splicing machinery and results in an increased diversity of the proteome. However, only a small fraction of the proteins resulting from AS will have beneficial functions and are subsequently selected for during evolution. The large remaining fraction is, similarly as for redundant gene-copies, lost within a very short evolutionary time period after its emergence. </p
    corecore