35 research outputs found

    Chapter 3-Figure S1

    No full text
    <p>Positions of segments for the 15-class model of D. melanogaster versus D. simulans alignment.</p

    Chapter 3-File S2

    No full text
    <p>Positions of segments for the 16-class model of D. melanogaster versus D. yakuba alignment.</p

    Investigating genomic structure using changept: A Bayesian segmentation model

    Get PDF
    Genomes are composed of a wide variety of elements with distinct roles and characteristics. Some of these elements are well-characterised functional components such as protein-coding exons. Other elements play regulatory or structural roles, encode functional non-protein-coding RNAs, or perform some other function yet to be characterised. Still others may have no functional importance, though they may nevertheless be of interest to biologists. One technique for investigating the composition of genomes is to segment sequences into compositionally homogenous blocks. This technique, known as ‘sequence segmentation’ or ‘change-point analysis’, is used to identify patterns of variation across genomes such as GC-rich and GC-poor regions, coding and non-coding regions, slowly evolving and rapidly evolving regions and many other types of variation. In this mini-review we outline many of the genome segmentation methods currently available and then focus on a Bayesian DNA segmentation algorithm, with examples of its various applications

    Chapter 4- Supplemental Table 6

    No full text
    <p>RNA-seq reads mapping to PFEs and controls identified in pathway-focussed analysis</p

    Chapter 4- Supplemental File 1

    No full text
    <p>Genome-wide intronic PFEs</p

    Drosophila 3' UTRs are more complex than protein-coding sequences

    Get PDF
    The 3′ UTRs of eukaryotic genes participate in a variety of post-transcriptional (and some transcriptional) regulatory interactions. Some of these interactions are well characterised, but an undetermined number remain to be discovered. While some regulatory sequences in 3′ UTRs may be conserved over long evolutionary time scales, others may have only ephemeral functional significance as regulatory profiles respond to changing selective pressures. Here we propose a sensitive segmentation methodology for investigating patterns of composition and conservation in 3′ UTRs based on comparison of closely related species. We describe encodings of pairwise and three-way alignments integrating information about conservation, GC content and transition/transversion ratios and apply the method to three closely related Drosophila species: D. melanogaster, D. simulans and D. yakuba. Incorporating multiple data types greatly increased the number of segment classes identified compared to similar methods based on conservation or GC content alone. We propose that the number of segments and number of types of segment identified by the method can be used as proxies for functional complexity. Our main finding is that the number of segments and segment classes identified in 3′ UTRs is greater than in the same length of protein-coding sequence, suggesting greater functional complexity in 3′ UTRs. There is thus a need for sustained and extensive efforts by bioinformaticians to delineate functional elements in this important genomic fraction. C code, data and results are available upon request

    Bayesian methods for identifying non-protein coding genomic regions contributing to diseases

    No full text
    Identifying and discerning the function of non-coding RNAs (ncRNAs) is an important goal of genetic research. Much evidence suggests that ncRNAs play an important role in the aetiology of many complex genetic diseases. Therefore the task of developing methods to identify these elements in genomes has become increasingly urgent. In this research my focus was to use a Bayesian approach to identify putative functional non-coding genomic sequences contributing to various diseases. The analysis was mainly carried out using a Bayesian segmentation model, implemented in the software package changept, designed to segment discrete genomic data. In the first phase of the research, I developed methods to expand the capabilities of changept. One simple but powerful innovation was to develop several ways of encoding an alignment of sequences using a D-character representation (D is a positive integer). This enables sequence alignments to be segmented based on multiple data types: specifically conservation, GC content and transition/transversion ratio and significantly generalizes the capacity of changept, which previously could only segment on the basis of one of these characteristics at a time. Incorporating multiple data types greatly helped to clearly identify complex segmentation patterns and functional signatures among species, especially between closely related species. A second methodological innovation was a new model selection procedure to decide the optimal model for the data. A third, and most important, methodological innovation was to build a process for systematically discovering genome- wide putative ncRNAs, including data selection, cleaning, encoding, analysis and post-processing. To validate these findings, both experimental methods and currently available bioinfomatics resources were used. In the second phase of the research, my focus turned to application of changept, and the new methods developed, to identify genome-wide putative non-coding elements that may be associated with diseases. I was able to discover more than a thousand highly conserved non-coding sequences in human, mouse and zebrafish genomes. A complementary analysis focused on a set of genes involved in muscle development. Some of these elements identified may contribute to muscle diseases. Discovery of putative small ncRNAs in the bacterium Wolbachia pipientis is another successful application of the new methods; this work was undertaken as part of the eradicate dengue project. Application to malaria genomes revealed genetic mechanisms important in infecting multiple hosts. I also identified putative regulatory sequences in 3' UTRs in 3 closely related Drosophila species. Although this work focussed on Drosophila rather than human diseases, mutations in 3' UTRs have been shown to play a crucial role in human health and diseases

    eya1 class 2 wig file

    No full text
    wig file for class 2 in eya1 alignment, as discussed in pape

    myf6 manual alignment

    No full text
    An alignment of human, mouse and zebrafish DNA sequences of myf6, with manual intervention to ensure exons are aligned.<br
    corecore