228 research outputs found

    NestedMICA: sensitive inference of over-represented motifs in nucleic acid sequence

    Get PDF
    NestedMICA is a new, scalable, pattern-discovery system for finding transcription factor binding sites and similar motifs in biological sequences. Like several previous methods, NestedMICA tackles this problem by optimizing a probabilistic mixture model to fit a set of sequences. However, the use of a newly developed inference strategy called Nested Sampling means NestedMICA is able to find optimal solutions without the need for a problematic initialization or seeding step. We investigate the performance of NestedMICA in a range scenario, on synthetic data and a well-characterized set of muscle regulatory regions, and compare it with the popular MEME program. We show that the new method is significantly more sensitive than MEME: in one case, it successfully extracted a target motif from background sequence four times longer than could be handled by the existing program. It also performs robustly on synthetic sequences containing multiple significant motifs. When tested on a real set of regulatory sequences, NestedMICA produced motifs which were good predictors for all five abundant classes of annotated binding sites

    AnnoTrack - a tracking system for genome annotation

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>As genome sequences are determined for increasing numbers of model organisms, demand has grown for better tools to facilitate unified genome annotation efforts by communities of biologists. Typically this process involves numerous experts from the field and the use of data from dispersed sources as evidence. This kind of collaborative annotation project requires specialized software solutions for efficient data tracking and processing.</p> <p>Results</p> <p>As part of the scale-up phase of the ENCODE project (Encyclopedia of DNA Elements), the aim of the GENCODE project is to produce a highly accurate evidence-based reference gene annotation for the human genome. The <it>AnnoTrack </it>software system was developed to aid this effort. It integrates data from multiple distributed sources, highlights conflicts and facilitates the quick identification, prioritisation and resolution of problems during the process of genome annotation.</p> <p>Conclusions</p> <p>AnnoTrack has been in use for the last year and has proven a very valuable tool for large-scale genome annotation. Designed to interface with standard bioinformatics components, such as DAS servers and Ensembl databases, it is easy to setup and configure for different genome projects. The source code is available at <url>http://annotrack.sanger.ac.uk</url>.</p

    Large-Scale Discovery of Promoter Motifs in Drosophila melanogaster

    Get PDF
    A key step in understanding gene regulation is to identify the repertoire of transcription factor binding motifs (TFBMs) that form the building blocks of promoters and other regulatory elements. Identifying these experimentally is very laborious, and the number of TFBMs discovered remains relatively small, especially when compared with the hundreds of transcription factor genes predicted in metazoan genomes. We have used a recently developed statistical motif discovery approach, NestedMICA, to detect candidate TFBMs from a large set of Drosophila melanogaster promoter regions. Of the 120 motifs inferred in our initial analysis, 25 were statistically significant matches to previously reported motifs, while 87 appeared to be novel. Analysis of sequence conservation and motif positioning suggested that the great majority of these discovered motifs are predictive of functional elements in the genome. Many motifs showed associations with specific patterns of gene expression in the D. melanogaster embryo, and we were able to obtain confident annotation of expression patterns for 25 of our motifs, including eight of the novel motifs. The motifs are available through Tiffin, a new database of DNA sequence motifs. We have discovered many new motifs that are overrepresented in D. melanogaster promoter regions, and offer several independent lines of evidence that these are novel TFBMs. Our motif dictionary provides a solid foundation for further investigation of regulatory elements in Drosophila, and demonstrates techniques that should be applicable in other species. We suggest that further improvements in computational motif discovery should narrow the gap between the set of known motifs and the total number of transcription factors in metazoan genomes

    Pharmacogenomic testing in paediatrics: Clinical implementation strategies

    Get PDF
    Pharmacogenomics (PGx) relates to the study of genetic factors determining variability in drug response. Implementing PGx testing in paediatric patients can enhance drug safety, helping to improve drug efficacy or reduce the risk of toxicity. Despite its clinical relevance, the implementation of PGx testing in paediatric practice to date has been variable and limited. As with most paediatric pharmacological studies, there are well-recognised barriers to obtaining high-quality PGx evidence, particularly when patient numbers may be small, and off-label or unlicensed prescribing remains widespread. Furthermore, trials enrolling small numbers of children can rarely, in isolation, provide sufficient PGx evidence to change clinical practice, so extrapolation from larger PGx studies in adult patients, where scientifically sound, is essential. This review paper discusses the relevance of PGx to paediatrics and considers implementation strategies from a child health perspective. Examples are provided from Canada, the Netherlands and the UK, with consideration of the different healthcare systems and their distinct approaches to implementation, followed by future recommendations based on these cumulative experiences. Improving the evidence base demonstrating the clinical utility and cost-effectiveness of paediatric PGx testing will be critical to drive implementation forwards. International, interdisciplinary collaborations will enhance paediatric data collation, interpretation and evidence curation, while also supporting dedicated paediatric PGx educational initiatives. PGx consortia and paediatric clinical research networks will continue to play a central role in the streamlined development of effective PGx implementation strategies to help optimise paediatric pharmacotherapy

    Expanding the Landscape of Chromatin Modification (CM)-Related Functional Domains and Genes in Human

    Get PDF
    Chromatin modification (CM) plays a key role in regulating transcription, DNA replication, repair and recombination. However, our knowledge of these processes in humans remains very limited. Here we use computational approaches to study proteins and functional domains involved in CM in humans. We analyze the abundance and the pair-wise domain-domain co-occurrences of 25 well-documented CM domains in 5 model organisms: yeast, worm, fly, mouse and human. Results show that domains involved in histone methylation, DNA methylation, and histone variants are remarkably expanded in metazoan, reflecting the increased demand for cell type-specific gene regulation. We find that CM domains tend to co-occur with a limited number of partner domains and are hence not promiscuous. This property is exploited to identify 47 potentially novel CM domains, including 24 DNA-binding domains, whose role in CM has received little attention so far. Lastly, we use a consensus Machine Learning approach to predict 379 novel CM genes (coding for 329 proteins) in humans based on domain compositions. Several of these predictions are supported by very recent experimental studies and others are slated for experimental verification. Identification of novel CM genes and domains in humans will aid our understanding of fundamental epigenetic processes that are important for stem cell differentiation and cancer biology. Information on all the candidate CM domains and genes reported here is publicly available

    An interactive genome browser of association results from the UK10K cohorts project.

    Get PDF
    UNLABELLED: High-throughput sequencing technologies survey genetic variation at genome scale and are increasingly used to study the contribution of rare and low-frequency genetic variants to human traits. As part of the Cohorts arm of the UK10K project, genetic variants called from low-read depth (average 7×) whole genome sequencing of 3621 cohort individuals were analysed for statistical associations with 64 different phenotypic traits of biomedical importance. Here, we describe a novel genome browser based on the Biodalliance platform developed to provide interactive access to the association results of the project. AVAILABILITY AND IMPLEMENTATION: The browser is available at http://www.uk10k.org/dalliance.html. Source code for the Biodalliance platform is available under a BSD license from http://github.com/dasmoth/dalliance, and for the LD-display plugin and backend from http://github.com/dasmoth/ldserv

    Assessment of transcript reconstruction methods for RNA-seq

    Full text link
    We evaluated 25 protocol variants of 14 independent computational methods for exon identification, transcript reconstruction and expression-level quantification from RNA-seq data. Our results show that most algorithms are able to identify discrete transcript components with high success rates but that assembly of complete isoform structures poses a major challenge even when all constituent elements are identified. Expression-level estimates also varied widely across methods, even when based on similar transcript models. Consequently, the complexity of higher eukaryotic genomes imposes severe limitations on transcript recall and splice product discrimination that are likely to remain limiting factors for the analysis of current-generation RNA-seq data
    corecore