118 research outputs found

    NestedMICA: sensitive inference of over-represented motifs in nucleic acid sequence

    Get PDF
    NestedMICA is a new, scalable, pattern-discovery system for finding transcription factor binding sites and similar motifs in biological sequences. Like several previous methods, NestedMICA tackles this problem by optimizing a probabilistic mixture model to fit a set of sequences. However, the use of a newly developed inference strategy called Nested Sampling means NestedMICA is able to find optimal solutions without the need for a problematic initialization or seeding step. We investigate the performance of NestedMICA in a range scenario, on synthetic data and a well-characterized set of muscle regulatory regions, and compare it with the popular MEME program. We show that the new method is significantly more sensitive than MEME: in one case, it successfully extracted a target motif from background sequence four times longer than could be handled by the existing program. It also performs robustly on synthetic sequences containing multiple significant motifs. When tested on a real set of regulatory sequences, NestedMICA produced motifs which were good predictors for all five abundant classes of annotated binding sites

    AnnoTrack - a tracking system for genome annotation

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>As genome sequences are determined for increasing numbers of model organisms, demand has grown for better tools to facilitate unified genome annotation efforts by communities of biologists. Typically this process involves numerous experts from the field and the use of data from dispersed sources as evidence. This kind of collaborative annotation project requires specialized software solutions for efficient data tracking and processing.</p> <p>Results</p> <p>As part of the scale-up phase of the ENCODE project (Encyclopedia of DNA Elements), the aim of the GENCODE project is to produce a highly accurate evidence-based reference gene annotation for the human genome. The <it>AnnoTrack </it>software system was developed to aid this effort. It integrates data from multiple distributed sources, highlights conflicts and facilitates the quick identification, prioritisation and resolution of problems during the process of genome annotation.</p> <p>Conclusions</p> <p>AnnoTrack has been in use for the last year and has proven a very valuable tool for large-scale genome annotation. Designed to interface with standard bioinformatics components, such as DAS servers and Ensembl databases, it is easy to setup and configure for different genome projects. The source code is available at <url>http://annotrack.sanger.ac.uk</url>.</p

    Large-Scale Discovery of Promoter Motifs in Drosophila melanogaster

    Get PDF
    A key step in understanding gene regulation is to identify the repertoire of transcription factor binding motifs (TFBMs) that form the building blocks of promoters and other regulatory elements. Identifying these experimentally is very laborious, and the number of TFBMs discovered remains relatively small, especially when compared with the hundreds of transcription factor genes predicted in metazoan genomes. We have used a recently developed statistical motif discovery approach, NestedMICA, to detect candidate TFBMs from a large set of Drosophila melanogaster promoter regions. Of the 120 motifs inferred in our initial analysis, 25 were statistically significant matches to previously reported motifs, while 87 appeared to be novel. Analysis of sequence conservation and motif positioning suggested that the great majority of these discovered motifs are predictive of functional elements in the genome. Many motifs showed associations with specific patterns of gene expression in the D. melanogaster embryo, and we were able to obtain confident annotation of expression patterns for 25 of our motifs, including eight of the novel motifs. The motifs are available through Tiffin, a new database of DNA sequence motifs. We have discovered many new motifs that are overrepresented in D. melanogaster promoter regions, and offer several independent lines of evidence that these are novel TFBMs. Our motif dictionary provides a solid foundation for further investigation of regulatory elements in Drosophila, and demonstrates techniques that should be applicable in other species. We suggest that further improvements in computational motif discovery should narrow the gap between the set of known motifs and the total number of transcription factors in metazoan genomes

    SISYPHUS—structural alignments for proteins with non-trivial relationships

    Get PDF
    With the increasing amount of structural data, the number of homologous protein structures bearing topological irregularities is steadily growing. These include proteins with circular permutations, segment-swapping, context-dependent folding or chameleon sequences that can adopt alternative secondary structures. Their non-trivial structural relationships are readily identified during expert analysis but their automatic identification using the existing computational tools still remains difficult or impossible. Such non-trivial cases of protein relationships are known to pose a problem to multiple alignment algorithms and to impede comparative modeling studies. They support a new emerging concept of evolutionary changeable protein fold, which creates practical difficulties for the hierarchical classifications of protein structures.To facilitate the understanding of, and to provide a comprehensive annotation of proteins with such non-trivial structural relationships we have created SISYPHUS ([Σισυϕος]—in Greek crafty), a compendium to the SCOP database. The SISYPHUS database contains a collection of manually curated structural alignments and their inter-relationships. The multiple alignments are constructed for protein structural regions that range from oligomeric biological units, or individual domains to fragments of different size. The SISYPHUS multiple alignments are displayed with SPICE, a browser that provides an integrated view of protein sequences, structures and their annotations. The database is available from

    Integrating sequence and structural biology with DAS.

    Get PDF
    BACKGROUND: The Distributed Annotation System (DAS) is a network protocol for exchanging biological data. It is frequently used to share annotations of genomes and protein sequence. RESULTS: Here we present several extensions to the current DAS 1.5 protocol. These provide new commands to share alignments, three dimensional molecular structure data, add the possibility for registration and discovery of DAS servers, and provide a convention how to provide different types of data plots. We present examples of web sites and applications that use the new extensions. We operate a public registry of DAS sources, which now includes entries for more than 250 distinct sources. CONCLUSION: Our DAS extensions are essential for the management of the growing number of services and exchange of diverse biological data sets. In addition the extensions allow new types of applications to be developed and scientific questions to be addressed. The registry of DAS sources is available at http://www.dasregistry.org.RIGHTS : This article is licensed under the BioMed Central licence at http://www.biomedcentral.com/about/license which is similar to the 'Creative Commons Attribution Licence'. In brief you may : copy, distribute, and display the work; make derivative works; or make commercial use of the work - under the following conditions: the original author must be given credit; for any reuse or distribution, it must be made clear to others what the license terms of this work are

    A quantum Monte Carlo study of the one-dimensional ionic Hubbard model

    Full text link
    Quantum Monte Carlo methods are used to study a quantum phase transition in a 1D Hubbard model with a staggered ionic potential (D). Using recently formulated methods, the electronic polarization and localization are determined directly from the correlated ground state wavefunction and compared to results of previous work using exact diagonalization and Hartree-Fock. We find that the model undergoes a thermodynamic transition from a band insulator (BI) to a broken-symmetry bond ordered (BO) phase as the ratio of U/D is increased. Since it is known that at D = 0 the usual Hubbard model is a Mott insulator (MI) with no long-range order, we have searched for a second transition to this state by (i) increasing U at fixed ionic potential (D) and (ii) decreasing D at fixed U. We find no transition from the BO to MI state, and we propose that the MI state in 1D is unstable to bond ordering under the addition of any finite ionic potential. In real 1D systems the symmetric MI phase is never stable and the transition is from a symmetric BI phase to a dimerized BO phase, with a metallic point at the transition

    Analysis of diagnoses extracted from electronic health records in a large mental health case register

    Get PDF
    The UK government has recently recognised the need to improve mental health services in the country. Electronic health records provide a rich source of patient data which could help policymakers to better understand needs of the service users. The main objective of this study is to unveil statistics of diagnoses recorded in the Case Register of the South London and Maudsley NHS Foundation Trust, one of the largest mental health providers in the UK and Europe serving a source population of over 1.2 million people residing in south London. Based on over 500,000 diagnoses recorded in ICD10 codes for a cohort of approximately 200,000 mental health patients, we established frequency rate of each diagnosis (the ratio of the number of patients for whom a diagnosis has ever been recorded to the number of patients in the entire population who have made contact with mental disorders). We also investigated differences in diagnoses prevalence between subgroups of patients stratified by gender and ethnicity. The most common diagnoses in the considered population were (recurrent) depression (ICD10 codes F32-33; 16.4% of patients), reaction to severe stress and adjustment disorders (F43; 7.1%), mental/behavioural disorders due to use of alcohol (F10; 6.9%), and schizophrenia (F20; 5.6%). We also found many diagnoses which were more likely to be recorded in patients of a certain gender or ethnicity. For example, mood (affective) disorders (F31-F39); neurotic, stress-related and somatoform disorders (F40-F48, except F42); and eating disorders (F50) were more likely to be found in records of female patients, while males were more likely to be diagnosed with mental/behavioural disorders due to psychoactive substance use (F10-F19). Furthermore, mental/behavioural disorders due to use of alcohol and opioids were more likely to be recorded in patients of white ethnicity, and disorders due to use of cannabinoids in those of black ethnicity
    corecore