153 research outputs found

    SAGETTARIUS: a program to reduce the number of tags mapped to multiple transcripts and to plan SAGE sequencing stages

    Get PDF
    SAGE (Serial Analysis of Gene Expression) experiments generate short nucleotide sequences called ‘tags’ which are assumed to map unambiguously to their original transcripts (1 tag to 1 transcript mapping). Nevertheless, many tags are generated that do not map to any transcript or map to multiple transcripts. Current bioinformatics resources, such as SAGEmap and TAGmapper, have focused on reducing the number of unmapped tags. Here, we describe SAGETTARIUS, a new high-throughput program that performs successive precise Nla3 and Sau3A tag to transcript mapping, based on specifically designed Virtual Tag (VT) libraries. First, SAGETTARIUS decreases the number of tags mapped to multiple transcripts. Among the various mapping resources compared, SAGETTARIUS performed the best in this respect by decreasing up to 11% the number of multiply mapped tags. Second, SAGETTARIUS allows the establishment of a guideline for SAGE experiment sequencing efforts through efficient mapping of the CRT (Cytoplasmic Ribosomal protein Transcripts)-specific tags. Using all publicly available human and mouse Nla3 SAGE experiments, we show that sequencing 100 000 tags is sufficient to map almost all CRT-specific tags and that four sequencing stages can be identified when carrying out a human or mouse SAGE project. SAGETTARIUS is web interfaced and freely accessible to academic users

    seqMINER: an integrated ChIP-seq data interpretation platform

    Get PDF
    In a single experiment, chromatin immunoprecipitation combined with high throughput sequencing (ChIP-seq) provides genome-wide information about a given covalent histone modification or transcription factor occupancy. However, time efficient bioinformatics resources for extracting biological meaning out of these gigabyte-scale datasets are often a limiting factor for data interpretation by biologists. We created an integrated portable ChIP-seq data interpretation platform called seqMINER, with optimized performances for efficient handling of multiple genome-wide datasets. seqMINER allows comparison and integration of multiple ChIP-seq datasets and extraction of qualitative as well as quantitative information. seqMINER can handle the biological complexity of most experimental situations and proposes methods to the user for data classification according to the analysed features. In addition, through multiple graphical representations, seqMINER allows visualization and modelling of general as well as specific patterns in a given dataset. To demonstrate the efficiency of seqMINER, we have carried out a comprehensive analysis of genome-wide chromatin modification data in mouse embryonic stem cells to understand the global epigenetic landscape and its change through cellular differentiation

    Host-Pathogen O-Methyltransferase Similarity and Its Specific Presence in Highly Virulent Strains of Francisella tularensis Suggests Molecular Mimicry

    Get PDF
    Whole genome comparative studies of many bacterial pathogens have shown an overall high similarity of gene content (>95%) between phylogenetically distinct subspecies. In highly clonal species that share the bulk of their genomes subtle changes in gene content and small-scale polymorphisms, especially those that may alter gene expression and protein-protein interactions, are more likely to have a significant effect on the pathogen's biology. In order to better understand molecular attributes that may mediate the adaptation of virulence in infectious bacteria, a comparative study was done to further analyze the evolution of a gene encoding an o-methyltransferase that was previously identified as a candidate virulence factor due to its conservation specifically in highly pathogenic Francisella tularensis subsp. tularensis strains. The o-methyltransferase gene is located in the genomic neighborhood of a known pathogenicity island and predicted site of rearrangement. Distinct o-methyltransferase subtypes are present in different Francisella tularensis subspecies. Related protein families were identified in several host species as well as species of pathogenic bacteria that are otherwise very distant phylogenetically from Francisella, including species of Mycobacterium. A conserved sequence motif profile is present in the mammalian host and pathogen protein sequences, and sites of non-synonymous variation conserved in Francisella subspecies specific o-methyltransferases map proximally to the predicted active site of the orthologous human protein structure. Altogether, evidence suggests a role of the F. t. subsp. tularensis protein in a mechanism of molecular mimicry, similar perhaps to Legionella and Coxiella. These findings therefore provide insights into the evolution of niche-restriction and virulence in Francisella, and have broader implications regarding the molecular mechanisms that mediate host-pathogen relationships

    MACSIMS : multiple alignment of complete sequences information management system

    Get PDF
    BACKGROUND: In the post-genomic era, systems-level studies are being performed that seek to explain complex biological systems by integrating diverse resources from fields such as genomics, proteomics or transcriptomics. New information management systems are now needed for the collection, validation and analysis of the vast amount of heterogeneous data available. Multiple alignments of complete sequences provide an ideal environment for the integration of this information in the context of the protein family. RESULTS: MACSIMS is a multiple alignment-based information management program that combines the advantages of both knowledge-based and ab initio sequence analysis methods. Structural and functional information is retrieved automatically from the public databases. In the multiple alignment, homologous regions are identified and the retrieved data is evaluated and propagated from known to unknown sequences with these reliable regions. In a large-scale evaluation, the specificity of the propagated sequence features is estimated to be >99%, i.e. very few false positive predictions are made. MACSIMS is then used to characterise mutations in a test set of 100 proteins that are known to be involved in human genetic diseases. The number of sequence features associated with these proteins was increased by 60%, compared to the features available in the public databases. An XML format output file allows automatic parsing of the MACSIM results, while a graphical display using the JalView program allows manual analysis. CONCLUSION: MACSIMS is a new information management system that incorporates detailed analyses of protein families at the structural, functional and evolutionary levels. MACSIMS thus provides a unique environment that facilitates knowledge extraction and the presentation of the most pertinent information to the biologist. A web server and the source code are available at

    MSV3d: database of human MisSense variants mapped to 3D protein structure

    Get PDF
    The elucidation of the complex relationships linking genotypic and phenotypic variations to protein structure is a major challenge in the post-genomic era. We present MSV3d (Database of human MisSense Variants mapped to 3D protein structure), a new database that contains detailed annotation of missense variants of all human proteins (20 199 proteins). The multi-level characterization includes details of the physico-chemical changes induced by amino acid modification, as well as information related to the conservation of the mutated residue and its position relative to functional features in the available or predicted 3D model. Major releases of the database are automatically generated and updated regularly in line with the dbSNP (database of Single Nucleotide Polymorphism) and SwissVar releases, by exploiting the extensive Décrypthon computational grid resources. The database (http://decrypthon.igbmc.fr/msv3d) is easily accessible through a simple web interface coupled to a powerful query engine and a standard web service. The content is completely or partially downloadable in XML or flat file formats

    X-Ray Structure of the Human Calreticulin Globular Domain Reveals a Peptide-Binding Area and Suggests a Multi-Molecular Mechanism

    Get PDF
    In the endoplasmic reticulum, calreticulin acts as a chaperone and a Ca2+-signalling protein. At the cell surface, it mediates numerous important biological effects. The crystal structure of the human calreticulin globular domain was solved at 1.55 Å resolution. Interactions of the flexible N-terminal extension with the edge of the lectin site are consistently observed, revealing a hitherto unidentified peptide-binding site. A calreticulin molecular zipper, observed in all crystal lattices, could further extend this site by creating a binding cavity lined by hydrophobic residues. These data thus provide a first structural insight into the lectin-independent binding properties of calreticulin and suggest new working hypotheses, including that of a multi-molecular mechanism
    corecore