20 research outputs found

    Blast sampling for structural and functional analyses

    Get PDF
    BACKGROUND: The post-genomic era is characterised by a torrent of biological information flooding the public databases. As a direct consequence, similarity searches starting with a single query sequence frequently lead to the identification of hundreds, or even thousands of potential homologues. The huge volume of data renders the subsequent structural, functional and evolutionary analyses very difficult. It is therefore essential to develop new strategies for efficient sampling of this large sequence space, in order to reduce the number of sequences to be processed. At the same time, it is important to retain the most pertinent sequences for structural and functional studies. RESULTS: An exhaustive analysis on a large scale test set (284 protein families) was performed to compare the efficiency of four different sampling methods aimed at selecting the most pertinent sequences. These four methods sample the proteins detected by BlastP searches and can be divided into two categories: two customisable methods where the user defines either the maximal number or the percentage of sequences to be selected; two automatic methods in which the number of sequences selected is determined by the program. We focused our analysis on the potential information content of the sampled sets of sequences using multiple alignment of complete sequences as the main validation tool. The study considered two criteria: the total number of sequences in BlastP and their associated E-values. The subsequent analyses investigated the influence of the sampling methods on the E-value distributions, the sequence coverage, the final multiple alignment quality and the active site characterisation at various residue conservation thresholds as a function of these criteria. CONCLUSION: The comparative analysis of the four sampling methods allows us to propose a suitable sampling strategy that significantly reduces the number of homologous sequences required for alignment, while at the same time maintaining the relevant information concerning the active site residues

    The structure of Staphylococcus aureus epidermolytic toxin A, an atypic serine protease, at 1.7 Ă… resolution

    Get PDF
    AbstractBackground: Staphylococcal epidermolytic toxins A and B (ETA and ETB) are responsible for the staphylococcal scalded skin syndrome of newborn and young infants; this condition can appear just a few hours after birth. These toxins cause the disorganization and disruption of the region between the stratum spinosum and the stratum granulosum —  two of the three cellular layers constituting the epidermis. The physiological substrate of ETA is not known and, consequently, its mode of action in vivo remains an unanswered question. Determination of the structure of ETA and its comparison with other serine proteases may reveal insights into ETA's catalytic mechanism.Results: The crystal structure of staphylococcal ETA has been determined by multiple isomorphous replacement and refined at 1.7 Å resolution with a crystallographic R factor of 0.184. The structure of ETA reveals it to be a new and unique member of the trypsin-like serine protease family. In contrast to other serine protease folds, ETA can be characterized by ETA-specific surface loops, a lack of cysteine bridges, an oxyanion hole which is not preformed, an S1 specific pocket designed for a negatively charged amino acid and an ETA-specific N-terminal helix which is shown to be crucial for substrate hydrolysis.Conclusions: Despite very low sequence homology between ETA and other trypsin-like serine proteases, the ETA crystal structure, together with biochemical data and site-directed mutagenesis studies, strongly confirms the classification of ETA in the Glu-endopeptidase family. Direct links can be made between the protease architecture of ETA and its biological activity

    Fixed-Size Determinantal Point Processes Sampling For Species Phylogeny

    Get PDF
    Determinantal point processes (DPPs) are popular tools that supply useful information for repulsiveness. They provide coherent probabilistic models when negative correlations arise and also represent new algo-rithms for inference problems like sampling, marginalization and conditioning. Recently, DPPs have played an increasingly important role in machine learning and statistics, since they are used for diverse subset se-lection problems. In this paper we use k-DPP, a conditional DPP that models only sets of cardinality k, to sample a diverse subset of species from a large phylogenetic tree. The tree sampling task is important in many studies in modern bioinformatics. The results show a fast mixing sampler for k-DPP, for which a polynomial bound on the mixing time is given. This approach is applied to a real-world dataset of species,and we observe that leaves joined by a higher subtree are more likely to appear

    Transmembrane Signaling across the Ligand-Gated FhuA Receptor

    Get PDF
    International audienceFhuA protein facilitates ligand-gated transport of ferrichrome-bound iron across Escherichia coli outer membranes. X-ray analysis at 2.7 A resolution reveals two distinct conformations in the presence and absence of ferrichrome. The monomeric protein consists of a hollow, 22-stranded, antiparallel beta barrel (residues 160-714), which is obstructed by a plug (residues 19-159). The binding site of ferrichrome, an aromatic pocket near the cell surface, undergoes minor changes upon association with the ligand. These are propagated and amplified across the plug, eventually resulting in substantially different protein conformations at the periplasmic face. Our findings reveal the mechanism of signal transmission and suggest how the energy-transducing TonB complex senses ligand binding

    RReportGenerator: automatic reports from routine statistical analysis using R.

    No full text
    With the establishment of high-throughput (HT) screening methods there is an increasing need for automatic analysis methods. Here we present RReportGenerator, a user-friendly portal for automatic routine analysis using the statistical platform R and Bioconductor. RReportGenerator is designed to analyze data using predefined analysis scenarios via a graphical user interface (GUI). A report in pdf format combining text, figures and tables is automatically generated and results may be exported. To demonstrate suitable analysis tasks we provide direct web access to a collection of analysis scenarios for summarizing data from transfected cell arrays (TCA), segmentation of CGH data, and microarray quality control and normalization. AVAILABILITY: RReportGenerator, a user manual and a collection of analysis scenarios are available under a GNU public license on http://www-bio3d-igbmc.u-strasbg.fr/~wraf

    Spliceator: multi-species splice site prediction using convolutional neural networks

    No full text
    International audienceBackground Ab initio prediction of splice sites is an essential step in eukaryotic genome annotation. Recent predictors have exploited Deep Learning algorithms and reliable gene structures from model organisms. However, Deep Learning methods for non-model organisms are lacking.Results We developed Spliceator to predict splice sites in a wide range of species, including model and non-model organisms. Spliceator uses a convolutional neural network and is trained on carefully validated data from over 100 organisms. We show that Spliceator achieves consistently high accuracy (89–92%) compared to existing methods on independent benchmarks from human, fish, fly, worm, plant and protist organisms.Conclusions Spliceator is a new Deep Learning method trained on high-quality data, which can be used to predict splice sites in diverse organisms, ranging from human to protists, with consistently high accuracy

    DĂ©crypthon Grid - Grid Resources Dedicated to Neuromuscular Disorders

    No full text
    International audienceThanks to the availability of computational grids and their middleware, a seamless access to computation and storage resources is provided to application developers and scientists. The DĂ©crypthon project is one example of such a high performance platform. In this paper, we present the architecture of the platform, the middleware developed to facilitate access to several servers deployed in France, and the data center for integrating large biological datasets over multiple sites, supported by a new query language and integration of various tools. The SM2PH project represents an example of a biological application that exploits the capacities of the DĂ©crypthon grid. The goal of SM2PH is a better understanding of mutations involved in human monogenic diseases, their impact on the 3D structure of the protein and the subsequent consequences for the pathological phenotypes
    corecore