25,090 research outputs found
The interplay of descriptor-based computational analysis with pharmacophore modeling builds the basis for a novel classification scheme for feruloyl esterases
One of the most intriguing groups of enzymes, the feruloyl esterases (FAEs), is ubiquitous in both simple and complex organisms. FAEs have gained importance in biofuel, medicine and food industries due to their capability of acting on a large range of substrates for cleaving ester bonds and synthesizing high-added value molecules through esterification and transesterification reactions. During the past two decades extensive studies have been carried out on the production and partial characterization of FAEs from fungi, while much less is known about FAEs of bacterial or plant origin. Initial classification studies on FAEs were restricted on sequence similarity and substrate specificity on just four model substrates and considered only a handful of FAEs belonging to the fungal kingdom. This study centers on the descriptor-based classification and structural analysis of experimentally verified and putative FAEs; nevertheless, the framework presented here is applicable to every poorly characterized enzyme family. 365 FAE-related sequences of fungal, bacterial and plantae origin were collected and they were clustered using Self Organizing Maps followed by k-means clustering into distinct groups based on amino acid composition and physico-chemical composition descriptors derived from the respective amino acid sequence. A Support Vector Machine model was subsequently constructed for the classification of new FAEs into the pre-assigned clusters. The model successfully recognized 98.2% of the training sequences and all the sequences of the blind test. The underlying functionality of the 12 proposed FAE families was validated against a combination of prediction tools and published experimental data. Another important aspect of the present work involves the development of pharmacophore models for the new FAE families, for which sufficient information on known substrates existed. Knowing the pharmacophoric features of a small molecule that are essential for binding to the members of a certain family opens a window of opportunities for tailored applications of FAEs
Structuprint: a scalable and extensible tool for two-dimensional representation of protein surfaces
© 2016 Kontopoulos et al.Background: The term molecular cartography encompasses a family of computational methods for two-dimensional transformation of protein structures and analysis of their physicochemical properties. The underlying algorithms comprise multiple manual steps, whereas the few existing implementations typically restrict the user to a very limited set of molecular descriptors. Results: We present Structuprint, a free standalone software that fully automates the rendering of protein surface maps, given - at the very least - a directory with a PDB file and an amino acid property. The tool comes with a default database of 328 descriptors, which can be extended or substituted by user-provided ones. The core algorithm comprises the generation of a mould of the protein surface, which is subsequently converted to a sphere and mapped to two dimensions, using the Miller cylindrical projection. Structuprint is partly optimized for multicore computers, making the rendering of animations of entire molecular dynamics simulations feasible. Conclusions: Structuprint is an efficient application, implementing a molecular cartography algorithm for protein surfaces. According to the results of a benchmark, its memory requirements and execution time are reasonable, allowing it to run even on low-end personal computers. We believe that it will be of use - primarily but not exclusively - to structural biologists and computational biochemists
Multiple sequence alignment based on set covers
We introduce a new heuristic for the multiple alignment of a set of
sequences. The heuristic is based on a set cover of the residue alphabet of the
sequences, and also on the determination of a significant set of blocks
comprising subsequences of the sequences to be aligned. These blocks are
obtained with the aid of a new data structure, called a suffix-set tree, which
is constructed from the input sequences with the guidance of the
residue-alphabet set cover and generalizes the well-known suffix tree of the
sequence set. We provide performance results on selected BAliBASE amino-acid
sequences and compare them with those yielded by some prominent approaches
Assembly of an interactive correlation network for the Arabidopsis genome using a novel heuristic clustering algorithm
Peer reviewedPublisher PD
Recommended from our members
Gut microbial features can predict host phenotype response to protein deficiency.
Malnutrition remains a major health problem in low- and middle-income countries. During low protein intake, <0.67 g/kg/day, there is a loss of nitrogen (N2 ) balance, due to the unavailability of amino acid for metabolism and unbalanced protein catabolism results. However, there are individuals, who consume the same low protein intake, and preserve N2 balance for unknown reasons. A novel factor, the gut microbiota, may account for these N2 balance differences. To investigate this, we correlated gut microbial profiles with the growth of four murine strains (C57Bl6/J, CD-1, FVB, and NIH-Swiss) on protein deficient (PD) diet. Results show that a PD diet exerts a strain-dependent impact on growth and N2 balance as determined through analysis of urinary urea, ammonia and creatinine excretion. Bacterial alpha diversity was significantly (P < 0.05, FDR) lower across all strains on a PD diet compared to normal chow (NC). Multi-group analyses of the composition of microbiomes (ANCOM) revealed significantly differential microbial signatures between the four strains independent of diet. However, mice on a PD diet demonstrated differential enrichment of bacterial genera including, Allobaculum (C57Bl6/J), Parabacteroides (CD-1), Turicibacter (FVB), and Mucispirillum (NIH-Swiss) relative to NC. For instance, selective comparison of the CD-1 (gained weight) and C57Bl6/J (did not gain weight) strains on PD diet also demonstrated significant pathway enrichment of dihydroorodate dehydrogenase, rRNA methyltransferases, and RNA splicing ligase in the CD-1 strains compared to C57Bl6/J strains; which might account in their ability to retain growth despite a protein deficient diet. Taken together, these results suggest a potential relationship between the specific gut microbiota, N2 balance and animal response to malnutrition
Fold-specific sequence scoring improves protein sequence matching
Background Sequence matching is extremely important for applications throughout biology, particularly for discovering information such as functional and evolutionary relationships, and also for discriminating between unimportant and disease mutants. At present the functions of a large fraction of genes are unknown; improvements in sequence matching will improve gene annotations. Universal amino acid substitution matrices such as Blosum62 are used to measure sequence similarities and to identify distant homologues, regardless of the structure class. However, such single matrices do not take into account important structural information evident within the different topologies of proteins and treats substitutions within all protein folds identically. Others have suggested that the use of structural information can lead to significant improvements in sequence matching but this has not yet been very effective. Here we develop novel substitution matrices that include not only general sequence information but also have a topology specific component that is unique for each CATH topology. This novel feature of using a combination of sequence and structure information for each protein topology significantly improves the sequence matching scores for the sequence pairs tested. We have used a novel multi-structure alignment method for each homology level of CATH in order to extract topological information. Results We obtain statistically significant improved sequence matching scores for 73 % of the alpha helical test cases. On average, 61 % of the test cases showed improvements in homology detection when structure information was incorporated into the substitution matrices. On average z-scores for homology detection are improved by more than 54 % for all cases, and some individual cases have z-scores more than twice those obtained using generic matrices. Our topology specific similarity matrices also outperform other traditional similarity matrices and single matrix based structure methods. When default amino acid substitution matrix in the Psi-blast algorithm is replaced by our structure-based matrices, the structure matching is significantly improved over conventional Psi-blast. It also outperforms results obtained for the corresponding HMM profiles generated for each topology. Conclusions We show that by incorporating topology-specific structure information in addition to sequence information into specific amino acid substitution matrices, the sequence matching scores and homology detection are significantly improved. Our topology specific similarity matrices outperform other traditional similarity matrices, single matrix based structure methods, also show improvement over conventional Psi-blast and HMM profile based methods in sequence matching. The results support the discriminatory ability of the new amino acid similarity matrices to distinguish between distant homologs and structurally dissimilar pairs
Simultaneous identification of specifically interacting paralogs and inter-protein contacts by Direct-Coupling Analysis
Understanding protein-protein interactions is central to our understanding of
almost all complex biological processes. Computational tools exploiting rapidly
growing genomic databases to characterize protein-protein interactions are
urgently needed. Such methods should connect multiple scales from evolutionary
conserved interactions between families of homologous proteins, over the
identification of specifically interacting proteins in the case of multiple
paralogs inside a species, down to the prediction of residues being in physical
contact across interaction interfaces. Statistical inference methods detecting
residue-residue coevolution have recently triggered considerable progress in
using sequence data for quaternary protein structure prediction; they require,
however, large joint alignments of homologous protein pairs known to interact.
The generation of such alignments is a complex computational task on its own;
application of coevolutionary modeling has in turn been restricted to proteins
without paralogs, or to bacterial systems with the corresponding coding genes
being co-localized in operons. Here we show that the Direct-Coupling Analysis
of residue coevolution can be extended to connect the different scales, and
simultaneously to match interacting paralogs, to identify inter-protein
residue-residue contacts and to discriminate interacting from noninteracting
families in a multiprotein system. Our results extend the potential
applications of coevolutionary analysis far beyond cases treatable so far.Comment: Main Text 19 pages Supp. Inf. 16 page
- …