123 research outputs found

    Ab initio detection of fuzzy amino acid tandem repeats in protein sequences

    Get PDF
    Background Tandem repetitions within protein amino acid sequences often correspond to regular secondary structures and form multi-repeat 3D assemblies of varied size and function. Developing internal repetitions is one of the evolutionary mechanisms that proteins employ to adapt their structure and function under evolutionary pressure. While there is keen interest in understanding such phenomena, detection of repeating structures based only on sequence analysis is considered an arduous task, since structure and function is often preserved even under considerable sequence divergence (fuzzy tandem repeats). Results In this paper we present PTRStalker, a new algorithm for ab-initio detection of fuzzy tandem repeats in protein amino acid sequences. In the reported results we show that by feeding PTRStalker with amino acid sequences from the UniProtKB/Swiss-Prot database we detect novel tandemly repeated structures not captured by other state-of-the-art tools. Experiments with membrane proteins indicate that PTRStalker can detect global symmetries in the primary structure which are then reflected in the tertiary structure. Conclusions PTRStalker is able to detect fuzzy tandem repeating structures in protein sequences, with performance beyond the current state-of-the art. Such a tool may be a valuable support to investigating protein structural properties when tertiary X-ray data is not available

    Tandem Repeats in Proteins: Prediction Algorithms and Biological Role

    Get PDF
    Tandem repetitions in protein sequence and structure is a fascinating subject of research which has been a focus of study since the late 1990s. In this survey, we give an overview on the multi-faceted aspects of research on protein tandem repeats (PTR for short), including prediction algorithms, databases, early classification efforts, mechanisms of PTR formation and evolution, and synthetic PTR design. We also touch on the rather open issue of the relationship between PTR and flexibility (or disorder) in proteins. Detection of PTR either from protein sequence or structure data is challenging due to inherent high (biological) signal-to-noise ratio that is a key feature of this problem. As early in silico analytic tools have been key enablers for starting this field of study, we expect that current and future algorithmic and statistical breakthroughs will have a high impact on the investigations of the biological role of PTR

    TRStalker: an efficient heuristic for finding fuzzy tandem repeats

    Get PDF
    Motivation: Genomes in higher eukaryotic organisms contain a substantial amount of repeated sequences. Tandem Repeats (TRs) constitute a large class of repetitive sequences that are originated via phenomena such as replication slippage and are characterized by close spatial contiguity. They play an important role in several molecular regulatory mechanisms, and also in several diseases (e.g. in the group of trinucleotide repeat disorders). While for TRs with a low or medium level of divergence the current methods are rather effective, the problem of detecting TRs with higher divergence (fuzzy TRs) is still open. The detection of fuzzy TRs is propaedeutic to enriching our view of their role in regulatory mechanisms and diseases. Fuzzy TRs are also important as tools to shed light on the evolutionary history of the genome, where higher divergence correlates with more remote duplication events

    Solving the Structural Modeling Problems for Tandem Repeat Proteins

    Get PDF
    Over the last decade, numerous studies have demonstrated fundamental importance of tandem repeat proteins (TRP) in many biological processes (Andrade, Perez-Iratxeta, and Ponting 2001). Repeat proteins are a widespread class of non-globular proteins carrying heterogeneous functions involved in several diseases. One of the most frequent problems in the study of biology is the functional characterization of a protein. This problem is usually solved by analyzing the three-dimensional (3D) structure. The experimental determination of the 3D structure is time consuming and technically difficult. For this reason structure prediction by homology modeling offers a fast alternative to experimental approaches. However homology modeling is not feasible for tandem repeat proteins because it is difficult to infer homology due to a high degree of sequence degeneration. In this thesis, I focused on algorithms oriented toward repeat unit prediction, and characterization. I developed an innovative approach, Repeat Protein Unit Predictor (ReUPred), for fast automatic prediction of repeat units and repeat classification, exploiting a Structure Repeat Unit Library (SRUL) derived from RepeatsDB, the core database of TRP. ReUPred is based on the Victor C++ library, an open source platform dedicated to protein structure manipulation. To prove the accuracy of the predictor, we ran it against all the entries in the PDB database and the resulting predictions allowed us to improve and increase RepeatsDB annotation twenty times. During my PhD I have integrated ReUPred prediction into the new version of RepeatsDB (release 2.0) that now features information on start and end positions for the repeat regions and units for all entries. The updated web interface includes a new search engine for complex queries and a fully re-designed entry page for a better overview of structural data. To further improve RepeatsDB quality we decided to provide a finer classification at the subclass level based on the structural conformation of the repeated units. We hypothesized that inside these ensembles it is possible to find subgroups of proteins sharing the same unit type. To prove it, we performed a detailed structural analysis. We created a network where nodes are the units and arcs represent structural similarity. The network can be partitioned in 7 different clusters. For each cluster, it was possible to create a Hidden Markov Model similar to those representing Pfam domains. This analysis is an unpublished work but it already helped to improve ReUPred accuracy and RepeatsDB annotation. To summarize, this work is a partial answer to the problems of TRP modeling and might be helpful during future investigations such as drug design and disease studies

    A membrane-inserted structural model of the yeast mitofusin Fzo1

    Get PDF
    Mitofusins are large transmembrane GTPases of the dynamin-related protein family, and are required for the tethering and fusion of mitochondrial outer membranes. Their full-length structures remain unknown, which is a limiting factor in the study of outer membrane fusion. We investigated the structure and dynamics of the yeast mitofusin Fzo1 through a hybrid computational and experimental approach, combining molecular modelling and all-atom molecular dynamics simulations in a lipid bilayer with site-directed mutagenesis and in vivo functional assays. The predicted architecture of Fzo1 improves upon the current domain annotation, with a precise description of the helical spans linked by flexible hinges, which are likely of functional significance. In vivo site-directed mutagenesis validates salient aspects of this model, notably, the long-distance contacts and residues participating in hinges. GDP is predicted to interact with Fzo1 through the G1 and G4 motifs of the GTPase domain. The model reveals structural determinants critical for protein function, including regions that may be involved in GTPase domain-dependent rearrangements

    The Fuzziness in Molecular, Supramolecular, and Systems Chemistry

    Get PDF
    Fuzzy Logic is a good model for the human ability to compute words. It is based on the theory of fuzzy set. A fuzzy set is different from a classical set because it breaks the Law of the Excluded Middle. In fact, an item may belong to a fuzzy set and its complement at the same time and with the same or different degree of membership. The degree of membership of an item in a fuzzy set can be any real number included between 0 and 1. This property enables us to deal with all those statements of which truths are a matter of degree. Fuzzy logic plays a relevant role in the field of Artificial Intelligence because it enables decision-making in complex situations, where there are many intertwined variables involved. Traditionally, fuzzy logic is implemented through software on a computer or, even better, through analog electronic circuits. Recently, the idea of using molecules and chemical reactions to process fuzzy logic has been promoted. In fact, the molecular word is fuzzy in its essence. The overlapping of quantum states, on the one hand, and the conformational heterogeneity of large molecules, on the other, enable context-specific functions to emerge in response to changing environmental conditions. Moreover, analog input–output relationships, involving not only electrical but also other physical and chemical variables can be exploited to build fuzzy logic systems. The development of “fuzzy chemical systems” is tracing a new path in the field of artificial intelligence. This new path shows that artificially intelligent systems can be implemented not only through software and electronic circuits but also through solutions of properly chosen chemical compounds. The design of chemical artificial intelligent systems and chemical robots promises to have a significant impact on science, medicine, economy, security, and wellbeing. Therefore, it is my great pleasure to announce a Special Issue of Molecules entitled “The Fuzziness in Molecular, Supramolecular, and Systems Chemistry.” All researchers who experience the Fuzziness of the molecular world or use Fuzzy logic to understand Chemical Complex Systems will be interested in this book

    Computational characterization of tandem repeat and non-globular proteins

    Get PDF
    The first protein structure to be determined was hemoglobin, a globe-like, water-soluble protein with enzymatic activity. Since then, protein science has been biased towards this type, termed globular. However, over the last decades accumulating experimental evidences suggested the functional importance of their counterpart, non-globular proteins (NGPs). The definition includes tandem repetitions, intrinsically disordered regions, aggregating domains and transmembrane domains. NGPs recognition and classification is essential to shed a light on the so called “dark proteome”, i.e. the large fraction that we know almost nothing about. I contributed to this goal through the development of new resources dedicated to NGPs. My main focus are tandem repeat proteins (TRPs). TRPs are characterized by a repeated sequence which folds into a modular architecture, where modules are called “units”. The unit represents not only the structural but also the evolutionary module and base TRPs classification. TRPs are widespread in all type of organisms, where they carry out fundamental functions. The sequences of TRP units diverge quickly while maintaining their fold, hampering detection by traditional methods for sequence analysis. Conversely, the challenges of structure-based repeats detection lie in the multidimensional nature of the data. Specialized methods have been developed for TRPs identification, however few of them annotate single repeat units. RepeatsDB is a database of TRP structures annotated with the position of repeat units and insertions. I contributed to the new version of RepeatsDB database, which was populated taking advantage of ReUPred, predictor of tandem repeat units. The quality of RepeatsDB data is guaranteed by manual validation, a time-consuming task which requires community annotation efforts. To facilitate this process I developed RepeatsDB-lite, web server for the prediction and refinement of tandem repeats in protein structure. Analysing RepeatsDB data, I compared the sequence- and structure-based classification of TRPs. Moreover, I provided insights on TRPs role in the human proteome by characterizing them in terms of function, protein-protein interaction networks and impact on diseases. As a case study, I characterized Collagen V, a repeat protein associated to Ehlers-Danlos syndrome, identifying genotype-phenotype correlations in relation to its interaction network model. Another category of NGPs is intrinsically disordered proteins (IDPs), devoid of order in their native state. Intrinsic disorder was shown to be prevalent in the human proteome, to play important signaling and regulatory roles and to be frequently involved in disease. I contributed to MobiDB, database of protein disorder and mobility annotations that describes several aspects of NGPs structure and mechanism of function. MobiDB provides consensus predictions and functional annotations for all known protein sequences. A common feature of TRPs, IDPs and other NGPs is that they are characterized by low-complexity regions, where the distribution of amino acids deviates from the common amino acid usage. The functional importance of low complexity regions is strictly related to their non-globular arrangement. I contributed to the field with a critical review focusing on the definition of sequence features of low complexity regions and their relationship to structural features. Finally, I exploited the knowledge acquired on NGPs in the previous studies to design one of the first sequence-based methods for the prediction of protein solubility, SODA. SODA uses the aggregation propensity, intrinsic disorder, hydrophobicity and secondary structure preferences from a sequence to evaluate solubility changes introduced by a mutation. The main envisaged applications of SODA are in protein engineering and in the study of the impact of protein mutations in disease insurgence
    corecore