18,152 research outputs found
Resdiue-residue contact driven protein structure prediction using optimization and machine learning
Significant improvements in the prediction of protein residue-residue contacts are observed in the recent years. These contacts, predicted using a variety of coevolution-based and machine learning methods, are the key contributors to the recent progress in ab initio protein structure prediction, as demonstrated in the recent CASP experiments. Continuing the development of new methods to reliably predict contact maps, tools to assess the utility of predicted contacts, and methods to construct protein tertiary structures from predicted contacts, are essential to further improve ab initio structure prediction. In this dissertation, three contributions are described -- (a) DNCON2, a two-level convolutional neural network-based method for protein contact prediction, (b) ConEVA, a toolkit for contact assessment and evaluation, and (c) CONFOLD, a method of building protein 3D structures from predicted contacts and secondary structures. Additional related contributions on protein contact prediction and structure reconstruction are also described. DNCON2 and CONFOLD demonstrate state-of-the-art performance on contact prediction and structure reconstruction from scratch. All three protein structure methods are available as software or web server which are freely available to the scientific community.Includes biblographical reference
Blurring contact maps of thousands of proteins: what we can learn by reconstructing 3D structure
<p>Abstract</p> <p>Background</p> <p>The present knowledge of protein structures at atomic level derives from some 60,000 molecules. Yet the exponential ever growing set of hypothetical protein sequences comprises some 10 million chains and this makes the problem of protein structure prediction one of the challenging goals of bioinformatics. In this context, the protein representation with contact maps is an intermediate step of fold recognition and constitutes the input of contact map predictors. However contact map representations require fast and reliable methods to reconstruct the specific folding of the protein backbone.</p> <p>Methods</p> <p>In this paper, by adopting a GRID technology, our algorithm for 3D reconstruction FT-COMAR is benchmarked on a huge set of non redundant proteins (1716) taking random noise into consideration and this makes our computation the largest ever performed for the task at hand.</p> <p>Results</p> <p>We can observe the effects of introducing random noise on 3D reconstruction and derive some considerations useful for future implementations. The dimension of the protein set allows also statistical considerations after grouping per SCOP structural classes.</p> <p>Conclusions</p> <p>All together our data indicate that the quality of 3D reconstruction is unaffected by deleting up to an average 75% of the real contacts while only few percentage of randomly generated contacts in place of non-contacts are sufficient to hamper 3D reconstruction.</p
A two-stage approach for improved prediction of residue contact maps
BACKGROUND: Protein topology representations such as residue contact maps are an important intermediate step towards ab initio prediction of protein structure. Although improvements have occurred over the last years, the problem of accurately predicting residue contact maps from primary sequences is still largely unsolved. Among the reasons for this are the unbalanced nature of the problem (with far fewer examples of contacts than non-contacts), the formidable challenge of capturing long-range interactions in the maps, the intrinsic difficulty of mapping one-dimensional input sequences into two-dimensional output maps. In order to alleviate these problems and achieve improved contact map predictions, in this paper we split the task into two stages: the prediction of a map's principal eigenvector (PE) from the primary sequence; the reconstruction of the contact map from the PE and primary sequence. Predicting the PE from the primary sequence consists in mapping a vector into a vector. This task is less complex than mapping vectors directly into two-dimensional matrices since the size of the problem is drastically reduced and so is the scale length of interactions that need to be learned. RESULTS: We develop architectures composed of ensembles of two-layered bidirectional recurrent neural networks to classify the components of the PE in 2, 3 and 4 classes from protein primary sequence, predicted secondary structure, and hydrophobicity interaction scales. Our predictor, tested on a non redundant set of 2171 proteins, achieves classification performances of up to 72.6%, 16% above a base-line statistical predictor. We design a system for the prediction of contact maps from the predicted PE. Our results show that predicting maps through the PE yields sizeable gains especially for long-range contacts which are particularly critical for accurate protein 3D reconstruction. The final predictor's accuracy on a non-redundant set of 327 targets is 35.4% and 19.8% for minimum contact separations of 12 and 24, respectively, when the top length/5 contacts are selected. On the 11 CASP6 Novel Fold targets we achieve similar accuracies (36.5% and 19.7%). This favourably compares with the best automated predictors at CASP6. CONCLUSION: Our final system for contact map prediction achieves state-of-the-art performances, and may provide valuable constraints for improved ab initio prediction of protein structures. A suite of predictors of structural features, including the PE, and PE-based contact maps, is available at
Recovery of Protein Structure from Contact Maps
We present an efficient algorithm to recover the three dimensional structure
of a protein from its contact map representation. First we show that when a
physically realizable map is used as target, our method generates a structure
whose contact map is essentially similar to the target. Furthermore, the
reconstructed and original structures are similar up to the resolution of the
contact map representation. Next we use non-physical target maps, obtained by
corrupting a physical one; in this case our method essentially recovers the
underlying physical map and structure. Hence our algorithm will help to fold
proteins, using dynamics in the space of contact maps. Finally we investigate
the manner in which the quality of the recovered structure degrades when the
number of contacts is reduced.Comment: 27 pages, RevTex, 12 figures include
Recommended from our members
Genome organization and interaction with capsid protein in a multipartite RNA virus.
We report the asymmetric reconstruction of the single-stranded RNA (ssRNA) content in one of the three otherwise identical virions of a multipartite RNA virus, brome mosaic virus (BMV). We exploit a sample consisting exclusively of particles with the same RNA content-specifically, RNAs 3 and 4-assembled in planta by agrobacterium-mediated transient expression. We find that the interior of the particle is nearly empty, with most of the RNA genome situated at the capsid shell. However, this density is disordered in the sense that the RNA is not associated with any particular structure but rather, with an ensemble of secondary/tertiary structures that interact with the capsid protein. Our results illustrate a fundamental difference between the ssRNA organization in the multipartite BMV viral capsid and the monopartite bacteriophages MS2 and Qβ for which a dominant RNA conformation is found inside the assembled viral capsids, with RNA density conserved even at the center of the particle. This can be understood in the context of the differing demands on their respective lifecycles: BMV must package separately each of several different RNA molecules and has been shown to replicate and package them in isolated, membrane-bound, cytoplasmic complexes, whereas the bacteriophages exploit sequence-specific "packaging signals" throughout the viral RNA to package their monopartite genomes
Recommended from our members
Cryo-EM structures of herpes simplex virus type 1 portal vertex and packaged genome.
Herpesviruses are enveloped viruses that are prevalent in the human population and are responsible for diverse pathologies, including cold sores, birth defects and cancers. They are characterized by a highly pressurized pseudo-icosahedral capsid-with triangulation number (T) equal to 16-encapsidating a tightly packed double-stranded DNA (dsDNA) genome1-3. A key process in the herpesvirus life cycle involves the recruitment of an ATP-driven terminase to a unique portal vertex to recognize, package and cleave concatemeric dsDNA, ultimately giving rise to a pressurized, genome-containing virion4,5. Although this process has been studied in dsDNA phages6-9-with which herpesviruses bear some similarities-a lack of high-resolution in situ structures of genome-packaging machinery has prevented the elucidation of how these multi-step reactions, which require close coordination among multiple actors, occur in an integrated environment. To better define the structural basis of genome packaging and organization in herpes simplex virus type 1 (HSV-1), we developed sequential localized classification and symmetry relaxation methods to process cryo-electron microscopy (cryo-EM) images of HSV-1 virions, which enabled us to decouple and reconstruct hetero-symmetric and asymmetric elements within the pseudo-icosahedral capsid. Here we present in situ structures of the unique portal vertex, genomic termini and ordered dsDNA coils in the capsid spooled around a disordered dsDNA core. We identify tentacle-like helices and a globular complex capping the portal vertex that is not observed in phages, indicative of herpesvirus-specific adaptations in the DNA-packaging process. Finally, our atomic models of portal vertex elements reveal how the fivefold-related capsid accommodates symmetry mismatch imparted by the dodecameric portal-a longstanding mystery in icosahedral viruses-and inform possible DNA-sequence recognition and headful-sensing pathways involved in genome packaging. This work showcases how to resolve symmetry-mismatched elements in a large eukaryotic virus and provides insights into the mechanisms of herpesvirus genome packaging
Structural visualization of key steps in human transcription initiation.
Eukaryotic transcription initiation requires the assembly of general transcription factors into a pre-initiation complex that ensures the accurate loading of RNA polymerase II (Pol II) at the transcription start site. The molecular mechanism and function of this assembly have remained elusive due to lack of structural information. Here we have used an in vitro reconstituted system to study the stepwise assembly of human TBP, TFIIA, TFIIB, Pol II, TFIIF, TFIIE and TFIIH onto promoter DNA using cryo-electron microscopy. Our structural analyses provide pseudo-atomic models at various stages of transcription initiation that illuminate critical molecular interactions, including how TFIIF engages Pol II and promoter DNA to stabilize both the closed pre-initiation complex and the open-promoter complex, and to regulate start--initiation complexes, combined with the localization of the TFIIH helicases XPD and XPB, support a DNA translocation model of XPB and explain its essential role in promoter opening
- …