2 research outputs found
Exploiting distant homologues for phasing through the generation of compact fragments, local fold refinement and partial solution combination.
Macromolecular structures can be solved by molecular replacement provided that suitable search models are available. Models from distant homologues may deviate too much from the target structure to succeed, notwithstanding an overall similar fold or even their featuring areas of very close geometry. Successful methods to make the most of such templates usually rely on the degree of conservation to select and improve search models. ARCIMBOLDO_SHREDDER uses fragments derived from distant homologues in a brute-force approach driven by the experimental data, instead of by sequence similarity. The new algorithms implemented in ARCIMBOLDO_SHREDDER are described in detail, illustrating its characteristic aspects in the solution of new and test structures. In an advance from the previously published algorithm, which was based on omitting or extracting contiguous polypeptide spans, model generation now uses three-dimensional volumes respecting structural units. The optimal fragment size is estimated from the expected log-likelihood gain (LLG) values computed assuming that a substructure can be found with a level of accuracy near that required for successful extension of the structure, typically below 0.6 Å root-mean-square deviation (r.m.s.d.) from the target. Better sampling is attempted through model trimming or decomposition into rigid groups and optimization through Phaser's gyre refinement. Also, after model translation, packing filtering and refinement, models are either disassembled into predetermined rigid groups and refined (gimble refinement) or Phaser's LLG-guided pruning is used to trim the model of residues that are not contributing signal to the LLG at the target r.m.s.d. value. Phase combination among consistent partial solutions is performed in reciprocal space with ALIXE. Finally, density modification and main-chain autotracing in SHELXE serve to expand to the full structure and identify successful solutions. The performance on test data and the solution of new structures are described
Enforcing secondary and tertiary structure for crystallographic phasing. Developing ARCIMBOLDO and BORGES
[spa]
ARCIMBOLDO es un método de resolución de estructuras macromoleculares cristalográficas ab intio, que combina la localización de pequeños fragmentos modelo tales como hélices alfa con modificación de la densidad electrónica y trazado automático de la cadena polipeptídica. El método ha sido denominado como el pintor italiano Giuseppe Arcimboldo, quien componía retratos con objetos comunes, tales como libros o vegetales. De modo análogo, nuestro método compone hipótesis estructurales colocando pequeños fragmentos de estructura, cuando la subestructura resultante es suficientemente próxima a la real, la modificación de la densidad electrónica muestra su “retrato”. Si por el contrario la hipótesis es incorrecta, el resultado es un mero “bodegón”. El presente trabajo se centra en el desarrollo de este método y su extensión del uso de fragmentos de estructura seundaria a pequeños plegamientos locales y estructuras terciarias derivadas de modelos de baja homología. • El método se ha caracterizado por necesitar computación masiva para tratar la enorme cantidad de hipótesis generadas, pero en el presente trabajo hemos implementado una versión tan optimizada que resuelve estructuras cristalográficas en una estación de trabajo única • Más allá de las hélices alfa de polialanina, se ha extendido el uso a fragmentos cortados de modelos de baja homología, desarrollando un método para determinar y extraer la subestructura de más similaridad contra los datos experimentales, en concreto la función de rotación. Implementación de SHREDDER. • La extensión del método ab initio del uso de estructura secundaria a terciaria requería utilizar librerías de hipótesis de plegamiento que representaran una vasta colección de posibilidades. Se ha desarrollado una herramienta para generar tales librerías, el programa BORGES y un formalismo subyacente basado en vectores característicos. • Desarrollo de un método e implementación de un programa para resolver estructuras empleando las librerías de plegamientos inespecíficos: ARCIMBOLDO_BORGES. Todos estos objetivos se han cumplido satisfactoriamente.[eng] ARCIMBOLDO is an ab initio phasing method for macromolecular crystallographic X-ray diffraction data, which combines location of model fragments such as polyalanine α- helices with the program PHASER and density modification and main chain autotracing with the program SHELXE. The method has been named after the Italian painter Giuseppe Arcimboldo (1526-1593), who used to compose portraits out of common objects such as fruits and vegetables. Following the analogy, ARCIMBOLDO composes an unknown structure by assembling small secondary structure elements, which are conserved across families of unrelated tertiary structure. Exploiting this method requires a multi-solution approach due to the difficulty to recognize correct solutions at early stages. Moreover, phasing a structure starting from partial information provided by such a small percentage of the total model (around 10% of the main chain atoms) is challenging and requires evaluation of alternative hypotheses under statistical constraints to avoid combinatorial explosion. ARCIMBOLDO methods have proven successful in many cases of previously unknown structures[3] and also on a pool of test structures[4]. The program can accept any Sohnke space group and all the most frequent ones are represented in the pool of structures solved so far. In both studies data were collected in the most common protein space groups. Data quality is crucial for phasing methods, and particularly sensitive for ARCIMBOLDO, where low resolution (worse than 2.1 Å) and lack of completeness (less than 98%) drastically decrease the chance of success. Location of secondary structure elements is not indicated as phasing method for large structures or complexes (over 400 residues) unless very long helices are present and high resolution data are available. Such cases would require the placement of many fragments in order to assemble 10% of the main chain, which can lead to an unmanageable number of solutions. To approach correctly this different scenario we have implemented dedicated methods in ARCIMBOLDO_BORGES[7] and ARCIMBOLDO_SHREDDER[8]. These programs exploit libraries of folds or large search models and are described later in the text. The current implementation[4], coded in Python, is deployed as a standalone binary, freely available under registration from http://chango.ibmb.csic.es/download. The binary is compatible with common Linux distributions and latest versions of the Mac OSX operating system. Users can find online manuals, tutorials and documentation in our website. As of 30th April 2015, it has been downloaded 664 times and distributed to 121 research groups; furthermore, it has been installed in many European synchrotron facilities such as the Alba Synchrotron in Spain, the Diamond Light Source in United Kingdom and SOLEIL Synchrotron in France. The software is also available through SBGrid Consortium (https://sbgrid.org), a network of institutions across 19 countries, which provides a distributed grid network of computers to run structural biology software. We have recently started a collaboration with the San Diego Supercomputer Center (http://www.sdsc.edu) in California (USA), to develop optimized and dedicated versions of the programs for their platform with the aim of addressing difficult phasing cases. Due to this recent spread in the crystallographic community ARCIMBOLDO has been presented in many international conferences such as the International Union of Crystallography Meeting in Madrid (ES) 2011 and in Montreal (CA) 2014; the European Crystallographic Meeting in Bergen (NO) 2012, Warwick (UK) 2013; and many schools and workshops such as the International School of Crystallography in Erice (IT) 2012 and Macromolecular Crystallography School in Madrid (ES) 2014. This thesis is organised in the standard scientific format comprising five main parts: 1. INTRODUCTION: introducing the theoretical topics directly or indirectly related to the contents of the thesis and also discussing the state of the art of current scientific production related to the objective proposed. 2. OBJECTIVES: listing all general goals and particular aims of the doctoral project conducted. 3. MATERIALS AND METHODS: detailing the hardware and software environment, including third party software and algorithms employed in the project. 4. RESULTS AND DISCUSSION: presenting all the produced algorithms, software, experiments and tests that correspond to the prefixed objectives. 5. CONCLUSION: summarising the whole project and listing its achievements by the end of the doctoral studies