9 research outputs found
CRANKITE: a fast polypeptide backbone conformation sampler
Background: CRANKITE is a suite of programs for simulating backbone conformations of polypeptides and proteins. The core of the suite is an efficient Metropolis Monte Carlo sampler of backbone conformations in continuous three-dimensional space in atomic details.
Methods: In contrast to other programs relying on local Metropolis moves in the space of dihedral angles, our sampler utilizes local crankshaft rotations of rigid peptide bonds in Cartesian space.
Results: The sampler allows fast simulation and analysis of secondary structure formation and conformational changes for proteins of average length
Path planning on manifolds using randomized higher-dimensional continuation
Despite the significant advances in path planning methods, problems involving highly constrained spaces are still challenging. In particular, in many situations the configuration space is a non-parametrizable variety implicitly defined by constraints, which complicates the successful generalization of sampling-based path planners. In this paper, we present a new path planning algorithm specially tailored for highly constrained systems. It builds on recently developed tools for Higher-dimensional Continuation, which provide numerical procedures to describe an implicitly defined variety using a set of local charts. We propose to extend these
methods to obtain an efficient path planner on varieties, handling highly constrained
problems. The advantage of this planner comes from that it directly operates into
the configuration space and not into the higher-dimensional ambient space, as most
of the existing methods do.Postprint (author’s final draft
Path planning with loop closure constraints using an atlas-based RRT
In many relevant path planning problems, loop closure constraints reduce the configuration space to a manifold embedded in the higher-dimensional joint ambient space. Whereas many progresses have been done to solve path planning problems in the presence of obstacles, only few work consider loop closure constraints. In this paper we present the AtlasRRT algorithm, a planner specially tailored for such constrained systems that builds on recently developed tools for higher-dimensional continuation. These tools provide procedures to define charts that locally parametrize manifolds and to coordinate them forming an atlas. AtlasRRT simultaneously builds an atlas and a Rapidly-Exploring Random Tree (RRT), using the atlas to sample relevant configurations for the RRT, and the RRT to devise directions of expansion for the atlas. The new planner is advantageous since samples obtained from the atlas allow a more efficient extension of the RRT than state of the art approaches, where samples are generated in the joint ambient space.Peer ReviewedPostprint (author’s final draft
Protein Loop Prediction by Fragment Assembly
If the primary sequence of a protein is known, what is its three-dimensional structure?
This is one of the most challenging problems in molecular biology and has many applications
in proteomics. During the last three decades, this issue has been extensively researched.
Techniques such as the protein folding approach have been demonstrated to be promising in predicting the core areas of proteins - α-helices and β-strands. However, loops that contain no regular units of secondary structure elements remain the most difficult regions for prediction.
The protein loop prediction problem is to predict the spatial structure of a loop given the primary sequence of a protein and the spatial structures of all the other regions. There are two major approaches used to conduct loop prediction – the ab initio folding and database searching methods. The loop prediction accuracy is unsatisfactory because of the hypervariable property of the loops.
The key contribution proposed by this thesis is a novel fragment assembly algorithm using
branch-and-cut to tackle the loop prediction problem. We present various pruning rules to
reduce the search space and to speed up the finding of good loop candidates. The algorithm
has the advantages of the database-search approach and ensures that the predicted loops are physically reasonable. The algorithm also benefits from ab initio folding since it enumerates all the possible loops in the discrete approximation of the conformation space.
We implemented the proposed algorithm as a protein loop prediction tool named LoopLocker.
A test set from CASP6, the world wide protein structure prediction competition, was used to
evaluate the performance of LoopLocker. Experimental results showed that LoopLocker is
capable of predicting loops of 4, 8, 11-12, 13-15 residues with average RMSD errors of
0.452, 1.410, 1.741 and 1.895 A respectively. In the PDB, more than 90% loops are fewer
than 15 residues. This concludes that our fragment assembly algorithm is successful in
tackling the loop prediction problem
Distance-based formulations for the position analysis of kinematic chains
This thesis addresses the kinematic analysis of mechanisms, in particular, the position
analysis of kinematic chains, or linkages, that is, mechanisms with rigid bodies (links)
interconnected by kinematic pairs (joints). This problem, of completely geometrical
nature, consists in finding the feasible assembly modes that a kinematic chain can adopt.
An assembly mode is a possible relative transformation between the links of a kinematic
chain. When an assignment of positions and orientations is made for all links with
respect to a given reference frame, an assembly mode is called a configuration. The
methods reported in the literature for solving the position analysis of kinematic chains
can be classified as graphical, analytical, or numerical.
The graphical approaches are mostly geometrical and designed to solve particular
problems. The analytical and numerical methods deal, in general, with kinematic chains
of any topology and translate the original geometric problem into a system of kinematic analysis of all the Assur kinematic chains resulting from replacing some of its revolute
joints by slider joints. Thus, it is concluded that the polynomials of all fully-parallel
planar robots can be derived directly from that of the widely known 3-RPR robot. In
addition to these results, this thesis also presents an efficient procedure, based on distance
and oriented area constraints, and geometrical arguments, to trace coupler curves
of pin-jointed Gr¨ubler kinematic chains. All these techniques and results together are
contributions to theoretical kinematics of mechanisms, robot kinematics, and distance
plane geometry.
equations that defines the location of each link based, mainly, on independent loop
equations. In the analytical approaches, the system of kinematic equations is reduced
to a polynomial, known as the characteristic polynomial of the linkage, using different
elimination methods —e.g., Gr¨obner bases or resultant techniques. In the numerical
approaches, the system of kinematic equations is solved using, for instance, polynomial
continuation or interval-based procedures.
In any case, the use of independent loop equations to solve the position analysis
of kinematic chains, almost a standard in kinematics of mechanisms, has seldom been
questioned despite the resulting system of kinematic equations becomes quite involved
even for simple linkages. Moreover, stating the position analysis of kinematic chains
directly in terms of poses, with or without using independent loop equations, introduces
two major disadvantages: arbitrary reference frames has to be included, and all formulas
involve translations and rotations simultaneously. This thesis departs from this standard
approach by, instead of directly computing Cartesian locations, expressing the original
position problem as a system of distance-based constraints that are then solved using
analytical and numerical procedures adapted to their particularities.
In favor of developing the basics and theory of the proposed approach, this thesis
focuses on the study of the most fundamental planar kinematic chains, namely, Baranov
trusses, Assur kinematic chains, and pin-jointed Gr¨ubler kinematic chains. The results
obtained have shown that the novel developed techniques are promising tools for the
position analysis of kinematic chains and related problems. For example, using these
techniques, the characteristic polynomials of most of the cataloged Baranov trusses can
be obtained without relying on variable eliminations or trigonometric substitutions and
using no other tools than elementary algebra. An outcome in clear contrast with the
complex variable eliminations require when independent loop equations are used to tackle
the problem.
The impact of the above result is actually greater because it is shown that the
characteristic polynomial of a Baranov truss, derived using the proposed distance-based
techniques, contains all the necessary and sufficient information for solving the positionEsta tesis aborda el problema de análisis de posición de cadenas cinemáticas, mecanismos con cuerpos rígidos (enlaces)
interconectados por pares cinemáticos (articulaciones). Este problema, de naturaleza geométrica, consiste en encontrar los
modos de ensamblaje factibles que una cadena cinemática puede adoptar. Un modo de ensamblaje es una transformación
relativa posible entre los enlaces de una cadena cinemática. Los métodos reportados en la literatura para la solución del análisis
de posición de cadenas cinemáticas se pueden clasificar como gráficos, analíticos o numéricos.
Los enfoques gráficos son geométricos y se diseñan para resolver problemas particulares. Los métodos analíticos y numéricos
tratan con cadenas cinemáticas de cualquier topología y traducen el problema geométrico original en un sistema de ecuaciones
cinemáticas que define la ubicación de cada enlace, basado generalmente en ecuaciones de bucle independientes. En los
enfoques analíticos, el sistema de ecuaciones cinemáticas se reduce a un polinomio, conocido como el polinomio característico
de la cadena cinemática, utilizando diferentes métodos de eliminación. En los métodos numéricos, el sistema se resuelve
utilizando, por ejemplo, la continuación polinomial o procedimientos basados en intervalos.
En cualquier caso, el uso de ecuaciones de bucle independientes, un estándar en cinemática de mecanismos, rara vez ha sido
cuestionado a pesar de que el sistema resultante de ecuaciones es bastante complicado, incluso para cadenas simples. Por otra
parte, establecer el análisis de la posición de cadenas cinemáticas directamente en términos de poses, con o sin el uso de
ecuaciones de bucle independientes, presenta dos inconvenientes: sistemas de referencia arbitrarios deben ser introducidos, y
todas las fórmulas implican traslaciones y rotaciones de forma simultánea. Esta tesis se aparta de este enfoque estándar
expresando el problema de posición original como un sistema de restricciones basadas en distancias, en lugar de directamente
calcular posiciones cartesianas. Estas restricciones son posteriormente resueltas con procedimientos analíticos y numéricos
adaptados a sus particularidades.
Con el propósito de desarrollar los conceptos básicos y la teoría del enfoque propuesto, esta tesis se centra en el estudio de las
cadenas cinemáticas planas más fundamentales, a saber, estructuras de Baranov, cadenas cinemáticas de Assur, y cadenas
cinemáticas de Grübler. Los resultados obtenidos han demostrado que las técnicas desarrolladas son herramientas
prometedoras para el análisis de posición de cadenas cinemáticas y problemas relacionados. Por ejemplo, usando dichas
técnicas, los polinomios característicos de la mayoría de las estructuras de Baranov catalogadas se puede obtener sin realizar
eliminaciones de variables o sustituciones trigonométricas, y utilizando solo álgebra elemental. Un resultado en claro contraste
con las complejas eliminaciones de variables que se requieren cuando se utilizan ecuaciones de bucle independientes.
El impacto del resultado anterior es mayor porque se demuestra que el polinomio característico de una estructura de Baranov,
derivado con las técnicas propuestas, contiene toda la información necesaria y suficiente para resolver el análisis de posición de
las cadenas cinemáticas de Assur que resultan de la sustitución de algunas de sus articulaciones de revolución por
articulaciones prismáticas. De esta forma, se concluye que los polinomios de todos los robots planares totalmente paralelos se
pueden derivar directamente del polinomio característico del conocido robot 3-RPR. Adicionalmente, se presenta un
procedimiento eficaz, basado en restricciones de distancias y áreas orientadas, y argumentos geométricos, para trazar curvas
de acoplador de cadenas cinemáticas de Grübler. En conjunto, todas estas técnicas y resultados constituyen contribuciones a la
cinemática teórica de mecanismos, la cinemática de robots, y la geometría plana de distancias.
Barcelona 13
De Novo Protein Structure Modeling from Cryoem Data Through a Dynamic Programming Algorithm in the Secondary Structure Topology Graph
Proteins are the molecules carry out the vital functions and make more than the half of dry weight in every cell. Protein in nature folds into a unique and energetically favorable 3-Dimensional (3-D) structure which is critical and unique to its biological function. In contrast to other methods for protein structure determination, Electron Cryorricroscopy (CryoEM) is able to produce volumetric maps of proteins that are poorly soluble, large and hard to crystallize. Furthermore, it studies the proteins in their native environment. Unfortunately, the volumetric maps generated by current advances in CryoEM technique produces protein maps at medium resolution about (~5 to 10Å) in which it is hard to determine the atomic-structure of the protein. However, the resolution of the volumetric maps is improving steadily, and recent works could obtain atomic models at higher resolutions (~3Å).
De novo protein modeling is the process of building the structure of the protein using its CryoEM volumetric map. Thereupon, the volumetric maps at medium resolution generated by CryoEM technique proposed a new challenge. At the medium resolution, the location and orientation of secondary structure elements (SSE) can be visually and computationally identified. However, the order and direction (called protein topology) of the SSEs detected from the CryoEM volumetric map are not visible. In order to determine the protein structure, the topology of the SSEs has to be figured out and then the backbone can be built. Consequently, the topology problem has become a bottle neck for protein modeling using CryoEM
In this dissertation, we focus to establish an effective computational framework to derive the atomic structure of a protein from the medium resolution CryoEM volumetric maps. This framework includes a topology graph component to rank effectively the topologies of the SSEs and a model building component. In order to generate the small subset of candidate topologies, the problem is translated into a layered graph representation. We developed a dynamic programming algorithm (TopoDP) for the new representation to overcome the problem of large search space. Our approach shows the improved accuracy, speed and memory use when compared with existing methods. However, the generating of such set was infeasible using a brute force method. Therefore, the topology graph component effectively reduces the topological space using the geometrical features of the secondary structures through a constrained K-shortest paths method in our layered graph. The model building component involves the bending of a helix and the loop construction using skeleton of the volumetric map. The forward-backward CCD is applied to bend the helices and model the loops
Estudio de la evolución estructural en familias de proteínas y su aplicación al refinado de modelos obtenidos por homología
Tesis doctoral inédita leída en la Universidad Autónoma de Madrid, Facultad de Ciencias, Departamento de Biología Molecular. Fecha de lectura: 20 de octubre de 2006El refinado estructural de proteínas continúa siendo un reto importante en el campo de la
predicción estructural. La mayoría de los intentos de refinar modelos conducen a su
degradación, en lugar de a la mejora de su calidad, de manera que muchos protocolos omiten
este paso final. Incluso en ausencia de errores en los alineamientos y usando las plantillas
óptimas, se ha demostrado que los métodos de modelado basados en patrones tienen
limitaciones intrínsecas, lo que sugiere la necesidad de desarrollar otras metodologías si el
objetivo es mejorar la calidad final de los modelos propuestos. Las dificultades del refinado
estructural se derivan del delicado balance de fuerzas en el estado nativo de las proteínas, que
todavía no es reproducible en toda su extensión mediante los campos de fuerza actuales, y de la
necesidad de muestrear un gran número de conformaciones alternativas en la búsqueda del
mínimo global de energía. En esta tesis se aborda esta segunda cuestión. Se presenta un nuevo
algoritmo de alineamiento estructural múltiple, MAMMOTH-mult, que permite detectar las
regiones estructuralmente conservadas en familias de proteínas y se estudia su plasticidad
mediante análisis de componentes principales y de modos normales. Esto permite caracterizar
las deformaciones más importantes que experimentan las estructuras a lo largo de la evolución y
las debidas a su propia topología. Se observa que cada familia de proteínas homólogas presenta
un patrón de evolución estructural característico, que está fundamentalmente relacionado con la
propia topología de la estructura y no con los detalles de la secuencia. Estos patrones de
deformación se utilizan para ayudar a facilitar el problema del muestreo en el refinado. Se
observa que se puede resolver este problema de manera esencial para la cadena principal de las
estructuras definiendo un subespacio pequeño, de unas 50 dimensiones, consistente en una
combinación de direcciones favorecidas por la evolución, definidas por los componentes
principales de la variación estructural dentro de las familias de proteínas homólogas, y las
direcciones de vibración derivadas del análisis de sus modos normales. La mayoría de los
centros estructurales de las proteínas en este subespacio combinado se puede representar con
menos de 1 Å de RMSD con respecto a sus posiciones correctas. También se muestra que las
optimizaciones de intercambio de réplicas de Monte Carlo son muy eficientes para encontrar el
mínimo global en este subespacio. Finalmente, se discuten las aplicaciones de esta metodología.Structural refinement of protein models remains as a particularly challenging problem in
protein structure prediction. Most attempts to refining comparative models lead to degradation
rather than improvement in model quality, so most current comparative modelling procedures
omit the refinement step. However, it has been shown that even in absence of alignment errors
and using optimal templates, template-only methods have intrinsic limitations, suggesting that
other methodologies must be developed if accuracy is ultimately to be improved. It is thought
that these difficulties originate from the delicate balance of forces in the native state and the
requirement to sample a large number of alternative tightly packed conformations in the search
for the global minimum. Here we address this second issue. We present a new algorithm,
MAMMOTH-mult, for multiple structural alignment, that allows to detect structural conserved
regions in protein families. Applying principal components and normal mode analysis to these
regions allows the caracterization of the most important deformations that structures experiment
along the evolution and those which are due to their own topologies. We find that each family
of homologous proteins has a characteristic template of structural evolution related to its own
structure topology rather than to sequence details. We use this information for helping to solve
the sampling problem. We show this problem can be essentially solved at the backbone level by
defining a small sampling subspace, of 50 dimensions at most, consisting on a combination of
evolutionarily favoured directions defined by the principal components of structural variation
within a family of homologous proteins and their topological vibrational directions derived from
normal mode analyses. Most protein cores in this combined space can be represented within 1 Å
accuracy. We also show that Replica Exchange Monte Carlo optimizations in this subspace are
very efficient at finding the global minimum neighbourhood in realistic conditions of roughness
of the energy landscape. Applications of this methodology are finally discussed
Protein structure prediction: improving and automating knowledge-based approaches
This work presents a computational approach to improve the automatic prediction of protein structures from sequence. Its main focus was twofold. An automated method for guiding the modeling process was first developed. This was tested and found to be state of the art in the CASP4 structure prediction contest in 2000. The second focus was the development of a novel divide and conquer algorithm for modeling flexible loops in proteins. Implementation of the search procedure and subsequent ranking is presented. The results are again compared with state of the art methods