thesis

Protein contour modelling and computation for complementarity detection and docking

Abstract

The aim of this thesis is the development and application of a model that effectively and efficiently integrates the evaluation of geometric and electrostatic complementarity for the protein-protein docking problem. Proteins perform their biological roles by interacting with other biomolecules and forming macromolecular complexes. The structural characterization of protein complexes is important to understand the underlying biological processes. Unfortunately, there are several limitations to the available experimental techniques, leaving the vast majority of these complexes to be determined by means of computational methods such as protein-protein docking. The ultimate goal of the protein-protein docking problem is the in silico prediction of the three-dimensional structure of complexes of two or more interacting proteins, as occurring in living organisms, which can later be verified in vitro or in vivo. These interactions are highly specific and take place due to the simultaneous formation of multiple weak bonds: the geometric complementarity of the contours of the interacting molecules is a fundamental requirement in order to enable and maintain these interactions. However, shape complementarity alone cannot guarantee highly accurate docking predictions, as there are several physicochemical factors, such as Coulomb potentials, van der Waals forces and hydrophobicity, affecting the formation of protein complexes. In order to set up correct and efficient methods for the protein-protein docking, it is necessary to provide a unique representation which integrates geometric and physicochemical criteria in the complementarity evaluation. To this end, a novel local surface descriptor, capable of capturing both the shape and electrostatic distribution properties of macromolecular surfaces, has been designed and implemented. The proposed methodology effectively integrates the evaluation of geometrical and electrostatic distribution complementarity of molecular surfaces, while maintaining efficiency in the descriptor comparison phase. The descriptor is based on the 3D Zernike invariants which possess several attractive features, such as a compact representation, rotational and translational invariance and have been shown to adequately capture global and local protein surface shape similarity and naturally represent physicochemical properties on the molecular surface. Locally, the geometric similarity between two portions of protein surface implies a certain degree of complementarity, but the same cannot be stated about electrostatic distributions. Complementarity in electrostatic distributions is more complex to handle, as charges must be matched with opposite ones even if they do not have the same magnitude. The proposed method overcomes this limitation as follows. From a unique electrostatic distribution function, two separate distribution functions are obtained, one for the positive and one for the negative charges, and both functions are normalised in [0, 1]. Descriptors are computed separately for the positive and negative charge distributions, and complementarity evaluation is then done by cross-comparing descriptors of distributions of charges of opposite signs. The proposed descriptor uses a discrete voxel-based representation of the Connolly surface on which the corresponding electrostatic potentials have been mapped. Voxelised surface representations have received a lot of interest in several bioinformatics and computational biology applications as a simple and effective way of jointly representing geometric and physicochemical properties of proteins and other biomolecules by mapping auxiliary information in each voxel. Moreover, the voxel grid can be defined at different resolutions, thus giving the means to effectively control the degree of detail in the discrete representation along with the possibility of producing multiple representations of the same molecule at different resolutions. A specific algorithm has been designed for the efficient computation of voxelised macromolecular surfaces at arbitrary resolutions, starting from experimentally-derived structural data (X-ray crystallography, NMR spectroscopy or cryo-electron microscopy). Fast surface generation is achieved by adapting an approximate Euclidean Distance Transform algorithm in the Connolly surface computation step and by exploiting the geometrical relationship between the latter and the Solvent Accessible surface. This algorithm is at the base of VoxSurf (Voxelised Surface calculation program), a tool which can produce discrete representations of macromolecules at very high resolutions starting from the three-dimensional information of their corresponding PDB files. By employing compact data structures and implementing a spatial slicing protocol, the proposed tool can calculate the three main molecular surfaces at high resolutions with limited memory demands. To reduce the surface computation time without affecting the accuracy of the representation, two parallel algorithms for the computation of voxelised macromolecular surfaces, based on a spatial slicing procedure, have been introduced. The molecule is sliced in a user-defined number of parts and the portions of the overall surface can be calculated for each slice in parallel. The molecule is sliced with planes perpendicular to the abscissa axis of the Cartesian coordinate system defined in the molecule's PDB entry. The first algorithms uses an overlapping margin of one probe-sphere radius length among slices in order to guarantee the correctness of the Euclidean Distance Transform. Because of this margin, the Connolly surface can be computed nearly independently for each slice. Communications among processes are necessary only during the pocket identification procedure which ensures that pockets spanning through more than one slice are correctly identified and discriminated from solvent-excluded cavities inside the molecule. In the second parallel algorithm the size of the overlapping margin between slices has been reduced to a one-voxel length by adapting a multi-step region-growing Euclidean Distance Transform algorithm. At each step, distance values are first calculated independently for every slice, then, a small portion of the borders' information is exchanged between adjacent slices. The proposed methodologies will serve as a basis for a full-fledged protein-protein docking protocol based on local feature matching. Rigorous benchmark tests have shown that the combined geometric and electrostatic descriptor can effectively identify shape and electrostatic distribution complementarity in the binding sites of protein-protein complexes, by efficiently comparing circular surface patches and significantly decreasing the number of false positives obtained when using a purely-geometric descriptor. In the validation experiments, the contours of the two interacting proteins are divided in circular patches: all possible patch pairs from the two proteins are then evaluated in terms of complementarity and a general ranking is produced. Results show that native patch pairs obtain higher ranks when using the newly proposed descriptor, with respect to the ranks obtained when using the purely-geometric one

    Similar works