1,182 research outputs found

    Towards Automating Protein Structure Determination from NMR Data

    Get PDF
    Nuclear magnetic resonance (NMR) spectroscopy technique is becoming exceedingly significant due to its capability of studying protein structures in solution. However, NMR protein structure determination has remained a laborious and costly process until now, even with the help of currently available computer programs. After the NMR spectra are collected, the main road blocks to the fully automated NMR protein structure determination are peak picking from noisy spectra, resonance assignment from imperfect peak lists, and structure calculation from incomplete assignment and ambiguous nuclear Overhauser enhancements (NOE) constraints. The goal of this dissertation is to propose error-tolerant and highly-efficient methods that work well on real and noisy data sets of NMR protein structure determination and the closely related protein structure prediction problems. One major contribution of this dissertation is to propose a fully automated NMR protein structure determination system, AMR, with emphasis on the parts that I contributed. AMR only requires an input set with six NMR spectra. We develop a novel peak picking method, PICKY, to solve the crucial but tricky peak picking problem. PICKY consists of a noise level estimation step, a component forming step, a singular value decomposition-based initial peak picking step, and a peak refinement step. The first systematic study on peak picking problem is conducted to test the performance of PICKY. An integer linear programming (ILP)-based resonance assignment method, IPASS, is then developed to handle the imperfect peak lists generated by PICKY. IPASS contains an error-tolerant spin system forming method and an ILP-based assignment method. The assignment generated by IPASS is fed into the structure calculation step, FALCON-NMR. FALCON-NMR has a threading module, an ab initio module, an all-atom refinement module, and an NOE constraints-based decoy selection module. The entire system, AMR, is successfully tested on four out of five real proteins with practical NMR spectra, and generates 1.25A, 1.49A, 0.67A, and 0.88A to the native reference structures, respectively. Another contribution of this dissertation is to propose novel ideas and methods to solve three protein structure prediction problems which are closely related to NMR protein structure determination. We develop a novel consensus contact prediction method, which is able to eliminate server correlations, to solve the protein inter-residue contact prediction problem. We also propose an ultra-fast side chain packing method, which only uses local backbone information, to solve the protein side chain packing problem. Finally, two complementary local quality assessment methods are proposed to solve the local quality prediction problem for comparative modeling-based protein structure prediction methods

    Research on knowledge representation, machine learning, and knowledge acquisition

    Get PDF
    Research in knowledge representation, machine learning, and knowledge acquisition performed at Knowledge Systems Lab. is summarized. The major goal of the research was to develop flexible, effective methods for representing the qualitative knowledge necessary for solving large problems that require symbolic reasoning as well as numerical computation. The research focused on integrating different representation methods to describe different kinds of knowledge more effectively than any one method can alone. In particular, emphasis was placed on representing and using spatial information about three dimensional objects and constraints on the arrangement of these objects in space. Another major theme is the development of robust machine learning programs that can be integrated with a variety of intelligent systems. To achieve this goal, learning methods were designed, implemented and experimented within several different problem solving environments

    XML in Motion from Genome to Drug

    Get PDF
    Information technology (IT) has emerged as a central to the solution of contemporary genomics and drug discovery problems. Researchers involved in genomics, proteomics, transcriptional profiling, high throughput structure determination, and in other sub-disciplines of bioinformatics have direct impact on this IT revolution. As the full genome sequences of many species, data from structural genomics, micro-arrays, and proteomics became available, integration of these data to a common platform require sophisticated bioinformatics tools. Organizing these data into knowledgeable databases and developing appropriate software tools for analyzing the same are going to be major challenges. XML (eXtensible Markup Language) forms the backbone of biological data representation and exchange over the internet, enabling researchers to aggregate data from various heterogeneous data resources. The present article covers a comprehensive idea of the integration of XML on particular type of biological databases mainly dealing with sequence-structure-function relationship and its application towards drug discovery. This e-medical science approach should be applied to other scientific domains and the latest trend in semantic web applications is also highlighted

    Crystallisation and characterisation of muscle proteins: a mini-review

    Get PDF
    The techniques of X-ray protein crystallography, NMR and high-resolution cryo-electron microscopy have all been used to determine the high-resolution structure of proteins. The most-commonly used method, however, remains X-ray crystallography but it does rely heavily on the production of suitable crystals. Indeed, the production of diffraction quality crystals remains the rate-limiting step for most protein systems. This mini-review highlights the crystallisation trials that used existing and newly developed crystallisation methods on two muscle protein targets - the actin binding domain (ABD) of α-actinin and the C0-C1 domain of human cardiac myosin binding protein C (cMyBP-C). Furthermore, using heterogenous nucleating agents the crystallisation of the C1 domain of cMyBP-C was successfully achieved in house along with preliminary actin binding studies using electron microscopy and co-sedimentation assays

    Optimization of Protein-Protein Interaction Measurements for Drug Discovery Using AFM Force Spectroscopy

    Get PDF
    Increasingly targeted in drug discovery, protein-protein interactions challenge current high throughput screening technologies in the pharmaceutical industry. Developing an effective and efficient method for screening small molecules or compounds is critical to accelerate the discovery of ligands for enzymes, receptors and other pharmaceutical targets. Here, we report developments of methods to increase the signal-to-noise ratio (SNR) for screening protein-protein interactions using atomic force microscopy (AFM) force spectroscopy. We have demonstrated the effectiveness of these developments on detecting the binding process between focal adhesion kinases (FAK) with protein kinase B (Akt1), which is a target for potential cancer drugs. These developments include optimized probe and substrate functionalization processes and redesigned probe-substrate contact regimes. Furthermore, a statistical-based data processing method was developed to enhance the contrast of the experimental data. Collectively, these results demonstrate the potential of the AFM force spectroscopy in automating drug screening with high throughput

    CS23D: a web server for rapid protein structure generation using NMR chemical shifts and sequence data

    Get PDF
    CS23D (chemical shift to 3D structure) is a web server for rapidly generating accurate 3D protein structures using only assigned nuclear magnetic resonance (NMR) chemical shifts and sequence data as input. Unlike conventional NMR methods, CS23D requires no NOE and/or J-coupling data to perform its calculations. CS23D accepts chemical shift files in either SHIFTY or BMRB formats, and produces a set of PDB coordinates for the protein in about 10–15 min. CS23D uses a pipeline of several preexisting programs or servers to calculate the actual protein structure. Depending on the sequence similarity (or lack thereof) CS23D uses either (i) maximal subfragment assembly (a form of homology modeling), (ii) chemical shift threading or (iii) shift-aided de novo structure prediction (via Rosetta) followed by chemical shift refinement to generate and/or refine protein coordinates. Tests conducted on more than 100 proteins from the BioMagResBank indicate that CS23D converges (i.e. finds a solution) for >95% of protein queries. These chemical shift generated structures were found to be within 0.2–2.8 Å RMSD of the NMR structure generated using conventional NOE-base NMR methods or conventional X-ray methods. The performance of CS23D is dependent on the completeness of the chemical shift assignments and the similarity of the query protein to known 3D folds. CS23D is accessible at http://www.cs23d.ca

    The FEATURE framework for protein function annotation: modeling new functions, improving performance, and extending to novel applications

    Get PDF
    Structural genomics efforts contribute new protein structures that often lack significant sequence and fold similarity to known proteins. Traditional sequence and structure-based methods may not be sufficient to annotate the molecular functions of these structures. Techniques that combine structural and functional modeling can be valuable for functional annotation. FEATURE is a flexible framework for modeling and recognition of functional sites in macromolecular structures. Here, we present an overview of the main components of the FEATURE framework, and describe the recent developments in its use. These include automating training sets selection to increase functional coverage, coupling FEATURE to structural diversity generating methods such as molecular dynamics simulations and loop modeling methods to improve performance, and using FEATURE in large-scale modeling and structure determination efforts
    corecore