186 research outputs found

    Skewed Factor Models Using Selection Mechanisms

    Get PDF
    Traditional factor models explicitly or implicitly assume that the factors follow a multivariate normal distribution; that is, only moments up to order two are involved. However, it may happen in real data problems that the first two moments cannot explain the factors. Based on this motivation, here we devise three new skewed factor models, the skew-normal, the skew-t, and the generalized skew-normal factor models depending on a selection mechanism on the factors. The ECME algorithms are adopted to estimate related parameters for statistical inference. Monte Carlo simulations validate our new models and we demonstrate the need for skewed factor models using the classic open/closed book exam scores dataset

    Collective estimation of multiple bivariate density functions with application to angular-sampling-based protein loop modeling

    Get PDF
    This article develops a method for simultaneous estimation of density functions for a collection of populations of protein backbone angle pairs using a data-driven, shared basis that is constructed by bivariate spline functions defined on a triangulation of the bivariate domain. The circular nature of angular data is taken into account by imposing appropriate smoothness constraints across boundaries of the triangles. Maximum penalized likelihood is used to fit the model and an alternating blockwise Newton-type algorithm is developed for computation. A simulation study shows that the collective estimation approach is statistically more efficient than estimating the densities individually. The proposed method was used to estimate neighbor-dependent distributions of protein backbone dihedral angles (i.e., Ramachandran distributions). The estimated distributions were applied to protein loop modeling, one of the most challenging open problems in protein structure prediction, by feeding them into an angular-sampling-based loop structure prediction framework. Our estimated distributions compared favorably to the Ramachandran distributions estimated by fitting a hierarchical Dirichlet process model; and in particular, our distributions showed significant improvements on the hard cases where existing methods do not work well

    CONSTRUCTION OF COMPUTATIONAL 3D STRUCTURES OF PROTEIN DRUG TARGETS OF MYCOBACTERIUM TUBERCULOSIS

    Get PDF
    Objective: This study aims in constructing a three-dimensional modeled protein structure of potential drug targets in Mycobacterium tuberculosis bacteria. Methods: The protein models were constructed using SWISS-Model online tool. The constructed protein models were submitted in online database called Protein Model Database (PMDB) for public access to the structures. Results: A total of 100 protein sequences of M. tuberculosis were retrieved from UniProt database and were subjected for sequence similarity search and homology model construction. The constructed models were subjected for Ramachandran plot analysis to validate the quality of the structures. A total of 69 structures were considered to be of significant quality and were submitted to the online database PMDB. Conclusion: These predicted structures would help greatly in identification and drug design to various strains of M. tuberculosis that are sensitive and resistant to different antibiotics. This would greatly help in drug development and personalized drug treatment against different strains of the pathogen. This database would significantly support the structure-based computational drug design applications toward personalized medicine in regard to differences in the various strains of the pathogen

    Automatic rebuilding and optimization of crystallographic structures in the Protein Data Bank

    Get PDF
    Motivation: Macromolecular crystal structures in the Protein Data Bank (PDB) are a key source of structural insight into biological processes. These structures, some >30 years old, were constructed with methods of their era. With PDB_REDO, we aim to automatically optimize these structures to better fit their corresponding experimental data, passing the benefits of new methods in crystallography on to a wide base of non-crystallographer structure users

    Traditional Biomolecular Structure Determination by NMR Spectroscopy Allows for Major Errors

    Get PDF
    One of the major goals of structural genomics projects is to determine the three-dimensional structure of representative members of as many different fold families as possible. Comparative modeling is expected to fill the remaining gaps by providing structural models of homologs of the experimentally determined proteins. However, for such an approach to be successful it is essential that the quality of the experimentally determined structures is adequate. In an attempt to build a homology model for the protein dynein light chain 2A (DLC2A) we found two potential templates, both experimentally determined nuclear magnetic resonance (NMR) structures originating from structural genomics efforts. Despite their high sequence identity (96%), the folds of the two structures are markedly different. This urged us to perform in-depth analyses of both structure ensembles and the deposited experimental data, the results of which clearly identify one of the two models as largely incorrect. Next, we analyzed the quality of a large set of recent NMR-derived structure ensembles originating from both structural genomics projects and individual structure determination groups. Unfortunately, a visual inspection of structures exhibiting lower quality scores than DLC2A reveals that the seriously flawed DLC2A structure is not an isolated incident. Overall, our results illustrate that the quality of NMR structures cannot be reliably evaluated using only traditional experimental input data and overall quality indicators as a reference and clearly demonstrate the urgent need for a tight integration of more sophisticated structure validation tools in NMR structure determination projects. In contrast to common methodologies where structures are typically evaluated as a whole, such tools should preferentially operate on a per-residue basis

    A series of PDB related databases for everyday needs

    Get PDF
    The Protein Data Bank (PDB) is the world-wide repository of macromolecular structure information. We present a series of databases that run parallel to the PDB. Each database holds one entry, if possible, for each PDB entry. DSSP holds the secondary structure of the proteins. PDBREPORT holds reports on the structure quality and lists errors. HSSP holds a multiple sequence alignment for all proteins. The PDBFINDER holds easy to parse summaries of the PDB file content, augmented with essentials from the other systems. PDB_REDO holds re-refined, and often improved, copies of all structures solved by X-ray. WHY_NOT summarizes why certain files could not be produced. All these systems are updated weekly. The data sets can be used for the analysis of properties of protein structures in areas ranging from structural genomics, to cancer biology and protein design

    Neighbor-Dependent Ramachandran Probability Distributions of Amino Acids Developed from a Hierarchical Dirichlet Process Model

    Get PDF
    Distributions of the backbone dihedral angles of proteins have been studied for over 40 years. While many statistical analyses have been presented, only a handful of probability densities are publicly available for use in structure validation and structure prediction methods. The available distributions differ in a number of important ways, which determine their usefulness for various purposes. These include: 1) input data size and criteria for structure inclusion (resolution, R-factor, etc.); 2) filtering of suspect conformations and outliers using B-factors or other features; 3) secondary structure of input data (e.g., whether helix and sheet are included; whether beta turns are included); 4) the method used for determining probability densities ranging from simple histograms to modern nonparametric density estimation; and 5) whether they include nearest neighbor effects on the distribution of conformations in different regions of the Ramachandran map. In this work, Ramachandran probability distributions are presented for residues in protein loops from a high-resolution data set with filtering based on calculated electron densities. Distributions for all 20 amino acids (with cis and trans proline treated separately) have been determined, as well as 420 left-neighbor and 420 right-neighbor dependent distributions. The neighbor-independent and neighbor-dependent probability densities have been accurately estimated using Bayesian nonparametric statistical analysis based on the Dirichlet process. In particular, we used hierarchical Dirichlet process priors, which allow sharing of information between densities for a particular residue type and different neighbor residue types. The resulting distributions are tested in a loop modeling benchmark with the program Rosetta, and are shown to improve protein loop conformation prediction significantly. The distributions are available at http://dunbrack.fccc.edu/hdp

    Feature- Based and String-Based Models for Predicting RNA-Protein Interaction

    Get PDF
    In this work, we study two approaches for the problem of RNA-Protein Interaction (RPI). In the first approach, we use a feature-based technique by combining extracted features from both sequences and secondary structures. The feature-based approach enhanced the prediction accuracy as it included much more available information about the RNA-protein pairs. In the second approach, we apply search algorithms and data structures to extract effective string patterns for prediction of RPI, using both sequence information (protein and RNA sequences), and structure information (protein and RNA secondary structures). This led to different string-based models for predicting interacting RNA-protein pairs. We show results that demonstrate the effectiveness of the proposed approaches, including comparative results against leading state-of-the-art methods

    Characterization of Novel StAR (Steroidogenic Acute Regulatory Protein) Mutations Causing Non-Classic Lipoid Adrenal Hyperplasia

    Get PDF
    Context Steroidogenic acute regulatory protein (StAR) is crucial for transport of cholesterol to mitochondria where biosynthesis of steroids is initiated. Loss of StAR function causes lipoid congenital adrenal hyperplasia (LCAH). Objective StAR gene mutations causing partial loss of function manifest atypical and may be mistaken as familial glucocorticoid deficiency. Only a few mutations have been reported. Design To report clinical, biochemical, genetic, protein structure and functional data on two novel StAR mutations, and to compare them with published literature. Setting Collaboration between the University Children's Hospital Bern, Switzerland, and the CIBERER, Hospital Vall d'Hebron, Autonomous University, Barcelona, Spain. Patients Two subjects of a non-consanguineous Caucasian family were studied. The 46,XX phenotypic normal female was diagnosed with adrenal insufficiency at the age of 10 months, had normal pubertal development and still has no signs of hypergonodatropic hypogonadism at 32 years of age. Her 46,XY brother was born with normal male external genitalia and was diagnosed with adrenal insufficiency at 14 months. Puberty was normal and no signs of hypergonadotropic hypogonadism are present at 29 years of age. Results StAR gene analysis revealed two novel compound heterozygote mutations T44HfsX3 and G221S. T44HfsX3 is a loss-of-function StAR mutation. G221S retains partial activity (~30%) and is therefore responsible for a milder, non-classic phenotype. G221S is located in the cholesterol binding pocket and seems to alter binding/release of cholesterol. Conclusions StAR mutations located in the cholesterol binding pocket (V187M, R188C, R192C, G221D/S) seem to cause non-classic lipoid CAH. Accuracy of genotype-phenotype prediction by in vitro testing may vary with the assays employed
    corecore