38 research outputs found
Recommended from our members
Towards the integration of structural and systems biology: structure-based studies of protein-protein interactions on a genome-wide scale
Knowledge of protein-protein interactions (PPIs) is essential to understanding regulatory processes in a cell. High-throughput experimental methods have made significant contributions to PPI determination, but they are known to have many false positives and fail to identify a signification portion of bona fide interactions. The same is true for the many computational tools that have been developed. Significantly, although protein structures provide atomic details of PPIs, they have had relatively little impact in large-scale PPI predictions and there has been only limited overlap between structural and systems biology. Here in this thesis, I present our progress in combining structural biology and systems biology in the context of studies analyzing, coarse-grained modeling and prediction of protein-protein interactions.
I first report a comprehensive analysis of the degree to which the location of a protein interface is conserved in sets of proteins that share different levels of similarities. Our results show that while, in general, the interface conservation is most significant among close neighbors, it is still significant even for remote structural neighbors. Based on this finding, we designed PredUs, a method to predict protein interface simply by "mapping" the interface information from its structural neighbors (i.e., "templates") to the target structure. We developed the PredUs web server to predict protein interfaces using this "template-based" method and a support vector machine (SVM) to further improve predictions. The PredUs webserver outperforms other state-of-the-art methods that are typically based on amino acid properties in terms of both prediction precision and recall. Meanwhile, PredUs runs very fast and can be used to study protein interfaces in a high throughput fashion. Maybe more importantly, it is not sensitive to local conformational changes and small errors in structures and thus can be applied to predict interface of protein homology models, when experimental structures are not available.
I then describe a novel structural modeling method that uses geometric relationships between protein structures, including both PDB structures and homology models, to accurately predict PPIs on a genome-wide scale. We applied the method with considerable success to both the yeast and the human genomes. We found that the accuracy and the coverage of our structure-based prediction compare favorably with the methods derived from sequence and functional clues, e.g. sequence similarity, co-expression, phylogenetic similarity, etc. Results further improve when using a naive Bayesian classifier to combine structural information with non-structural clues (PREPPI), yielding predictions of comparable quality to high-throughput experiments. Our data further suggests that PREPPI predictions are substantially complementary to those by experimental methods thus providing a way to dissect interactions that would be hard to identify on a purely high-throughput experimental basis.
We have for the first time designed a "template-based" method that predicts protein interface with high precision and recall. We have also for the first time used 3D structure as part of the repertoire of experimental and computational information and find a way to accurately infer PPIs on a large scale. The success of PredUs and PREPPI can be attributed to the exploitation of both the information contained in imperfect models and the remote structure-function relationships between proteins that have been usually considered to be unrelated. Our results constitute a significant paradigm shift in both structural and systems biology and suggest that they can be integrated to an extent that has not been possible in the past
A^2-Net: Molecular Structure Estimation from Cryo-EM Density Volumes
Constructing of molecular structural models from Cryo-Electron Microscopy
(Cryo-EM) density volumes is the critical last step of structure determination
by Cryo-EM technologies. Methods have evolved from manual construction by
structural biologists to perform 6D translation-rotation searching, which is
extremely compute-intensive. In this paper, we propose a learning-based method
and formulate this problem as a vision-inspired 3D detection and pose
estimation task. We develop a deep learning framework for amino acid
determination in a 3D Cryo-EM density volume. We also design a sequence-guided
Monte Carlo Tree Search (MCTS) to thread over the candidate amino acids to form
the molecular structure. This framework achieves 91% coverage on our newly
proposed dataset and takes only a few minutes for a typical structure with a
thousand amino acids. Our method is hundreds of times faster and several times
more accurate than existing automated solutions without any human intervention.Comment: 8 pages, 5 figures, 4 table
Recommended from our members
SCALE method for single-cell ATAC-seq analysis via latent feature extraction.
Single-cell ATAC-seq (scATAC-seq) profiles the chromatin accessibility landscape at single cell level, thus revealing cell-to-cell variability in gene regulation. However, the high dimensionality and sparsity of scATAC-seq data often complicate the analysis. Here, we introduce a method for analyzing scATAC-seq data, called Single-Cell ATAC-seq analysis via Latent feature Extraction (SCALE). SCALE combines a deep generative framework and a probabilistic Gaussian Mixture Model to learn latent features that accurately characterize scATAC-seq data. We validate SCALE on datasets generated on different platforms with different protocols, and having different overall data qualities. SCALE substantially outperforms the other tools in all aspects of scATAC-seq data analysis, including visualization, clustering, and denoising and imputation. Importantly, SCALE also generates interpretable features that directly link to cell populations, and can potentially reveal batch effects in scATAC-seq experiments
PrePPI: a structure-informed database of protein–protein interactions
PrePPI (http://bhapp.c2b2.columbia.edu/PrePPI) is a database that combines predicted and experimentally determined protein–protein interactions (PPIs) using a Bayesian framework. Predicted interactions are assigned probabilities of being correct, which are derived from calculated likelihood ratios (LRs) by combining structural, functional, evolutionary and expression information, with the most important contribution coming from structure. Experimentally determined interactions are compiled from a set of public databases that manually collect PPIs from the literature and are also assigned LRs. A final probability is then assigned to every interaction by combining the LRs for both predicted and experimentally determined interactions. The current version of PrePPI contains ∼2 million PPIs that have a probability more than ∼0.1 of which ∼60 000 PPIs for yeast and ∼370 000 PPIs for human are considered high confidence (probability greater than 0.5). The PrePPI database constitutes an integrated resource that enables users to examine aggregate information on PPIs, including both known and potentially novel interactions, and that provides structural models for many of the PPIs
Systematic Discovery of Xist RNA Binding Proteins
Noncoding RNAs (ncRNAs) function with associated proteins to effect complex structural and regulatory outcomes. To reveal the composition and dynamics of specific noncoding RNA- protein complexes (RNPs) in vivo, we developed comprehensive identification of RNA-binding proteins by mass spectrometry (ChIRP-MS). ChIRP-MS analysis of four ncRNAs captures key protein interactors, including a U1-specific link to the 3′ RNA processing machinery. Xist, an essential lncRNA for X-chromosome inactivation (XCI), interacts with 81 proteins from chromatin modification, nuclear matrix, and RNA remodeling pathways. The Xist RNA-protein particle assembles in two steps coupled with the transition from pluripotency to differentiation. Specific interactors include HnrnpK that participates in Xist-mediated gene silencing and histone modifications, but not Xist localization and Drosophila Split ends homolog Spen that interacts via the A-repeat domain of Xist and is required for gene silencing. Thus, Xist lncRNA engages with proteins in a modular and developmentally controlled manner to coordinate chromatin spreading and silencing
Extensive Variation in Chromatin States Across Humans
The majority of disease-associated variants lie outside protein-coding regions, suggesting a link between variation in regulatory regions and disease predisposition. We studied differences in chromatin states using five histone modifications, cohesin, and CTCF in lymphoblastoid lines from 19 individuals of diverse ancestry. We found extensive signal variation in regulatory regions, which often switch between active and repressed states across individuals. Enhancer activity is particularly diverse among individuals, whereas gene expression remains relatively stable. Chromatin variability shows genetic inheritance in trios, correlates with genetic variation and population divergence, and is associated with disruptions of transcription factor binding motifs. Overall, our results provide insights into chromatin variation among humans
RNA Regulations and Functions Decoded by Transcriptome-wide RNA Structure Probing
RNA folds into intricate structures that are crucial for its functions and regulations. To date, a multitude of approaches for probing structures of the whole transcriptome, i.e., RNA structuromes, have been developed. Applications of these approaches to different cell lines and tissues have generated a rich resource for the study of RNA structureâfunction relationships at a systems biology level. In this review, we first introduce the designs of these methods and their applications to study different RNA structuromes. We emphasize their technological differences especially their unique advantages and caveats. We then summarize the structural insights in RNA functions and regulations obtained from the studies of RNA structuromes. And finally, we propose potential directions for future improvements and studies. Keywords: RNA structure probing, RNA structurome, RNA secondary structure, Structureâfunction relationship, RNA regulatio
Differential analysis of RNA structure probing experiments at nucleotide resolution: uncovering regulatory functions of RNA structure
The authors present DiffScan, an advanced tool for normalization and differential analysis of RNA structure probing experiments, combining their power in deciphering the dynamic RNA structurome and facilitating the discovery of RNA regulatory functions