1,050 research outputs found

    A Multi-Layered Study on Harmonic Oscillations in Mammalian Genomics and Proteomics

    Get PDF
    Cellular, organ, and whole animal physiology show temporal variation predominantly featuring 24-h (circadian) periodicity. Time-course mRNA gene expression profiling in mouse liver showed two subsets of genes oscillating at the second (12-h) and third (8-h) harmonic of the prime (24-h) frequency. The aim of our study was to identify specific genomic, proteomic, and functional properties of ultradian and circadian subsets. We found hallmarks of the three oscillating gene subsets, including different (i) functional annotation, (ii) proteomic and electrochemical features, and (iii) transcription factor binding motifs in upstream regions of 8-h and 12-h oscillating genes that seemingly allow the link of the ultradian gene sets to a known circadian network. Our multifaceted bioinformatics analysis of circadian and ultradian genes suggests that the different rhythmicity of gene expression impacts physiological outcomes and may be related to transcriptional, translational and post-translational dynamics, as well as to phylogenetic and evolutionary components

    Comprehensive assessment of cancer missense mutation clustering in protein structures

    Get PDF
    Large-scale tumor sequencing projects enabled the identification of many new cancer gene candidates through computational approaches. Here, we describe a general method to detect cancer genes based on significant 3D clustering of mutations relative to the structure of the encoded protein products. The approach can also be used to search for proteins with an enrichment of mutations at binding interfaces with a protein, nucleic acid, or small molecule partner. We applied this approach to systematically analyze the PanCancer compendium of somatic mutations from 4,742 tumors relative to all known 3D structures of human proteins in the Protein Data Bank. We detected significant 3D clustering of missense mutations in several previously known oncoproteins including HRAS, EGFR, and PIK3CA. Although clustering of missense mutations is often regarded as a hallmark of oncoproteins, we observed that a number of tumor suppressors, including FBXW7, VHL, and STK11, also showed such clustering. Beside these known cases, we also identified significant 3D clustering of missense mutations in NUF2, which encodes a component of the kinetochore, that could affect chromosome segregation and lead to aneuploidy. Analysis of interaction interfaces revealed enrichment of mutations in the interfaces between FBXW7-CCNE1, HRAS-RASA1, CUL4B-CAND1, OGT-HCFC1, PPP2R1A-PPP2R5C/PPP2R2A, DICER1-Mg 2+ , MAX-DNA, SRSF2-RNA, and others. Together, our results indicate that systematic consideration of 3D structure can assist in the identification of cancer genes and in the understanding of the functional role of their mutations. Keywords: cancer; cancer genetics; mutation clustering; protein structures; interaction interfacesNational Institutes of Health (U.S.) (Grant U24 CA143845

    Protein-DNA binding sites prediction based on pre-trained protein language model and contrastive learning

    Full text link
    Protein-DNA interaction is critical for life activities such as replication, transcription, and splicing. Identifying protein-DNA binding residues is essential for modeling their interaction and downstream studies. However, developing accurate and efficient computational methods for this task remains challenging. Improvements in this area have the potential to drive novel applications in biotechnology and drug design. In this study, we propose a novel approach called CLAPE, which combines a pre-trained protein language model and the contrastive learning method to predict DNA binding residues. We trained the CLAPE-DB model on the protein-DNA binding sites dataset and evaluated the model performance and generalization ability through various experiments. The results showed that the AUC values of the CLAPE-DB model on the two benchmark datasets reached 0.871 and 0.881, respectively, indicating superior performance compared to other existing models. CLAPE-DB showed better generalization ability and was specific to DNA-binding sites. In addition, we trained CLAPE on different protein-ligand binding sites datasets, demonstrating that CLAPE is a general framework for binding sites prediction. To facilitate the scientific community, the benchmark datasets and codes are freely available at https://github.com/YAndrewL/clape

    Computational Analysis and Prediction of Intrinsic Disorder and Intrinsic Disorder Functions in Proteins

    Get PDF
    COMPUTATIONAL ANALYSIS AND PREDICTION OF INTRINSIC DISORDER AND INTRINSIC DISORDER FUNCTIONS IN PROTEINS By Akila Imesha Katuwawala A dissertation submitted in partial fulfillment of the requirements for the degree of Engineering, Doctor of Philosophy with a concentration in Computer Science at Virginia Commonwealth University. Virginia Commonwealth University, 2021 Director: Lukasz Kurgan, Professor, Department of Computer Science Proteins, as a fundamental class of biomolecules, have been studied from various perspectives over the past two centuries. The traditional notion is that proteins require fixed and stable three-dimensional structures to carry out biological functions. However, there is mounting evidence regarding a “special” class of proteins, named intrinsically disordered proteins, which do not have fixed three-dimensional structures though they perform a number of important biological functions. Computational approaches have been a vital component to study these intrinsically disordered proteins over the past few decades. Prediction of the intrinsic disorder and functions of intrinsic disorder from protein sequences is one such important computational approach that has recently gained attention, particularly in the advent of the development of modern machine learning techniques. This dissertation runs along two basic themes, namely, prediction of the intrinsic disorder and prediction of the intrinsic disorder functions. The work related to the prediction of intrinsic disorder covers a novel approach to evaluate the predictive performance of the current computational disorder predictors. This approach evaluates the intrinsic disorder predictors at the individual protein level compared to the traditional studies that evaluate them over large protein datasets. We address several interesting aspects concerning the differences in the protein-level vs. dataset-level predictive quality, complementarity and predictive performance of the current predictors. Based on the findings from this assessment we have conceptualized, developed, tested and deployed an innovative platform called DISOselect that recommends the most suitable computational disorder predictors for a given protein, with an underlying goal to maximize the predictive performance. DISOselect provides advice on whether a given disorder predictor would provide an accurate prediction for a given protein of user’s interest, and recommends the most suitable disorder predictor together with an estimate of its expected predictive quality. The second theme, prediction of the intrinsic disorder functions, includes first-of-its-kind evaluation of the current computational disorder predictors on two functional sub-classes of the intrinsically disordered proteins. This study introduces several novel evaluation strategies to assess predictive performance of disorder prediction methods and focuses on the evaluation for disorder functions associated with interactions with partner molecules. Results of this analysis motivated us to conceptualize, design, test and deploy a new and accurate machine learning-based predictor of the disordered lipid-binding residues, DisoLipPred. We empirically show that the strong predictive performance of DisoLipPred stems from several innovative design features and that its predictions complements results produced by current disorder predictors, disorder function predictors and predictors of transmembrane regions. We deploy DisoLipPred as a convenient webserver and discuss its predictions on the yeast proteome

    Size does not matter: a molecular insight into the biological activity of chemical fragments utilizing computational approaches.

    Get PDF
    Masters Degree. University of KwaZulu-Natal, Durban.Insight into the functional and physiological state of a drug target is of essential importance in the drug discovery process, with the lack of emerging (3D) drug targets we propose the integration of homology modeling which may aid in the accurate yet efficient construction of 3D protein structures. In this study we present the applications of homology modeling in drug discovery, a conclusive route map and detailed technical guideline that can be utilised to obtain the most accurate model. Even with the presence of available drug targets and substantial advancements being made in the field of drug discovery, the prevalence of incurable diseases still remains at an all-time high. In this study we explore the biological activity of chemically derived fragments from natural products utilising a range of computational approaches and implement its use in a new route towards innovative drug discovery. A potential avenue referred to as the reduce to maximum concept recently proposed by organic chemists, entails reducing the size of a chemical compound to obtain a structural analogs with retained or enhanced biological activity, better synthetic approachability and reduced toxicity. Displaying that size may not in fact matter. Molecular dynamic simulations along with toxicity profiling were comparatively performed, on natural compound Anguinomycin D and its derived analog SB 640 each in complex with the CRM1 protein which plays an avid role in cancer pathogenesis. Each system was post-dynamically studied to comprehend structural dynamics adopted by the parent compound to that exhibited by the analog. Although being reduced by 60% the analog SB 640 displayed an overall exhibition of attractive pharmacophore properties which include minimal reduction in binding affinity, enhanced synthetic approachability and reduced toxicity in comparison to the parent compound. Potent inhibitor of CRM1, Leptomycin B (LMB) displayed substantial inhibition of the CRM1 export protein by binding to four of the PKIαNES residues (ϕ0, ϕ1, ϕ2, ϕ3, and ϕ4) present within the hydrophobic binding groove of CRM1. Although being drastically reduced in size and lacking the presence of the polyketide chain present in the parent compound Anguinomycin D and LMB the analog SB 640 displaced three of these essential NES residues. The potential therapeutic activity of the structural analog remains undeniable, however the application of this approach in drug design still remains ambiguous as to which chemical fragments must be retained or truncated to ensure retention or enhanced pharmacophore properties. In this study we aimed to the use of thermodynamic calculations, which was accomplished by incorporating a MM/GBSA per-residue energy contribution footprint from molecular dynamics simulation. The proposed approach was generated for each system. Anguinomycin D and analog SB 640 each in complex with CRM1 protein, each system formed interactions with the conserved active site residues Leu 536, Thr 575, Val 576 and Lys 579. These residues were highlighted as the most energetically favourable amino acid residues contributing substantially to the total binding free energy. Thus implying a conserved selectivity and binding mode adopted by both compounds despite the omission of the prominent polyketide chain in the analog SB 640, present in the parent compound. A strategic computational approach presented in this study could serve as a beneficial tool to enhance novel drug discovery. This entire work provides an invaluable contribution to the understanding of the phenomena underlying the reduction in the size of a chemical compound to obtain the most beneficial pharmacokinetic properties and could largely contribute to the design of potent analog inhibitors for a range of drug targets implicated in the orchestration of diseases

    Developing a computational approach to investigate the impacts of disease-causing mutations on protein function

    Get PDF
    This project uses bioinformatics protocols to explore the impacts of non-synonymous mutations (nsSNPs) in proteins associated with diseases, including germline, rare diseases and somatic diseases such as cancer. New approaches were explored for determining the impacts of disease-associated mutations on protein structure and function. Whilst this work has mainly concentrated on the analysis of cancer mutations, the methods developed are generic and could be applied to analysing other types of disease mutations. Different types of disease-causing mutations have been studied including germline diseases, somatic cancer mutations in oncogenes and tumour-suppressors, along with known activating and inactivating mutations in kinases. The proximity of disease-associated mutations has been analysed with respect to known functional sites reported by CSA, IBIS, along with predicted functional sites derived from the CATH classification of domain structure superfamilies. The latter are called FunSites, and are highly conserved residues within a CATH functional family (FunFam) – which is a functionally coherent subset of a CATH superfamily. Such sites include key catalytic residues as well as specificity determining residues and interface residues. Clear differences were found between oncogenes, tumour suppressor and germ-line mutations with oncogene mutations more likely to locate close to FunSites. Functional families that are highly enriched in disease mutations were identified and exploited structural data to identify clusters within proteins in these families that are enriched in mutations (using our MutClust program). We examined the tendencies of these clusters to lie close to the functional sites discussed above. For selected genes, the stability effects of disease mutations in cancer have also been investigated with a particular focus on activating mutations in FGFR3. These studies, which were supported by experimental validation, showed that activating mutations implicated in cancer tend to cause stabilisation of the active FGFR3 form, leading to its abnormal activity and oncogenesis. Mutationally enriched CATH FunFams were also used in the identification of cancer driver genes, which were then subjected to pathway and GO biological process analysis

    DescribePROT: database of amino acid-level protein structure and function predictions

    Get PDF
    We present DescribePROT, the database of predicted amino acid-level descriptors of structure and function of proteins. DescribePROT delivers a comprehensive collection of 13 complementary descriptors predicted using 10 popular and accurate algorithms for 83 complete proteomes that cover key model organisms. The current version includes 7.8 billion predictions for close to 600 million amino acids in 1.4 million proteins. The descriptors encompass sequence conservation, position specific scoring matrix, secondary structure, solvent accessibility, intrinsic disorder, disordered linkers, signal peptides, MoRFs and interactions with proteins, DNA and RNAs. Users can search DescribePROT by the amino acid sequence and the UniProt accession number and entry name. The pre-computed results are made available instantaneously. The predictions can be accesses via an interactive graphical interface that allows simultaneous analysis of multiple descriptors and can be also downloaded in structured formats at the protein, proteome and whole database scale. The putative annotations included by DescriPROT are useful for a broad range of studies, including: investigations of protein function, applied projects focusing on therapeutics and diseases, and in the development of predictors for other protein sequence descriptors. Future releases will expand the coverage of DescribePROT. DescribePROT can be accessed at http://biomine.cs.vcu.edu/servers/DESCRIBEPROT/

    From condition-specific interactions towards the differential complexome of proteins

    Get PDF
    While capturing the transcriptomic state of a cell is a comparably simple effort with modern sequencing techniques, mapping protein interactomes and complexomes in a sample-specific manner is currently not feasible on a large scale. To understand crucial biological processes, however, knowledge on the physical interplay between proteins can be more interesting than just their mere expression. In this thesis, we present and demonstrate four software tools that unlock the cellular wiring in a condition-specific manner and promise a deeper understanding of what happens upon cell fate transitions. PPIXpress allows to exploit the abundance of existing expression data to generate specific interactomes, which can even consider alternative splicing events when protein isoforms can be related to the presence of causative protein domain interactions of an underlying model. As an addition to this work, we developed the convenient differential analysis tool PPICompare to determine rewiring events and their causes within the inferred interaction networks between grouped samples. Furthermore, we present a new implementation of the combinatorial protein complex prediction algorithm DACO that features a significantly reduced runtime. This improvement facilitates an application of the method for a large number of samples and the resulting sample-specific complexes can ultimately be assessed quantitatively with our novel differential protein complex analysis tool CompleXChange.Das Transkriptom einer Zelle ist mit modernen Sequenzierungstechniken vergleichsweise einfach zu erfassen. Die Ermittlung von Proteininteraktionen und -komplexen wiederum ist in großem Maßstab derzeit nicht möglich. Um wichtige biologische Prozesse zu verstehen, kann das Zusammenspiel von Proteinen jedoch erheblich interessanter sein als deren reine Expression. In dieser Arbeit stellen wir vier Software-Tools vor, die es ermöglichen solche Interaktionen zustandsbezogen zu betrachten und damit ein tieferes Verständnis darüber versprechen, was in der Zelle bei Veränderungen passiert. PPIXpress ermöglicht es vorhandene Expressionsdaten zu nutzen, um die aktiven Interaktionen in einem biologischen Kontext zu ermitteln. Wenn Proteinvarianten mit Interaktionen von Proteindomänen in Verbindung gebracht werden können, kann hierbei sogar alternatives Spleißen berücksichtigen werden. Als Ergänzung dazu haben wir das komfortable Differenzialanalyse-Tool PPICompare entwickelt, welches Veränderungen des Interaktoms und deren Ursachen zwischen gruppierten Proben bestimmen kann. Darüber hinaus stellen wir eine neue Implementierung des Proteinkomplex-Vorhersagealgorithmus DACO vor, die eine deutlich reduzierte Laufzeit aufweist. Diese Verbesserung ermöglicht die Anwendung der Methode auf eine große Anzahl von Proben. Die damit bestimmten probenspezifischen Komplexe können schließlich mit unserem neuartigen Differenzialanalyse-Tool CompleXChange quantitativ bewertet werden

    Design, synthesis and evaluation of inhibitors of POT1-DNA interactions

    Get PDF
    The unlimited replicative potential of cells is one of the hallmarks of cancer. Telomeres, DNA structures found at the ends of chromosomes have attracted a great deal of interest in recent years as potential anti-cancer drug targets since they play an important role in cancer cell immortality. The repetitive TTAGGG sequences of telomeres are complexed to a group of six indispensible proteins, one of which is the protection of telomeres 1 (POT1) protein. This specialised protein binds to a ten nucleotide single stranded DNA sequence at the ends of chromosomes and plays an important role in telomere capping and length regulation. It has recently been proposed that the key function of POT1 is to suppress a potent DNA damage response at telomeres thereby protecting chromosome tips from being recognised as sites of DNA damage. Deletion of POT1 from telomeres in a variety of organisms including humans results in cytogenetic aberrations, senescence and cell death. These results indicate that POT1 is an integral telomere end-protection protein which is necessary for continued cellular proliferation and therefore POT1 is becoming a promising new target in cancer. Using a structure-based approach, several small molecule inhibitors of POT1 have been designed to affect telomere integrity by disrupting the binding interaction of human POT1 with its target DNA sequence thereby driving cancer cells into senescence/apoptosis. Using a range of computational tools, a suitable drug binding pocket in POT1 has been identified and the de novo design of a specific class of POT1 inhibitor was completed. Using this novel scaffold, a small focussed library of hit-like compounds were synthesised and screened in a new POT1 fluorescence polarisation displacement assay developed by scientists at the University of Nottingham. In total, over 90 small molecule inhibitors based on two different scaffolds: pyrido[1,2-a]pyrimidines and sulfathiazoles have been synthesized with some inhibitors effectively decreasing POT1-DNA binding between 10-54% at 100μM ligand concentration. The biological results have established that electron-withdrawing substituents on the pendent phenyl ring of the pyrimidine core are essential for strong binding. These results have the potential to guide future development of improved lead compounds as therapeutics for the treatment of cancer

    Cancer signaling networks and their implications for personalized medicine

    Get PDF
    corecore