333 research outputs found

    Efficient algorithms and architectures for protein 3-D structure comparison

    Get PDF
    Η σύγκριση δομών πρωτεϊνών είναι ανεπτυγμένος τομέας της υπολογιστικής πρωτεϊνωμικής που χρησιμοποιείται ευρέως στη δομική βιολογία και την ανακάλυψη φαρμάκων. Οι αυξανόμενες υπολογιστικές απαιτήσεις του είναι αποτέλεσμα τριών παραγόντων: ταχεία επέκταση των βάσεων δεδομένων με νέες δομές πρωτεϊνών, υψηλή υπολογιστική πολυπλοκότητα των αλγορίθμων σύγκρισης δομών πρωτεϊνών κατά ζεύγη (PSC), και τάση χρήσης πολλαπλών μεθόδων σύγκρισης και συνδυασμού των αποτελεσμάτων τους (multi criteria protein structure comparison-MCPSC-), μιας και δεν υπάρχει PSC μέθοδος κοινά αποδεκτή ως η καλύτερη. Αναπτύξαμε πλαίσιο λογισμικού που εκμεταλλεύεται επεξεργαστές πολλών πυρήνων για την υλοποίηση παράλληλων στρατηγικών MCPSC με βάση τρεις δημοφιλείς PSC μεθόδους, τις TMalign, CE και USM. Συγκρίνουμε την απόδοση και αποδοτικότητα δύο παράλληλων υλοποιήσεων MCPSC στον πειραματικό επεξεργαστή δικτύου σε ψηφίδα (Network on Chip)  Intel Single-Chip Cloud Computer και τον δημοφιλή επεξεργαστή Intel Core i7. Επιπλέον, αναπτύξαμε εκτενές υπολογιστικό pipeline και υλοποίησή του με πρόγραμμα Python, που ονομάζεται pyMCPSC, που επιτρέπει στους χρήστες να εκτελούν MCPSC διεργασίες σε επεξεργαστές πολλαπλών πυρήνων. Το pyMCPSC, το οποίο συνδυάζει πέντε μεθόδους PSC και υποστηρίζει πέντε διαφορετικά σχήματα συναίνεσης MCPSC, υποστηρίζει τη συγκριτική ανάλυση μεγάλων συνόλων με δομές πρωτεϊνών και μπορεί να επεκταθεί ώστε να ενσωματώσει και νέες μεθόδους PSC στις βαθμολογίες συναίνεσης, καθώς αυτές καθίστανται διαθέσιμες.Protein Structure Comparison (PSC) is a well developed field of computational proteomics with active interest since it is widely used in structural biology and drug discovery. Fast increasing computational demand for all-to-all protein structures comparison is a result of mainly three factors: rapidly expanding structural proteomics databases, high computational complexity of pairwise PSC algorithms, and the trend towards using multiple criteria for comparison and combining their results (MCPSC). In this thesis we have developed a software framework that exploits many-core and multi-core CPUs to implement efficient parallel MCPSC schemes in modern processors based on three popular PSC methods, namely, TMalign, CE, and USM. We evaluate and compare the performance and efficiency of two parallel MCPSC implementations using Intel’s experimental many-core Single-Chip Cloud Computer (SCC) CPU as well as Intel’s Core i7 multi-core processor. Further, we have developed a dataset processing pipeline and implemented it in a Python utility, called pyMCPSC, allowing users to perform MCPSC efficiently on multi-core CPU. pyMCPSC, which combines five PSC methods and five different consensus scoring schemes, facilitates the analysis of similarities in protein domain datasets and can be easily extended to incorporate more PSC methods in the consensus scoring as they are becoming available

    Bayesian nonparametric clusterings in relational and high-dimensional settings with applications in bioinformatics.

    Get PDF
    Recent advances in high throughput methodologies offer researchers the ability to understand complex systems via high dimensional and multi-relational data. One example is the realm of molecular biology where disparate data (such as gene sequence, gene expression, and interaction information) are available for various snapshots of biological systems. This type of high dimensional and multirelational data allows for unprecedented detailed analysis, but also presents challenges in accounting for all the variability. High dimensional data often has a multitude of underlying relationships, each represented by a separate clustering structure, where the number of structures is typically unknown a priori. To address the challenges faced by traditional clustering methods on high dimensional and multirelational data, we developed three feature selection and cross-clustering methods: 1) infinite relational model with feature selection (FIRM) which incorporates the rich information of multirelational data; 2) Bayesian Hierarchical Cross-Clustering (BHCC), a deterministic approximation to Cross Dirichlet Process mixture (CDPM) and to cross-clustering; and 3) randomized approximation (RBHCC), based on a truncated hierarchy. An extension of BHCC, Bayesian Congruence Measuring (BCM), is proposed to measure incongruence between genes and to identify sets of congruent loci with identical evolutionary histories. We adapt our BHCC algorithm to the inference of BCM, where the intended structure of each view (congruent loci) represents consistent evolutionary processes. We consider an application of FIRM on categorizing mRNA and microRNA. The model uses latent structures to encode the expression pattern and the gene ontology annotations. We also apply FIRM to recover the categories of ligands and proteins, and to predict unknown drug-target interactions, where latent categorization structure encodes drug-target interaction, chemical compound similarity, and amino acid sequence similarity. BHCC and RBHCC are shown to have improved predictive performance (both in terms of cluster membership and missing value prediction) compared to traditional clustering methods. Our results suggest that these novel approaches to integrating multi-relational information have a promising future in the biological sciences where incorporating data related to varying features is often regarded as a daunting task

    Identification of structure activity relationships in primary screening data of high-throughput screening assays

    Get PDF
    The aim of the thesis was to identify structure activity relationships (SAR) in the primary screening data of high-throughput screening (HTS) assays. The strategy was to perform a hierarchical clustering of the molecules, assign the primary screening data to the created clusters and derive models from the clusters. The models should serve to identify singletons, clusters enriched with actives, not confirmed hits and false-negatives. Two hierarchical clustering algorithms, NIPALSTREE and hierarchical k-means have been developed and adapted for this purpose, respectively. A graphical user interface (GUI) has been implemented to extract SAR from the clustering results. Retrospective and prospective applications of the clustering approach were performed. SAR models were created by combining the clustering results with different chemoinformatic methods. NIPALSTREE projects a data set onto one dimension using principle component analysis. The data set is sorted according to the scoring vector and split at the median position into two subsets. The algorithm is applied recursively onto the subsets. The hierarchical k-means recursively separates a data set into two clusters using the k-means algorithm. Both algorithms are capable of clustering large data sets with more than a million data points. They were validated and compared to each other on the basis of different structural classes. NIPALSTREE provided with the loading vectors first insights into SAR whereas the hierarchical k-means yielded superior results. A GUI was developed allowing the display of and the navigation in the clustering results. Functionalities were integrated to analyse the clusters in the dendrogram, molecules in a cluster, and physicochemical properties of a molecule. Measures were developed to identify clusters enriched with actives, to characterize singletons and to analyse selectivity and specificity. Different protease inhibitors of the COBRA database were examined using the hierarchical k-means algorithm. Supported by similarity searches and nearest neighbour analyses thrombin inhibitor singletons were quickly isolated and displayed in the dendrogram. By scaling enrichment factors to the logarithm of the dendrogram level, clusters enriched with different structural classes of factor Xa inhibitors were simultaneously identified. The observed co-clustering of other protease inhibitors provided a deeper insight into selectivity and specificity and shows the utility of the approach for constructing focussed screening libraries. Specificity was analyzed by extracting and clustering relative frequencies of the protease inhibitors from the clusters of dendrogram level 7. A unique ligand based point of view on the pocketome of the protease enzymes was obtained. To identify not confirmed hits and false-negatives in the primary screening data of HTS assays, three assays were retrospectively analysed with the hierarchical k-means algorithm. A rule catalogue was developed judging hits in terminal clusters based on the cluster size, the percent control values of the entries in a cluster, the overall hit rate, the hit rate in the cluster and the environment of a cluster in the dendrogram. It resulted in the identification of a high proportion of not confirmed hits and provided for each hit a rating in context of related non-hits. This allows prioritizing compounds for follow-up studies. Non-hits and hits were retrieved from terminal clusters containing hits. Molecules bearing false-negative scaffolds were co-extracted and enriched. To minimize the number of false-positives in the extracted lists, Bayesian regularized artificial neutral network classification models were trained with the data. Applying the models marked improvement of enrichment factors for the false-negatives was obtained. It proofs the scaffold-hopping potential of the approach. NIPALSTREE, the hierarchical k-means algorithm and self-organising maps were prospectively applied to identify novel lead candidates for dopamine D3 receptors. Compounds with novel scaffolds and low nanomolar binding affinity (65 nM, compound 42) were identified. To provide a deeper insight into the SAR of these molecules, different alternative computational methods were employed. Support vector-based regression and partial least squares were examined. Predictive models for dopamine D2 and D3 receptor binding affinity values were obtained. Important features explaining SAR were extracted from the models. The prospective application of the models to the diverse and novel virtual screening data was of limited success only. Docking studies were performed using a homology model of the dopamine D3 receptor. The visual inspection of the binding modes resulted in the hypothesis of two alternative binding pockets for the aryl moiety of dopamine D3 receptor antagonists. A pharmacophore model was created simultaneously requiring both aryl moieties. Virtual screening with the model identified a nanomolar hit (65 nM, compound 59) corroborating the hypothesis of the two binding pockets and providing a new lead structure for dopamine D3 receptors. The presented data shows that the combined approach of hierarchically clustering a data set in combination with the subsequent usage of the clusters for model generation is suited to extract SAR from screening data. The models are successful in identifying singletons, clusters enriched with actives, not confirmed hits and false-negative scaffolds.Das Ziel der Arbeit war es, Struktur-Aktivitätsbeziehungen (SAR) in primären Screeningdaten von Hochdurchsatzscreening (HTS)- Assays zu finden. Als Strategie sollten die Moleküle hierarchisch geclustert werden, die primären Screeningdaten den gebildeten Clustern zugeordnet und Modelle aus den Clustern abgeleitet werden. Die Modelle sollten das Auffinden von Singletons, mit Hits angereicherter Cluster, nicht bestätigter Hits und falsch Negativer ermöglichen. Zu diesem Zweck wurden zwei hierarchische Clusteralgorithmen, NIPALSTREE und hierarchischer k-means, entwickelt bzw. angepasst. Eine graphische Benutzeroberfläche (GUI) wurde implementiert, um SAR aus den Ergebnissen der Clusterung abzuleiten. Retrospektive und prospektive Anwendungen wurden mit den Clusteransätzen verfolgt. SAR Modelle wurden durch Verwendung der Ergebnisse der Clusterung mit verschiedenen chemoinformatischen Verfahren erstellt. NIPALSTREE projiziert mit Hilfe der Hauptkomponentenanalyse einen Datensatz auf eine Dimension. Der Datensatz wird anhand des Scoringvektors sortiert und, basierend auf dem Median, in zwei Teilmengen aufgetrennt. Der Algorithmus wird rekursiv auf die neu gebildeten Mengen angewandt. Der hierarchische k-means Algorithmus trennt, basierend auf dem k-means Algorithmus, einen Datensatz rekursiv in zwei Cluster auf. Beide Algorithmen sind in der Lage, große Datenmengen mit mehr als einer Million Datenpunkte zu clustern. Sie wurden anhand verschiedener Strukturklassen validiert und miteinander verglichen. NIPALSTREE erbrachte mit dem Loadingvektor erste Einblicke in die SAR, wohingegen der hierarchische k-means zu besseren Ergebnissen führte. Eine GUI wurde entwickelt, die es erlaubt, die Clusterergebnisse darzustellen und darin zu navigieren. Funktionalitäten wurden bereitgestellt, um die Cluster im Dendrogramm, die Moleküle eines Clusters und die physikochemischen Eigenschaften eines Moleküls zu analysieren. Verfahren wurden entwickelt, um mit Hits angereicherte Cluster zu finden, Singletons zu charakterisieren und Selektivität und Spezifität zu analysieren. Verschiedene Proteaseinhibitoren aus der COBRA-Datenbank wurden mit dem hierarchischen k-means Algorithmus näher betrachtet. Mit Hilfe von Ähnlichkeitssuchen und nächsten Nachbaranalysen wurden Thrombininhibitorsingletons im Dendrogram in kürzester Zeit isoliert und dargestellt. Cluster, die mit verschiedenen Strukturklassen von Faktor-Xa-Inhibitoren angereichert waren, wurden, durch Skalierung des Anreicherungsfaktors auf den Logarithmus der Dendrogrammebene, gleichzeitig im Dendrogramm identifiziert. Eine Clusterung der Faktor-Xa-Inhibitoren mit anderen Proteaseinhibitoren wurde beobachtet. Sie erbrachte einen vertieften Einblick in Selektivität und Spezifität und zeigt die Anwendbarkeit des Ansatzes zur Erstellung fokussierter Screeningbibliotheken. Durch Extrahierung und Clusterung der relativen Anteile der Proteaseinhibitoren aus den Clustern von Dendrogrammebene sieben wurde die Spezifität der Proteaseinhibitoren analysiert. Eine spezifische, Liganden basierte Betrachtung des Pocketoms der Proteaseenzyme wurde erhalten. Um nicht bestätigte Hits und falsch Negative in den primären Screening Daten von HTS Assays zu finden, wurden drei Assays in Retrospektive mit dem hierarchischen k-means analysiert. Ein Regelwerk wurde entwickelt, welches Hits anhand der Clustergröße, des Prozent-Kontrollwertes der Einträge eines Clusters, der Gesamthitrate, der Hitrate in einem Cluster und der Umgebung des Clusters im Dendrogramm bewertet. Das Regelwerk führte zum Auffindung eines großen Anteils nicht bestätigter Hits. Zudem wurde für jeden Hit eine Bewertung im Kontext verwandter Nichthits erhalten. Dies erlaubt ein Priorisieren von Molekülen für Folgeuntersuchungen. Nichthits und Hits wurden aus Endcluster, die Hits enthielten, extrahiert. Moleküle mit falsch negativen Molekülgrundgerüsten wurden koextrahiert und angereichert. Um falsch Positive in den extrahierten Listen zu minimieren, wurden Bayesische regularisierte neuronale Klassifizierungsnetze mit den Daten trainiert. Die Anwendung der Modelle ergab eine deutliche Verbesserung der Anreicherungsfaktoren der falsch Negativen. Es zeigt, dass die Methode in der Lage ist, einen Molekülgrundgerüstwechsel durchzuführen. NIPALSTREE, der hierarchische k-means und selbst organisierende Karten wurden prospektiv angewandt, um neue Leitstrukturkandidaten für Dopamin-D3-Rezeptoren zu finden. Moleküle mit neuen Molekülgrundgerüsten und Bindungsaffinitäten im niedrigen nanomolaren Bereich wurden gefunden (65 nM für Molekül 42). Um einen tieferen Einblick in die SAR dieser Moleküle zu erhalten, wurden verschiede Computerverfahren verwendet. Supportvektorregression und PLS („partial least squares“) wurden untersucht. Es war möglich, voraussagende Modelle für Dopamin-D2 und D3 Bindungsaffinitäten zu erstellen. Die SAR erklärende Moleküleigenschaften konnten aus den Modellen extrahiert werden. Die prospektive Anwendung der Modelle auf die diversen und neuen virtuellen Screeningdaten war nur von begrenztem Erfolg. Dockingstudien wurden mit einem Homologiemodell des Dopamin-D3-Rezeptors durchgeführt. Die visuelle Begutachtung der Bindemoden führte zur Hypothese zweier alternativer Bindetaschen für den Aryl-Rest von Dopamin-D3-Rezeptorantagonisten. Ein Pharmakophormodell wurde erstellt, welches beide Aryl-Reste gleichzeitig benötigt. Ein virtuelles Screening mit dem Modell identifizierte einen nanomolaren Hit (65 nM für Molekül 59), welcher die Hypothese unterstützt und eine neue Leitstruktur für Dopamin-D3-Rezeptoren darstellt. Die vorgestellten Daten zeigen, dass der kombinierte Ansatz aus hierarchischer Clusterung und anschließender Verwendung der Cluster zur Modellerstellung, SAR in HTS-Daten findet. Die Modelle sind geeignet zum Auffinden von Singletons, mit Hits angereichter Cluster, nicht bestätigter Hits und falsch negativer Molekülgrundgerüste

    Evolution of Genome-wide Gene Regulation in the Budding Yeast Cell-Division Cycle

    Get PDF
    Genome-wide regulation of gene expression involves a dynamic epigenetic structure which generates an organism\u27s life-cycle. Although changes in gene expression during development have broad effects on many basic phenomena including cell growth, differentiation, morphogenesis, and disease progression, the evolutionary forces influencing gene expression dynamics and gene regulation remain largely unknown, due to the nature of gene expression as a polygenic, quantitative trait. Moreover, gene expression is regulated differentially over time, so evolutionary forces may be influenced by developmental context. To advance the understanding of evolution in the context of the life-cycle, the architecture of gene expression timing control and its influence on expression dynamics must be revealed. This dissertation presents two experimental investigations of the evolution of genes and related structural regions and time-dependent gene expression, using the budding yeasts Saccharomyces cerevisiae and Saccharomyces paradoxus and their mitotic cell-division cycle as model organism and life-cycle. Comparative methodologies were employed to analyze genome-wide patterns of genetic and phenotypic diversity within and between species. Analysis of several dozen yeast genomes reveals a dominant evolutionary mode of purifying selection. Despite limited genetic variability, differences in transcriptional regulation appear to contribute predominantly to interspecies divergence, and altered post-transcriptional regulation of ribosomal genes may have altered the timing of each species\u27 transition from vegetative growth to reproduction, a classic life-history trait. In addition, natural variation in genome-wide gene expression was measured as a time-series through the mitotic cell-division cycle of 10 yeast lines, including one outgroup species. Despite levels of variation consistent with strong stabilizing selection, transcriptome coexpression dynamics have diverged significantly within and between species. A model involving timing pattern changes explains 61% of the between-genome variation in expression dynamics, suggesting that the major mode of transcriptome evolution involves changes in timing (heterochrony) rather than changes in levels (heterometry) of expression. Analysis of heterochrony patterns suggests that timing control is organized into distinct and dynamically-autonomous modules. Divergence in expression dynamics may be explained by pleiotropic changes in modular timing control. Genome-wide gene regulation may utilize a general architecture comprised of multiple discrete event timelines, whose superposition could produce combinatorial complexity in timing patterns

    Cell-free expression and molecular modeling of the γ-secretase complex and G-protein-coupled receptors

    Get PDF
    Alzheimer’s disease (AD), which was first reported more than a century ago by Alhzeimer, is one of the commonest forms of dementia which affects >30 million people globally (>8 million in Europe). The origin and pathogenesis of AD is poorly understood and there is no cure available for the disease. AD is characterized by the accumulation of senile plaques composed of amyloid beta peptides (Ab 37-43) which is formed by the gamma secretase (GS) complex by cleaving amyloid precursor protein. Therefore GS can be an attractive drug target. Since GS processes several other substrates like Notch, CD44 and Cadherins, nonspecific inhibition of GS has many side effects. Due to the lack of crystal structure of GS, which is attributed to the extreme difficulties in purifying it, molecular modeling can be useful to understand its architecture. So far only low resolution cryoEM structures of the complex has been solved which only provides a rough structure of the complex at low 12-15 A resolution Furthermore the activity of GS in vitro can be achieved by means of cell-free (CF) expression. GS comprises catalytic subunits namely presenilins and supporting elements containing Pen-2, Aph-1 and Nicastrin. The origin of AD is hidden in the regulated intramembrnae proteolysis (RIP) which is involved in various physiological processes and also in leukemia. So far growth factors, cytokines, receptors, viral proteins, cell adhesion proteins, signal peptides and GS has been shown to undergo RIP. During RIP, the target proteins undergo extracellular shredding and intramembrane proteolysis. This thesis is based on molecular modeling, molecular dynamics (MD) simulations, cell-free (CF) expression, mass spectrometry, NMR, crystallization, activity assay etc of the components of GS complex and G-protein coupled receptors (GPCRs). First I validated the NMR structure of PS1 CTF in detergent micelles and lipid bilayers using coarse-grained MD simulations using MARTINI forcefield implemented in Gromacs. CTF was simulated in DPC micelles, DPPC and DLPC lipid bilayer. Starting from random configuration of detergent and lipids, micelle and lipid bilyer were formed respectively in presence of CTF and it was oriented properly to the micelle and bilyer during the simulation. Around DPC molecules formed micelle around CTF in agreement of the experimental results in which 80-85 DPC molecules are required to form micelles. The structure obtained in DPC was similar to that of NMR structure but differed in bilayer simulations showed the possibility of substrate docking in the conserved PAL motif. Simulations of CTF in implicit membrane (IMM1) in CHAMM yielded similar structure to that from coarse grained MD. I performed cell-free expression optimization, crystallization and NMR spectroscopy of Pen-2 in various detergent micelles. Additionally Pen-2 was modeled by a combination of rosetta membrane ab-initio method, HHPred distant homology modeling and incorporating NMR constraints. The models were validated by all atom and coarse grained MD simulations both in detergent micelles and POPC/DPPC lipid bilayers using MARTINI forcefield. GS operon consisting of all four subunits was co-expressed in CF and purified. The presence of of GS subunits after pull-down with Aph-1 was determined by western blotting (Pen-2) and mass spectrometry (Presenilin-1 and Aph-1). I also studied interactions of especially PS1 CTF, APP and NTF by docking and MD. I also made models and interfaces of Pen-2 with PS1 NTF and checked their stability by MD simulations and compared with experimental results. The goal is to model the interfaces between GS subunits using molecular modeling approaches based on available experimental data like cross-linking, mutations and NMR structure of C-terminal fragment of PS1 and transmembrane part of APP. The obtained interfaces of GS subunits may explain its catalysis mechanism which can be exploited for novel lead design. Due to lack of crystal/NMR structure of the GS subunits except the PS1 CTF, it is not possible to predict the effect of mutations in terms of APP cleavage. So I also developed a sequence based approach based on machine learning using support vector machine to predict the effect of PS1 CTF L383 mutations in terms of Aβ40/Aβ42 ratio with 88% accuracy. Mutational data derived from the Molgen database of Presenilin 1 mutations was using for training. GPCRs (also called 7TM receptors) form a large superfamily of membrane proteins, which can be activated by small molecules, lipids, hormones, peptides, light, pain, taste and smell etc. Although 50% of the drugs in market target GPCRs , only few are targeted therapeutically. Such wide range of targets is due to involvement of GPCRs in signaling pathways related to many diseases i.e. dementia (like Alzheimer's disease), metabolic (like diabetes) including endocrinological disorders, immunological including viral infections, cardiovascular, inflammatory, senses disorders, pain and cancer. Cannabinoid and adrenergic receptors belong to the class A (similar to rhodopsin) GPCRs. Docking of agonists and antagonists to CB1 and CB2 cannabinoid receptors revealed the importance of a centrally located rotamer toggle switch, and its possible role in the mechanism of agonist/antagonist recognition. The switch is composed of two residues, F3.36 and W6.48, located on opposite transmembrane helices TM3 and TM6 in the central part of the membranous domain of cannabinoid receptors. The CB1 and CB2 receptor models were constructed based on the adenosine A2A receptor template. The two best scored conformations of each receptor were used for the docking procedure. In all poses (ligand-receptor conformations) characterized by the lowest ligand-receptor intermolecular energy and free energy of binding the ligand type matched the state of the rotamer toggle switch: antagonists maintained an inactive state of the switch, whereas agonists changed it. In case of agonists of β2AR, the (R,R) and (S,S) stereoisomers of fenoterol, the molecular dynamics simulations provided evidence of different binding modes while preserving the same average position of ligands in the binding site. The (S,S) isomer was much more labile in the binding site and only one stable hydrogen bond was created. Such dynamical binding modes may also be valid for ligands of cannabinoid receptors because of the hydrophobic nature of their ligand-receptor interactions. However, only very long molecular dynamics simulations could verify the validity of such binding modes and how they affect the process of activation. Human N-formyl peptide receptors (FPRs) are G protein-coupled receptors (GPCRs) involved in many physiological processes, including host defense against bacterial infection and resolving inflammation. The three human FPRs (FPR1, FPR2 and FPR3) share significant sequence homology and perform their action via coupling to Gi protein. Activation of FPRs induces a variety of responses, which are dependent on the agonist, cell type, receptor subtype, and also species involved. FPRs are expressed mainly by phagocytic leukocytes. Together, these receptors bind a large number of structurally diverse groups of agonistic ligands, including N-formyl and nonformyl peptides of different composition, that chemoattract and activate phagocytes. For example, N-formyl-Met-Leu-Phe (fMLF), an FPR1 agonist, activates human phagocyte inflammatory responses, such as intracellular calcium mobilization, production of cytokines, generation of reactive oxygen species, and chemotaxis. This ligand can efficiently activate the major bactericidal neutrophil functions and it was one of the first characterized bacterial chemotactic peptides. Whereas fMLF is by far the most frequently used chemotactic peptide in studies of neutrophil functions, atomistic descriptions for fMLF-FPR1 binding mode are still scarce mainly because of the absence of a crystal structure of this receptor. Elucidating the binding modes may contribute to designing novel and more efficient non-peptide FPR1 drug candidates. Molecular modeling of FPR1, on the other hand, can provide an efficient way to reveal details of ligand binding and activation of the receptor. However, recent modelings of FPRs were confined only to bovine rhodopsin as a template. To locate specific ligand-receptor interactions based on a more appropriate template than rhodopsin we generated the homology models of FPR1 using the crystal structure of the chemokine receptor CXCR4, which shares over 30% sequence identity with FPR1 and is located in the same γ branch of phylogenetic tree of GPCRs (rhodopsin is located in α branch). Docking and model refinement procedures were pursued afterward. Finally, 40 ns full-atom MD simulations were conducted for the Apo form as well as for complexes of fMLF (agonist) and tBocMLF (antagonist) with FPR1 in the membrane. Based on locations of the N- and C-termini of the ligand the FPR1 extracellular pocket can be divided into two zones, namely, the anchor and activation regions. The formylated M1 residue of fMLF bound to the activation region led to a series of conformational changes of conserved residues. Internal water molecules participating in extended hydrogen bond networks were found to play a crucial role in transmitting the agonist-receptor interactions. A mechanism of initial steps of the activation concurrent with ligand binding is proposed. I accurately predicted the structure and ligand binding pose of dopamine receptor 3 (RMSD to the crystal structure: 2.13 Å) and chemokine receptor 4 (CXCR4, RMSD to the crystal structure 3.21 Å) in GPCR-Dock 2010 competition. The homology model of the dopamine receptor 3 was 8 th best overall in the competition

    Bioinformatics

    Get PDF
    This book is divided into different research areas relevant in Bioinformatics such as biological networks, next generation sequencing, high performance computing, molecular modeling, structural bioinformatics, molecular modeling and intelligent data analysis. Each book section introduces the basic concepts and then explains its application to problems of great relevance, so both novice and expert readers can benefit from the information and research works presented here

    Evolutionary genomics : statistical and computational methods

    Get PDF
    This open access book addresses the challenge of analyzing and understanding the evolutionary dynamics of complex biological systems at the genomic level, and elaborates on some promising strategies that would bring us closer to uncovering of the vital relationships between genotype and phenotype. After a few educational primers, the book continues with sections on sequence homology and alignment, phylogenetic methods to study genome evolution, methodologies for evaluating selective pressures on genomic sequences as well as genomic evolution in light of protein domain architecture and transposable elements, population genomics and other omics, and discussions of current bottlenecks in handling and analyzing genomic data. Written for the highly successful Methods in Molecular Biology series, chapters include the kind of detail and expert implementation advice that lead to the best results. Authoritative and comprehensive, Evolutionary Genomics: Statistical and Computational Methods, Second Edition aims to serve both novices in biology with strong statistics and computational skills, and molecular biologists with a good grasp of standard mathematical concepts, in moving this important field of study forward

    Identifying Structure Transitions Using Machine Learning Methods

    Get PDF
    Methodologies from data science and machine learning, both new and old, provide an exciting opportunity to investigate physical systems using extremely expressive statistical modeling techniques. Physical transitions are of particular interest, as they are accompanied by pattern changes in the configurations of the systems. Detecting and characterizing pattern changes in data happens to be a particular strength of statistical modeling in data science, especially with the highly expressive and flexible neural network models that have become increasingly computationally accessible in recent years through performance improvements in both hardware and algorithmic implementations. Conceptually, the machine learning approach can be regarded as one that employing algorithms that eschew explicit instructions in favor of strategies based around pattern extraction and inference driven by statistical analysis and large complex data sets. This allows for the investigation of physical systems using only raw configurational information to make inferences instead of relying on physical information obtained from a priori knowledge of the system. This work focuses on the extraction of useful compressed representations of physical configurations from systems of interest to automate phase classification tasks in addition to the identification of critical points and crossover regions
    corecore