266 research outputs found

    Cheminformatics Analysis and Computational Modeling of Detergent-Sensitive Aggregation

    Get PDF
    Small molecule aggregates cause detergent reversible protein sequestration and are the most prevalent source of nonspecific activity in biochemical screening assays. Large volumes of publicly available dose-response screens performed in the presence or absence of detergent have enabled cheminformatics analysis into chemical aggregation which reinforces prior notions that aggregation is prevalent and context dependent. We report the development of random forest classifiers trained on screens of β-lactamase or cruzain targets under well-defined assay conditions which distinguish putative aggregators and non-aggregators with balanced accuracies as high as 78%. These models overcome limitations of existing computational predictors related to programmatic errors, blurred modeling endpoints, and poor external predictivity. Model interpretation indicated that polarity, aliphaticity, and weight are significantly correlated with aggregation propensity, although these features alone estimate behavior with under 70% accuracy. Our curated datasets and validated models will help identify potential aggregators and reduce resource waste during drug discovery and optimization

    Development and Implementation of a High Throughput Screen for the Human Sperm-Specific Isoform of Glyceraldehyde 3-Phosphate Dehydrogenase (GAPDHS)

    Get PDF
    Glycolytic isozymes that are restricted to the male germline are potential targets for the development of reversible, non-hormonal male contraceptives. GAPDHS, the sperm-specific isoform of glyceraldehyde-3-phosphate dehydrogenase, is an essential enzyme for glycolysis making it an attractive target for rational drug design. Toward this goal, we have optimized and validated a high-throughput spectrophotometric assay for GAPDHS in 384-well format. The assay was stable over time and tolerant to DMSO. Whole plate validation experiments yielded Z’ values >0.8 indicating a robust assay for HTS. Two compounds were identified and confirmed from a test screen of the Prestwick collection. This assay was used to screen a diverse chemical library and identified fourteen small molecules that modulated the activity of recombinant purified GAPDHS with confirmed IC50 values ranging from 1.8 to 42 µM. These compounds may provide useful scaffolds as molecular tools to probe the role of GAPDHS in sperm motility and long term to develop potent and selective GAPDHS inhibitors leading to novel contraceptive agents

    Data Enrichment for Data Mining Applied to Bioinformatics and Cheminformatics Domains

    Get PDF
    Problemas cada vez mais complexos estão a ser tratados na àrea das ciências da vida. A aquisição de todos os dados que possam estar relacionados com o problema em questão é primordial. Igualmente importante é saber como os dados estão relacionados uns com os outros e com o próprio problema. Por outro lado, existem grandes quantidades de dados e informações disponíveis na Web. Os investigadores já estão a utilizar Data Mining e Machine Learning como ferramentas valiosas nas suas investigações, embora o procedimento habitual seja procurar a informação baseada nos modelos indutivos. Até agora, apesar dos grandes sucessos já alcançados com a utilização de Data Mining e Machine Learning, não é fácil integrar esta vasta quantidade de informação disponível no processo indutivo, com algoritmos proposicionais. A nossa principal motivação é abordar o problema da integração de informação de domínio no processo indutivo de técnicas proposicionais de Data Mining e Machine Learning, enriquecendo os dados de treino a serem utilizados em sistemas de programação de lógica indutiva. Os algoritmos proposicionais de Machine Learning são muito dependentes dos atributos dos dados. Ainda é difícil identificar quais os atributos mais adequados para uma determinada tarefa na investigação. É também difícil extrair informação relevante da enorme quantidade de dados disponíveis. Vamos concentrar os dados disponíveis, derivar características que os algoritmos de ILP podem utilizar para induzir descrições, resolvendo os problemas. Estamos a criar uma plataforma web para obter informação relevante para problemas de Bioinformática (particularmente Genómica) e Quimioinformática. Esta vai buscar os dados a repositórios públicos de dados genómicos, proteicos e químicos. Após o enriquecimento dos dados, sistemas Prolog utilizam programação lógica indutiva para induzir regras e resolver casos específicos de Bioinformática e Cheminformática. Para avaliar o impacto do enriquecimento dos dados com ILP, comparamos com os resultados obtidos na resolução dos mesmos casos utilizando algoritmos proposicionais.Increasingly more complex problems are being addressed in life sciences. Acquiring all the data that may be related to the problem in question is paramount. Equally important is to know how the data is related to each other and to the problem itself. On the other hand, there are large amounts of data and information available on the Web. Researchers are already using Data Mining and Machine Learning as a valuable tool in their researches, albeit the usual procedure is to look for the information based on induction models. So far, despite the great successes already achieved using Data Mining and Machine Learning, it is not easy to integrate this vast amount of available information in the inductive process with propositional algorithms. Our main motivation is to address the problem of integrating domain information into the inductive process of propositional Data Mining and Machine Learning techniques by enriching the training data to be used in inductive logic programming systems. The algorithms of propositional machine learning are very dependent on data attributes. It still is hard to identify which attributes are more suitable for a particular task in the research. It is also hard to extract relevant information from the enormous quantity of data available. We will concentrate the available data, derive features that ILP algorithms can use to induce descriptions, solving the problems. We are creating a web platform to obtain relevant bioinformatics (particularly Genomics) and Cheminformatics problems. It fetches the data from public repositories with genomics, protein and chemical data. After the data enrichment, Prolog systems use inductive logic programming to induce rules and solve specific Bioinformatics and Cheminformatics case studies. To assess the impact of the data enrichment with ILP, we compare with the results obtained solving the same cases using propositional algorithms

    Regulation of Human Hsp70 by its Nucleotide Exchange Factors (NEFs).

    Full text link
    Heat shock protein 70 (Hsp70) is an abundant and ubiquitous molecular chaperone that is responsible for maintenance of the human proteome. Accordingly, Hsp70 has become an attractive drug target for neurodegenerative and hyperproliferative disorders; however it is difficult to imagine strategies for inhibiting its pathobiology without impacting its essential roles. Fortunately, Hsp70 does not work alone, and instead employs a large network of co-chaperone proteins, which can tune Hsp70 activity and influence disease state. These co-chaperone proteins provide potential handles for targeting Hsp70 without disrupting overall proteostasis. One such class of co-chaperones proteins known as the Nucleotide Exchange Factors (NEFs), are a particular appealing target. NEFs bind Hsp70 and help to facilitate the exchange of ADP for ATP. The biochemistry of the NEF family of co-chaperones has classically been investigated using the prokaryotic NEF, GrpE, as a model. However, the eukaryotic cytosol does not contain a GrpE homolog. Rather, there are three main sub-classes of human NEFs: Hsp110, HspBP1, and the BAG proteins, all of which are structurally distinct with little sequence homology. Consistent with their diverse structures, they also differ in their mode of binding to Hsp70 and their roles in guiding Hsp70 biology. For example, BAG2 is associated with proteasomal degradation of the Hsp70 substrate, tau, while BAG1-Hsp70 is linked to increased tau stability. These observations suggest that the formation of specific NEF-Hsp70 complexes may help decide the fate of Hsp70-bound substrates. Additionally, these findings illustrate that differential disruption of specific Hsp70-NEF contacts might be beneficial in disease. In this thesis work I have systematically characterized the human Hsp70 NEFs, including how they interact with Hsp70, how the influence Hsp70 biochemistry and how they can bridge Hsp70 with other classes of chaperone proteins. I have used high throughput screening methods to search for chemical matter that can modulate Hsp70-NEF interactions, and we have shown that inhibitors of Hsp70-NEF interactions can be beneficial for treating disease. This thesis work has significantly advanced our knowledge of human Hsp70 regulation, and has provided groundwork for future studies on other Hsp70 co-chaperones and proteostasis components.PhDBiological ChemistryUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/111611/1/rauchjn_1.pd

    Database development and machine learning prediction of pharmaceutical agents

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Crystallographic fragment screening - improvement of workflow, tools and procedures, and application for the development of enzyme and protein-protein interaction modulators

    Get PDF
    One of the great societal challenges of today is the fight against diseases which reduce life expectancy and lead to high economic losses. Both the understanding and the addressing of these diseases need research activities at all levels. One aspect of this is the discovery and development of tool compounds and drugs. Tool compounds support disease research and the development of drugs. For about 20 years, the discovery of new compounds has been attempted by screening small organic molecules by high-throughput methods. More recently, X-ray crystallography has emerged as the most promising method to conduct such screening. Crystallographic fragment-screening (CFS) generates binding information as well as 3D-structural information of the target protein in complex with the bound fragment. This doctoral research project is focused primarily on the optimization of the crystallographic fragment screening workflow. Investigated were the requirements for more successful screening campaigns with respect to the crystal system studied, the fragment libraries, the handling of the crystalline samples, as well as the handling of the data associated with a screening campaign. The improved CFS workflow was presented as a detailed protocol and as an accompanying video to train future CFS users in a streamlined and accessible way. Together, these improvements make CFS campaigns a more high-throughput method, offering the ability to screen larger fragment libraries and allowing higher numbers of campaigns performed per year. The protein targets throughout the project were two enzymes and a spliceosomal protein-protein complex. The enzymes comprised the aspartic protease Endothiapepsin and the SARS-Cov-2 main protease. The protein-protein complex was the RNaseH-like domain of Prp8, a vital structural protein in the spliceosome, together with its nuclear shuttling factor Aar2. By performing the CFS campaigns against disease-relevant targets, the resulting fragment hits could be used directly to develop tool compounds or drugs. The first steps of optimization of fragment hits into higher affinity binders were also investigated for improvements. In summary, a plethora of novel starting points for tool compound and drug development was identified

    Encoding, Storing and Searching of Analytical Properties and Assigned Metabolite Structures

    Get PDF
    Informationen über Metabolite und andere kleine organische Moleküle sind von entscheidender Bedeutung in vielen verschiedenen Bereichen der Naturwissenschaften. Sie spielen z.B. eine entscheidende Rolle in metabolischen Netzwerken und das Wissen über ihre Eigenschaften, hilft komplexe biologische Prozesse und komplette biologische Systeme zu verstehen. Da in biologischen und chemischen Laboren täglich Daten anfallen, welche diese Moleküle beschreiben, existiert eine umfassende Datengrundlage, die sich kontinuierlich erweitert. Um Wissenschaftlern die Verarbeitung, den Austausch, die Archivierung und die Suche innerhalb dieser Informationen unter Erhaltung der semantischen Zusammenhänge zu ermöglichen, sind komplexe Softwaresysteme und Datenformate nötig. Das Ziel dieses Projektes bestand darin, Anwendungen und Algorithmen zu entwickeln, welche für die effiziente Kodierung, Sammlung, Normalisierung und Analyse molekularer Daten genutzt werden können. Diese sollen Wissenschaftler bei der Strukturaufklärung, der Dereplikation, der Analyse von molekularen Wechselwirkungen und bei der Veröffentlichung des so gewonnenen Wissens unterstützen. Da die direkte Beschreibung der Struktur und der Funktionsweise einer unbekannten Verbindung sehr schwierig und aufwändig ist, wird dies hauptsächlich indirekt, mit Hilfe beschreibender Eigenschaften erreicht. Diese werden dann zur Vorhersage struktureller und funktioneller Charakteristika genutzt. In diesem Zusammenhang wurden Programmmodule entwickelt, welche sowohl die Visualisierung von Struktur- und Spektroskopiedaten, die gegliederte Darstellung und Veränderung von Metadaten und Eigenschaften, als auch den Import und Export von verschiedenen Datenformaten erlauben. Diese wurden durch Methoden erweitert, welche es ermöglichen, die gewonnenen Informationen weitergehend zu analysieren und Struktur- und Spektroskopiedaten einander zuzuweisen. Außerdem wurde ein System zur strukturierten Archivierung und Verwaltung großer Mengen molekularer Daten und spektroskopischer Informationen, unter Beibehaltung der semantischen Zusammenhänge, sowohl im Dateisystem, als auch in Datenbanken, entwickelt. Um die verlustfreie Speicherung zu gewährleisten, wurde ein offenes und standardisiertes Datenformat definiert (CMLSpect). Dieses erweitert das existierende CML (Chemical Markup Language) Vokabular und erlaubt damit die einfache Handhabung von verknüpften Struktur- und Spektroskopiedaten. Die entwickelten Anwendungen wurden in das Bioclipse System für Bio- und Chemoinformatik eingebunden und bieten dem Nutzer damit eine hochqualitative Benutzeroberfläche und dem Entwickler eine leicht zu erweiternde modulare Programmarchitektur
    corecore