266 research outputs found
Recommended from our members
Computational advances in combating colloidal aggregation in drug discovery.
Small molecule effectors are essential for drug discovery. Specific molecular recognition, reversible binding and dose-dependency are usually key requirements to ensure utility of a novel chemical entity. However, artefactual frequent-hitter and assay interference compounds may divert lead optimization and screening programmes towards attrition-prone chemical matter. Colloidal aggregates are the prime source of false positive readouts, either through protein sequestration or protein-scaffold mimicry. Nevertheless, assessment of colloidal aggregation remains somewhat overlooked and under-appreciated. In this Review, we discuss the impact of aggregation in drug discovery by analysing select examples from the literature and publicly-available datasets. We also examine and comment on technologies used to experimentally identify these potentially problematic entities. We focus on evidence-based computational filters and machine learning algorithms that may be swiftly deployed to flag chemical matter and mitigate the impact of aggregates in discovery programmes. We highlight the tools that can be used to scrutinize libraries, and identify and eliminate these problematic compounds.D.R. is a Swiss National Science Foundation Fellow (Grants P2EZP3_168827 and P300P2_177833). G.J.L.B. is a Royal Society URF (UF110046 and URF/R/180019), an iFCT Investigator (IF/00624/2015), and the recipient of an ERC StG (TagIt, Grant Agreement 676832). T.R. and G.J.L.B. acknowledge Marie Sklodowska-Curie ITN Protein Conjugates (Grant Agreement 675007) for funding. T.R. is a Marie Curie Fellow (Grant Agreement 743640). T.R. acknowledges the H2020 (TWINN-2017 ACORN, Grant Agreement 807281) and POR Lisboa 2020/FEDER (02/SAICT/2017, Grant Agreement Lisboa-01-0145-FEDER-028333) for funding. D.R. acknowledges the MIT-IBM Watson AI Lab and the MIT SenseTime coalition for funding
Cheminformatics Analysis and Computational Modeling of Detergent-Sensitive Aggregation
Small molecule aggregates cause detergent reversible protein sequestration and are the most prevalent source of nonspecific activity in biochemical screening assays. Large volumes of publicly available dose-response screens performed in the presence or absence of detergent have enabled cheminformatics analysis into chemical aggregation which reinforces prior notions that aggregation is prevalent and context dependent. We report the development of random forest classifiers trained on screens of β-lactamase or cruzain targets under well-defined assay conditions which distinguish putative aggregators and non-aggregators with balanced accuracies as high as 78%. These models overcome limitations of existing computational predictors related to programmatic errors, blurred modeling endpoints, and poor external predictivity. Model interpretation indicated that polarity, aliphaticity, and weight are significantly correlated with aggregation propensity, although these features alone estimate behavior with under 70% accuracy. Our curated datasets and validated models will help identify potential aggregators and reduce resource waste during drug discovery and optimization
Development and Implementation of a High Throughput Screen for the Human Sperm-Specific Isoform of Glyceraldehyde 3-Phosphate Dehydrogenase (GAPDHS)
Glycolytic isozymes that are restricted to the male germline are potential targets for the development of reversible, non-hormonal male contraceptives. GAPDHS, the sperm-specific isoform of glyceraldehyde-3-phosphate dehydrogenase, is an essential enzyme for glycolysis making it an attractive target for rational drug design. Toward this goal, we have optimized and validated a high-throughput spectrophotometric assay for GAPDHS in 384-well format. The assay was stable over time and tolerant to DMSO. Whole plate validation experiments yielded Z’ values >0.8 indicating a robust assay for HTS. Two compounds were identified and confirmed from a test screen of the Prestwick collection. This assay was used to screen a diverse chemical library and identified fourteen small molecules that modulated the activity of recombinant purified GAPDHS with confirmed IC50 values ranging from 1.8 to 42 µM. These compounds may provide useful scaffolds as molecular tools to probe the role of GAPDHS in sperm motility and long term to develop potent and selective GAPDHS inhibitors leading to novel contraceptive agents
Data Enrichment for Data Mining Applied to Bioinformatics and Cheminformatics Domains
Problemas cada vez mais complexos estão a ser tratados na àrea das ciências da vida. A aquisição de todos os dados que possam estar relacionados com o problema em questão é primordial. Igualmente importante é saber como os dados estão relacionados uns com os outros e com o próprio problema. Por outro lado, existem grandes quantidades de dados e informações disponíveis na Web. Os investigadores já estão a utilizar Data Mining e Machine Learning como ferramentas valiosas nas suas investigações, embora o procedimento habitual seja procurar a informação baseada nos modelos indutivos.
Até agora, apesar dos grandes sucessos já alcançados com a utilização de Data Mining e Machine Learning, não é fácil integrar esta vasta quantidade de informação disponível no processo indutivo, com algoritmos proposicionais. A nossa principal motivação é abordar o problema da integração de informação de domínio no processo indutivo de técnicas proposicionais de Data Mining e Machine Learning, enriquecendo os dados de treino a serem utilizados em sistemas de programação de lógica indutiva.
Os algoritmos proposicionais de Machine Learning são muito dependentes dos atributos dos dados. Ainda é difícil identificar quais os atributos mais adequados para uma determinada tarefa na investigação. É também difícil extrair informação relevante da enorme quantidade de dados disponíveis. Vamos concentrar os dados disponíveis, derivar características que os algoritmos de ILP podem utilizar para induzir descrições, resolvendo os problemas.
Estamos a criar uma plataforma web para obter informação relevante para problemas de Bioinformática (particularmente Genómica) e Quimioinformática. Esta vai buscar os dados a repositórios públicos de dados genómicos, proteicos e químicos. Após o enriquecimento dos dados, sistemas Prolog utilizam programação lógica indutiva para induzir regras e resolver casos específicos de Bioinformática e Cheminformática. Para avaliar o impacto do enriquecimento dos dados com ILP, comparamos com os resultados obtidos na resolução dos mesmos casos utilizando algoritmos proposicionais.Increasingly more complex problems are being addressed in life sciences. Acquiring all the data that may be related to the problem in question is paramount. Equally important is to know how the data is related to each other and to the problem itself. On the other hand, there are large amounts of data and information available on the Web. Researchers are already using Data Mining and Machine Learning as a valuable tool in their researches, albeit the usual procedure is to look for the information based on induction models.
So far, despite the great successes already achieved using Data Mining and Machine Learning, it is not easy to integrate this vast amount of available information in the inductive process with propositional algorithms. Our main motivation is to address the problem of integrating domain information into the inductive process of propositional Data Mining and Machine Learning techniques by enriching the training data to be used in inductive logic programming systems.
The algorithms of propositional machine learning are very dependent on data attributes. It still is hard to identify which attributes are more suitable for a particular task in the research. It is also hard to extract relevant information from the enormous quantity of data available. We will concentrate the available data, derive features that ILP algorithms can use to induce descriptions, solving the problems.
We are creating a web platform to obtain relevant bioinformatics (particularly Genomics) and Cheminformatics problems. It fetches the data from public repositories with genomics, protein and chemical data. After the data enrichment, Prolog systems use inductive logic programming to induce rules and solve specific Bioinformatics and Cheminformatics case studies. To assess the impact of the data enrichment with ILP, we compare with the results obtained solving the same cases using propositional algorithms
Regulation of Human Hsp70 by its Nucleotide Exchange Factors (NEFs).
Heat shock protein 70 (Hsp70) is an abundant and ubiquitous molecular chaperone that is responsible for maintenance of the human proteome. Accordingly, Hsp70 has become an attractive drug target for neurodegenerative and hyperproliferative disorders; however it is difficult to imagine strategies for inhibiting its pathobiology without impacting its essential roles. Fortunately, Hsp70 does not work alone, and instead employs a large network of co-chaperone proteins, which can tune Hsp70 activity and influence disease state. These co-chaperone proteins provide potential handles for targeting Hsp70 without disrupting overall proteostasis.
One such class of co-chaperones proteins known as the Nucleotide Exchange Factors (NEFs), are a particular appealing target. NEFs bind Hsp70 and help to facilitate the exchange of ADP for ATP. The biochemistry of the NEF family of co-chaperones has classically been investigated using the prokaryotic NEF, GrpE, as a model. However, the eukaryotic cytosol does not contain a GrpE homolog. Rather, there are three main sub-classes of human NEFs: Hsp110, HspBP1, and the BAG proteins, all of which are structurally distinct with little sequence homology. Consistent with their diverse structures, they also differ in their mode of binding to Hsp70 and their roles in guiding Hsp70 biology. For example, BAG2 is associated with proteasomal degradation of the Hsp70 substrate, tau, while BAG1-Hsp70 is linked to increased tau stability. These observations suggest that the formation of specific NEF-Hsp70 complexes may help decide the fate of Hsp70-bound substrates. Additionally, these findings illustrate that differential disruption of specific Hsp70-NEF contacts might be beneficial in disease.
In this thesis work I have systematically characterized the human Hsp70 NEFs, including how they interact with Hsp70, how the influence Hsp70 biochemistry and how they can bridge Hsp70 with other classes of chaperone proteins. I have used high throughput screening methods to search for chemical matter that can modulate Hsp70-NEF interactions, and we have shown that inhibitors of Hsp70-NEF interactions can be beneficial for treating disease. This thesis work has significantly advanced our knowledge of human Hsp70 regulation, and has provided groundwork for future studies on other Hsp70 co-chaperones and proteostasis components.PhDBiological ChemistryUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/111611/1/rauchjn_1.pd
High performance computational virtual screening tools: development and application to the discovery of kinase inhibitors
Ph.DDOCTOR OF PHILOSOPH
Database development and machine learning prediction of pharmaceutical agents
Ph.DDOCTOR OF PHILOSOPH
Crystallographic fragment screening - improvement of workflow, tools and procedures, and application for the development of enzyme and protein-protein interaction modulators
One of the great societal challenges of today is the fight against diseases which reduce
life expectancy and lead to high economic losses. Both the understanding and the
addressing of these diseases need research activities at all levels. One aspect of this is
the discovery and development of tool compounds and drugs. Tool compounds support
disease research and the development of drugs. For about 20 years, the discovery of new
compounds has been attempted by screening small organic molecules by high-throughput
methods. More recently, X-ray crystallography has emerged as the most promising method
to conduct such screening. Crystallographic fragment-screening (CFS) generates binding
information as well as 3D-structural information of the target protein in complex with the
bound fragment. This doctoral research project is focused primarily on the optimization of
the crystallographic fragment screening workflow. Investigated were the requirements for
more successful screening campaigns with respect to the crystal system studied, the
fragment libraries, the handling of the crystalline samples, as well as the handling of the
data associated with a screening campaign. The improved CFS workflow was presented
as a detailed protocol and as an accompanying video to train future CFS users in a
streamlined and accessible way. Together, these improvements make CFS campaigns a
more high-throughput method, offering the ability to screen larger fragment libraries and
allowing higher numbers of campaigns performed per year. The protein targets throughout
the project were two enzymes and a spliceosomal protein-protein complex. The enzymes
comprised the aspartic protease Endothiapepsin and the SARS-Cov-2 main protease. The
protein-protein complex was the RNaseH-like domain of Prp8, a vital structural protein in
the spliceosome, together with its nuclear shuttling factor Aar2. By performing the CFS
campaigns against disease-relevant targets, the resulting fragment hits could be used
directly to develop tool compounds or drugs. The first steps of optimization of fragment
hits into higher affinity binders were also investigated for improvements. In summary, a
plethora of novel starting points for tool compound and drug development was identified
Recommended from our members
Automated analysis and validation of open chemical data
Methods to automatically extract Open Data from the chemical literature,
validate it, and use it to validate theory are examined.
Chemical identifiers which assist the automatic location of chemical structures
using commercial Web search engines are investigated. The IUPAC
International Chemical Idenfitifer (InChI) gives almost 100% recall and precision,
though is shown to be too long for present search engines. A combination
of InChI and InChIKey, a shorter, fixed-length hash of the InChI
string, is concluded to be the best current method of identifying structures.
The proportion of published, Open Crystallographic Information Files
(CIFs) that are valid with respect to the specification is shown to be improving,
and is around 99% in 2007. The error rate in the conversion of valid
CIFs to Chemical Markup Language (CML) is less than 0.2%. The machine
generation of connection tables from CIFs requires many heuristics, and in
some cases it is impossible to deduce the exact connection table.
CrystalEye, a fully-automated system for the reformulation of the fragmented
crystallographic Web into a structured XML-based repository is described.
Published, Open CIFs can be located and aggregated programmatically
with almost 100% recall. It is shown that, by converting CIF data
to CML, software can be created to use the latest Web standards and technologies
to enhance the ability of Web users to browse, find, keep updated,
download and reuse the latest published crystallography.
A workflow for the high-throughput calculation of solid-state geometry
using a semi-empirical method is described. A wide-range of organic and
inorganic systems provided by CrystalEye are used to test both the data and
the method. Several errors in the method are discovered, many of which can
be attributed to the parameterization process.
An Open NMR experiment to perform high-throughput prediction of 13C
chemical shifts using a GIAO protocol is described. The data and analysis
were provided on publicly-available webpages to enable crowdsourcing, which
assisted in discovering an error rate of 6.1% in the starting data. The protocol
was refined during the work and shown to have an average unsigned error
of 2.24ppm for 13C nuclei of small, rigid molecules; comparable to the errors
observed elsewhere for general structures using HOSE and Neural Network
methods
Encoding, Storing and Searching of Analytical Properties and Assigned Metabolite Structures
Informationen über Metabolite und andere kleine organische Moleküle sind von entscheidender Bedeutung in vielen verschiedenen Bereichen der Naturwissenschaften. Sie spielen z.B. eine entscheidende Rolle in metabolischen Netzwerken und das Wissen über ihre Eigenschaften, hilft komplexe biologische Prozesse und komplette biologische Systeme zu verstehen. Da in biologischen und chemischen Laboren täglich Daten anfallen, welche diese Moleküle beschreiben, existiert eine umfassende Datengrundlage, die sich kontinuierlich erweitert. Um Wissenschaftlern die Verarbeitung, den Austausch, die Archivierung und die Suche innerhalb dieser Informationen unter Erhaltung der semantischen Zusammenhänge zu ermöglichen, sind komplexe Softwaresysteme und Datenformate nötig. Das Ziel dieses Projektes bestand darin, Anwendungen und Algorithmen zu entwickeln, welche für die effiziente Kodierung, Sammlung, Normalisierung und Analyse molekularer Daten genutzt werden können. Diese sollen Wissenschaftler bei der Strukturaufklärung, der Dereplikation, der Analyse von molekularen Wechselwirkungen und bei der Veröffentlichung des so gewonnenen Wissens unterstützen. Da die direkte Beschreibung der Struktur und der Funktionsweise einer unbekannten Verbindung sehr schwierig und aufwändig ist, wird dies hauptsächlich indirekt, mit Hilfe beschreibender Eigenschaften erreicht. Diese werden dann zur Vorhersage struktureller und funktioneller Charakteristika genutzt. In diesem Zusammenhang wurden Programmmodule entwickelt, welche sowohl die Visualisierung von Struktur- und Spektroskopiedaten, die gegliederte Darstellung und Veränderung von Metadaten und Eigenschaften, als auch den Import und Export von verschiedenen Datenformaten erlauben. Diese wurden durch Methoden erweitert, welche es ermöglichen, die gewonnenen Informationen weitergehend zu analysieren und Struktur- und Spektroskopiedaten einander zuzuweisen. Außerdem wurde ein System zur strukturierten Archivierung und Verwaltung großer Mengen molekularer Daten und spektroskopischer Informationen, unter Beibehaltung der semantischen Zusammenhänge, sowohl im Dateisystem, als auch in Datenbanken, entwickelt. Um die verlustfreie Speicherung zu gewährleisten, wurde ein offenes und standardisiertes Datenformat definiert (CMLSpect). Dieses erweitert das existierende CML (Chemical Markup Language) Vokabular und erlaubt damit die einfache Handhabung von verknüpften Struktur- und Spektroskopiedaten. Die entwickelten Anwendungen wurden in das Bioclipse System für Bio- und Chemoinformatik eingebunden und bieten dem Nutzer damit eine hochqualitative Benutzeroberfläche und dem Entwickler eine leicht zu erweiternde modulare Programmarchitektur
- …