5,771 research outputs found

    Combining evolutionary algorithms with reaction rules towards focused molecular design

    Get PDF
    Designing novel small molecules with desirable properties and feasible synthesis continues to pose a significant challenge in drug discovery, particularly in the realm of natural products. Reaction-based gradient-free methods are promising approaches for designing new molecules as they ensure synthetic feasibility and provide potential synthesis paths. However, it is important to note that the novelty and diversity of the generated molecules highly depend on the availability of comprehensive reaction templates. To address this challenge, we introduce ReactEA, a new open-source evolutionary framework for computer-aided drug discovery that solely utilizes biochemical reaction rules. ReactEA optimizes molecular properties using a comprehensive set of 22,949 reaction rules, ensuring chemical validity and synthetic feasibility. ReactEA is versatile, as it can virtually optimize any objective function and track potential synthetic routes during the optimization process. To demonstrate its effectiveness, we apply ReactEA to various case studies, including the design of novel drug-like molecules and the optimization of pre-existing ligands. The results show that ReactEA consistently generates novel molecules with improved properties and reasonable synthetic routes, even for complex tasks such as improving binding affinity against the PARP1 enzyme when compared to existing inhibitors.Centre of Biological Engineering (CEB, University of Minho) for financial and equipment support. Portuguese Foundation for Science and Technology (FCT) under the scope of the strategic funding of UIDB/04469/2020 unit and through a Ph.D. scholarship awarded to JoĂŁo Correia (SFRH/BD/144314/2019). European Commission through the project SHIKIFACTORY100 - Modular cell factories for the production of 100 compounds from the shikimate pathway (Reference 814408).info:eu-repo/semantics/publishedVersio

    Quantitative predictions of peptide binding to any HLA-DR molecule of known sequence: NetMHCIIpan

    Get PDF
    CD4 positive T helper cells control many aspects of specific immunity. These cells are specific for peptides derived from protein antigens and presented by molecules of the extremely polymorphic major histocompatibility complex (MHC) class II system. The identification of peptides that bind to MHC class II molecules is therefore of pivotal importance for rational discovery of immune epitopes. HLA-DR is a prominent example of a human MHC class II. Here, we present a method, NetMHCIIpan, that allows for pan-specific predictions of peptide binding to any HLA-DR molecule of known sequence. The method is derived from a large compilation of quantitative HLA-DR binding events covering 14 of the more than 500 known HLA-DR alleles. Taking both peptide and HLA sequence information into account, the method can generalize and predict peptide binding also for HLA-DR molecules where experimental data is absent. Validation of the method includes identification of endogenously derived HLA class II ligands, cross-validation, leave-one-molecule-out, and binding motif identification for hitherto uncharacterized HLA-DR molecules. The validation shows that the method can successfully predict binding for HLA-DR molecules-even in the absence of specific data for the particular molecule in question. Moreover, when compared to TEPITOPE, currently the only other publicly available prediction method aiming at providing broad HLA-DR allelic coverage, NetMHCIIpan performs equivalently for alleles included in the training of TEPITOPE while outperforming TEPITOPE on novel alleles. We propose that the method can be used to identify those hitherto uncharacterized alleles, which should be addressed experimentally in future updates of the method to cover the polymorphism of HLA-DR most efficiently. We thus conclude that the presented method meets the challenge of keeping up with the MHC polymorphism discovery rate and that it can be used to sample the MHC "space," enabling a highly efficient iterative process for improving MHC class II binding predictions

    On Approximating Four Covering and Packing Problems

    Get PDF
    In this paper, we consider approximability issues of the following four problems: triangle packing, full sibling reconstruction, maximum profit coverage and 2-coverage. All of them are generalized or specialized versions of set-cover and have applications in biology ranging from full-sibling reconstructions in wild populations to biomolecular clusterings; however, as this paper shows, their approximability properties differ considerably. Our inapproximability constant for the triangle packing problem improves upon the previous results; this is done by directly transforming the inapproximability gap of Haastad for the problem of maximizing the number of satisfied equations for a set of equations over GF(2) and is interesting in its own right. Our approximability results on the full siblings reconstruction problems answers questions originally posed by Berger-Wolf et al. and our results on the maximum profit coverage problem provides almost matching upper and lower bounds on the approximation ratio, answering a question posed by Hassin and Or.Comment: 25 page

    Exploring Molecular Diversity: There is Plenty of Room at Markush's

    Get PDF
    L'estratĂšgia de les etapes inicials del descobriment de fĂ rmacs estĂ  normalment basada en un procĂ©s anomenat hit-to-lead que implica un extens estudi entorn de la sĂ­ntesi de derivats d'una molĂšcula original que prĂšviament hagi mostrat certa activitat biolĂČgica davant d'una diana concreta. Per tant, aquest procĂ©s comporta la sĂ­ntesi de molts anĂ legs que descriurien una subquimioteca, que generalment evidencia que aquests estudis estan molt focalitzats al voltant de l'espai quĂ­mic del compost original. AixĂ­ i tot, quan aquesta molĂšcula Ă©s finalment patentada, es descriu un espai quĂ­mic molt mĂ©s vast per mitjĂ  d'estructures Markush donant per suposat que alguns dels seus derivats puguin presentar tambĂ© activitat biolĂČgica. Tot i aixĂČ, la presĂšncia d'aquestes estructures no implica la sĂ­ntesi comprovada de tota la biblioteca molecular sinĂł nomĂ©s una petita mostra de la mateixa. La nostra hipĂČtesi Ă©s que hi ha una gran part de l’espai quĂ­mic d’aquestes biblioteques que estĂ  sense explorar i pot amagar possibles candidats que poden fins i tot superar l’activitat del hit original. A travĂ©s d'aquest projecte, es proposa una alternativa que sostĂ© que una selecciĂł racional de poques molĂšcules – basat en l'agrupament segons semblança molecular – pot representar de manera mĂ©s significativa l'espai quĂ­mic establert, oferint la possibilitat d'explorar regions desconegudes que podrien amagar mĂ©s potencial biolĂČgic. DesprĂ©s de revisar els darrers fĂ rmacs aprovats per la FDA en el perĂ­ode del 2008 al 2020 i la base de dades de molĂšcules bioactives de ChEMBL, s'ha dut a terme una exploraciĂł de l'ampli espai quĂ­mic resultant de molĂšcules petites amb propietats similars a les dels medicaments per definir nous espais accessibles que podrien ocultar activitat. Els resultats obtinguts de set casos d'estudis reals han demostrat que tant la selecciĂł racional com l’aleatĂČria representen mĂ©s significativament les biblioteques combinatĂČries declarades a les patents, que les molĂšcules descrites fins ara. S'han realitzat dos estudis prĂ ctics que implementen aquesta metodologia suggerida per descriure millor l'espai quĂ­mic del fĂ rmac antipalĂșdic Tafenoquina i del Dacomitinib, un inhibidor de tirosina cinases de segona generaciĂł per al tractament del cĂ ncer de pulmĂł de cĂšl·lules no petites. L’exploraciĂł de l’espai quĂ­mic d’aquestes dues famĂ­lies ha portat a la sĂ­ntesi racional de set anĂ legs antipalĂșdics i vuit inhibidors de cinases que han mostrat interessants activitats inhibidores. Aquests resultats demostren que l'aplicaciĂł de la quimioinformĂ tica per a la selecciĂł de biblioteques pot millorar la capacitat d'inspeccionar millor els conjunts de dades quĂ­miques per identificar nous compostos precandidats i representar grans biblioteques per a posteriors campanyes de reposicionament.La estrategia de las etapas iniciales del descubrimiento de fĂĄrmacos estĂĄ normalmente basada en un proceso denominado hit-to-lead que implica un extenso estudio entorno a la sĂ­ntesis de derivados de una molĂ©cula original que previamente haya expresado cierta actividad biolĂłgica frente a una diana concreta. Por ende, este proceso conlleva la sĂ­ntesis de muchos anĂĄlogos que describirĂ­an una sublibrerĂ­a quĂ­mica, la cual generalmente evidencia que estos estudios estĂĄn muy focalizados alrededor del espacio quĂ­mico del compuesto original. AĂșn y asĂ­, cuando esta molĂ©cula es finalmente patentada, se describe un espacio quĂ­mico mucho mĂĄs vasto por medio de estructuras Markush teorizando que algunos de sus derivados puedan presentar tambiĂ©n actividad biolĂłgica. Sin embargo, la presencia de estas estructuras no implica la sĂ­ntesis comprobada de toda la biblioteca molecular sino solo una pequeña muestra de la misma. Nuestra hipĂłtesis es que hay una gran parte del espacio quĂ­mico de estas bibliotecas que estĂĄ sin explorar y puede ocultar posibles candidatos que pueden hasta superar la actividad del hit original. A travĂ©s de este proyecto, se propone una alternativa que sostiene que una selecciĂłn racional de pocas molĂ©culas – fundada en el agrupamiento segĂșn su similitud quĂ­mica – puede representar de manera mĂĄs significativa el espacio quĂ­mico establecido, ofreciendo la posibilidad de explorar regiones desconocidas que podrĂ­an ocultar mĂĄs potencial biolĂłgico. DespuĂ©s de revisar los Ășltimos fĂĄrmacos aprobados por la FDA en el perĂ­odo de 2008 a 2020 y la base de datos de molĂ©culas bioactivas de ChEMBL, se ha llevado a cabo una exploraciĂłn del amplio espacio quĂ­mico resultante de molĂ©culas pequeñas con propiedades similares a las de los medicamentos para definir nuevos espacios accesible que podrĂ­an ocultar actividad. Los resultados obtenidos de siete casos de estudios reales han demostrado que tanto la selecciĂłn racional como la aleatoria representan mĂĄs significativamente las bibliotecas combinatorias declaradas en las patentes que las molĂ©culas descritas hasta la fecha. Se han desarrollado dos estudios prĂĄcticos que implementan esta metodologĂ­a sugerida para describir mejor el espacio quĂ­mico del fĂĄrmaco antipalĂșdico Tafenoquina y Dacomitinib, un inhibidor de la tirosina quinasa de segunda generaciĂłn para el tratamiento del cĂĄncer de pulmĂłn de cĂ©lulas no pequeñas. La exploraciĂłn del espacio quĂ­mico de estas dos familias ha llevado a la sĂ­ntesis racional de siete anĂĄlogos antipalĂșdicos y ocho inhibidores de quinasas que han mostrado interesantes actividades inhibidoras. Estos resultados demuestran que la aplicaciĂłn de la quimioinformĂĄtica para la selecciĂłn de bibliotecas puede mejorar la capacidad de inspeccionar mejor los conjuntos de datos quĂ­micos para identificar nuevos potenciales hits y representar grandes bibliotecas para fines de reposicionamiento.The early Drug Discovery strategy is commonly based on a hit-to-lead process which involves large research on the synthesis of derivatives of an original molecule that had previously shown biological activity against a specific biological target. Therefore, this process implies the synthesis of many analogs leading to the description of a chemical sub-library which generally leads to a highly focused study on the chemical space nearby the hit compound. However, when this drug is finally patented, a wider chemical space derived from a Markush structure is described, theorizing that some analogs within may present biological activity. Nevertheless, this claim involving the Markush structure does not imply the proven synthesis of all the chemical library but just a small population of it. We hypothesize that there is a great part of the chemical space of these libraries that is unexplored and can hide potential lead candidates which may even surpass the activity of the original hit. Through this project, an alternative is proposed claiming that a rational selection of a short sample of small molecules – founded on similarity-based clustering – can represent more significatively the stated chemical space offering the possibility to explore the unknown space that could hide more potential biological activity. After a review on the latest approved drugs by the FDA in the period from 2008 to 2020 and the ChEMBL database of bioactive molecules, an exploration of the resulting wide chemical space of small molecules with drug-like properties has been assessed in order to define accessible spots that might hide biological activity. The obtained results from seven real cases of study have proven that random and rationally selected molecules represent more significantly the combinatorial libraries stated in the patents rather than the reported molecules until date. Furthermore, two practical studies implementing our suggested methodology have been developed to better describe the chemical space of the antimalarial drug Tafenoquine and Dacomitinib, a second-generation tyrosine kinase inhibitor for non-small-cell lung cancer treatment. The assessment driven by a better chemical space exploration of these two families have led to the rational synthesis of seven antimalarial analogs and eight kinase inhibitors which have shown interesting inhibitory activities. Our results evince that the application of cheminformatics for library selection may improve the ability to better inspect chemical datasets in order to identify new potential hits and represent large libraries for further reprofiling purposes

    Capturing One of the Human Gut Microbiome's Most Wanted:Reconstructing the Genome of a Novel Butyrate-Producing, Clostridia! Scavenger from Metagenomic Sequence Data

    Get PDF
    The role of the microbiome in health and disease is attracting great attention, yet we still know little about some of the most prevalent microorganisms inside our bodies. Several years ago, Human Microbiome Project (HMP) researchers generated a list of “most wanted” taxa: bacteria both prevalent among healthy volunteers and distantly related to any sequenced organisms. Unfortunately, the challenge of assembling high-quality genomes from a tangle of metagenomic reads has slowed progress in learning about these uncultured bacteria. Here, we describe how recent advances in sequencing and analysis allowed us to assemble “most wanted” genomes from metagenomic data collected from four stool samples. Using a combination of both de novo and guided assembly methods, we assembled and binned over 100 genomes from an initial data set of over 1,300 Gbp. One of these genome bins, which met HMP’s criteria for a “most wanted” taxa, contained three essentially complete genomes belonging to a previously uncultivated species. This species is most closely related to Eubacterium desmolans and the clostridial cluster IV/Clostridium leptum subgroup species Butyricicoccus pullicaecorum (71–76% average nucleotide identity). Gene function analysis indicates that the species is an obligate anaerobe, forms spores, and produces the anti-inflammatory short-chain fatty acids acetate and butyrate. It also appears to take up metabolically costly molecules such as cobalamin, methionine, and branch-chained amino acids from the environment, and to lack virulence genes. Thus, the evidence is consistent with a secondary degrader that occupies a host-dependent, nutrient-scavenging niche within the gut; its ability to produce butyrate, which is thought to play an anti-inflammatory role, makes it intriguing for the study of diseases such as colon cancer and inflammatory bowel disease. In conclusion, we have assembled essentially complete genomes from stool metagenomic data, yielding valuable information about uncultured organisms’ metabolic and ecologic niches, factors that may be required to successfully culture these bacteria, and their role in maintaining health and causing disease

    An application in bioinformatics : a comparison of affymetrix and compugen human genome microarrays

    Get PDF
    The human genome microarrays from CompugenÂź and AffymetrixÂź were compared in the context of the emerging field of computational biology. The two premier database servers for genomic sequence data, the National Center for Biotechnology Information and the European Bioinformatics Institute, were described in detail. The various databases and data mining tools available through these data servers were also discussed. Microarrays were examined from a historical perspective and their main current applications-expression analysis, mutation analysis, and comparative genomic hybridization-were discussed. The two main types of microarrays, cDNA spotted microarrays and high-density spotted microarrays were analyzed by exploring the human genome microarray from CompugenÂź and the HGU133 Set from AffymetrixÂź respectively. Array design issues, sequence collection and analysis, and probe selection processes for the two representative types of arrays were described. The respective chip design of the two types of microarrays was also analyzed. It was found that the human genome microarray from Compugen 0 contains probes that interrogate 1,119,840 bases corresponding to 18,664 genes, while the HG-U133 Set from AffymetrixÂź contains probes that interrogate only 825,000 bases corresponding to 33,000 genes. Based on this, the efficiency of the 25-mer probes of the HG-U133 Set from AffymetrixÂź compared to the 60-mer probes of the microarray from CompugenÂź was questioned

    Interpretation of QSAR Models: Mining Structural Patterns Taking into Account Molecular Context.

    Get PDF
    The study focused on QSAR model interpretation. The goal was to develop a workflow for the identification of molecular fragments in different contexts important for the property modelled. Using a previously established approach - Structural and physicochemical interpretation of QSAR models (SPCI) - fragment contributions were calculated and their relative influence on the compounds' properties characterised. Analysis of the distributions of these contributions using Gaussian mixture modelling was performed to identify groups of compounds (clusters) comprising the same fragment, where these fragments had substantially different contributions to the property studied. SMARTSminer was used to detect patterns discriminating groups of compounds from each other and visual inspection if the former did not help. The approach was applied to analyse the toxicity, in terms of 40 hour inhibition of growth, of 1984 compounds to Tetrahymena pyriformis. The results showed that the clustering technique correctly identified known toxicophoric patterns: it detected groups of compounds where fragments have specific molecular context making them contribute substantially more to toxicity. The results show the applicability of the interpretation of QSAR models to retrieve reasonable patterns, even from data sets consisting of compounds having different mechanisms of action, something which is difficult to achieve using conventional pattern/data mining approaches
    • 

    corecore