26 research outputs found

    Exploring Molecular Diversity: There is Plenty of Room at Markush's

    Get PDF
    L'estratègia de les etapes inicials del descobriment de fàrmacs està normalment basada en un procés anomenat hit-to-lead que implica un extens estudi entorn de la síntesi de derivats d'una molècula original que prèviament hagi mostrat certa activitat biològica davant d'una diana concreta. Per tant, aquest procés comporta la síntesi de molts anàlegs que descriurien una subquimioteca, que generalment evidencia que aquests estudis estan molt focalitzats al voltant de l'espai químic del compost original. Així i tot, quan aquesta molècula és finalment patentada, es descriu un espai químic molt més vast per mitjà d'estructures Markush donant per suposat que alguns dels seus derivats puguin presentar també activitat biològica. Tot i això, la presència d'aquestes estructures no implica la síntesi comprovada de tota la biblioteca molecular sinó només una petita mostra de la mateixa. La nostra hipòtesi és que hi ha una gran part de l’espai químic d’aquestes biblioteques que està sense explorar i pot amagar possibles candidats que poden fins i tot superar l’activitat del hit original. A través d'aquest projecte, es proposa una alternativa que sosté que una selecció racional de poques molècules – basat en l'agrupament segons semblança molecular – pot representar de manera més significativa l'espai químic establert, oferint la possibilitat d'explorar regions desconegudes que podrien amagar més potencial biològic. Després de revisar els darrers fàrmacs aprovats per la FDA en el període del 2008 al 2020 i la base de dades de molècules bioactives de ChEMBL, s'ha dut a terme una exploració de l'ampli espai químic resultant de molècules petites amb propietats similars a les dels medicaments per definir nous espais accessibles que podrien ocultar activitat. Els resultats obtinguts de set casos d'estudis reals han demostrat que tant la selecció racional com l’aleatòria representen més significativament les biblioteques combinatòries declarades a les patents, que les molècules descrites fins ara. S'han realitzat dos estudis pràctics que implementen aquesta metodologia suggerida per descriure millor l'espai químic del fàrmac antipalúdic Tafenoquina i del Dacomitinib, un inhibidor de tirosina cinases de segona generació per al tractament del càncer de pulmó de cèl·lules no petites. L’exploració de l’espai químic d’aquestes dues famílies ha portat a la síntesi racional de set anàlegs antipalúdics i vuit inhibidors de cinases que han mostrat interessants activitats inhibidores. Aquests resultats demostren que l'aplicació de la quimioinformàtica per a la selecció de biblioteques pot millorar la capacitat d'inspeccionar millor els conjunts de dades químiques per identificar nous compostos precandidats i representar grans biblioteques per a posteriors campanyes de reposicionament.La estrategia de las etapas iniciales del descubrimiento de fármacos está normalmente basada en un proceso denominado hit-to-lead que implica un extenso estudio entorno a la síntesis de derivados de una molécula original que previamente haya expresado cierta actividad biológica frente a una diana concreta. Por ende, este proceso conlleva la síntesis de muchos análogos que describirían una sublibrería química, la cual generalmente evidencia que estos estudios están muy focalizados alrededor del espacio químico del compuesto original. Aún y así, cuando esta molécula es finalmente patentada, se describe un espacio químico mucho más vasto por medio de estructuras Markush teorizando que algunos de sus derivados puedan presentar también actividad biológica. Sin embargo, la presencia de estas estructuras no implica la síntesis comprobada de toda la biblioteca molecular sino solo una pequeña muestra de la misma. Nuestra hipótesis es que hay una gran parte del espacio químico de estas bibliotecas que está sin explorar y puede ocultar posibles candidatos que pueden hasta superar la actividad del hit original. A través de este proyecto, se propone una alternativa que sostiene que una selección racional de pocas moléculas – fundada en el agrupamiento según su similitud química – puede representar de manera más significativa el espacio químico establecido, ofreciendo la posibilidad de explorar regiones desconocidas que podrían ocultar más potencial biológico. Después de revisar los últimos fármacos aprobados por la FDA en el período de 2008 a 2020 y la base de datos de moléculas bioactivas de ChEMBL, se ha llevado a cabo una exploración del amplio espacio químico resultante de moléculas pequeñas con propiedades similares a las de los medicamentos para definir nuevos espacios accesible que podrían ocultar actividad. Los resultados obtenidos de siete casos de estudios reales han demostrado que tanto la selección racional como la aleatoria representan más significativamente las bibliotecas combinatorias declaradas en las patentes que las moléculas descritas hasta la fecha. Se han desarrollado dos estudios prácticos que implementan esta metodología sugerida para describir mejor el espacio químico del fármaco antipalúdico Tafenoquina y Dacomitinib, un inhibidor de la tirosina quinasa de segunda generación para el tratamiento del cáncer de pulmón de células no pequeñas. La exploración del espacio químico de estas dos familias ha llevado a la síntesis racional de siete análogos antipalúdicos y ocho inhibidores de quinasas que han mostrado interesantes actividades inhibidoras. Estos resultados demuestran que la aplicación de la quimioinformática para la selección de bibliotecas puede mejorar la capacidad de inspeccionar mejor los conjuntos de datos químicos para identificar nuevos potenciales hits y representar grandes bibliotecas para fines de reposicionamiento.The early Drug Discovery strategy is commonly based on a hit-to-lead process which involves large research on the synthesis of derivatives of an original molecule that had previously shown biological activity against a specific biological target. Therefore, this process implies the synthesis of many analogs leading to the description of a chemical sub-library which generally leads to a highly focused study on the chemical space nearby the hit compound. However, when this drug is finally patented, a wider chemical space derived from a Markush structure is described, theorizing that some analogs within may present biological activity. Nevertheless, this claim involving the Markush structure does not imply the proven synthesis of all the chemical library but just a small population of it. We hypothesize that there is a great part of the chemical space of these libraries that is unexplored and can hide potential lead candidates which may even surpass the activity of the original hit. Through this project, an alternative is proposed claiming that a rational selection of a short sample of small molecules – founded on similarity-based clustering – can represent more significatively the stated chemical space offering the possibility to explore the unknown space that could hide more potential biological activity. After a review on the latest approved drugs by the FDA in the period from 2008 to 2020 and the ChEMBL database of bioactive molecules, an exploration of the resulting wide chemical space of small molecules with drug-like properties has been assessed in order to define accessible spots that might hide biological activity. The obtained results from seven real cases of study have proven that random and rationally selected molecules represent more significantly the combinatorial libraries stated in the patents rather than the reported molecules until date. Furthermore, two practical studies implementing our suggested methodology have been developed to better describe the chemical space of the antimalarial drug Tafenoquine and Dacomitinib, a second-generation tyrosine kinase inhibitor for non-small-cell lung cancer treatment. The assessment driven by a better chemical space exploration of these two families have led to the rational synthesis of seven antimalarial analogs and eight kinase inhibitors which have shown interesting inhibitory activities. Our results evince that the application of cheminformatics for library selection may improve the ability to better inspect chemical datasets in order to identify new potential hits and represent large libraries for further reprofiling purposes

    Information retrieval and text mining technologies for chemistry

    Get PDF
    Efficient access to chemical information contained in scientific literature, patents, technical reports, or the web is a pressing need shared by researchers and patent attorneys from different chemical disciplines. Retrieval of important chemical information in most cases starts with finding relevant documents for a particular chemical compound or family. Targeted retrieval of chemical documents is closely connected to the automatic recognition of chemical entities in the text, which commonly involves the extraction of the entire list of chemicals mentioned in a document, including any associated information. In this Review, we provide a comprehensive and in-depth description of fundamental concepts, technical implementations, and current technologies for meeting these information demands. A strong focus is placed on community challenges addressing systems performance, more particularly CHEMDNER and CHEMDNER patents tasks of BioCreative IV and V, respectively. Considering the growing interest in the construction of automatically annotated chemical knowledge bases that integrate chemical information and biological data, cheminformatics approaches for mapping the extracted chemical names into chemical structures and their subsequent annotation together with text mining applications for linking chemistry with biological information are also presented. Finally, future trends and current challenges are highlighted as a roadmap proposal for research in this emerging field.A.V. and M.K. acknowledge funding from the European Community’s Horizon 2020 Program (project reference: 654021 - OpenMinted). M.K. additionally acknowledges the Encomienda MINETAD-CNIO as part of the Plan for the Advancement of Language Technology. O.R. and J.O. thank the Foundation for Applied Medical Research (FIMA), University of Navarra (Pamplona, Spain). This work was partially funded by Consellería de Cultura, Educación e Ordenación Universitaria (Xunta de Galicia), and FEDER (European Union), and the Portuguese Foundation for Science and Technology (FCT) under the scope of the strategic funding of UID/BIO/04469/2013 unit and COMPETE 2020 (POCI-01-0145-FEDER-006684). We thank Iñigo Garciá -Yoldi for useful feedback and discussions during the preparation of the manuscript.info:eu-repo/semantics/publishedVersio

    Structure-based classification and ontology in chemistry

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Recent years have seen an explosion in the availability of data in the chemistry domain. With this information explosion, however, retrieving <it>relevant </it>results from the available information, and <it>organising </it>those results, become even harder problems. Computational processing is essential to filter and organise the available resources so as to better facilitate the work of scientists. Ontologies encode expert domain knowledge in a hierarchically organised machine-processable format. One such ontology for the chemical domain is ChEBI. ChEBI provides a classification of chemicals based on their structural features and a role or activity-based classification. An example of a structure-based class is 'pentacyclic compound' (compounds containing five-ring structures), while an example of a role-based class is 'analgesic', since many different chemicals can act as analgesics without sharing structural features. Structure-based classification in chemistry exploits elegant regularities and symmetries in the underlying chemical domain. As yet, there has been neither a systematic analysis of the types of structural classification in use in chemistry nor a comparison to the capabilities of available technologies.</p> <p>Results</p> <p>We analyze the different categories of structural classes in chemistry, presenting a list of patterns for features found in class definitions. We compare these patterns of class definition to tools which allow for automation of hierarchy construction within cheminformatics and within logic-based ontology technology, going into detail in the latter case with respect to the expressive capabilities of the Web Ontology Language and recent extensions for modelling structured objects. Finally we discuss the relationships and interactions between cheminformatics approaches and logic-based approaches.</p> <p>Conclusion</p> <p>Systems that perform intelligent reasoning tasks on chemistry data require a diverse set of underlying computational utilities including algorithmic, statistical and logic-based tools. For the task of automatic structure-based classification of chemical entities, essential to managing the vast swathes of chemical data being brought online, systems which are capable of hybrid reasoning combining several different approaches are crucial. We provide a thorough review of the available tools and methodologies, and identify areas of open research.</p

    Development and implementation of in silico molecule fragmentation algorithms for the cheminformatics analysis of natural product spaces

    Get PDF
    Computational methodologies extracting specific substructures like functional groups or molecular scaffolds from input molecules can be grouped under the term “in silico molecule fragmentation”. They can be used to investigate what specifically characterises a heterogeneous compound class, like pharmaceuticals or Natural Products (NP) and in which aspects they are similar or dissimilar. The aim is to determine what specifically characterises NP structures to transfer patterns favourable for bioactivity to drug development. As part of this thesis, the first algorithmic approach to in silico deglycosylation, the removal of glycosidic moieties for the study of aglycones, was developed with the Sugar Removal Utility (SRU) (Publication A). The SRU has also proven useful for investigating NP glycoside space. It was applied to one of the largest open NP databases, COCONUT (COlleCtion of Open Natural prodUcTs), for this purpose (Publication B). A contribution was made to the Chemistry Development Kit (CDK) by developing the open Scaffold Generator Java library (Publication C). Scaffold Generator can extract different scaffold types and dissect them into smaller parent scaffolds following the scaffold tree or scaffold network approach. Publication D describes the OngLai algorithm, the first automated method to identify homologous series in input datasets, group the member structures of each group, and extract their common core. To support the development of new fragmentation algorithms, the open Java rich client graphical user interface application MORTAR (MOlecule fRagmenTAtion fRamework) was developed as part of this thesis (Publication E). MORTAR allows users to quickly execute the steps of importing a structural dataset, applying a fragmentation algorithm, and visually inspecting the results in different ways. All software developed as part of this thesis is freely and openly available (see https://github.com/JonasSchaub)
    corecore