1,029 research outputs found

    ScaffoldGraph: an open-source library for the generation and analysis of molecular scaffold networks and scaffold trees

    Get PDF
    SUMMARY: ScaffoldGraph (SG) is an open-source Python library and command-line tool for the generation and analysis of molecular scaffold networks and trees, with the capability of processing large sets of input molecules. With the increase in high-throughput screening (HTS) data, scaffold graphs have proven useful for the navigation and analysis of chemical space, being used for visualisation, clustering, scaffold-diversity analysis and active-series identification. Built on RDKit and NetworkX, SG integrates scaffold graph analysis into the growing scientific/cheminformatics Python stack, increasing the flexibility and extendibility of the tool compared to existing software. AVAILABILITY AND IMPLEMENTATION: SG is freely available and released under the MIT license at https://github.com/UCLCheminformatics/ScaffoldGraph

    MolMap - Visualizing Molecule Libraries as Topographic Maps

    Get PDF
    We present a new application for graph drawing and visualization in the context of drug discovery. Combining the scaffold-based cluster hierarchy with molecular similarity graphs — both standard concepts in cheminfor- matics — allows one to get new insights for analyzing large molecule libraries. The derived clustered graphs represent different aspects of structural similarity. We suggest visualizing them as topographic maps. Since the cluster hierarchy does not reflect the underlying graph structure as in (Gronemann and Jünger, 2012), we suggest a new partitioning algorithm that takes the edges of the graph into account. Experiments show that the new algorithm leads to significant improvements in terms of the edge lengths in the obtained drawings

    MolMap - Visualizing Molecule Libraries as Topographic Maps

    Get PDF
    We present a new application for graph drawing and visualization in the context of drug discovery. Combining the scaffold-based cluster hierarchy with molecular similarity graphs — both standard concepts in cheminfor- matics — allows one to get new insights for analyzing large molecule libraries. The derived clustered graphs represent different aspects of structural similarity. We suggest visualizing them as topographic maps. Since the cluster hierarchy does not reflect the underlying graph structure as in (Gronemann and Jünger, 2012), we suggest a new partitioning algorithm that takes the edges of the graph into account. Experiments show that the new algorithm leads to significant improvements in terms of the edge lengths in the obtained drawings

    Structure-based classification and ontology in chemistry

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Recent years have seen an explosion in the availability of data in the chemistry domain. With this information explosion, however, retrieving <it>relevant </it>results from the available information, and <it>organising </it>those results, become even harder problems. Computational processing is essential to filter and organise the available resources so as to better facilitate the work of scientists. Ontologies encode expert domain knowledge in a hierarchically organised machine-processable format. One such ontology for the chemical domain is ChEBI. ChEBI provides a classification of chemicals based on their structural features and a role or activity-based classification. An example of a structure-based class is 'pentacyclic compound' (compounds containing five-ring structures), while an example of a role-based class is 'analgesic', since many different chemicals can act as analgesics without sharing structural features. Structure-based classification in chemistry exploits elegant regularities and symmetries in the underlying chemical domain. As yet, there has been neither a systematic analysis of the types of structural classification in use in chemistry nor a comparison to the capabilities of available technologies.</p> <p>Results</p> <p>We analyze the different categories of structural classes in chemistry, presenting a list of patterns for features found in class definitions. We compare these patterns of class definition to tools which allow for automation of hierarchy construction within cheminformatics and within logic-based ontology technology, going into detail in the latter case with respect to the expressive capabilities of the Web Ontology Language and recent extensions for modelling structured objects. Finally we discuss the relationships and interactions between cheminformatics approaches and logic-based approaches.</p> <p>Conclusion</p> <p>Systems that perform intelligent reasoning tasks on chemistry data require a diverse set of underlying computational utilities including algorithmic, statistical and logic-based tools. For the task of automatic structure-based classification of chemical entities, essential to managing the vast swathes of chemical data being brought online, systems which are capable of hybrid reasoning combining several different approaches are crucial. We provide a thorough review of the available tools and methodologies, and identify areas of open research.</p

    Systematic Identification of Scaffolds Representing Different Types of Structure-Activity Relationships

    Get PDF
    In medicinal chemistry, it is of central importance to understand structure-activity relationships (SARs) of small bioactive compounds. Typically, SARs are analyzed on a case-by-case basis for sets of compounds active against a given target. However, the increasing amount of compound activity data that is becoming available allows SARs to be explored on a large-scale. Moreover, molecular scaffolds derived from bioactive compounds are also of high interest for SAR analysis. In general, scaffolds are obtained by removing all substituents from rings and from linkers between rings. This thesis aims at systematically mining compounds for which activity annotations are available and investigating relationships between chemical structure and biological activities at the level of active compounds, in particular, molecular scaffolds. Therefore, data mining approaches are designed to identify scaffolds with different structural and/or activity characteristics. Initially, scaffold distributions in compounds at different stages of pharmaceutical development are analyzed. Sets of scaffolds that overlap between different stages or preferentially occur at certain stages are identified. Furthermore, a systematic selectivity profile analysis of public domain active compounds is carried out. Scaffolds that yield compounds selective for communities of closely related targets and represent compounds selective only for one particular target over others are identified. In addition, the degree of promiscuity of scaffolds is thoroughly examined. Eighty-three scaffolds covering 33 chemotypes correspond to compounds active against at least three different target families and thus are considered to be promiscuous. Moreover, by integrating pairwise scaffold similarity and compound potency differences, the propensity of scaffolds to form multi-target activity or selectivity cliffs and, in addition, the global scaffold potential of individual targets are quantitatively assessed, respectively. Finally, structural relationships between scaffolds are systematically explored. Most scaffolds extracted from active compounds are found to be involved in substructure relationships and/or share topological features with others. These substructure relationships are also compared to, and combined with, hierarchical substructure relationships to facilitate activity prediction

    Computational Methods for Structure-Activity Relationship Analysis and Activity Prediction

    Get PDF
    Structure-activity relationship (SAR) analysis of small bioactive compounds is a key task in medicinal chemistry. Traditionally, SARs were established on a case-by-case basis. However, with the arrival of high-throughput screening (HTS) and synthesis techniques, a surge in the size and structural heterogeneity of compound data is seen and the use of computational methods to analyse SARs has become imperative and valuable. In recent years, graphical methods have gained prominence for analysing SARs. The choice of molecular representation and the method of assessing similarities affects the outcome of the SAR analysis. Thus, alternative methods providing distinct points of view of SARs are required. In this thesis, a novel graphical representation utilizing the canonical scaffold-skeleton definition to explore meaningful global and local SAR patterns in compound data is introduced. Furthermore, efforts have been made to go beyond descriptive SAR analysis offered by the graphical methods. SAR features inferred from descriptive methods are utilized for compound activity predictions. In this context, a data structure called SAR matrix (SARM), which is reminiscent of conventional R-group tables, is utilized. SARMs suggest many virtual compounds that represent as of yet unexplored chemical space. These virtual compounds are candidates for further exploration but are too many to prioritize simply on the basis of visual inspection. Conceptually different approaches to enable systematic compound prediction and prioritization are introduced. Much emphasis is put on evolving the predictive ability for prospective compound design. Going beyond SAR analysis, the SARM method has also been adapted to navigate multi-target spaces primarily for analysing compound promiscuity patterns. Thus, the original SARM methodology has been further developed for a variety of medicinal chemistry and chemogenomics applications

    Development and implementation of in silico molecule fragmentation algorithms for the cheminformatics analysis of natural product spaces

    Get PDF
    Computational methodologies extracting specific substructures like functional groups or molecular scaffolds from input molecules can be grouped under the term “in silico molecule fragmentation”. They can be used to investigate what specifically characterises a heterogeneous compound class, like pharmaceuticals or Natural Products (NP) and in which aspects they are similar or dissimilar. The aim is to determine what specifically characterises NP structures to transfer patterns favourable for bioactivity to drug development. As part of this thesis, the first algorithmic approach to in silico deglycosylation, the removal of glycosidic moieties for the study of aglycones, was developed with the Sugar Removal Utility (SRU) (Publication A). The SRU has also proven useful for investigating NP glycoside space. It was applied to one of the largest open NP databases, COCONUT (COlleCtion of Open Natural prodUcTs), for this purpose (Publication B). A contribution was made to the Chemistry Development Kit (CDK) by developing the open Scaffold Generator Java library (Publication C). Scaffold Generator can extract different scaffold types and dissect them into smaller parent scaffolds following the scaffold tree or scaffold network approach. Publication D describes the OngLai algorithm, the first automated method to identify homologous series in input datasets, group the member structures of each group, and extract their common core. To support the development of new fragmentation algorithms, the open Java rich client graphical user interface application MORTAR (MOlecule fRagmenTAtion fRamework) was developed as part of this thesis (Publication E). MORTAR allows users to quickly execute the steps of importing a structural dataset, applying a fragmentation algorithm, and visually inspecting the results in different ways. All software developed as part of this thesis is freely and openly available (see https://github.com/JonasSchaub)

    Disain ja modelleerimine HIV-1 pöördtranskriptaasi ja Malaaria ravimite väljatöötamise varajases faasis

    Get PDF
    Väitekirja elektrooniline versioon ei sisalda publikatsiooneKäesolev uurimus keskendub kahele ohtlikule infektsioonhaigusele: inimese immuunpuudulikkuse viirus tüüp 1 (HIV-1) ja malaaria. Uue ravimi väljatöötamine algusest lõppuni on aega nõudev ning kulukas protsess, mis jaotatakse viieks etapiks: baas uurimistöö, põhi sihtmärgi ja baas ühendi(te) leidmine, eelkliiniline arendus, kliiniline arendus ja vajalike dokumentide esitamine ravimiametisse. Antud väitekirjas keskendutakse kahele esimesele etappidele, mida tuntakse ka varajase ravimiarenduse faasina. HIV-1 uurimisel oli kaks põhisuunda. Esmalt tuginedes eelnevalt tehtud virtuaalsõelumise tulemustele teostati uudsete s-triasiini derivaatide avastamine, disainimine, ja süntees, mille tulemused valideeriti eksperimentaalselt ning analüüsiti valk-ligand interaktsioonimudelite abil. Kõige tõhusam HIV-1 mitte-nukleosiidne pöördtranskriptaasi inhibiitor oli madala molekulmassiga, heade ligandi efektiivsust näitavate parameetritega, ja madala toksilisusega, võimaldades edasist modifitseerimist ja arendamist. Tehtud aktiivse keemilise struktuuri avastus motiveeris HIV-1 inhibiitorite keemilise struktuuriruumi laiemat uurimist, et kindlaks teha kas uudsed s-triasiinid moodustavad ka unikaalsed keemiliste ühendite grupi HIV-1 mitte-nukleosiidsete pöördtranskriptaasi inhibiitorite maastikul. Selle läbiviimiseks koostati, korrastati ja kureeriti ChEMBL-i andmebaasist saadud andmetest fokusseeritud andmeseeriad HIV-1 mitte-nukleosiidne ja nukelosiidsete pöördtranskriptaasi inhibiitorite jaoks, kuhu lisati ka avastatud s-triasiini derivaadid. Andmeseeriate struktuuride analüüs hierarhilise klassifitseerimise meetodil grupeeris ühendid keemiliste struktuuritüüpide (nn. vanematüüp) järgi. Selgus, et avastatud s-triasiinid moodustasid eraldiseisva struktuuritüübi grupi. Leitud struktuuritüüpe analüüsiti, lisades juurde ka vastavad mõõdetud seondumise afiinsuse tasakaalukonstandid (Ki). Selle analüüsi käigus toodi välja struktuurifragmendid, mis omavad olulist rolli afiinsuse ning stabiilsuse seisukohast. Lisaks võimaldasid struktuurselt mitmekesised ja unikaalsed HIV-1 mitte-nukleosiidne ja nukelosiidsete pöördtranskriptaasi inhibiitorite andmeseeriad esmakordselt arendada kirjeldavaid kvantitatiivsete struktuur-aktiivsus sõltuvuste prognoosmudeleid, mida on võimalik kasutada järgnevas uurimustöös uute aktiivsete keemiliste ühendite avastamisel. Selleks et leida uudseid malaaria ravimikanditaate koostati ja kureeriti süsteemselt andmebaas eksperimentaalsete anti-Plasmodium andmetega kasutades nii asutusesisesed, kui ka ChEMBL-i andmebaasis olevad andmed. Saadud andmete ulatusliku kureerimise, filtreerimise ning ühendamise tulemusena saadi kolmkümmend modelleeritavat andmeseeriat, millele koostati klassifitseerimise mudelid, eesmärgiga eristada aktiivsed ja mitteaktiivsed ühendid. Nendest seitsmeteistkümnele andmeseeriale saadi ennustusvõimelised nn. üksmeele (inglise keeles consensus) mudelid. Loodud mudelitega teostati ennustusi asutusesiseselt olemasolevatele curcuminoidide seerjale ning nende analoogidele, millest parima ennustusvõimega ühenditele teostati eksperimentaalne valideerimine in vitro katsetega, kus aktiivseks osutusid seitseteist ühendit, mida saab edasistes uuringutes täpsemini uurida. Samuti tehti kindaks, et arvutuslikult tuvastatud mitteaktiivsed ühendid jäid mitteaktiivseks ka eksperimentaalse valideerimise käigus, mis näitas süsteemselt kureeritud ja koostatud andmeseeriate ning prognoosmudelite jätkusuutlikust.Current thesis focused on study of two highly prevalent infections affecting many regions in the world: alaria and human immunodeficiency virus 1 (HIV-1). Developing a new drug from scratch is time consuming and costly process. This could be divided into five stages: basic research, lead target and lead compound(s) discovery, preclinical development, clinical development and filing to drug administration agency. Present thesis focused on basic research and lead compound discovery stages, i.e. to the early drug discovery. For the HIV-1, the focus was two-fold. First, based on the earlier multi-objective in silico screening, novel s-triazine derivatives were designed, discovered, synthesized, and findings where supported by the modelling tasks and validated with biological evaluation. The most potent compound is with small molecular size, potent ligand efficiencies, and measured low toxicity permitting further exploration and modifications. Second, the discovered new bioactive s-triazines motivated to analyse the chemical landscape of HIV-1 RT inhibitors. For this the dataset was systematically created and curated for HIV-1 NNRT (non-nucleoside reverse transcriptase) and NRT (nucleoside reverse transcriptase) inhibitors based on data from ChEMBL database. The hierarchical classification of scaffold structures of curated datasets revealed common chemical parent types for the compounds, hierarchy in chemical structures and showed that discovered s-triazines formed a separate structural parent type group. Each group of compounds related to the parent type was analysed and examined together with corresponding binding affinity equilibrium constants (Ki). The structural fragments affecting the potency and stability of compounds were highlighted. The structurally diverse datasets for the HIV-1 NNRTIs and NRTIs with binding affinity equilibrium constants allowed development of novel descriptive and predictive QSAR models for log Ki, that in future will help in design of new compounds. In order to discover new promising antimalarial compounds, the experimental anti-Plasmodium data was gathered and systematically curated from in-house experimental studies and expanded with data from ChEMBL database. Extracted data was carefully extensively curated, fused, filtered, and grouped into thirty data sets for the modelling. The consensus models for each dataset for the classification of active/inactive compounds were established and seventeen models with promising prediction ability were used in consensus predictions and in identifying the series of curcuminoids and their structural analogues as potential inhibitors for the malaria. The selection of compounds was experimentally validated, i.e. tested in vitro, revealing seventeen potentially active compounds for further testing and modifications. The validation showed that computationally predicted inactive compounds were also inactive in experiment, being additional proof for the quality of data curation and dataset assembly process forming the ground for the modelling task
    corecore