71 research outputs found

    Nucleic Acid Architectures for Therapeutics, Diagnostics, Devices and Materials

    Get PDF
    Nucleic acids (RNA and DNA) and their chemical analogs have been utilized as building materials due to their biocompatibility and programmability. RNA, which naturally possesses a wide range of different functions, is now being widely investigated for its role as a responsive biomaterial which dynamically reacts to changes in the surrounding environment. It is now evident that artificially designed self-assembling RNAs, that can form programmable nanoparticles and supra-assemblies, will play an increasingly important part in a diverse range of applications, such as macromolecular therapies, drug delivery systems, biosensing, tissue engineering, programmable scaffolds for material organization, logic gates, and soft actuators, to name but a few. The current exciting Special Issue comprises research highlights, short communications, research articles, and reviews that all bring together the leading scientists who are exploring a wide range of the fundamental properties of RNA and DNA nanoassemblies suitable for biomedical applications

    Comparative analysis of molecular fingerprints in prediction of drug combination effects

    Get PDF
    bbab291Application of machine and deep learning methods in drug discovery and cancer research has gained a considerable amount of attention in the past years. As the field grows, it becomes crucial to systematically evaluate the performance of novel computational solutions in relation to established techniques. To this end, we compare rule-based and data-driven molecular representations in prediction of drug combination sensitivity and drug synergy scores using standardized results of 14 high-throughput screening studies, comprising 64 200 unique combinations of 4153 molecules tested in 112 cancer cell lines. We evaluate the clustering performance of molecular representations and quantify their similarity by adapting the Centered Kernel Alignment metric. Our work demonstrates that to identify an optimal molecular representation type, it is necessary to supplement quantitative benchmark results with qualitative considerations, such as model interpretability and robustness, which may vary between and throughout preclinical drug development projects.Peer reviewe

    Towards Foundational Models for Molecular Learning on Large-Scale Multi-Task Datasets

    Full text link
    Recently, pre-trained foundation models have enabled significant advancements in multiple fields. In molecular machine learning, however, where datasets are often hand-curated, and hence typically small, the lack of datasets with labeled features, and codebases to manage those datasets, has hindered the development of foundation models. In this work, we present seven novel datasets categorized by size into three distinct categories: ToyMix, LargeMix and UltraLarge. These datasets push the boundaries in both the scale and the diversity of supervised labels for molecular learning. They cover nearly 100 million molecules and over 3000 sparsely defined tasks, totaling more than 13 billion individual labels of both quantum and biological nature. In comparison, our datasets contain 300 times more data points than the widely used OGB-LSC PCQM4Mv2 dataset, and 13 times more than the quantum-only QM1B dataset. In addition, to support the development of foundational models based on our proposed datasets, we present the Graphium graph machine learning library which simplifies the process of building and training molecular machine learning models for multi-task and multi-level molecular datasets. Finally, we present a range of baseline results as a starting point of multi-task and multi-level training on these datasets. Empirically, we observe that performance on low-resource biological datasets show improvement by also training on large amounts of quantum data. This indicates that there may be potential in multi-task and multi-level training of a foundation model and fine-tuning it to resource-constrained downstream tasks

    Cheminformatic Approach for Deconvolution of Active Compounds in a Complex Mixture - phytoserms in Licorice

    Get PDF
    ABSTRACT After the validation of our in silico models by using the previous knowledge in this area the alerting phytochemicals from two Glycyrrhiza species (G. glabra and G. uralensis) were clustered. Exhaustive computational mining of licorice metabolome against selected endocrinal and metabolic targets led to the discovery of a unique class of compounds which belong to the dihydrostilbenoids (DHS) class appended with prenyl groups at various positions. To the best of our knowledge this interesting group of compounds has not been studied for their estrogenic activities or PXR activation. In addition some of the bis-prenylated DHS have been reported to be present only in G. uralensis. Another aspect of the current project was to predict the phase I primary metabolites of compounds found in both species of Glycyrrhiza and assess them with computational tools to predict their binding potential against both isoforms of hERs or drug metabolizing enzymes such as (CYP) inhibition models. Our investigations revealed estrogenic character for most of the predicted metabolites and have confirmed earlier reports of potential CYP3A4 and CYP1A2 inhibition. Compilation of such data is essential to gain a better understanding of the efficacy/safety of licorice extracts used in various botanical formularies. This approach with the involved cheminformatic tools has proven effective to yield rich information to support our understanding of traditional practices. It also can expand the role of botanical drugs for introducing new chemical entities (NCEs) and/or uncovering their liabilities at early stages. In this work we endeavored to comprehend the mechanism associated with the efficacy and safety of components reported in the licorice plant. We utilized smart screening techniques such as cheminformatics tools to reveal the high number of secondary metabolites produced by licorice which are capable of interfering with the human Estrogen Receptors (hERs) and/or PXR or other vital cytochrome P450 enzymes. The genus Glycyrrhiza encompasses several species exhibiting complex structural diversity of secondary metabolites and hence biological activities. The intricate nature of botanical remedies such as licorice rendered them obsolete for scientific research or medical industry. Understanding and finding the mechanisms of efficacy or safety for a plant-based therapy is very challenging yet it remains crucial and warranted. The licorice plant is known to have Selective Estrogen Receptor Modulatory effects (SERMs) with a spectrum of estrogenic and anti-estrogenic activities attributed to women’s health. On the contrary licorice extract was shown to induce pregnane xenobiotic receptor (PXR) which may manifest as a potential route for deleterious effects such as herb-drug interaction (HDI). While many studies attributed these divergent activities to a few classes of compounds such as liquiritigenin (a weak estrogenic SERM) or glycyrrhizin (weak PXR agonist) no attempt was made to characterize the complete set of compounds responsible for these divergent activities. A plethora of licorice components is undermined which might have the potential to be developed into novel phytoSERMS or to trigger undesirable adverse effects by altering drug metabolizing enzymes and thus pharmacokinetics. Thus we have ventured to synthesize a set of constitutional isomers of stilbenoids and DHS (archetypal of those found in licorice) with different prenylation patterns. Sixteen constitutional isomers of stilbenoids (M2-M10) and DHS (M12-M18) were successfully synthesized of which six of them (M8 M9 M14 M15 M17 and M18) were synthesized for the first time to be further tested and validated with cell-based methods for their estrogenic activities. We have unveiled a novel class of compounds which possess a strong PXR activation. These results which were in accord with the in silico prediction were observed for multiple synthesized prenylated stilbenoid and DHS by the luciferase reporter gene assay at µM concentrations. Moreover this activation was further validated by the six-fold increase in mRNA expression of Cytochrome P450 3A4 (CYP3A4) where three representative compounds (M7 M10 and M15) exceeded the activation fold of the positive control

    Integration of protein three-dimensional structure into the workflow of interpretation of genetic variants

    Get PDF
    Life stores information in large biopolymer molecules, which can be repre- sented as a sequence of letters. Computers stores information in sequences of zeros and ones. This predestines computers for automated processing of biological data and with a great success. Computational biology has produced many methods and tools based on biological sequences. However, reducing life to just sequences radically reduces the whole picture. The functionality of biomolecules, especially proteins, is performed in the three-dimensional (3D) space. Thus, limiting methods in computational biology to sequences will never yield sufficient insights in the ways molecular biology operates. In this thesis I present my work on the integration of protein 3D structure information into the methodological workflow of computational biology. We developed an algorithmic pipeline that is able to map protein sequences to protein structures, providing an additional source of information. We used this pipeline in order to analyze the effects of genetic variants from the perspective of protein 3D structures. We analyzed genetic variants associated with diseases and compared their structural arrangements to that of neutral variants. Additionally, we discussed how structural information can improve methods that aim to predict the consequences of genetic variants.Das Leben speichert Informationen mit der Hilfe von langen Biopolymermolekülketten. Man kann solche Ketten durch Buchstabensequenzen beschreiben. Computer speichern Informationen in Sequenzen von Nullen und Einsen. Dies prädestiniert Computer zur Verabeitung biologischer Daten und tatsächlich hat die Bioinformatik, mit großem Erfolg, Methoden und Werkzeuge entwickelt, die auf der Verarbeitung solcher Sequenzen basieren. Allerdings, spielt sich die Funktionalität von Biomolekülen, insbesonders die von Proteinen, im drei-dimensionalen (3D) Raum ab. Und deshalb werden bioinformatische Methoden, die sich auf Sequenzdaten beschränken niemals in der Lage sein, mikrobiologische Vorgänge funktionell zu beschreiben. Diese Thesis widmet sich der Integration von Protein 3D Strukturinformationen in die Abläufe bioinformatischer Methodiken. Wir haben eine algorithmische Pipeline entwickelt, die es ermöglicht Proteinsequenzen auf Proteinstrukturen abzubilden um so eine zusätzliche Informationsquelle beizusteuern. Wir benutzten diese Methodik um die Effekte von genetischen Variationen aus der Sichtweise von Proteinstrukturen zu analysieren. Wir haben die Tendenzen der räumlichen Verteilung von genetischen Varianten, die man mit Krankheiten in Verbidung gebracht hat, analysiert und sie mit denen von neutralen Varianten verglichen. Desweiteren, haben wir geprüft in wie weit das Einbeziehen strukureller Daten die Vorhersage von Konsequenzen genetischer Varianten verbessert

    Bayesian phylogenetic modelling of lateral gene transfers

    Get PDF
    PhD ThesisPhylogenetic trees represent the evolutionary relationships between a set of species. Inferring these trees from data is particularly challenging sometimes since the transfer of genetic material can occur not only from parents to their o spring but also between organisms via lateral gene transfers (LGTs). Thus, the presence of LGTs means that genes in a genome can each have di erent evolutionary histories, represented by di erent gene trees. A few statistical approaches have been introduced to explore non-vertical evolution through collections of Markov-dependent gene trees. In 2005 Suchard described a Bayesian hierarchical model for joint inference of gene trees and an underlying species tree, where a layer in the model linked gene trees to the species tree via a sequence of unknown lateral gene transfers. In his model LGT was modeled via a random walk in the tree space derived from the subtree prune and regraft (SPR) operator on unrooted trees. However, the use of SPR moves to represent LGT in an unrooted tree is problematic, since the transference of DNA between two organisms implies the contemporaneity of both organisms and therefore it can allow unrealistic LGTs. This thesis describes a related hierarchical Bayesian phylogenetic model for reconstructing phylogenetic trees which imposes a temporal constraint on LGTs, namely that they can only occur between species which exist concurrently. This is achieved by taking into account possible time orderings of divergence events in trees, without explicitly modelling divergence times. An extended version of the SPR operator is introduced as a more adequate mechanism to represent the LGT e ect in a tree. The extended SPR operation respects the time ordering. It additionaly di ers from regular SPR as it maintains a 1-to-1 correspondence between points on the species tree and points on each gene tree. Each point on a gene tree represents the existence of a population containing that gene at some point in time. Hierarchical phylogenetic models were used in the reconstruction of each gene tree from its corresponding gene alignment, enabling the pooling of information across genes. In addition to Suchard's approach, we assume variation in the rate of evolution between di erent sites. The species tree is assumed to be xed. A Markov Chain Monte Carlo (MCMC) algorithm was developed to t the model in a Bayesian framework. A novel MCMC proposal mechanism for jointly proposing the gene tree topology and branch lengths, LGT distance and LGT history has been developed as well as a novel graphical tool to represent LGT history, the LGT Biplot. Our model was applied to simulated and experimental datasets. More speci cally we analysed LGT/reassortment presence in the evolution of 2009 Swine-Origin In uenza Type A virus. Future improvements of our model and algorithm should include joint inference of the species tree, improving the computational e ciency of the MCMC algorithm and better consideration of other factors that can cause discordance of gene trees and species trees such as gene loss
    • …
    corecore