12 research outputs found

    Deep learning for clustering of multivariate clinical patient trajectories with missing values

    Get PDF
    BACKGROUND: Precision medicine requires a stratification of patients by disease presentation that is sufficiently informative to allow for selecting treatments on a per-patient basis. For many diseases, such as neurological disorders, this stratification problem translates into a complex problem of clustering multivariate and relatively short time series because (i) these diseases are multifactorial and not well described by single clinical outcome variables and (ii) disease progression needs to be monitored over time. Additionally, clinical data often additionally are hindered by the presence of many missing values, further complicating any clustering attempts. FINDINGS: The problem of clustering multivariate short time series with many missing values is generally not well addressed in the literature. In this work, we propose a deep learning-based method to address this issue, variational deep embedding with recurrence (VaDER). VaDER relies on a Gaussian mixture variational autoencoder framework, which is further extended to (i) model multivariate time series and (ii) directly deal with missing values. We validated VaDER by accurately recovering clusters from simulated and benchmark data with known ground truth clustering, while varying the degree of missingness. We then used VaDER to successfully stratify patients with Alzheimer disease and patients with Parkinson disease into subgroups characterized by clinically divergent disease progression profiles. Additional analyses demonstrated that these clinical differences reflected known underlying aspects of Alzheimer disease and Parkinson disease. CONCLUSIONS: We believe our results show that VaDER can be of great value for future efforts in patient stratification, and multivariate time-series clustering in general

    Comprehensive Fragment Screening of the SARS-CoV-2 Proteome Explores Novel Chemical Space for Drug Development

    Get PDF
    12 pags., 4 figs., 3 tabs.SARS-CoV-2 (SCoV2) and its variants of concern pose serious challenges to the public health. The variants increased challenges to vaccines, thus necessitating for development of new intervention strategies including anti-virals. Within the international Covid19-NMR consortium, we have identified binders targeting the RNA genome of SCoV2. We established protocols for the production and NMR characterization of more than 80 % of all SCoV2 proteins. Here, we performed an NMR screening using a fragment library for binding to 25 SCoV2 proteins and identified hits also against previously unexplored SCoV2 proteins. Computational mapping was used to predict binding sites and identify functional moieties (chemotypes) of the ligands occupying these pockets. Striking consensus was observed between NMR-detected binding sites of the main protease and the computational procedure. Our investigation provides novel structural and chemical space for structure-based drug design against the SCoV2 proteome.Work at BMRZ is supported by the state of Hesse. Work in Covid19-NMR was supported by the Goethe Corona Funds, by the IWBEFRE-program 20007375 of state of Hesse, the DFG through CRC902: “Molecular Principles of RNA-based regulation.” and through infrastructure funds (project numbers: 277478796, 277479031, 392682309, 452632086, 70653611) and by European Union’s Horizon 2020 research and innovation program iNEXT-discovery under grant agreement No 871037. BY-COVID receives funding from the European Union’s Horizon Europe Research and Innovation Programme under grant agreement number 101046203. “INSPIRED” (MIS 5002550) project, implemented under the Action “Reinforcement of the Research and Innovation Infrastructure,” funded by the Operational Program “Competitiveness, Entrepreneurship and Innovation” (NSRF 2014–2020) and co-financed by Greece and the EU (European Regional Development Fund) and the FP7 REGPOT CT-2011-285950—“SEE-DRUG” project (purchase of UPAT’s 700 MHz NMR equipment). The support of the CERM/CIRMMP center of Instruct-ERIC is gratefully acknowledged. This work has been funded in part by a grant of the Italian Ministry of University and Research (FISR2020IP_02112, ID-COVID) and by Fondazione CR Firenze. A.S. is supported by the Deutsche Forschungsgemeinschaft [SFB902/B16, SCHL2062/2-1] and the Johanna Quandt Young Academy at Goethe [2019/AS01]. M.H. and C.F. thank SFB902 and the Stiftung Polytechnische Gesellschaft for the Scholarship. L.L. work was supported by the French National Research Agency (ANR, NMR-SCoV2-ORF8), the Fondation de la Recherche MĂ©dicale (FRM, NMR-SCoV2-ORF8), FINOVI and the IR-RMN-THC Fr3050 CNRS. Work at UConn Health was supported by grants from the US National Institutes of Health (R01 GM135592 to B.H., P41 GM111135 and R01 GM123249 to J.C.H.) and the US National Science Foundation (DBI 2030601 to J.C.H.). Latvian Council of Science Grant No. VPP-COVID-2020/1-0014. National Science Foundation EAGER MCB-2031269. This work was supported by the grant Krebsliga KFS-4903-08-2019 and SNF-311030_192646 to J.O. P.G. (ITMP) The EOSC Future project is co-funded by the European Union Horizon Programme call INFRAEOSC-03-2020—Grant Agreement Number 101017536. Open Access funding enabled and organized by Projekt DEALPeer reviewe

    Pharmacophore-based ML model to predict ligand selectivity for E3 ligase binders

    No full text
    E3 ligases are enzymes that play a critical role in ubiquitin-mediated protein degradation and are involved in various cellular processes. Pharmacophore analysis is a useful approach for predicting E3 ligase binding selectivity, which involves identifying key chemical features necessary for a ligand to interact with a specific protein target cavity. While pharmacophore analysis is not always sufficient to accurately predict ligand binding affinity, it can be a valuable tool for filtering and/or designing focused libraries for screening campaigns. In this study, we present a fast and inexpensive approach using a pharmacophore fingerprinting scheme known as ErG, which is used in a multiclass machine learning classification model. This model can assign the correct E3 ligase binder to its known E3 ligase and predict the probability of each molecule to bind to different E3 ligases. Practical applications of this approach are demonstrated on commercial libraries for rational design of E3 ligase binders

    Open Imaging Data Sharing in EOSC: COVID-19 as Demonstrator

    No full text
    This Science Project (SP) brings together three different domains of life sciences with the aim to create reproducible workflows, tools and web-services for data visualization. This SP focuses in building resources for handling data from bioimaging, structural and bio-chemical studies. The Euro-Bioimaging will implement a community standard cloud compatible open image data format and data submission workflow for high-throughput screening data. Whereas, Instruct-ERIC will develop a user-friendly web-service to access to multi-dimensional structural and imaging data. Lastly, EU-OpenScreen/Fraunhofer ITMP will create reproducible workflow for generating Knowledge Graphs that represent phenotype-chemotype of diseases. While these resources are being developed, the collaborators will also simultaneously harmonize the resources right from the beginning to enable FAIR data principles. This SP uses COVID-19 as a demonstrator, however the resources will be generalized for any disease of interest

    Multimodal mechanistic signatures for neurodegenerative diseases (NeuroMMSig): A web server for mechanism enrichment

    No full text
    Motivation The concept of a ‘mechanism-based taxonomy of human disease’ is currently replacing the outdated paradigm of diseases classified by clinical appearance. We have tackled the paradigm of mechanism-based patient subgroup identification in the challenging area of research on neurodegenerative diseases. Results We have developed a knowledge base representing essential pathophysiology mechanisms of neurodegenerative diseases. Together with dedicated algorithms, this knowledge base forms the basis for a ‘mechanism-enrichment server’ that supports the mechanistic interpretation of multiscale, multimodal clinical data. Availability and implementation NeuroMMSig is available at http://neurommsig.scai.fraunhofer.de

    Training and evaluation corpora for the extraction of causal relationships encoded in biological expression language (BEL)

    No full text
    Success in extracting biological relationships is mainly dependent on the complexity of the task as well as the availability of high-quality training data. Here, we describe the new corpora in the systems biology modeling language BEL for training and testing biological relationship extraction systems that we prepared for the BioCreative V BEL track. BEL was designed to capture relationships not only between proteins or chemicals, but also complex events such as biological processes or disease states. A BEL nanopub is the smallest unit of information and represents a biological relationship with its provenance. In BEL relationships (called BEL statements), the entities are normalized to defined namespaces mainly derived from public repositories, such as sequence databases, MeSH or publicly available ontologies. In the BEL nanopubs, the BEL statements are associated with citation information and supportive evidence such as a text excerpt. To enable the training of extraction tools, we prepared BEL resources and made them available to the community. We selected a subset of these resources focusing on a reduced set of namespaces, namely, human and mouse genes, ChEBI chemicals, MeSH diseases and GO biological processes, as well as relationship types ‘increases’ and ‘decreases’. The published training corpus contains 11 000 BEL statements from over 6000 supportive text excerpts. For method evaluation, we selected and re-annotated two smaller subcorpora containing 100 text excerpts. For this re-annotation, the inter-annotator agreement was measured by the BEL track evaluation environment and resulted in a maximal F-score of 91.18% for full statement agreement. In addition, for a set of 100 BEL statements, we do not only provide the gold standard expert annotations, but also text excerpts pre-selected by two automated systems. Those text excerpts were evaluated and manually annotated as true or false supportive in the course of the BioCreative V BEL track task

    Improving COVID-19 metadata findability and interoperability in the European Open Science Cloud

    No full text
    This publication details the workplan of the Science Project (SP) “COVID-19 metadata findability and interoperability in EOSC” (short: META-COVID) that is part of the Horizon Europe funded project EOSC Future. The COVID-19 pandemic has generated a huge variety of research activities, studies, and policies across both the life sciences (LS) and the social sciences and humanities (SSH). Useful insights from combining the data and conclusions from these different forms of research are, however, hampered by the lack of a common metadata framework with which to describe them. This is because different scientific disciplines have different ways of organising research activities. For example, the type of the research (e.g., hypothesis testing versus hypothesis generating) and the methodology chosen (e.g., experimental, survey, cohort, case study) are key elements in understanding the data generated and in supporting its secondary use. Another issue to be tackled is the integration of various sources of metadata related to parliamentary and social media metadata. In META-COVID, scientists from the LS and SSH domains gathered to discuss ways in which metadata could go beyond the description of the data itself to include the basic elements of the research process (“contextual metadata”) within the frame of the European Open Science Cloud (EOSC). The main outcomes of the SP will be: i) An inventory of metadata schemas applied across infrastructures and domains; ii) The development of a framework for a metadata model characterising the research approach and workflow across research infrastructures; iii) The application of the framework to selected COVID-19 use cases; iv) The development of an ontology of COVID-19 related topics from parliamentary data and social media

    Curating, Collecting, and Cataloguing Global COVID-19 Datasets for the Aim of Predicting Personalized Risk

    No full text
    Although hundreds of datasets have been published since the beginning of the coronavirus pandemic, there is a lack of centralized resources where these datasets are listed and harmonized to facilitate their applicability and uptake by predictive modeling approaches. Firstly, such a centralized resource provides information about data owners to researchers who are searching datasets to develop their predictive models. Secondly, the harmonization of the datasets supports simultaneously taking advantage of several similar datasets. This, in turn, does not only ease the imperative external validation of data-driven models but can also be used for virtual cohort generation, which helps to overcome data sharing impediments. Here, we present that the COVID-19 data catalogue is a repository that provides a landscape view of COVID-19 studies and datasets as a putative source to enable researchers to develop personalized COVID-19 predictive risk models. The COVID-19 data catalogue currently contains over 400 studies and their relevant information collected from a wide range of global sources such as global initiatives, clinical trial repositories, publications, and data repositories. Further, the curated content stored in this data catalogue is complemented by a web application, providing visualizations of these studies, including their references, relevant information such as measured variables, and the geographical locations of where these studies were performed. This resource is one of the first to capture, organize, and store studies, datasets, and metadata related to COVID-19 in a comprehensive repository. We believe that our work will facilitate future research and development of personalized predictive risk models for COVID-19
    corecore