1,077 research outputs found

    RNAcentral 2021: secondary structure integration, improved sequence search and new member databases

    Get PDF
    RNAcentral is a comprehensive database of non-coding RNA (ncRNA) sequences that provides a single access point to 44 RNA resources and >18 million ncRNA sequences from a wide range of organisms and RNA types. RNAcentral now also includes secondary (2D) structure information for >13 million sequences, making RNAcentral the world's largest RNA 2D structure database. The 2D diagrams are displayed using R2DT, a new 2D structure visualization method that uses consistent, reproducible and recognizable layouts for related RNAs. The sequence similarity search has been updated with a faster interface featuring facets for filtering search results by RNA type, organism, source database or any keyword. This sequence search tool is available as a reusable web component, and has been integrated into several RNAcentral member databases, including Rfam, miRBase and snoDB. To allow for a more fine-grained assignment of RNA types and subtypes, all RNAcentral sequences have been annotated with Sequence Ontology terms. The RNAcentral database continues to grow and provide a central data resource for the RNA community

    BioRED: A Comprehensive Biomedical Relation Extraction Dataset

    Full text link
    Automated relation extraction (RE) from biomedical literature is critical for many downstream text mining applications in both research and real-world settings. However, most existing benchmarking datasets for bio-medical RE only focus on relations of a single type (e.g., protein-protein interactions) at the sentence level, greatly limiting the development of RE systems in biomedicine. In this work, we first review commonly used named entity recognition (NER) and RE datasets. Then we present BioRED, a first-of-its-kind biomedical RE corpus with multiple entity types (e.g., gene/protein, disease, chemical) and relation pairs (e.g., gene-disease; chemical-chemical), on a set of 600 PubMed articles. Further, we label each relation as describing either a novel finding or previously known background knowledge, enabling automated algorithms to differentiate between novel and background information. We assess the utility of BioRED by benchmarking several existing state-of-the-art methods, including BERT-based models, on the NER and RE tasks. Our results show that while existing approaches can reach high performance on the NER task (F-score of 89.3%), there is much room for improvement for the RE task, especially when extracting novel relations (F-score of 47.7%). Our experiments also demonstrate that such a comprehensive dataset can successfully facilitate the development of more accurate, efficient, and robust RE systems for biomedicine

    RNAcentral 2021: secondary structure integration, improved sequence search and new member databases.

    Get PDF
    RNAcentral is a comprehensive database of non-coding RNA (ncRNA) sequences that provides a single access point to 44 RNA resources and >18 million ncRNA sequences from a wide range of organisms and RNA types. RNAcentral now also includes secondary (2D) structure information for >13 million sequences, making RNAcentral the world's largest RNA 2D structure database. The 2D diagrams are displayed using R2DT, a new 2D structure visualization method that uses consistent, reproducible and recognizable layouts for related RNAs. The sequence similarity search has been updated with a faster interface featuring facets for filtering search results by RNA type, organism, source database or any keyword. This sequence search tool is available as a reusable web component, and has been integrated into several RNAcentral member databases, including Rfam, miRBase and snoDB. To allow for a more fine-grained assignment of RNA types and subtypes, all RNAcentral sequences have been annotated with Sequence Ontology terms. The RNAcentral database continues to grow and provide a central data resource for the RNA community. RNAcentral is freely available at https://rnacentral.org

    Automated Identification of Targeted Therapy Strategies in Precision Oncology

    Get PDF
    Individualisierung in der Krebsbehandlung beruht auf gezielten Strategien, die auf die genomischen Merkmale der Patienten zugeschnitten sind, die pathologische Anomalien verursachen. Die Extrapolation von Phänotyp-Genotyp-Beziehungen auf onkologische Kliniken hat zu einem weniger kostspieligen und effizienteren Krebsbehandlungsmodell geführt. Die Umsetzung ist jedoch nach wie vor schwierig, da für die komplexe Analyse verschiedene Bioinformatik-Tools und Datenbanken erforderlich sind. Sie beruht auf dem individuellen Fachwissen der MTBs, die einen nicht standardisierten Rahmen mit einer begrenzten Anzahl von Quellen ausführen. Die Nachteile bestehender Tools bestehen darin, dass sie Programmierkenntnisse erfordern, den Datenschutz nicht berücksichtigen, eine Vielzahl von Datenbanken mit klinischer Evidenz enthalten und keine auf die Arbeitsabläufe von Molekularen Tumorboards (MTBs) zugeschnittene Benutzeroberfläche haben. Wir haben ClinVAP entwickelt, ein kohärentes Framework für die klinische Annotation von Genomvarianten, das den Prozess der Erstellung patientenspezifischer Diagnoseberichte automatisiert, indem es die lange Liste von Mutationen in klinische Implikationen übersetzt.Wir haben es mit den Gen-Gen-Interaktionen angereichert, die auch den Inhalt der gestörten Signalwege aufzeigen. Wir haben die kombinierten Ergebnisse in einer interaktiven grafischen Benutzeroberfläche (GUI) bereitgestellt, die die Backend-Operationen von den Nutzern isoliert und es ihnen ermöglicht, die Ergebnisse zu bearbeiten. Wir haben die Anpassungsfähigkeit von ClinVAP anhand von retrospektiven Fällen gemessen, um ihre inhaltliche Gleichheit mit der manuellen Implementierung im MTB zu vergleichen. Die Unterschiede beruhten hauptsächlich auf Expertenmeinungen. Der Inhalt und die Struktur der automatisierten Patientenberichts-Tools sind eine umfassende Grundlage für die Entscheidungsfindung. Die Zukunft der Präzisionsonkologie hängt von der Zugänglichkeit des gesammelten molekularen Wissens über die krankheitsverursachenden Faktoren ab. Die Vielzahl der Bioinformatik-Tools und die schiere Größe der Genomdaten stellen ein Hindernis für die Bereitstellung dieser Informationen in Krankenhäusern dar. Unsere Lösungen erhöhen nicht nur ihre klinische Anwendbarkeit, sondern zeigen auch, dass das Feld bereit ist, automatisierte Lösungen zu entwickeln. Darüber hinaus werden Standardisierung und Archivierung Populationsstudien erleichtern, da molekulare Analysen archiviert und als Informationen an das System zurückgegeben werden können.Precision in cancer treatment builds upon targeted strategies tailored to the genomic traits of patients instigating pathological abnormalities. Extrapolating phenotype to genotype translations to oncology clinics has led to a less costly and more efficient cancer care model. However, its implementation remains challenging due to the complex analysis trajectory requiring various bioinformatics tools and databases. It relies on the individual expertise of MTBs executing a non-standard framework with a limited number of pharmacogenomics sources. The disadvantages of existing tools emanate from requiring programmatic skills, not addressing data privacy concerns, the large number of clinical evidence databases, and the lack of GUI tailored to MTB’s workflow. We created ClinVAP, a cohesive framework for clinical annotation of genomic variants which automates the process of generating patient-specific diagnostic reports by translating the long list of mutations to clinical implications. We enriched it with the gene-gene interactions that also reveal the content of disrupted pathways. We provided the combined results in an interactive GUI which isolates backend operations from the users and allows them to operate through the results. We measured the adaptability of ClinVAP using retrospective cases to compare their contentwise equality to the MTB’s implementation. The differences were mainly based on expert opinion. The content and the structure of the automated patient reporting tools form a comprehensive foundation to be used in decision making. The future of precision oncology depends on the accessibility of the accumulated molecular knowledge of the disease-contributing factors. The number of bioinformatics tools and the sheer size of genome data is a barrier to making this information available in hospitals. Our solutions not only increase their clinical applicability, but also demonstrate the field’s readiness to generate automated solutions. Moreover, standardization and archiving will facilitate population studies, allowing molecular analyses to be archived and returned to the system as information

    RNAcentral 2021: secondary structure integration, improved sequence search and new member databases

    Get PDF
    RNAcentral is a comprehensive database of non-coding RNA (ncRNA) sequences that provides a single access point to 44 RNA resources and >18 million ncRNA sequences from a wide range of organisms and RNA types. RNAcentral now also includes secondary (2D) structure information for >13 million sequences, making RNAcentral the world’s largest RNA 2D structure database. The 2D diagrams are displayed using R2DT, a new 2D structure visualization method that uses consistent, reproducible and recognizable layouts for related RNAs. The sequence similarity search has been updated with a faster interface featuring facets for filtering search results by RNA type, organism, source database or any keyword. This sequence search tool is available as a reusable web component, and has been integrated into several RNAcentral member databases, including Rfam, miRBase and snoDB. To allow for a more fine-grained assignment of RNA types and subtypes, all RNAcentral sequences have been annotated with Sequence Ontology terms. The RNAcentral database continues to grow and provide a central data resource for the RNA community. RNAcentral is freely available at https://rnacentral.org

    Synthetic whole-slide image tile generation with gene expression profile-infused deep generative models

    Get PDF
    In this work, we propose an approach to generate whole-slide image (WSI) tiles by using deep generative models infused with matched gene expression profiles. First, we train a variational autoencoder (VAE) that learns a latent, lower-dimensional representation of multi-tissue gene expression profiles. Then, we use this representation to infuse generative adversarial networks (GANs) that generate lung and brain cortex tissue tiles, resulting in a new model that we call RNA-GAN. Tiles generated by RNA-GAN were preferred by expert pathologists compared with tiles generated using traditional GANs, and in addition, RNA-GAN needs fewer training epochs to generate high-quality tiles. Finally, RNA-GAN was able to generalize to gene expression profiles outside of the training set, showing imputation capabilities. A web-based quiz is available for users to play a game distinguishing real and synthetic tiles: https://rna-gan.stanford.edu/, and the code for RNA-GAN is available here: https://github.com/gevaertlab/RNA-GAN.Grants PID2021- 128317OB-I00MCIN/AEI/10.13039/501100011033Project P20-00163, funded by Consejerı´a de Universidad, Investigacio´ n e InnovacioERDF A way of making Europ

    Bioinformaatika meetodid personaalses farmakoteraapias

    Get PDF
    Väitekirja elektrooniline versioon ei sisalda publikatsiooneKogutavate terviseandmete hulk kasvab kiiresti. Tänu neile andmetele on meditsiinilise ravi pakkumisel võimalik senisest enam arvesse võtta individuaalseid bioloogilisi andmeid. See doktoritöö käsitleb mitmeid personaalses meditsiinis esinevaid probleeme ja näitab, et ravi individualiseerimiseks kasutatavad andmed tulevad väga erinevatest allikatest. Inimestevahelised erinevused teevad ravimite metabolismi ennustamise keerukaks, siiski on ravi käigus kogutavad kontsentratsioonimõõtmised ravimiefekti hindamisel heaks allikaks. Me arendasime välja täppisdoseerimise tööriista, mis võimaldab vankomütsiini ravil vastsündinutele määrata ravi tõhustavat personaalseid doose kasutades selleks nende endi ravi käigus kogutud kontsentratsioone. Suurema osa ravimiteraapiate puhul ei ole võimalik pidevalt ravimi kontsentratsioone koguda. Nende ülejäänud ravimite puhul on heaks informatsiooniallikaks geneetika. Paljude ravimimetabolismiga seotud geneetiliste variantide mõju on piisav, et tingida muutuseid ravi läbiviimisel. Me uurisime geneetika ja ravimite kõrvalmõjude omavahelisi seoseid kasutades rahvastikupõhist lähenemist. See toetus Eesti Geenivaramu geeniandmetele ja teistele laiapõhjalistele terviseandmete registritele. Me leidsime ja valideerisime seose, et CTNNA3 geenis olev geenivariant tõstab oksikaamide ravil olevate inimeste jaoks kõrvalmõjude sagedust. Arvutuslik geneetika toetub kvantitatiivsetele meetoditele, millest kõige levinum on ülegenoomne assotsiatsiooni analüüs (GWAS). Sagedasti kasutatav GWASi järelsamm on aega nõudev GWASist ilmnenud p-väärtuste visuaalne hindamine teiste samas genoomi piirkonnas olevate geneetiliste variantide kontekstis. Selle sammu automatiseerimiseks arendasime me kaks tööriista, Manhattan Harvester ja Cropper, mis võimaldavad automaatselt huvipakkuvaid piirkondi tuvastada ja nende headust hinnata.The amount of collected health data is growing fast. Insights from these data allow using biological patient specifics to improve therapy management with further individualization. This thesis addresses problems in multiple sub-fields of personalised medicine and aims to illustrate that data for precision medicine emerges from different sources. Drug metabolism is difficult to predict because individual biological differences. Fortunately, drug concentrations are a good proxy for drug effect. To address the growing need for tools that allow on-line therapy adjustment based on individual concentrations we have developed and externally evaluated a precision dosing tool that allows individualised dosing of vancomycin in neonates. Other than drugs used in therapeutic drug monitoring, most pharmacotherapies can not rely on continuous concentration measurements but for such drugs genetics provides a valuable source of information for individualization. Effects of many genetic variants in drug metabolism pathways are often large enough to require changes in drug prescriptions or schedules. We have applied a population-based approach in testing relations between drug related adverse effects and genomic loci, and found and validated a novel variant in CTNNA3 gene that increases adverse drug effects in patients with oxicam prescriptions. This was done by leveraging the data in Estonian Genome Center and linking these to nation-wide electronic health data registries. Computational genetics relies on quantitative methods for which the most common is the genome-wide association analysis (GWAS). A common GWAS downstream step involves time-consuming visual assessment of the association study p-values in context with other variants in genomic vicinity. In order to streamline this step, we developed, Manhattan Harvester and Cropper, that allow for automated detection of peak areas and assign scores by emulating human evaluators.https://www.ester.ee/record=b524282
    corecore