Search CORE

16 research outputs found

Investigating drug translational research using PubMed articles

Author: Li Xin
Tang Xuli
Publication venue
Publication date: 18/07/2023
Field of study

Drug research and development are embracing translational research for its potential to increase the number of drugs successfully brought to clinical applications. Using the publicly available PubMed database, we sought to describe the status of drug translational research, the distribution of translational lags for all drugs as well as the collaborations between basic science and clinical science in drug research. For each drug, an indicator called Translational Lag was proposed to quantify the interval time from its first PubMed article to its first clinical article. Meanwhile, the triangle of biomedicine was also used to visualize the status and multidisciplinary collaboration of drug translational research. The results showed that only 18.1% (24,410) of drugs/compounds had been successfully entering clinical research. It averagely took 14.38 years (interquartile range, 4 to 21 years) for a drug from the initial basic discovery to its first clinical research. In addition, the results also revealed that, in drug research, there was rare cooperation between basic science and clinical science, which were more inclined to cooperate within disciplines.Comment: 7pages, 1 figure

arXiv.org e-Print Archive

Bi-Encoders based Species Normalization -- Pairwise Sentence Learning to Rank

Author: Awan Zainab
Kahlke Tim
Kennedy Paul
Ralph Peter
Publication venue
Publication date: 22/10/2023
Field of study

Motivation: Biomedical named-entity normalization involves connecting biomedical entities with distinct database identifiers in order to facilitate data integration across various fields of biology. Existing systems for biomedical named entity normalization heavily rely on dictionaries, manually created rules, and high-quality representative features such as lexical or morphological characteristics. However, recent research has investigated the use of neural network-based models to reduce dependence on dictionaries, manually crafted rules, and features. Despite these advancements, the performance of these models is still limited due to the lack of sufficiently large training datasets. These models have a tendency to overfit small training corpora and exhibit poor generalization when faced with previously unseen entities, necessitating the redesign of rules and features. Contribution: We present a novel deep learning approach for named entity normalization, treating it as a pair-wise learning to rank problem. Our method utilizes the widely-used information retrieval algorithm Best Matching 25 to generate candidate concepts, followed by the application of bi-directional encoder representation from the encoder (BERT) to re-rank the candidate list. Notably, our approach eliminates the need for feature-engineering or rule creation. We conduct experiments on species entity types and evaluate our method against state-of-the-art techniques using LINNAEUS and S800 biomedical corpora. Our proposed approach surpasses existing methods in linking entities to the NCBI taxonomy. To the best of our knowledge, there is no existing neural network-based approach for species normalization in the literature

arXiv.org e-Print Archive

BERT Based Clinical Knowledge Extraction for Biomedical Knowledge Graph Construction and Analysis

Author: Asri Bouchra El
Elkaimbillah Zineb
Harnoune Ayoub
Mikram Mounia
Rhanoui Maryem
Yousfi Siham
Publication venue: 'Elsevier BV'
Publication date: 21/04/2023
Field of study

Background : Knowledge is evolving over time, often as a result of new discoveries or changes in the adopted methods of reasoning. Also, new facts or evidence may become available, leading to new understandings of complex phenomena. This is particularly true in the biomedical field, where scientists and physicians are constantly striving to find new methods of diagnosis, treatment and eventually cure. Knowledge Graphs (KGs) offer a real way of organizing and retrieving the massive and growing amount of biomedical knowledge. Objective : We propose an end-to-end approach for knowledge extraction and analysis from biomedical clinical notes using the Bidirectional Encoder Representations from Transformers (BERT) model and Conditional Random Field (CRF) layer. Methods : The approach is based on knowledge graphs, which can effectively process abstract biomedical concepts such as relationships and interactions between medical entities. Besides offering an intuitive way to visualize these concepts, KGs can solve more complex knowledge retrieval problems by simplifying them into simpler representations or by transforming the problems into representations from different perspectives. We created a biomedical Knowledge Graph using using Natural Language Processing models for named entity recognition and relation extraction. The generated biomedical knowledge graphs (KGs) are then used for question answering. Results : The proposed framework can successfully extract relevant structured information with high accuracy (90.7% for Named-entity recognition (NER), 88% for relation extraction (RE)), according to experimental findings based on real-world 505 patient biomedical unstructured clinical notes. Conclusions : In this paper, we propose a novel end-to-end system for the construction of a biomedical knowledge graph from clinical textual using a variation of BERT models

arXiv.org e-Print Archive

Data-driven information extraction and enrichment of molecular profiling data for cancer cell lines

Author: Baudis Michael
Giagkos Dimitris
Paloots Rahel
Smith Ellery
Stockinger Kurt
Publication venue: Oxford University Press
Publication date: 16/03/2024
Field of study

Motivation With the proliferation of research means and computational methodologies, published biomedical literature is growing exponentially in numbers and volume. Cancer cell lines are frequently used models in biological and medical research that are currently applied for a wide range of purposes, from studies of cellular mechanisms to drug development, which has led to a wealth of related data and publications. Sifting through large quantities of text to gather relevant information on cell lines of interest is tedious and extremely slow when performed by humans. Hence, novel computational information extraction and correlation mechanisms are required to boost meaningful knowledge extraction. Results In this work, we present the design, implementation, and application of a novel data extraction and exploration system. This system extracts deep semantic relations between textual entities from scientific literature to enrich existing structured clinical data concerning cancer cell lines. We introduce a new public data exploration portal, which enables automatic linking of genomic copy number variants plots with ranked, related entities such as affected genes. Each relation is accompanied by literature-derived evidences, allowing for deep, yet rapid, literature search, using existing structured data as a springboard. Availability and implementation Our system is publicly available on the web at https://cancercelllines.org

ZORA

COVID-19 datasets : a brief overview

Author: Chadhar Mehmood
Li Wuyang
Saikrishna Vidya
Sun Ke
Xia Feng
Publication venue: ComSIS Consortium
Publication date: 01/01/2022
Field of study

The outbreak of the COVID-19 pandemic affects lives and social-economic development around the world. The affecting of the pandemic has motivated researchers from different domains to find effective solutions to diagnose, prevent, and estimate the pandemic and relieve its adverse effects. Numerous COVID-19 datasets are built from these studies and are available to the public. These datasets can be used for disease diagnosis and case prediction, speeding up solving problems caused by the pandemic. To meet the needs of researchers to understand various COVID-19 datasets, we examine and provide an overview of them. We organise the majority of these datasets into three categories based on the category of ap-plications, i.e., time-series, knowledge base, and media-based datasets. Organising COVID-19 datasets into appropriate categories can help researchers hold their focus on methodology rather than the datasets. In addition, applications and COVID-19 datasets suffer from a series of problems, such as privacy and quality. We discuss these issues as well as potentials of COVID-19 datasets. © 2022, ComSIS Consortium. All rights reserved

Federation ResearchOnline

Tacit knowledge elicitation process for industry 4.0

Author: Fenoglio Enzo
Kazim Emre
Koshiyama Adriano
Latapie Hugo
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/12/2022
Field of study

Manufacturers migrate their processes to Industry 4.0, which includes new technologies for improving productivity and efficiency of operations. One of the issues is capturing, recreating, and documenting the tacit knowledge of the aging workers. However, there are no systematic procedures to incorporate this knowledge into Enterprise Resource Planning systems and maintain a competitive advantage. This paper describes a solution proposal for a tacit knowledge elicitation process for capturing operational best practices of experienced workers in industrial domains based on a mix of algorithmic techniques and a cooperative game. We use domain ontologies for Industry 4.0 and reasoning techniques to discover and integrate new facts from textual sources into an Operational Knowledge Graph. We describe a concepts formation iterative process in a role game played by human and virtual agents through socialization and externalization for knowledge graph refinement. Ethical and societal concerns are discussed as well

UCL Discovery

Using knowledge graphs to enable data interoperability in health infrastructures for COVID-19 disease outbreak management.

Author: Rodríguez Pérez Pablo
Publication venue: Universitat Politècnica de Catalunya
Publication date: 26/10/2021
Field of study

UPCommons. Portal del coneixement obert de la UPC