16 research outputs found
Investigating drug translational research using PubMed articles
Drug research and development are embracing translational research for its
potential to increase the number of drugs successfully brought to clinical
applications. Using the publicly available PubMed database, we sought to
describe the status of drug translational research, the distribution of
translational lags for all drugs as well as the collaborations between basic
science and clinical science in drug research. For each drug, an indicator
called Translational Lag was proposed to quantify the interval time from its
first PubMed article to its first clinical article. Meanwhile, the triangle of
biomedicine was also used to visualize the status and multidisciplinary
collaboration of drug translational research. The results showed that only
18.1% (24,410) of drugs/compounds had been successfully entering clinical
research. It averagely took 14.38 years (interquartile range, 4 to 21 years)
for a drug from the initial basic discovery to its first clinical research. In
addition, the results also revealed that, in drug research, there was rare
cooperation between basic science and clinical science, which were more
inclined to cooperate within disciplines.Comment: 7pages, 1 figure
Bi-Encoders based Species Normalization -- Pairwise Sentence Learning to Rank
Motivation: Biomedical named-entity normalization involves connecting
biomedical entities with distinct database identifiers in order to facilitate
data integration across various fields of biology. Existing systems for
biomedical named entity normalization heavily rely on dictionaries, manually
created rules, and high-quality representative features such as lexical or
morphological characteristics. However, recent research has investigated the
use of neural network-based models to reduce dependence on dictionaries,
manually crafted rules, and features. Despite these advancements, the
performance of these models is still limited due to the lack of sufficiently
large training datasets. These models have a tendency to overfit small training
corpora and exhibit poor generalization when faced with previously unseen
entities, necessitating the redesign of rules and features. Contribution: We
present a novel deep learning approach for named entity normalization, treating
it as a pair-wise learning to rank problem. Our method utilizes the widely-used
information retrieval algorithm Best Matching 25 to generate candidate
concepts, followed by the application of bi-directional encoder representation
from the encoder (BERT) to re-rank the candidate list. Notably, our approach
eliminates the need for feature-engineering or rule creation. We conduct
experiments on species entity types and evaluate our method against
state-of-the-art techniques using LINNAEUS and S800 biomedical corpora. Our
proposed approach surpasses existing methods in linking entities to the NCBI
taxonomy. To the best of our knowledge, there is no existing neural
network-based approach for species normalization in the literature
BERT Based Clinical Knowledge Extraction for Biomedical Knowledge Graph Construction and Analysis
Background : Knowledge is evolving over time, often as a result of new
discoveries or changes in the adopted methods of reasoning. Also, new facts or
evidence may become available, leading to new understandings of complex
phenomena. This is particularly true in the biomedical field, where scientists
and physicians are constantly striving to find new methods of diagnosis,
treatment and eventually cure. Knowledge Graphs (KGs) offer a real way of
organizing and retrieving the massive and growing amount of biomedical
knowledge.
Objective : We propose an end-to-end approach for knowledge extraction and
analysis from biomedical clinical notes using the Bidirectional Encoder
Representations from Transformers (BERT) model and Conditional Random Field
(CRF) layer.
Methods : The approach is based on knowledge graphs, which can effectively
process abstract biomedical concepts such as relationships and interactions
between medical entities. Besides offering an intuitive way to visualize these
concepts, KGs can solve more complex knowledge retrieval problems by
simplifying them into simpler representations or by transforming the problems
into representations from different perspectives. We created a biomedical
Knowledge Graph using using Natural Language Processing models for named entity
recognition and relation extraction. The generated biomedical knowledge graphs
(KGs) are then used for question answering.
Results : The proposed framework can successfully extract relevant structured
information with high accuracy (90.7% for Named-entity recognition (NER), 88%
for relation extraction (RE)), according to experimental findings based on
real-world 505 patient biomedical unstructured clinical notes.
Conclusions : In this paper, we propose a novel end-to-end system for the
construction of a biomedical knowledge graph from clinical textual using a
variation of BERT models
Data-driven information extraction and enrichment of molecular profiling data for cancer cell lines
Motivation
With the proliferation of research means and computational methodologies, published biomedical literature is growing exponentially in numbers and volume. Cancer cell lines are frequently used models in biological and medical research that are currently applied for a wide range of purposes, from studies of cellular mechanisms to drug development, which has led to a wealth of related data and publications. Sifting through large quantities of text to gather relevant information on cell lines of interest is tedious and extremely slow when performed by humans. Hence, novel computational information extraction and correlation mechanisms are required to boost meaningful knowledge extraction.
Results
In this work, we present the design, implementation, and application of a novel data extraction and exploration system. This system extracts deep semantic relations between textual entities from scientific literature to enrich existing structured clinical data concerning cancer cell lines. We introduce a new public data exploration portal, which enables automatic linking of genomic copy number variants plots with ranked, related entities such as affected genes. Each relation is accompanied by literature-derived evidences, allowing for deep, yet rapid, literature search, using existing structured data as a springboard.
Availability and implementation
Our system is publicly available on the web at https://cancercelllines.org
COVID-19 datasets : a brief overview
The outbreak of the COVID-19 pandemic affects lives and social-economic development around the world. The affecting of the pandemic has motivated researchers from different domains to find effective solutions to diagnose, prevent, and estimate the pandemic and relieve its adverse effects. Numerous COVID-19 datasets are built from these studies and are available to the public. These datasets can be used for disease diagnosis and case prediction, speeding up solving problems caused by the pandemic. To meet the needs of researchers to understand various COVID-19 datasets, we examine and provide an overview of them. We organise the majority of these datasets into three categories based on the category of ap-plications, i.e., time-series, knowledge base, and media-based datasets. Organising COVID-19 datasets into appropriate categories can help researchers hold their focus on methodology rather than the datasets. In addition, applications and COVID-19 datasets suffer from a series of problems, such as privacy and quality. We discuss these issues as well as potentials of COVID-19 datasets. © 2022, ComSIS Consortium. All rights reserved
Tacit knowledge elicitation process for industry 4.0
Manufacturers migrate their processes to Industry 4.0, which includes new technologies for improving productivity and efficiency of operations. One of the issues is capturing, recreating, and documenting the tacit knowledge of the aging workers. However, there are no systematic procedures to incorporate this knowledge into Enterprise Resource Planning systems and maintain a competitive advantage. This paper describes a solution proposal for a tacit knowledge elicitation process for capturing operational best practices of experienced workers in industrial domains based on a mix of algorithmic techniques and a cooperative game. We use domain ontologies for Industry 4.0 and reasoning techniques to discover and integrate new facts from textual sources into an Operational Knowledge Graph. We describe a concepts formation iterative process in a role game played by human and virtual agents through socialization and externalization for knowledge graph refinement. Ethical and societal concerns are discussed as well