Search CORE

544,454 research outputs found

Recommended from our members

Visualizing latent domain knowledge

Author: Chen C
Kuljis J
Paul RJ
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2001
Field of study

Knowledge discovery and data mining commonly rely on finding salient patterns of association from a vast amount of data. Traditional citation analysis of scientific literature draws insights from strong citation patterns. Latent domain knowledge, in contrast to the mainstream domain knowledge, often consists of highly relevant but relatively infrequently cited scientific works. Visualizing latent domain knowledge presents a significant challenge to knowledge discovery and quantitative studies of science. We build upon a citation-based knowledge visualization procedure and develop an approach that not only captures knowledge structures from prominent and highly cited works, but also traces latent domain knowledge through low-frequency citation chains. We apply this approach to two cases: (1) identifying cross-domain applications of Pathfinder networks (PFNETs) and (2) clarifying the current status of scientific inquiry of a possible link between Bovine spongiform encephalopathy (BSE), also known as mad cow disease, and a new variant Creutzfeldt-Jakob disease (vCJD), a type of brain disease in human

Brunel University Research Archive

Drexel Libraries E-Repository and Archives

Recommended from our members

Unsupervised word embeddings capture latent knowledge from materials science literature.

Author: Ceder Gerbrand
Dagdelen John
Dunn Alexander
Jain Anubhav
Kononova Olga
Persson Kristin A
Rong Ziqin
Tshitoyan Vahe
Weston Leigh
Publication venue: eScholarship, University of California
Publication date: 01/07/2019
Field of study

The overwhelming majority of scientific knowledge is published as text, which is difficult to analyse by either traditional statistical analysis or modern machine learning methods. By contrast, the main source of machine-interpretable data for the materials research community has come from structured property databases1,2, which encompass only a small fraction of the knowledge present in the research literature. Beyond property values, publications contain valuable knowledge regarding the connections and relationships between data items as interpreted by the authors. To improve the identification and use of this knowledge, several studies have focused on the retrieval of information from scientific literature using supervised natural language processing3-10, which requires large hand-labelled datasets for training. Here we show that materials science knowledge present in the published literature can be efficiently encoded as information-dense word embeddings11-13 (vector representations of words) without human labelling or supervision. Without any explicit insertion of chemical knowledge, these embeddings capture complex materials science concepts such as the underlying structure of the periodic table and structure-property relationships in materials. Furthermore, we demonstrate that an unsupervised method can recommend materials for functional applications several years before their discovery. This suggests that latent knowledge regarding future discoveries is to a large extent embedded in past publications. Our findings highlight the possibility of extracting knowledge and relationships from the massive body of scientific literature in a collective manner, and point towards a generalized approach to the mining of scientific literature

eScholarship - University of California

Literature Based Discovery (LBD): Towards Hypothesis Generation and Knowledge Discovery in Biomedical Text Mining

Author: Bhasuran Balu
Murugesan Gurusamy
Natarajan Jeyakumar
Publication venue
Publication date: 03/10/2023
Field of study

Biomedical knowledge is growing in an astounding pace with a majority of this knowledge is represented as scientific publications. Text mining tools and methods represents automatic approaches for extracting hidden patterns and trends from this semi structured and unstructured data. In Biomedical Text mining, Literature Based Discovery (LBD) is the process of automatically discovering novel associations between medical terms otherwise mentioned in disjoint literature sets. LBD approaches proven to be successfully reducing the discovery time of potential associations that are hidden in the vast amount of scientific literature. The process focuses on creating concept profiles for medical terms such as a disease or symptom and connecting it with a drug and treatment based on the statistical significance of the shared profiles. This knowledge discovery approach introduced in 1989 still remains as a core task in text mining. Currently the ABC principle based two approaches namely open discovery and closed discovery are mostly explored in LBD process. This review starts with general introduction about text mining followed by biomedical text mining and introduces various literature resources such as MEDLINE, UMLS, MESH, and SemMedDB. This is followed by brief introduction of the core ABC principle and its associated two approaches open discovery and closed discovery in LBD process. This review also discusses the deep learning applications in LBD by reviewing the role of transformer models and neural networks based LBD models and its future aspects. Finally, reviews the key biomedical discoveries generated through LBD approaches in biomedicine and conclude with the current limitations and future directions of LBD.Comment: 43 Pages, 5 Figures, 4 Table

arXiv.org e-Print Archive

Knowledge Management for Biomedical Literature: The Function of Text-Mining Technologies in Life-Science Research

Author
Publication venue: INTERNATIONAL ASSOCIATION OF TECNOLOGY, EDUCATION AND DEVELOPMENT (IATED)
Publication date: 01/01/2008
Field of study

Efficient information retrieval and extraction is a major challenge in life-science research. The Knowledge Management (KM) for biomedical literature aims to establish an environment, utilizing information technologies, to facilitate better acquisition, generation, codification, and transfer of knowledge. Knowledge Discovery in Text (KDT) is one of the goals in KM, so as to find hidden information in the literature by exploring the internal structure of knowledge network created by the textual information. Knowledge discovery could be major help in the discovery of indirect relationships, which might imply new scientific discoveries. Text-mining provides methods and technologies to retrieve and extract information contained in free-text automatically. Moreover, it enables analysis of large collections of unstructured documents for the purposes of extracting interesting and non-trivial patterns of knowledge. Biomedical text-mining is organized in stages classified into the following steps: identification of biological entities, identification of biological relations and classification of entity relations. Here, we discuss the challenges and function of biomedical text-mining in the KM for biomedical literature

Knowledge Management for Biomedical Literature: The Function of Text-Mining Technologies in Life-Science Research

Author
Publication venue: INTERNATIONAL ASSOCIATION OF TECNOLOGY, EDUCATION AND DEVELOPMENT (IATED)
Publication date: 01/01/2008
Field of study

Neural networks for open and closed Literature-based Discovery

Author: Baker Simon
Crichton Gamal
Guo Yufan
Korhonen Anna
Publication venue: PLOS ONE
Publication date: 01/01/2020
Field of study

Funder: Cambridge Commonwealth, European and International Trust; funder-id: http://dx.doi.org/10.13039/501100003343Funder: St. Edmund’s College, University of Cambridge; funder-id: http://dx.doi.org/10.13039/501100005705Literature-based Discovery (LBD) aims to discover new knowledge automatically from large collections of literature. Scientific literature is growing at an exponential rate, making it difficult for researchers to stay current in their discipline and easy to miss knowledge necessary to advance their research. LBD can facilitate hypothesis testing and generation and thus accelerate scientific progress. Neural networks have demonstrated improved performance on LBD-related tasks but are yet to be applied to it. We propose four graph-based, neural network methods to perform open and closed LBD. We compared our methods with those used by the state-of-the-art LION LBD system on the same evaluations to replicate recently published findings in cancer biology. We also applied them to a time-sliced dataset of human-curated peer-reviewed biological interactions. These evaluations and the metrics they employ represent performance on real-world knowledge advances and are thus robust indicators of approach efficacy. In the first experiments, our best methods performed 2-4 times better than the baselines in closed discovery and 2-3 times better in open discovery. In the second, our best methods performed almost 2 times better than the baselines in open discovery. These results are strong indications that neural LBD is potentially a very effective approach for generating new scientific discoveries from existing literature. The code for our models and other information can be found at: https://github.com/cambridgeltl/nn_for_LBD

Directory of Open Access Journals

Apollo (Cambridge)

Large Language Models are Zero Shot Hypothesis Proposers

Author: Chen Zhang-Ren
Li Haoxiang
Qi Biqing
Tian Kai
Zeng Sihang
Zhang Kaiyan
Zhou Bowen
Publication venue
Publication date: 10/11/2023
Field of study

Significant scientific discoveries have driven the progress of human civilisation. The explosion of scientific literature and data has created information barriers across disciplines that have slowed the pace of scientific discovery. Large Language Models (LLMs) hold a wealth of global and interdisciplinary knowledge that promises to break down these information barriers and foster a new wave of scientific discovery. However, the potential of LLMs for scientific discovery has not been formally explored. In this paper, we start from investigating whether LLMs can propose scientific hypotheses. To this end, we construct a dataset consist of background knowledge and hypothesis pairs from biomedical literature. The dataset is divided into training, seen, and unseen test sets based on the publication date to control visibility. We subsequently evaluate the hypothesis generation capabilities of various top-tier instructed models in zero-shot, few-shot, and fine-tuning settings, including both closed and open-source LLMs. Additionally, we introduce an LLM-based multi-agent cooperative framework with different role designs and external tools to enhance the capabilities related to generating hypotheses. We also design four metrics through a comprehensive review to evaluate the generated hypotheses for both ChatGPT-based and human evaluations. Through experiments and analyses, we arrive at the following findings: 1) LLMs surprisingly generate untrained yet validated hypotheses from testing literature. 2) Increasing uncertainty facilitates candidate generation, potentially enhancing zero-shot hypothesis generation capabilities. These findings strongly support the potential of LLMs as catalysts for new scientific discoveries and guide further exploration.Comment: Instruction Workshop @ NeurIPS 202

arXiv.org e-Print Archive

Literature Mining for the Discovery of Hidden Connections between Drugs, Genes and Diseases

Author: AA Morgan
AC Nicholson
AJ Perez
Andrey Rzhetsky
AP Weetman
B Dell'Osso
B Rapoport
B Vaidya
BA Imhof
BT Alako
C Blaschke
C Nielsen
C Puozzo
CJ McDougle
CR Faltynek
D Chaussabel
D Denys
D Hristovski
D Olive
D Shao
DB Kell
DR Swanson
DR Swanson
E Yung
EC Butcher
EC Butcher
GR Hajer
H Kakeya
H Shatkay
HP Fischer
I Kola
J Han
J Kuhlmann
JA Wagner
Jacob de Vlieg
JD Wren
JD Wren
K Kajinami
K Miguita
K Njung'e
K Tomiyama
K Vandenborre
L Prokunina
LJ Jensen
M Briley
M Briley
M Campillos
M Hayashi
M Imoto
M Inazu
M Kamata
M Sugiyama
M Yetisgen-Yildiz
MA Andrade
MA Andrade
Marianne van Vugt
N Daraselia
NR Smalheiser
PD Pelton
PR Newby
R Frijters
R Frijters
R Frijters
R Homayouni
R Jelier
RA DiGiacomo
Raoul Frijters
René van Schaik
Ruben Smeets
RY Mukhtar
S Gordon
S Morikawa
S Raychaudhuri
S Raychaudhuri
SN Vaishnavi
SS Fuller
T Fawcett
T Hiramatsu
T Ito
T Shokawa
T Tabata
TK Jenssen
TT Ashburn
U Kaneyuki
WA Colburn
WK Goodman
Wynand Alkema
Y Ichimaru
Y Sugimoto
Y Tamori
Publication venue: Public Library of Science
Publication date: 01/01/2010
Field of study

The scientific literature represents a rich source for retrieval of knowledge on associations between biomedical concepts such as genes, diseases and cellular processes. A commonly used method to establish relationships between biomedical concepts from literature is co-occurrence. Apart from its use in knowledge retrieval, the co-occurrence method is also well-suited to discover new, hidden relationships between biomedical concepts following a simple ABC-principle, in which A and C have no direct relationship, but are connected via shared B-intermediates. In this paper we describe CoPub Discovery, a tool that mines the literature for new relationships between biomedical concepts. Statistical analysis using ROC curves showed that CoPub Discovery performed well over a wide range of settings and keyword thesauri. We subsequently used CoPub Discovery to search for new relationships between genes, drugs, pathways and diseases. Several of the newly found relationships were validated using independent literature sources. In addition, new predicted relationships between compounds and cell proliferation were validated and confirmed experimentally in an in vitro cell proliferation assay. The results show that CoPub Discovery is able to identify novel associations between genes, drugs, pathways and diseases that have a high probability of being biologically valid. This makes CoPub Discovery a useful tool to unravel the mechanisms behind disease, to find novel drug targets, or to find novel applications for existing drugs

Public Library of Science (PLOS)

Directory of Open Access Journals

PubMed Central

Radboud Repository