544,454 research outputs found
Recommended from our members
Visualizing latent domain knowledge
Knowledge discovery and data mining commonly rely on finding salient patterns of association from a vast amount of data. Traditional citation analysis of scientific literature draws insights from strong citation patterns. Latent domain knowledge, in contrast to the mainstream domain knowledge, often consists of highly relevant but relatively infrequently cited scientific works. Visualizing latent domain knowledge presents a significant challenge to knowledge discovery and quantitative studies of science. We build upon a citation-based knowledge visualization procedure and develop an approach that not only captures knowledge structures from prominent and highly cited works, but also traces latent domain knowledge through low-frequency citation chains. We apply this approach to two cases: (1) identifying cross-domain applications of Pathfinder networks (PFNETs) and (2) clarifying the current status of scientific inquiry of a possible link between Bovine spongiform encephalopathy (BSE), also known as mad cow disease, and a new variant Creutzfeldt-Jakob disease (vCJD), a type of brain disease in human
Recommended from our members
Unsupervised word embeddings capture latent knowledge from materials science literature.
The overwhelming majority of scientific knowledge is published as text, which is difficult to analyse by either traditional statistical analysis or modern machine learning methods. By contrast, the main source of machine-interpretable data for the materials research community has come from structured property databases1,2, which encompass only a small fraction of the knowledge present in the research literature. Beyond property values, publications contain valuable knowledge regarding the connections and relationships between data items as interpreted by the authors. To improve the identification and use of this knowledge, several studies have focused on the retrieval of information from scientific literature using supervised natural language processing3-10, which requires large hand-labelled datasets for training. Here we show that materials science knowledge present in the published literature can be efficiently encoded as information-dense word embeddings11-13 (vector representations of words) without human labelling or supervision. Without any explicit insertion of chemical knowledge, these embeddings capture complex materials science concepts such as the underlying structure of the periodic table and structure-property relationships in materials. Furthermore, we demonstrate that an unsupervised method can recommend materials for functional applications several years before their discovery. This suggests that latent knowledge regarding future discoveries is to a large extent embedded in past publications. Our findings highlight the possibility of extracting knowledge and relationships from the massive body of scientific literature in a collective manner, and point towards a generalized approach to the mining of scientific literature
Literature Based Discovery (LBD): Towards Hypothesis Generation and Knowledge Discovery in Biomedical Text Mining
Biomedical knowledge is growing in an astounding pace with a majority of this
knowledge is represented as scientific publications. Text mining tools and
methods represents automatic approaches for extracting hidden patterns and
trends from this semi structured and unstructured data. In Biomedical Text
mining, Literature Based Discovery (LBD) is the process of automatically
discovering novel associations between medical terms otherwise mentioned in
disjoint literature sets. LBD approaches proven to be successfully reducing the
discovery time of potential associations that are hidden in the vast amount of
scientific literature. The process focuses on creating concept profiles for
medical terms such as a disease or symptom and connecting it with a drug and
treatment based on the statistical significance of the shared profiles. This
knowledge discovery approach introduced in 1989 still remains as a core task in
text mining. Currently the ABC principle based two approaches namely open
discovery and closed discovery are mostly explored in LBD process. This review
starts with general introduction about text mining followed by biomedical text
mining and introduces various literature resources such as MEDLINE, UMLS, MESH,
and SemMedDB. This is followed by brief introduction of the core ABC principle
and its associated two approaches open discovery and closed discovery in LBD
process. This review also discusses the deep learning applications in LBD by
reviewing the role of transformer models and neural networks based LBD models
and its future aspects. Finally, reviews the key biomedical discoveries
generated through LBD approaches in biomedicine and conclude with the current
limitations and future directions of LBD.Comment: 43 Pages, 5 Figures, 4 Table
Knowledge Management for Biomedical Literature: The Function of Text-Mining Technologies in Life-Science Research
Efficient information retrieval and extraction is a major challenge in life-science research. The Knowledge Management (KM) for biomedical literature aims to establish an environment, utilizing information technologies, to facilitate better acquisition, generation, codification, and transfer of knowledge. Knowledge Discovery in Text (KDT) is one of the goals in KM, so as to find hidden information in the literature by exploring the internal structure of knowledge network created by the textual information. Knowledge discovery could be major help in the discovery of indirect relationships, which might imply new scientific discoveries. Text-mining provides methods and technologies to retrieve and extract information contained in free-text automatically. Moreover, it enables analysis of large collections of unstructured documents for the purposes of extracting interesting and non-trivial patterns of knowledge. Biomedical text-mining is organized in stages classified into the following steps: identification of biological entities, identification of biological relations and classification of entity relations. Here, we discuss the challenges and function of biomedical text-mining in the KM for biomedical literature
Knowledge Management for Biomedical Literature: The Function of Text-Mining Technologies in Life-Science Research
Efficient information retrieval and extraction is a major challenge in life-science research. The Knowledge Management (KM) for biomedical literature aims to establish an environment, utilizing information technologies, to facilitate better acquisition, generation, codification, and transfer of knowledge. Knowledge Discovery in Text (KDT) is one of the goals in KM, so as to find hidden information in the literature by exploring the internal structure of knowledge network created by the textual information. Knowledge discovery could be major help in the discovery of indirect relationships, which might imply new scientific discoveries. Text-mining provides methods and technologies to retrieve and extract information contained in free-text automatically. Moreover, it enables analysis of large collections of unstructured documents for the purposes of extracting interesting and non-trivial patterns of knowledge. Biomedical text-mining is organized in stages classified into the following steps: identification of biological entities, identification of biological relations and classification of entity relations. Here, we discuss the challenges and function of biomedical text-mining in the KM for biomedical literature
Neural networks for open and closed Literature-based Discovery
Funder: Cambridge Commonwealth, European and International Trust; funder-id: http://dx.doi.org/10.13039/501100003343Funder: St. Edmund’s College, University of Cambridge; funder-id: http://dx.doi.org/10.13039/501100005705Literature-based Discovery (LBD) aims to discover new knowledge automatically from large collections of literature. Scientific literature is growing at an exponential rate, making it difficult for researchers to stay current in their discipline and easy to miss knowledge necessary to advance their research. LBD can facilitate hypothesis testing and generation and thus accelerate scientific progress. Neural networks have demonstrated improved performance on LBD-related tasks but are yet to be applied to it. We propose four graph-based, neural network methods to perform open and closed LBD. We compared our methods with those used by the state-of-the-art LION LBD system on the same evaluations to replicate recently published findings in cancer biology. We also applied them to a time-sliced dataset of human-curated peer-reviewed biological interactions. These evaluations and the metrics they employ represent performance on real-world knowledge advances and are thus robust indicators of approach efficacy. In the first experiments, our best methods performed 2-4 times better than the baselines in closed discovery and 2-3 times better in open discovery. In the second, our best methods performed almost 2 times better than the baselines in open discovery. These results are strong indications that neural LBD is potentially a very effective approach for generating new scientific discoveries from existing literature. The code for our models and other information can be found at: https://github.com/cambridgeltl/nn_for_LBD
Large Language Models are Zero Shot Hypothesis Proposers
Significant scientific discoveries have driven the progress of human
civilisation. The explosion of scientific literature and data has created
information barriers across disciplines that have slowed the pace of scientific
discovery. Large Language Models (LLMs) hold a wealth of global and
interdisciplinary knowledge that promises to break down these information
barriers and foster a new wave of scientific discovery. However, the potential
of LLMs for scientific discovery has not been formally explored. In this paper,
we start from investigating whether LLMs can propose scientific hypotheses. To
this end, we construct a dataset consist of background knowledge and hypothesis
pairs from biomedical literature. The dataset is divided into training, seen,
and unseen test sets based on the publication date to control visibility. We
subsequently evaluate the hypothesis generation capabilities of various
top-tier instructed models in zero-shot, few-shot, and fine-tuning settings,
including both closed and open-source LLMs. Additionally, we introduce an
LLM-based multi-agent cooperative framework with different role designs and
external tools to enhance the capabilities related to generating hypotheses. We
also design four metrics through a comprehensive review to evaluate the
generated hypotheses for both ChatGPT-based and human evaluations. Through
experiments and analyses, we arrive at the following findings: 1) LLMs
surprisingly generate untrained yet validated hypotheses from testing
literature. 2) Increasing uncertainty facilitates candidate generation,
potentially enhancing zero-shot hypothesis generation capabilities. These
findings strongly support the potential of LLMs as catalysts for new scientific
discoveries and guide further exploration.Comment: Instruction Workshop @ NeurIPS 202
Literature Mining for the Discovery of Hidden Connections between Drugs, Genes and Diseases
The scientific literature represents a rich source for retrieval of knowledge on associations between biomedical concepts such as genes, diseases and cellular processes. A commonly used method to establish relationships between biomedical concepts from literature is co-occurrence. Apart from its use in knowledge retrieval, the co-occurrence method is also well-suited to discover new, hidden relationships between biomedical concepts following a simple ABC-principle, in which A and C have no direct relationship, but are connected via shared B-intermediates. In this paper we describe CoPub Discovery, a tool that mines the literature for new relationships between biomedical concepts. Statistical analysis using ROC curves showed that CoPub Discovery performed well over a wide range of settings and keyword thesauri. We subsequently used CoPub Discovery to search for new relationships between genes, drugs, pathways and diseases. Several of the newly found relationships were validated using independent literature sources. In addition, new predicted relationships between compounds and cell proliferation were validated and confirmed experimentally in an in vitro cell proliferation assay. The results show that CoPub Discovery is able to identify novel associations between genes, drugs, pathways and diseases that have a high probability of being biologically valid. This makes CoPub Discovery a useful tool to unravel the mechanisms behind disease, to find novel drug targets, or to find novel applications for existing drugs
- …