908 research outputs found
Mining hidden connections among biomedical concepts from disjoint biomedical literature sets through semantic-based association rule
Paper accepted for publication in Journal of Information Systems. Retrieved 6/26/2006 from http://www.ischool.drexel.edu/faculty/thu/My%20Publication/Journal-papers/JIS_hu2006.pdf.The novel connection between Raynaud dise ase and fish oils was
uncovered from two disjointed biomedical literature sets by Swanson in 1986.
Since then, there have been many approaches to uncover novel connections
by mining the biomedical literature. One of the popular approaches is to adapt
the Association Rule (AR) method to automatically identify implicit novel
connections between concept A and concept C from two disjointed sets of
documents through intermediate B concept. Since A and C concepts do not
occur together in the same data set , the mining goal is to find novel connection
among A and C concepts in the disjoint data sets. It first applies association rul e
to the two disjointed biomedical literature sets separately to generate two rule
sets (AĂ B, BĂ C), and then applies transitive law to get the novel connection s
AĂ C. However, this approach generates a huge number of possible
connections among the millions of biomedical concepts and a lot of these
hypothetical connections are spurious, useless and/or biologically meaningless.
Thus it is essential to develop new approach to generate highly likely novel and
biologically relevant connections among the biomedical concepts. This paper
presents a Biomedical Semantic-based Association Rule System (Bio - SARS)
that significantly reduce spurious/useless/biologically irrelevant connections
through semantic filtering. Compared to other approaches such as LSI and
traditional association rule-based approach, our approach generates much fewer
rules and a lot of these rules represent relevant connections among biological
concepts
Literature Based Discovery (LBD): Towards Hypothesis Generation and Knowledge Discovery in Biomedical Text Mining
Biomedical knowledge is growing in an astounding pace with a majority of this
knowledge is represented as scientific publications. Text mining tools and
methods represents automatic approaches for extracting hidden patterns and
trends from this semi structured and unstructured data. In Biomedical Text
mining, Literature Based Discovery (LBD) is the process of automatically
discovering novel associations between medical terms otherwise mentioned in
disjoint literature sets. LBD approaches proven to be successfully reducing the
discovery time of potential associations that are hidden in the vast amount of
scientific literature. The process focuses on creating concept profiles for
medical terms such as a disease or symptom and connecting it with a drug and
treatment based on the statistical significance of the shared profiles. This
knowledge discovery approach introduced in 1989 still remains as a core task in
text mining. Currently the ABC principle based two approaches namely open
discovery and closed discovery are mostly explored in LBD process. This review
starts with general introduction about text mining followed by biomedical text
mining and introduces various literature resources such as MEDLINE, UMLS, MESH,
and SemMedDB. This is followed by brief introduction of the core ABC principle
and its associated two approaches open discovery and closed discovery in LBD
process. This review also discusses the deep learning applications in LBD by
reviewing the role of transformer models and neural networks based LBD models
and its future aspects. Finally, reviews the key biomedical discoveries
generated through LBD approaches in biomedicine and conclude with the current
limitations and future directions of LBD.Comment: 43 Pages, 5 Figures, 4 Table
MKEM: a Multi-level Knowledge Emergence Model for mining undiscovered public knowledge
<p>Abstract</p> <p>Background</p> <p>Since Swanson proposed the Undiscovered Public Knowledge (UPK) model, there have been many approaches to uncover UPK by mining the biomedical literature. These earlier works, however, required substantial manual intervention to reduce the number of possible connections and are mainly applied to disease-effect relation. With the advancement in biomedical science, it has become imperative to extract and combine information from multiple disjoint researches, studies and articles to infer new hypotheses and expand knowledge.</p> <p>Methods</p> <p>We propose MKEM, a Multi-level Knowledge Emergence Model, to discover implicit relationships using Natural Language Processing techniques such as Link Grammar and Ontologies such as Unified Medical Language System (UMLS) MetaMap. The contribution of MKEM is as follows: First, we propose a flexible knowledge emergence model to extract implicit relationships across different levels such as molecular level for gene and protein and Phenomic level for disease and treatment. Second, we employ MetaMap for tagging biological concepts. Third, we provide an empirical and systematic approach to discover novel relationships.</p> <p>Results</p> <p>We applied our system on 5000 abstracts downloaded from PubMed database. We performed the performance evaluation as a gold standard is not yet available. Our system performed with a good precision and recall and we generated 24 hypotheses.</p> <p>Conclusions</p> <p>Our experiments show that MKEM is a powerful tool to discover hidden relationships residing in extracted entities that were represented by our Substance-Effect-Process-Disease-Body Part (SEPDB) model. </p
A semantic approach for mining hidden links from complementary and non-interactive biomedical literature
Presented at the 2006 SIAM Conference on Data Mining (SIAM DM 2006). Retrieved 6/26/2006 from http://www.ischool.drexel.edu/faculty/thu/My%20Publication/Conference-papers/SIAM06-Hu.pdf.Two complementary and non-interactive literature sets
of articles, when they are considered together, can
reveal useful information of scientific interest not
apparent in either of the two sets alone. Swanson
called the existence of such hidden links as
undiscovered public knowledge (UPK). The novel
connection between Raynaud disease and fish oils was
uncovered from complementary and non-interactive
biomedical literature by Swanson in 1986. Since then,
there have been many approaches to uncover UPK by
mining the biomedical literature. These earlier works,
however, required substantial manual intervention to
reduce the number of possible connections. This paper
proposes a semantic-based mining model for
undiscovered public knowledge using the biomedical
literature. Our method replaces manual ad-hoc
pruning by using semantic knowledge from the
biomedical ontologies. Using the semantic types and
semantic relationships of the biomedical concepts, our
prototype system can identify the relevant concepts
collected from Medline and generate the novel
hypothesis between these concepts. The system
successfully replicates Swanson’s two famous
discoveries: Raynaud disease/fish oils and
migraine/magnesium. Compared with previous
approaches such as LSI-based and traditional
association rule-based methods, our method generates
much fewer but more relevant novel hypotheses, and
requires much less human intervention in the
discovery procedure
Literature based discovery: Techniques and tools
Literature Based Discovery (LBD) was initially proposed by Don R. Swanson in 1980 as a method to establish relationships between disease and remedy from disjoint science literature. Consequently, he established a link between magnesium and migraines. Since then literature based discovery has been a subject of research and development for discovery in online medical publications. It has further been investigated in both chemistry and mathematics; In this thesis, we give an overview of LBD and the software tools necessary to automate this technique. We further provide an implementation of this technique that is intended to be used for computer science subject matter
Query-Constraint-Based Mining of Association Rules for Exploratory Analysis of Clinical Datasets in the National Sleep Research Resource
Background: Association Rule Mining (ARM) has been widely used by biomedical researchers to perform exploratory data analysis and uncover potential relationships among variables in biomedical datasets. However, when biomedical datasets are high-dimensional, performing ARM on such datasets will yield a large number of rules, many of which may be uninteresting. Especially for imbalanced datasets, performing ARM directly would result in uninteresting rules that are dominated by certain variables that capture general characteristics.
Methods: We introduce a query-constraint-based ARM (QARM) approach for exploratory analysis of multiple, diverse clinical datasets in the National Sleep Research Resource (NSRR). QARM enables rule mining on a subset of data items satisfying a query constraint. We first perform a series of data-preprocessing steps including variable selection, merging semantically similar variables, combining multiple-visit data, and data transformation. We use Top-k Non-Redundant (TNR) ARM algorithm to generate association rules. Then we remove general and subsumed rules so that unique and non-redundant rules are resulted for a particular query constraint.
Results: Applying QARM on five datasets from NSRR obtained a total of 2517 association rules with a minimum confidence of 60% (using top 100 rules for each query constraint). The results show that merging similar variables could avoid uninteresting rules. Also, removing general and subsumed rules resulted in a more concise and interesting set of rules.
Conclusions: QARM shows the potential to support exploratory analysis of large biomedical datasets. It is also shown as a useful method to reduce the number of uninteresting association rules generated from imbalanced datasets. A preliminary literature-based analysis showed that some association rules have supporting evidence from biomedical literature, while others without literature-based evidence may serve as the candidates for new hypotheses to explore and investigate. Together with literature-based evidence, the association rules mined over the NSRR clinical datasets may be used to support clinical decisions for sleep-related problems
Knowledge Extraction from Textual Resources through Semantic Web Tools and Advanced Machine Learning Algorithms for Applications in Various Domains
Nowadays there is a tremendous amount of unstructured data, often represented by texts, which is created and stored in variety of forms in many domains such as patients' health records, social networks comments, scientific publications, and so on. This volume of data represents an invaluable source of knowledge, but unfortunately it is challenging its mining for machines. At the same time, novel tools as well as advanced methodologies have been introduced in several domains, improving the efficacy and the efficiency of data-based services.
Following this trend, this thesis shows how to parse data from text with Semantic Web based tools, feed data into Machine Learning methodologies, and produce services or resources to facilitate the execution of some tasks. More precisely, the use of Semantic Web technologies powered by Machine Learning algorithms has been investigated in the Healthcare and E-Learning domains through not yet experimented methodologies. Furthermore, this thesis investigates the use of some state-of-the-art tools to move data from texts to graphs for representing the knowledge contained in scientific literature. Finally, the use of a Semantic Web ontology and novel heuristics to detect insights from biological data in form of graph are presented. The thesis contributes to the scientific literature in terms of results and resources. Most of the material presented in this thesis derives from research papers published in international journals or conference proceedings
Evidence/Discovery-Based Evolving Ontology (EDBEO)
This paper presents a proposal for the development of an ontology evolution strategy which refines ontological relations in scientific ontologies. In addition to experts’ consensus, it is desirable to define ontological relations between any two concepts in a scientific ontology based on scientific evidence. To address this issue, we can relate ontological relations to different research results obtained from various studies. To implement this solution, our envisaged evidence/discovery-based methodology integrates a higher-level ontology (systematic review ontology) into a systematic review agent which employs a Fuzzy Inference System in order to automatically modifyontological relations of a domain ontology based on the evidence received from information resources. The evidence/discovery-based methodology will further use the domain ontology to discover novel connections between distinct literatures, thereby, enrich its conceptualization
- …