6,942 research outputs found
Data Mining in Healthcare: A Survey of Techniques and Algorithms with its Limitations and Challenges
The large amount of data in healthcare industry is a key resource to be processed and analyzed for knowledge extraction. The knowledge discovery is the process of making low-level data into high-level knowledge. Data mining is a core component of the KDD process. Data mining techniques are used in healthcare management which improve the quality and decrease the cost of healthcare services. Data mining algorithms are needed in almost every step in KDD process ranging from domain understanding to knowledge evaluation. It is necessary to identify and evaluate the most common data mining algorithms implemented in modern healthcare services. The need is for algorithms with very high accuracy as medical diagnosis is considered as a significant yet obscure task that needs to be carried out precisely and efficiently
Literature Based Discovery (LBD): Towards Hypothesis Generation and Knowledge Discovery in Biomedical Text Mining
Biomedical knowledge is growing in an astounding pace with a majority of this
knowledge is represented as scientific publications. Text mining tools and
methods represents automatic approaches for extracting hidden patterns and
trends from this semi structured and unstructured data. In Biomedical Text
mining, Literature Based Discovery (LBD) is the process of automatically
discovering novel associations between medical terms otherwise mentioned in
disjoint literature sets. LBD approaches proven to be successfully reducing the
discovery time of potential associations that are hidden in the vast amount of
scientific literature. The process focuses on creating concept profiles for
medical terms such as a disease or symptom and connecting it with a drug and
treatment based on the statistical significance of the shared profiles. This
knowledge discovery approach introduced in 1989 still remains as a core task in
text mining. Currently the ABC principle based two approaches namely open
discovery and closed discovery are mostly explored in LBD process. This review
starts with general introduction about text mining followed by biomedical text
mining and introduces various literature resources such as MEDLINE, UMLS, MESH,
and SemMedDB. This is followed by brief introduction of the core ABC principle
and its associated two approaches open discovery and closed discovery in LBD
process. This review also discusses the deep learning applications in LBD by
reviewing the role of transformer models and neural networks based LBD models
and its future aspects. Finally, reviews the key biomedical discoveries
generated through LBD approaches in biomedicine and conclude with the current
limitations and future directions of LBD.Comment: 43 Pages, 5 Figures, 4 Table
Relation Prediction over Biomedical Knowledge Bases for Drug Repositioning
Identifying new potential treatment options for medical conditions that cause human disease burden is a central task of biomedical research. Since all candidate drugs cannot be tested with animal and clinical trials, in vitro approaches are first attempted to identify promising candidates. Likewise, identifying other essential relations (e.g., causation, prevention) between biomedical entities is also critical to understand biomedical processes. Hence, it is crucial to develop automated relation prediction systems that can yield plausible biomedical relations to expedite the discovery process. In this dissertation, we demonstrate three approaches to predict treatment relations between biomedical entities for the drug repositioning task using existing biomedical knowledge bases. Our approaches can be broadly labeled as link prediction or knowledge base completion in computer science literature. Specifically, first we investigate the predictive power of graph paths connecting entities in the publicly available biomedical knowledge base, SemMedDB (the entities and relations constitute a large knowledge graph as a whole). To that end, we build logistic regression models utilizing semantic graph pattern features extracted from the SemMedDB to predict treatment and causative relations in Unified Medical Language System (UMLS) Metathesaurus. Second, we study matrix and tensor factorization algorithms for predicting drug repositioning pairs in repoDB, a general purpose gold standard database of approved and failed drug–disease indications. The idea here is to predict repoDB pairs by approximating the given input matrix/tensor structure where the value of a cell represents the existence of a relation coming from SemMedDB and UMLS knowledge bases. The essential goal is to predict the test pairs that have a blank cell in the input matrix/tensor based on the shared biomedical context among existing non-blank cells. Our final approach involves graph convolutional neural networks where entities and relation types are embedded in a vector space involving neighborhood information. Basically, we minimize an objective function to guide our model to concept/relation embeddings such that distance scores for positive relation pairs are lower than those for the negative ones. Overall, our results demonstrate that recent link prediction methods applied to automatically curated, and hence imprecise, knowledge bases can nevertheless result in high accuracy drug candidate prediction with appropriate configuration of both the methods and datasets used
- …