43 research outputs found
edge2vec: Representation learning using edge semantics for biomedical knowledge discovery
Representation learning provides new and powerful graph analytical approaches
and tools for the highly valued data science challenge of mining knowledge
graphs. Since previous graph analytical methods have mostly focused on
homogeneous graphs, an important current challenge is extending this
methodology for richly heterogeneous graphs and knowledge domains. The
biomedical sciences are such a domain, reflecting the complexity of biology,
with entities such as genes, proteins, drugs, diseases, and phenotypes, and
relationships such as gene co-expression, biochemical regulation, and
biomolecular inhibition or activation. Therefore, the semantics of edges and
nodes are critical for representation learning and knowledge discovery in real
world biomedical problems. In this paper, we propose the edge2vec model, which
represents graphs considering edge semantics. An edge-type transition matrix is
trained by an Expectation-Maximization approach, and a stochastic gradient
descent model is employed to learn node embedding on a heterogeneous graph via
the trained transition matrix. edge2vec is validated on three biomedical domain
tasks: biomedical entity classification, compound-gene bioactivity prediction,
and biomedical information retrieval. Results show that by considering
edge-types into node embedding learning in heterogeneous graphs,
\textbf{edge2vec}\ significantly outperforms state-of-the-art models on all
three tasks. We propose this method for its added value relative to existing
graph analytical methodology, and in the real world context of biomedical
knowledge discovery applicability.Comment: 10 page
DTiGNN: Learning drug-target embedding from a heterogeneous biological network based on a two-level attention-based graph neural network
Motivation: In vitro experiment-based drug-target interaction (DTI) exploration demands more human, financial and data resources. In silico approaches have been recommended for predicting DTIs to reduce time and cost. During the drug development process, one can analyze the therapeutic effect of the drug for a particular disease by identifying how the drug binds to the target for treating that disease. Hence, DTI plays a major role in drug discovery. Many computational methods have been developed for DTI prediction. However, the existing methods have limitations in terms of capturing the interactions via multiple semantics between drug and target nodes in a heterogeneous biological network (HBN). Methods: In this paper, we propose a DTiGNN framework for identifying unknown drug-target pairs. The DTiGNN first calculates the similarity between the drug and target from multiple perspectives. Then, the features of drugs and targets from each perspective are learned separately by using a novel method termed an information entropy-based random walk. Next, all of the learned features from different perspectives are integrated into a single drug and target similarity network by using a multi-view convolutional neural network. Using the integrated similarity networks, drug interactions, drug-disease associations, protein interactions and protein-disease association, the HBN is constructed. Next, a novel embedding algorithm called a meta-graph guided graph neural network is used to learn the embedding of drugs and targets. Then, a convolutional neural network is employed to infer new DTIs after balancing the sample using oversampling techniques. Results: The DTiGNN is applied to various datasets, and the result shows better performance in terms of the area under receiver operating characteristic curve (AUC) and area under precision-recall curve (AUPR), with scores of 0.98 and 0.99, respectively. There are 23,739 newly predicted DTI pairs in total
์ง๋ฃ ๋ด์ญ ๋ฐ์ดํฐ๋ฅผ ํ์ฉํ ๋ฅ๋ฌ๋ ๊ธฐ๋ฐ์ ๊ฑด๊ฐ๋ณดํ ๋จ์ฉ ํ์ง
ํ์๋
ผ๋ฌธ (๋ฐ์ฌ) -- ์์ธ๋ํ๊ต ๋ํ์ : ๊ณต๊ณผ๋ํ ์ฐ์
๊ณตํ๊ณผ, 2020. 8. ์กฐ์ฑ์ค.As global life expectancy increases, spending on healthcare grows in accordance in order to improve quality of life. However, due to expensive price of medical care, the bare cost of healthcare services would inevitably places great financial burden to individuals and households. In this light, many countries have devised and established their own public healthcare insurance systems to help people receive medical services at a lower price. Since reimbursements are made ex-post, unethical practices arise, exploiting the post-payment structure of the insurance system. The archetypes of such behavior are overdiagnosis, the act of manipulating patients diseases, and overtreatments, prescribing unnecessary drugs for the patient. These abusive behaviors are considered as one of the main sources of financial loss incurred in the healthcare system. In order to detect and prevent abuse, the national healthcare insurance hires medical professionals to manually examine whether the claim filing is medically legitimate or not. However, the review process is, unquestionably, very costly and time-consuming. In order to address these limitations, data mining techniques have been employed to detect problematic claims or abusive providers showing an abnormal billing pattern. However, these cases only used coarsely grained information such as claim-level or provider-level data. This extracted information may lead to degradation of the model's performance.
In this thesis, we proposed abuse detection methods using the medical treatment data, which is the lowest level information of the healthcare insurance claim. Firstly, we propose a scoring model based on which abusive providers are detected and show that the review process with the proposed model is more efficient than that with the previous model which uses the provider-level variables as input variables. At the same time, we devise the evaluation metrics to quantify the efficiency of the review process. Secondly, we propose the method of detecting overtreatment under seasonality, which reflects more reality to the model. We propose a model embodying multiple structures specific to DRG codes selected as important for each given department. We show that the proposed method is more robust to the seasonality than the previous method. Thirdly, we propose an overtreatment detection model accounting for heterogeneous treatment between practitioners. We proposed a network-based approach through which the relationship between the diseases and treatments is considered during the overtreatment detection process. Experimental results show that the proposed method classify the treatment well which does not explicitly exist in the training set. From these works, we show that using treatment data allows modeling abuse detection at various levels: treatment, claim, and provider-level.์ฌ๋๋ค์ ๊ธฐ๋์๋ช
์ด ์ฆ๊ฐํจ์ ๋ฐ๋ผ ์ถ์ ์ง์ ํฅ์์ํค๊ธฐ ์ํด ๋ณด๊ฑด์๋ฃ์ ์๋นํ๋ ๊ธ์ก์ ์ฆ๊ฐํ๊ณ ์๋ค. ๊ทธ๋ฌ๋, ๋น์ผ ์๋ฃ ์๋น์ค ๋น์ฉ์ ํ์ฐ์ ์ผ๋ก ๊ฐ์ธ๊ณผ ๊ฐ์ ์๊ฒ ํฐ ์ฌ์ ์ ๋ถ๋ด์ ์ฃผ๊ฒ๋๋ค. ์ด๋ฅผ ๋ฐฉ์งํ๊ธฐ ์ํด, ๋ง์ ๊ตญ๊ฐ์์๋ ๊ณต๊ณต ์๋ฃ ๋ณดํ ์์คํ
์ ๋์
ํ์ฌ ์ฌ๋๋ค์ด ์ ์ ํ ๊ฐ๊ฒฉ์ ์๋ฃ์๋น์ค๋ฅผ ๋ฐ์ ์ ์๋๋ก ํ๊ณ ์๋ค. ์ผ๋ฐ์ ์ผ๋ก, ํ์๊ฐ ๋จผ์ ์๋น์ค๋ฅผ ๋ฐ๊ณ ๋์ ์ผ๋ถ๋ง ์ง๋ถํ๊ณ ๋๋ฉด, ๋ณดํ ํ์ฌ๊ฐ ์ฌํ์ ํด๋น ์๋ฃ ๊ธฐ๊ด์ ์์ฌ ๊ธ์ก์ ์ํ์ ํ๋ ์ ๋๋ก ์ด์๋๋ค. ๊ทธ๋ฌ๋ ์ด๋ฌํ ์ ๋๋ฅผ ์
์ฉํ์ฌ ํ์์ ์ง๋ณ์ ์กฐ์ํ๊ฑฐ๋ ๊ณผ์์ง๋ฃ๋ฅผ ํ๋ ๋ฑ์ ๋ถ๋น์ฒญ๊ตฌ๊ฐ ๋ฐ์ํ๊ธฐ๋ ํ๋ค. ์ด๋ฌํ ํ์๋ค์ ์๋ฃ ์์คํ
์์ ๋ฐ์ํ๋ ์ฃผ์ ์ฌ์ ์์ค์ ์ด์ ์ค ํ๋๋ก, ์ด๋ฅผ ๋ฐฉ์งํ๊ธฐ ์ํด, ๋ณดํํ์ฌ์์๋ ์๋ฃ ์ ๋ฌธ๊ฐ๋ฅผ ๊ณ ์ฉํ์ฌ ์ํ์ ์ ๋น์ฑ์ฌ๋ถ๋ฅผ ์ผ์ผํ ๊ฒ์ฌํ๋ค. ๊ทธ๋ฌ๋, ์ด๋ฌํ ๊ฒํ ๊ณผ์ ์ ๋งค์ฐ ๋น์ธ๊ณ ๋ง์ ์๊ฐ์ด ์์๋๋ค. ์ด๋ฌํ ๊ฒํ ๊ณผ์ ์ ํจ์จ์ ์ผ๋ก ํ๊ธฐ ์ํด, ๋ฐ์ดํฐ๋ง์ด๋ ๊ธฐ๋ฒ์ ํ์ฉํ์ฌ ๋ฌธ์ ๊ฐ ์๋ ์ฒญ๊ตฌ์๋ ์ฒญ๊ตฌ ํจํด์ด ๋น์ ์์ ์ธ ์๋ฃ ์๋น์ค ๊ณต๊ธ์๋ฅผ ํ์งํ๋ ์ฐ๊ตฌ๊ฐ ์์ด์๋ค. ๊ทธ๋ฌ๋, ์ด๋ฌํ ์ฐ๊ตฌ๋ค์ ๋ฐ์ดํฐ๋ก๋ถํฐ ์ฒญ๊ตฌ์ ๋จ์๋ ๊ณต๊ธ์ ๋จ์์ ๋ณ์๋ฅผ ์ ๋ํ์ฌ ๋ชจ๋ธ์ ํ์ตํ ์ฌ๋ก๋ค๋ก, ๊ฐ์ฅ ๋ฎ์ ๋จ์์ ๋ฐ์ดํฐ์ธ ์ง๋ฃ ๋ด์ญ ๋ฐ์ดํฐ๋ฅผ ํ์ฉํ์ง ๋ชปํ๋ค.
์ด ๋
ผ๋ฌธ์์๋ ์ฒญ๊ตฌ์์์ ๊ฐ์ฅ ๋ฎ์ ๋จ์์ ๋ฐ์ดํฐ์ธ ์ง๋ฃ ๋ด์ญ ๋ฐ์ดํฐ๋ฅผ ํ์ฉํ์ฌ ๋ถ๋น์ฒญ๊ตฌ๋ฅผ ํ์งํ๋ ๋ฐฉ๋ฒ๋ก ์ ์ ์ํ๋ค. ์ฒซ์งธ, ๋น์ ์์ ์ธ ์ฒญ๊ตฌ ํจํด์ ๊ฐ๋ ์๋ฃ ์๋น์ค ์ ๊ณต์๋ฅผ ํ์งํ๋ ๋ฐฉ๋ฒ๋ก ์ ์ ์ํ์๋ค. ์ด๋ฅผ ์ค์ ๋ฐ์ดํฐ์ ์ ์ฉํ์์ ๋, ๊ธฐ์กด์ ๊ณต๊ธ์ ๋จ์์ ๋ณ์๋ฅผ ์ฌ์ฉํ ๋ฐฉ๋ฒ๋ณด๋ค ๋ ํจ์จ์ ์ธ ์ฌ์ฌ๊ฐ ์ด๋ฃจ์ด ์ง์ ํ์ธํ์๋ค. ์ด ๋, ํจ์จ์ฑ์ ์ ๋ํํ๊ธฐ ์ํ ํ๊ฐ ์ฒ๋๋ ์ ์ํ์๋ค. ๋์งธ๋ก, ์ฒญ๊ตฌ์์ ๊ณ์ ์ฑ์ด ์กด์ฌํ๋ ์ํฉ์์ ๊ณผ์์ง๋ฃ๋ฅผ ํ์งํ๋ ๋ฐฉ๋ฒ์ ์ ์ํ์๋ค. ์ด ๋, ์ง๋ฃ ๊ณผ๋ชฉ๋จ์๋ก ๋ชจ๋ธ์ ์ด์ํ๋ ๋์ ์ง๋ณ๊ตฐ(DRG) ๋จ์๋ก ๋ชจ๋ธ์ ํ์ตํ๊ณ ํ๊ฐํ๋ ๋ฐฉ๋ฒ์ ์ ์ํ์๋ค. ๊ทธ๋ฆฌ๊ณ ์ค์ ๋ฐ์ดํฐ์ ์ ์ฉํ์์ ๋, ์ ์ํ ๋ฐฉ๋ฒ์ด ๊ธฐ์กด ๋ฐฉ๋ฒ๋ณด๋ค ๊ณ์ ์ฑ์ ๋ ๊ฐ๊ฑดํจ์ ํ์ธํ์๋ค. ์
์งธ๋ก, ๋์ผ ํ์์ ๋ํด์ ์์ฌ๊ฐ์ ์์ดํ ์ง๋ฃ ํจํด์ ๊ฐ๋ ํ๊ฒฝ์์์ ๊ณผ์์ง๋ฃ ํ์ง ๋ฐฉ๋ฒ์ ์ ์ํ์๋ค. ์ด๋ ํ์์ ์ง๋ณ๊ณผ ์ง๋ฃ๋ด์ญ๊ฐ์ ๊ด๊ณ๋ฅผ ๋คํธ์ํฌ ๊ธฐ๋ฐ์ผ๋ก ๋ชจ๋ธ๋งํ๋๊ฒ์ ๊ธฐ๋ฐ์ผ๋ก ํ๋ค. ์คํ ๊ฒฐ๊ณผ ์ ์ํ ๋ฐฉ๋ฒ์ด ํ์ต ๋ฐ์ดํฐ์์ ๋ํ๋์ง ์๋ ์ง๋ฃ ํจํด์ ๋ํด์๋ ์ ๋ถ๋ฅํจ์ ์ ์ ์์๋ค. ๊ทธ๋ฆฌ๊ณ ์ด๋ฌํ ์ฐ๊ตฌ๋ค๋ก๋ถํฐ ์ง๋ฃ ๋ด์ญ์ ํ์ฉํ์์ ๋, ์ง๋ฃ๋ด์ญ, ์ฒญ๊ตฌ์, ์๋ฃ ์๋น์ค ์ ๊ณต์ ๋ฑ ๋ค์ํ ๋ ๋ฒจ์์์ ๋ถ๋น ์ฒญ๊ตฌ๋ฅผ ํ์งํ ์ ์์์ ํ์ธํ์๋ค.Chapter 1 Introduction 1
Chapter 2 Detection of Abusive Providers by department with Neural Network 9
2.1 Background 9
2.2 Literature Review 12
2.2.1 Abnormality Detection in Healthcare Insurance with Datamining Technique 12
2.2.2 Feed-Forward Neural Network 17
2.3 Proposed Method 21
2.3.1 Calculating the Likelihood of Abuse for each Treatment with Deep Neural Network 22
2.3.2 Calculating the Abuse Score of the Provider 25
2.4 Experiments 26
2.4.1 Data Description 27
2.4.2 Experimental Settings 32
2.4.3 Evaluation Measure (1): Relative Efficiency 33
2.4.4 Evaluation Measure (2): Precision at k 37
2.5 Results 38
2.5.1 Results in the test set 38
2.5.2 The Relationship among the Claimed Amount, the Abused Amount and the Abuse Score 40
2.5.3 The Relationship between the Performance of the Treatment Scoring Model and Review Efficiency 41
2.5.4 Treatment Scoring Model Results 42
2.5.5 Post-deployment Performance 44
2.6 Summary 45
Chapter 3 Detection of overtreatment by Diagnosis-related Group with Neural Network 48
3.1 Background 48
3.2 Literature review 51
3.2.1 Seasonality in disease 51
3.2.2 Diagnosis related group 52
3.3 Proposed method 54
3.3.1 Training a deep neural network model for treatment classi fication 55
3.3.2 Comparing the Performance of DRG-based Model against the department-based Model 57
3.4 Experiments 60
3.4.1 Data Description and Preprocessing 60
3.4.2 Performance Measures 64
3.4.3 Experimental Settings 65
3.5 Results 65
3.5.1 Overtreatment Detection 65
3.5.2 Abnormal Claim Detection 67
3.6 Summary 68
Chapter 4 Detection of overtreatment with graph embedding of disease-treatment pair 70
4.1 Background 70
4.2 Literature review 72
4.2.1 Graph embedding methods 73
4.2.2 Application of graph embedding methods to biomedical data analysis 79
4.2.3 Medical concept embedding methods 87
4.3 Proposed method 88
4.3.1 Network construction 89
4.3.2 Link Prediction between the Disease and the Treatment 90
4.3.3 Overtreatment Detection 93
4.4 Experiments 96
4.4.1 Data Description 97
4.4.2 Experimental Settings 99
4.5 Results 102
4.5.1 Network Construction 102
4.5.2 Link Prediction between the Disease and the Treatment 104
4.5.3 Overtreatment Detection 105
4.6 Summary 106
Chapter 5 Conclusion 108
5.1 Contribution 108
5.2 Future Work 110
Bibliography 112
๊ตญ๋ฌธ์ด๋ก 129Docto
DapBCH: a disease association prediction model Based on Cross-species and Heterogeneous graph embedding
The study of comorbidity can provide new insights into the pathogenesis of the disease and has important economic significance in the clinical evaluation of treatment difficulty, medical expenses, length of stay, and prognosis of the disease. In this paper, we propose a disease association prediction model DapBCH, which constructs a cross-species biological network and applies heterogeneous graph embedding to predict disease association. First, we combine the human diseaseโgene network, mouse geneโphenotype network, humanโmouse homologous gene network, and human proteinโprotein interaction network to reconstruct a heterogeneous biological network. Second, we apply heterogeneous graph embedding based on meta-path aggregation to generate the feature vector of disease nodes. Finally, we employ link prediction to obtain the similarity of disease pairs. The experimental results indicate that our model is highly competitive in predicting the disease association and is promising for finding potential disease associations
Inductive Meta-path Learning for Schema-complex Heterogeneous Information Networks
Heterogeneous Information Networks (HINs) are information networks with
multiple types of nodes and edges. The concept of meta-path, i.e., a sequence
of entity types and relation types connecting two entities, is proposed to
provide the meta-level explainable semantics for various HIN tasks.
Traditionally, meta-paths are primarily used for schema-simple HINs, e.g.,
bibliographic networks with only a few entity types, where meta-paths are often
enumerated with domain knowledge. However, the adoption of meta-paths for
schema-complex HINs, such as knowledge bases (KBs) with hundreds of entity and
relation types, has been limited due to the computational complexity associated
with meta-path enumeration. Additionally, effectively assessing meta-paths
requires enumerating relevant path instances, which adds further complexity to
the meta-path learning process. To address these challenges, we propose
SchemaWalk, an inductive meta-path learning framework for schema-complex HINs.
We represent meta-paths with schema-level representations to support the
learning of the scores of meta-paths for varying relations, mitigating the need
of exhaustive path instance enumeration for each relation. Further, we design a
reinforcement-learning based path-finding agent, which directly navigates the
network schema (i.e., schema graph) to learn policies for establishing
meta-paths with high coverage and confidence for multiple relations. Extensive
experiments on real data sets demonstrate the effectiveness of our proposed
paradigm
Heterogeneous Multi-Layered Network Model for Omics Data Integration and Analysis
Advances in next-generation sequencing and high-throughput techniques have enabled the generation of vast amounts of diverse omics data. These big data provide an unprecedented opportunity in biology, but impose great challenges in data integration, data mining, and knowledge discovery due to the complexity, heterogeneity, dynamics, uncertainty, and high-dimensionality inherited in the omics data. Network has been widely used to represent relations between entities in biological system, such as protein-protein interaction, gene regulation, and brain connectivity (i.e. network construction) as well as to infer novel relations given a reconstructed network (aka link prediction). Particularly, heterogeneous multi-layered network (HMLN) has proven successful in integrating diverse biological data for the representation of the hierarchy of biological system. The HMLN provides unparalleled opportunities but imposes new computational challenges on establishing causal genotype-phenotype associations and understanding environmental impact on organisms. In this review, we focus on the recent advances in developing novel computational methods for the inference of novel biological relations from the HMLN. We first discuss the properties of biological HMLN. Then we survey four categories of state-of-the-art methods (matrix factorization, random walk, knowledge graph, and deep learning). Thirdly, we demonstrate their applications to omics data integration and analysis. Finally, we outline strategies for future directions in the development of new HMLN models