1,347 research outputs found

    Global Geometric Affinity for Revealing High Fidelity Protein Interaction Network

    Get PDF
    Protein-protein interaction (PPI) network analysis presents an essential role in understanding the functional relationship among proteins in a living biological system. Despite the success of current approaches for understanding the PPI network, the large fraction of missing and spurious PPIs and a low coverage of complete PPI network are the sources of major concern. In this paper, based on the diffusion process, we propose a new concept of global geometric affinity and an accompanying computational scheme to filter the uncertain PPIs, namely, reduce the spurious PPIs and recover the missing PPIs in the network. The main concept defines a diffusion process in which all proteins simultaneously participate to define a similarity metric (global geometric affinity (GGA)) to robustly reflect the internal connectivity among proteins. The robustness of the GGA is attributed to propagating the local connectivity to a global representation of similarity among proteins in a diffusion process. The propagation process is extremely fast as only simple matrix products are required in this computation process and thus our method is geared toward applications in high-throughput PPI networks. Furthermore, we proposed two new approaches that determine the optimal geometric scale of the PPI network and the optimal threshold for assigning the PPI from the GGA matrix. Our approach is tested with three protein-protein interaction networks and performs well with significant random noises of deletions and insertions in true PPIs. Our approach has the potential to benefit biological experiments, to better characterize network data sets, and to drive new discoveries

    Accurate Prediction of Antibody Function and Structure Using Bio-Inspired Antibody Language Model

    Full text link
    In recent decades, antibodies have emerged as indispensable therapeutics for combating diseases, particularly viral infections. However, their development has been hindered by limited structural information and labor-intensive engineering processes. Fortunately, significant advancements in deep learning methods have facilitated the precise prediction of protein structure and function by leveraging co-evolution information from homologous proteins. Despite these advances, predicting the conformation of antibodies remains challenging due to their unique evolution and the high flexibility of their antigen-binding regions. Here, to address this challenge, we present the Bio-inspired Antibody Language Model (BALM). This model is trained on a vast dataset comprising 336 million 40% non-redundant unlabeled antibody sequences, capturing both unique and conserved properties specific to antibodies. Notably, BALM showcases exceptional performance across four antigen-binding prediction tasks. Moreover, we introduce BALMFold, an end-to-end method derived from BALM, capable of swiftly predicting full atomic antibody structures from individual sequences. Remarkably, BALMFold outperforms those well-established methods like AlphaFold2, IgFold, ESMFold, and OmegaFold in the antibody benchmark, demonstrating significant potential to advance innovative engineering and streamline therapeutic antibody development by reducing the need for unnecessary trials

    GNE: A Deep Learning Framework for Gene Network Inference by Aggregating Biological Information

    Get PDF
    BACKGROUND: The topological landscape of gene interaction networks provides a rich source of information for inferring functional patterns of genes or proteins. However, it is still a challenging task to aggregate heterogeneous biological information such as gene expression and gene interactions to achieve more accurate inference for prediction and discovery of new gene interactions. In particular, how to generate a unified vector representation to integrate diverse input data is a key challenge addressed here. RESULTS: We propose a scalable and robust deep learning framework to learn embedded representations to unify known gene interactions and gene expression for gene interaction predictions. These low- dimensional embeddings derive deeper insights into the structure of rapidly accumulating and diverse gene interaction networks and greatly simplify downstream modeling. We compare the predictive power of our deep embeddings to the strong baselines. The results suggest that our deep embeddings achieve significantly more accurate predictions. Moreover, a set of novel gene interaction predictions are validated by up-to-date literature-based database entries. CONCLUSION: The proposed model demonstrates the importance of integrating heterogeneous information about genes for gene network inference. GNE is freely available under the GNU General Public License and can be downloaded from GitHub ( https://github.com/kckishan/GNE )

    ์ง„๋ฃŒ ๋‚ด์—ญ ๋ฐ์ดํ„ฐ๋ฅผ ํ™œ์šฉํ•œ ๋”ฅ๋Ÿฌ๋‹ ๊ธฐ๋ฐ˜์˜ ๊ฑด๊ฐ•๋ณดํ—˜ ๋‚จ์šฉ ํƒ์ง€

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (๋ฐ•์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› : ๊ณต๊ณผ๋Œ€ํ•™ ์‚ฐ์—…๊ณตํ•™๊ณผ, 2020. 8. ์กฐ์„ฑ์ค€.As global life expectancy increases, spending on healthcare grows in accordance in order to improve quality of life. However, due to expensive price of medical care, the bare cost of healthcare services would inevitably places great financial burden to individuals and households. In this light, many countries have devised and established their own public healthcare insurance systems to help people receive medical services at a lower price. Since reimbursements are made ex-post, unethical practices arise, exploiting the post-payment structure of the insurance system. The archetypes of such behavior are overdiagnosis, the act of manipulating patients diseases, and overtreatments, prescribing unnecessary drugs for the patient. These abusive behaviors are considered as one of the main sources of financial loss incurred in the healthcare system. In order to detect and prevent abuse, the national healthcare insurance hires medical professionals to manually examine whether the claim filing is medically legitimate or not. However, the review process is, unquestionably, very costly and time-consuming. In order to address these limitations, data mining techniques have been employed to detect problematic claims or abusive providers showing an abnormal billing pattern. However, these cases only used coarsely grained information such as claim-level or provider-level data. This extracted information may lead to degradation of the model's performance. In this thesis, we proposed abuse detection methods using the medical treatment data, which is the lowest level information of the healthcare insurance claim. Firstly, we propose a scoring model based on which abusive providers are detected and show that the review process with the proposed model is more efficient than that with the previous model which uses the provider-level variables as input variables. At the same time, we devise the evaluation metrics to quantify the efficiency of the review process. Secondly, we propose the method of detecting overtreatment under seasonality, which reflects more reality to the model. We propose a model embodying multiple structures specific to DRG codes selected as important for each given department. We show that the proposed method is more robust to the seasonality than the previous method. Thirdly, we propose an overtreatment detection model accounting for heterogeneous treatment between practitioners. We proposed a network-based approach through which the relationship between the diseases and treatments is considered during the overtreatment detection process. Experimental results show that the proposed method classify the treatment well which does not explicitly exist in the training set. From these works, we show that using treatment data allows modeling abuse detection at various levels: treatment, claim, and provider-level.์‚ฌ๋žŒ๋“ค์˜ ๊ธฐ๋Œ€์ˆ˜๋ช…์ด ์ฆ๊ฐ€ํ•จ์— ๋”ฐ๋ผ ์‚ถ์˜ ์งˆ์„ ํ–ฅ์ƒ์‹œํ‚ค๊ธฐ ์œ„ํ•ด ๋ณด๊ฑด์˜๋ฃŒ์— ์†Œ๋น„ํ•˜๋Š” ๊ธˆ์•ก์€ ์ฆ๊ฐ€ํ•˜๊ณ  ์žˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜, ๋น„์‹ผ ์˜๋ฃŒ ์„œ๋น„์Šค ๋น„์šฉ์€ ํ•„์—ฐ์ ์œผ๋กœ ๊ฐœ์ธ๊ณผ ๊ฐ€์ •์—๊ฒŒ ํฐ ์žฌ์ •์  ๋ถ€๋‹ด์„ ์ฃผ๊ฒŒ๋œ๋‹ค. ์ด๋ฅผ ๋ฐฉ์ง€ํ•˜๊ธฐ ์œ„ํ•ด, ๋งŽ์€ ๊ตญ๊ฐ€์—์„œ๋Š” ๊ณต๊ณต ์˜๋ฃŒ ๋ณดํ—˜ ์‹œ์Šคํ…œ์„ ๋„์ž…ํ•˜์—ฌ ์‚ฌ๋žŒ๋“ค์ด ์ ์ ˆํ•œ ๊ฐ€๊ฒฉ์— ์˜๋ฃŒ์„œ๋น„์Šค๋ฅผ ๋ฐ›์„ ์ˆ˜ ์žˆ๋„๋ก ํ•˜๊ณ  ์žˆ๋‹ค. ์ผ๋ฐ˜์ ์œผ๋กœ, ํ™˜์ž๊ฐ€ ๋จผ์ € ์„œ๋น„์Šค๋ฅผ ๋ฐ›๊ณ  ๋‚˜์„œ ์ผ๋ถ€๋งŒ ์ง€๋ถˆํ•˜๊ณ  ๋‚˜๋ฉด, ๋ณดํ—˜ ํšŒ์‚ฌ๊ฐ€ ์‚ฌํ›„์— ํ•ด๋‹น ์˜๋ฃŒ ๊ธฐ๊ด€์— ์ž”์—ฌ ๊ธˆ์•ก์„ ์ƒํ™˜์„ ํ•˜๋Š” ์ œ๋„๋กœ ์šด์˜๋œ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ์ด๋Ÿฌํ•œ ์ œ๋„๋ฅผ ์•…์šฉํ•˜์—ฌ ํ™˜์ž์˜ ์งˆ๋ณ‘์„ ์กฐ์ž‘ํ•˜๊ฑฐ๋‚˜ ๊ณผ์ž‰์ง„๋ฃŒ๋ฅผ ํ•˜๋Š” ๋“ฑ์˜ ๋ถ€๋‹น์ฒญ๊ตฌ๊ฐ€ ๋ฐœ์ƒํ•˜๊ธฐ๋„ ํ•œ๋‹ค. ์ด๋Ÿฌํ•œ ํ–‰์œ„๋“ค์€ ์˜๋ฃŒ ์‹œ์Šคํ…œ์—์„œ ๋ฐœ์ƒํ•˜๋Š” ์ฃผ์š” ์žฌ์ • ์†์‹ค์˜ ์ด์œ  ์ค‘ ํ•˜๋‚˜๋กœ, ์ด๋ฅผ ๋ฐฉ์ง€ํ•˜๊ธฐ ์œ„ํ•ด, ๋ณดํ—˜ํšŒ์‚ฌ์—์„œ๋Š” ์˜๋ฃŒ ์ „๋ฌธ๊ฐ€๋ฅผ ๊ณ ์šฉํ•˜์—ฌ ์˜ํ•™์  ์ •๋‹น์„ฑ์—ฌ๋ถ€๋ฅผ ์ผ์ผํžˆ ๊ฒ€์‚ฌํ•œ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜, ์ด๋Ÿฌํ•œ ๊ฒ€ํ† ๊ณผ์ •์€ ๋งค์šฐ ๋น„์‹ธ๊ณ  ๋งŽ์€ ์‹œ๊ฐ„์ด ์†Œ์š”๋œ๋‹ค. ์ด๋Ÿฌํ•œ ๊ฒ€ํ† ๊ณผ์ •์„ ํšจ์œจ์ ์œผ๋กœ ํ•˜๊ธฐ ์œ„ํ•ด, ๋ฐ์ดํ„ฐ๋งˆ์ด๋‹ ๊ธฐ๋ฒ•์„ ํ™œ์šฉํ•˜์—ฌ ๋ฌธ์ œ๊ฐ€ ์žˆ๋Š” ์ฒญ๊ตฌ์„œ๋‚˜ ์ฒญ๊ตฌ ํŒจํ„ด์ด ๋น„์ •์ƒ์ ์ธ ์˜๋ฃŒ ์„œ๋น„์Šค ๊ณต๊ธ‰์ž๋ฅผ ํƒ์ง€ํ•˜๋Š” ์—ฐ๊ตฌ๊ฐ€ ์žˆ์–ด์™”๋‹ค. ๊ทธ๋Ÿฌ๋‚˜, ์ด๋Ÿฌํ•œ ์—ฐ๊ตฌ๋“ค์€ ๋ฐ์ดํ„ฐ๋กœ๋ถ€ํ„ฐ ์ฒญ๊ตฌ์„œ ๋‹จ์œ„๋‚˜ ๊ณต๊ธ‰์ž ๋‹จ์œ„์˜ ๋ณ€์ˆ˜๋ฅผ ์œ ๋„ํ•˜์—ฌ ๋ชจ๋ธ์„ ํ•™์Šตํ•œ ์‚ฌ๋ก€๋“ค๋กœ, ๊ฐ€์žฅ ๋‚ฎ์€ ๋‹จ์œ„์˜ ๋ฐ์ดํ„ฐ์ธ ์ง„๋ฃŒ ๋‚ด์—ญ ๋ฐ์ดํ„ฐ๋ฅผ ํ™œ์šฉํ•˜์ง€ ๋ชปํ–ˆ๋‹ค. ์ด ๋…ผ๋ฌธ์—์„œ๋Š” ์ฒญ๊ตฌ์„œ์—์„œ ๊ฐ€์žฅ ๋‚ฎ์€ ๋‹จ์œ„์˜ ๋ฐ์ดํ„ฐ์ธ ์ง„๋ฃŒ ๋‚ด์—ญ ๋ฐ์ดํ„ฐ๋ฅผ ํ™œ์šฉํ•˜์—ฌ ๋ถ€๋‹น์ฒญ๊ตฌ๋ฅผ ํƒ์ง€ํ•˜๋Š” ๋ฐฉ๋ฒ•๋ก ์„ ์ œ์•ˆํ•œ๋‹ค. ์ฒซ์งธ, ๋น„์ •์ƒ์ ์ธ ์ฒญ๊ตฌ ํŒจํ„ด์„ ๊ฐ–๋Š” ์˜๋ฃŒ ์„œ๋น„์Šค ์ œ๊ณต์ž๋ฅผ ํƒ์ง€ํ•˜๋Š” ๋ฐฉ๋ฒ•๋ก ์„ ์ œ์•ˆํ•˜์˜€๋‹ค. ์ด๋ฅผ ์‹ค์ œ ๋ฐ์ดํ„ฐ์— ์ ์šฉํ•˜์˜€์„ ๋•Œ, ๊ธฐ์กด์˜ ๊ณต๊ธ‰์ž ๋‹จ์œ„์˜ ๋ณ€์ˆ˜๋ฅผ ์‚ฌ์šฉํ•œ ๋ฐฉ๋ฒ•๋ณด๋‹ค ๋” ํšจ์œจ์ ์ธ ์‹ฌ์‚ฌ๊ฐ€ ์ด๋ฃจ์–ด ์ง์„ ํ™•์ธํ•˜์˜€๋‹ค. ์ด ๋•Œ, ํšจ์œจ์„ฑ์„ ์ •๋Ÿ‰ํ™”ํ•˜๊ธฐ ์œ„ํ•œ ํ‰๊ฐ€ ์ฒ™๋„๋„ ์ œ์•ˆํ•˜์˜€๋‹ค. ๋‘˜์งธ๋กœ, ์ฒญ๊ตฌ์„œ์˜ ๊ณ„์ ˆ์„ฑ์ด ์กด์žฌํ•˜๋Š” ์ƒํ™ฉ์—์„œ ๊ณผ์ž‰์ง„๋ฃŒ๋ฅผ ํƒ์ง€ํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•˜์˜€๋‹ค. ์ด ๋•Œ, ์ง„๋ฃŒ ๊ณผ๋ชฉ๋‹จ์œ„๋กœ ๋ชจ๋ธ์„ ์šด์˜ํ•˜๋Š” ๋Œ€์‹  ์งˆ๋ณ‘๊ตฐ(DRG) ๋‹จ์œ„๋กœ ๋ชจ๋ธ์„ ํ•™์Šตํ•˜๊ณ  ํ‰๊ฐ€ํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•˜์˜€๋‹ค. ๊ทธ๋ฆฌ๊ณ  ์‹ค์ œ ๋ฐ์ดํ„ฐ์— ์ ์šฉํ•˜์˜€์„ ๋•Œ, ์ œ์•ˆํ•œ ๋ฐฉ๋ฒ•์ด ๊ธฐ์กด ๋ฐฉ๋ฒ•๋ณด๋‹ค ๊ณ„์ ˆ์„ฑ์— ๋” ๊ฐ•๊ฑดํ•จ์„ ํ™•์ธํ•˜์˜€๋‹ค. ์…‹์งธ๋กœ, ๋™์ผ ํ™˜์ž์— ๋Œ€ํ•ด์„œ ์˜์‚ฌ๊ฐ„์˜ ์ƒ์ดํ•œ ์ง„๋ฃŒ ํŒจํ„ด์„ ๊ฐ–๋Š” ํ™˜๊ฒฝ์—์„œ์˜ ๊ณผ์ž‰์ง„๋ฃŒ ํƒ์ง€ ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•˜์˜€๋‹ค. ์ด๋Š” ํ™˜์ž์˜ ์งˆ๋ณ‘๊ณผ ์ง„๋ฃŒ๋‚ด์—ญ๊ฐ„์˜ ๊ด€๊ณ„๋ฅผ ๋„คํŠธ์›Œํฌ ๊ธฐ๋ฐ˜์œผ๋กœ ๋ชจ๋ธ๋งํ•˜๋Š”๊ฒƒ์„ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•œ๋‹ค. ์‹คํ—˜ ๊ฒฐ๊ณผ ์ œ์•ˆํ•œ ๋ฐฉ๋ฒ•์ด ํ•™์Šต ๋ฐ์ดํ„ฐ์—์„œ ๋‚˜ํƒ€๋‚˜์ง€ ์•Š๋Š” ์ง„๋ฃŒ ํŒจํ„ด์— ๋Œ€ํ•ด์„œ๋„ ์ž˜ ๋ถ„๋ฅ˜ํ•จ์„ ์•Œ ์ˆ˜ ์žˆ์—ˆ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ์ด๋Ÿฌํ•œ ์—ฐ๊ตฌ๋“ค๋กœ๋ถ€ํ„ฐ ์ง„๋ฃŒ ๋‚ด์—ญ์„ ํ™œ์šฉํ•˜์˜€์„ ๋•Œ, ์ง„๋ฃŒ๋‚ด์—ญ, ์ฒญ๊ตฌ์„œ, ์˜๋ฃŒ ์„œ๋น„์Šค ์ œ๊ณต์ž ๋“ฑ ๋‹ค์–‘ํ•œ ๋ ˆ๋ฒจ์—์„œ์˜ ๋ถ€๋‹น ์ฒญ๊ตฌ๋ฅผ ํƒ์ง€ํ•  ์ˆ˜ ์žˆ์Œ์„ ํ™•์ธํ•˜์˜€๋‹ค.Chapter 1 Introduction 1 Chapter 2 Detection of Abusive Providers by department with Neural Network 9 2.1 Background 9 2.2 Literature Review 12 2.2.1 Abnormality Detection in Healthcare Insurance with Datamining Technique 12 2.2.2 Feed-Forward Neural Network 17 2.3 Proposed Method 21 2.3.1 Calculating the Likelihood of Abuse for each Treatment with Deep Neural Network 22 2.3.2 Calculating the Abuse Score of the Provider 25 2.4 Experiments 26 2.4.1 Data Description 27 2.4.2 Experimental Settings 32 2.4.3 Evaluation Measure (1): Relative Efficiency 33 2.4.4 Evaluation Measure (2): Precision at k 37 2.5 Results 38 2.5.1 Results in the test set 38 2.5.2 The Relationship among the Claimed Amount, the Abused Amount and the Abuse Score 40 2.5.3 The Relationship between the Performance of the Treatment Scoring Model and Review Efficiency 41 2.5.4 Treatment Scoring Model Results 42 2.5.5 Post-deployment Performance 44 2.6 Summary 45 Chapter 3 Detection of overtreatment by Diagnosis-related Group with Neural Network 48 3.1 Background 48 3.2 Literature review 51 3.2.1 Seasonality in disease 51 3.2.2 Diagnosis related group 52 3.3 Proposed method 54 3.3.1 Training a deep neural network model for treatment classi fication 55 3.3.2 Comparing the Performance of DRG-based Model against the department-based Model 57 3.4 Experiments 60 3.4.1 Data Description and Preprocessing 60 3.4.2 Performance Measures 64 3.4.3 Experimental Settings 65 3.5 Results 65 3.5.1 Overtreatment Detection 65 3.5.2 Abnormal Claim Detection 67 3.6 Summary 68 Chapter 4 Detection of overtreatment with graph embedding of disease-treatment pair 70 4.1 Background 70 4.2 Literature review 72 4.2.1 Graph embedding methods 73 4.2.2 Application of graph embedding methods to biomedical data analysis 79 4.2.3 Medical concept embedding methods 87 4.3 Proposed method 88 4.3.1 Network construction 89 4.3.2 Link Prediction between the Disease and the Treatment 90 4.3.3 Overtreatment Detection 93 4.4 Experiments 96 4.4.1 Data Description 97 4.4.2 Experimental Settings 99 4.5 Results 102 4.5.1 Network Construction 102 4.5.2 Link Prediction between the Disease and the Treatment 104 4.5.3 Overtreatment Detection 105 4.6 Summary 106 Chapter 5 Conclusion 108 5.1 Contribution 108 5.2 Future Work 110 Bibliography 112 ๊ตญ๋ฌธ์ดˆ๋ก 129Docto

    To Embed or Not: Network Embedding as a Paradigm in Computational Biology

    Get PDF
    Current technology is producing high throughput biomedical data at an ever-growing rate. A common approach to interpreting such data is through network-based analyses. Since biological networks are notoriously complex and hard to decipher, a growing body of work applies graph embedding techniques to simplify, visualize, and facilitate the analysis of the resulting networks. In this review, we survey traditional and new approaches for graph embedding and compare their application to fundamental problems in network biology with using the networks directly. We consider a broad variety of applications including protein network alignment, community detection, and protein function prediction. We find that in all of these domains both types of approaches are of value and their performance depends on the evaluation measures being used and the goal of the project. In particular, network embedding methods outshine direct methods according to some of those measures and are, thus, an essential tool in bioinformatics research
    • โ€ฆ
    corecore