6,418 research outputs found

    Automated linear motif discovery from protein interaction network

    Get PDF
    Master'sMASTER OF SCIENC

    Associative Pattern Recognition for Biological Regulation Data

    Get PDF
    In the last decade, bioinformatics data has been accumulated at an unprecedented rate, thanks to the advancement in sequencing technologies. Such rapid development poses both challenges and promising research topics. In this dissertation, we propose a series of associative pattern recognition algorithms in biological regulation studies. In particular, we emphasize efficiently recognizing associative patterns between genes, transcription factors, histone modifications and functional labels using heterogeneous data sources (numeric, sequences, time series data and textual labels). In protein-DNA associative pattern recognition, we introduce an efficient algorithm for affinity test by searching for over-represented DNA sequences using a hash function and modulo addition calculation. This substantially improves the efficiency of \textit{next generation sequencing} data analysis. In gene regulatory network inference, we propose a framework for refining weak networks based on transcription factor binding sites, thus improved the precision of predicted edges by up to 52%. In histone modification code analysis, we propose an approach to genome-wide combinatorial pattern recognition for histone code to function associative pattern recognition, and achieved improvement by up to 38.1%38.1\%. We also propose a novel shape based modification pattern analysis approach, using this to successfully predict sub-classes of genes in flowering-time category. We also propose a combination to combination associative pattern recognition, and achieved better performance compared against multi-label classification and bidirectional associative memory methods. Our proposed approaches recognize associative patterns from different types of data efficiently, and provides a useful toolbox for biological regulation analysis. This dissertation presents a road-map to associative patterns recognition at genome wide level

    Multi-Target Prediction: A Unifying View on Problems and Methods

    Full text link
    Multi-target prediction (MTP) is concerned with the simultaneous prediction of multiple target variables of diverse type. Due to its enormous application potential, it has developed into an active and rapidly expanding research field that combines several subfields of machine learning, including multivariate regression, multi-label classification, multi-task learning, dyadic prediction, zero-shot learning, network inference, and matrix completion. In this paper, we present a unifying view on MTP problems and methods. First, we formally discuss commonalities and differences between existing MTP problems. To this end, we introduce a general framework that covers the above subfields as special cases. As a second contribution, we provide a structured overview of MTP methods. This is accomplished by identifying a number of key properties, which distinguish such methods and determine their suitability for different types of problems. Finally, we also discuss a few challenges for future research

    Gapped consensus motif discovery: evaluation of a new algorithm based on local multiple alignments and a sampling strategy

    Get PDF
    We check the efficiency and faisability of a novel method designed for the discovery of a priori unknown motifs described as gaps alternating with specific regions. Such motifs are searched for as consensi of non homologous biological sequences. The only specifications required concern the maximal gap length, the minimal frequency for specific characters and the minimal percentage (quorum) of sequences sharing the motif. Our method is based on a cooperation between a multiple alignment method for a quick detection of local similarities and a sampling strategy running candidate position specific scoring matrices to convergence. This rather original way implemented for converging to the solution proves efficient both on simulated data, gapped instances of the so-called challenge problem, promoter sites in Dicot plants and transcription factor binding sites in E.Coli. Our algorithm compares favorably with the MEME and STARS approaches in terms of accuracy

    ์ง„๋ฃŒ ๋‚ด์—ญ ๋ฐ์ดํ„ฐ๋ฅผ ํ™œ์šฉํ•œ ๋”ฅ๋Ÿฌ๋‹ ๊ธฐ๋ฐ˜์˜ ๊ฑด๊ฐ•๋ณดํ—˜ ๋‚จ์šฉ ํƒ์ง€

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (๋ฐ•์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› : ๊ณต๊ณผ๋Œ€ํ•™ ์‚ฐ์—…๊ณตํ•™๊ณผ, 2020. 8. ์กฐ์„ฑ์ค€.As global life expectancy increases, spending on healthcare grows in accordance in order to improve quality of life. However, due to expensive price of medical care, the bare cost of healthcare services would inevitably places great financial burden to individuals and households. In this light, many countries have devised and established their own public healthcare insurance systems to help people receive medical services at a lower price. Since reimbursements are made ex-post, unethical practices arise, exploiting the post-payment structure of the insurance system. The archetypes of such behavior are overdiagnosis, the act of manipulating patients diseases, and overtreatments, prescribing unnecessary drugs for the patient. These abusive behaviors are considered as one of the main sources of financial loss incurred in the healthcare system. In order to detect and prevent abuse, the national healthcare insurance hires medical professionals to manually examine whether the claim filing is medically legitimate or not. However, the review process is, unquestionably, very costly and time-consuming. In order to address these limitations, data mining techniques have been employed to detect problematic claims or abusive providers showing an abnormal billing pattern. However, these cases only used coarsely grained information such as claim-level or provider-level data. This extracted information may lead to degradation of the model's performance. In this thesis, we proposed abuse detection methods using the medical treatment data, which is the lowest level information of the healthcare insurance claim. Firstly, we propose a scoring model based on which abusive providers are detected and show that the review process with the proposed model is more efficient than that with the previous model which uses the provider-level variables as input variables. At the same time, we devise the evaluation metrics to quantify the efficiency of the review process. Secondly, we propose the method of detecting overtreatment under seasonality, which reflects more reality to the model. We propose a model embodying multiple structures specific to DRG codes selected as important for each given department. We show that the proposed method is more robust to the seasonality than the previous method. Thirdly, we propose an overtreatment detection model accounting for heterogeneous treatment between practitioners. We proposed a network-based approach through which the relationship between the diseases and treatments is considered during the overtreatment detection process. Experimental results show that the proposed method classify the treatment well which does not explicitly exist in the training set. From these works, we show that using treatment data allows modeling abuse detection at various levels: treatment, claim, and provider-level.์‚ฌ๋žŒ๋“ค์˜ ๊ธฐ๋Œ€์ˆ˜๋ช…์ด ์ฆ๊ฐ€ํ•จ์— ๋”ฐ๋ผ ์‚ถ์˜ ์งˆ์„ ํ–ฅ์ƒ์‹œํ‚ค๊ธฐ ์œ„ํ•ด ๋ณด๊ฑด์˜๋ฃŒ์— ์†Œ๋น„ํ•˜๋Š” ๊ธˆ์•ก์€ ์ฆ๊ฐ€ํ•˜๊ณ  ์žˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜, ๋น„์‹ผ ์˜๋ฃŒ ์„œ๋น„์Šค ๋น„์šฉ์€ ํ•„์—ฐ์ ์œผ๋กœ ๊ฐœ์ธ๊ณผ ๊ฐ€์ •์—๊ฒŒ ํฐ ์žฌ์ •์  ๋ถ€๋‹ด์„ ์ฃผ๊ฒŒ๋œ๋‹ค. ์ด๋ฅผ ๋ฐฉ์ง€ํ•˜๊ธฐ ์œ„ํ•ด, ๋งŽ์€ ๊ตญ๊ฐ€์—์„œ๋Š” ๊ณต๊ณต ์˜๋ฃŒ ๋ณดํ—˜ ์‹œ์Šคํ…œ์„ ๋„์ž…ํ•˜์—ฌ ์‚ฌ๋žŒ๋“ค์ด ์ ์ ˆํ•œ ๊ฐ€๊ฒฉ์— ์˜๋ฃŒ์„œ๋น„์Šค๋ฅผ ๋ฐ›์„ ์ˆ˜ ์žˆ๋„๋ก ํ•˜๊ณ  ์žˆ๋‹ค. ์ผ๋ฐ˜์ ์œผ๋กœ, ํ™˜์ž๊ฐ€ ๋จผ์ € ์„œ๋น„์Šค๋ฅผ ๋ฐ›๊ณ  ๋‚˜์„œ ์ผ๋ถ€๋งŒ ์ง€๋ถˆํ•˜๊ณ  ๋‚˜๋ฉด, ๋ณดํ—˜ ํšŒ์‚ฌ๊ฐ€ ์‚ฌํ›„์— ํ•ด๋‹น ์˜๋ฃŒ ๊ธฐ๊ด€์— ์ž”์—ฌ ๊ธˆ์•ก์„ ์ƒํ™˜์„ ํ•˜๋Š” ์ œ๋„๋กœ ์šด์˜๋œ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ์ด๋Ÿฌํ•œ ์ œ๋„๋ฅผ ์•…์šฉํ•˜์—ฌ ํ™˜์ž์˜ ์งˆ๋ณ‘์„ ์กฐ์ž‘ํ•˜๊ฑฐ๋‚˜ ๊ณผ์ž‰์ง„๋ฃŒ๋ฅผ ํ•˜๋Š” ๋“ฑ์˜ ๋ถ€๋‹น์ฒญ๊ตฌ๊ฐ€ ๋ฐœ์ƒํ•˜๊ธฐ๋„ ํ•œ๋‹ค. ์ด๋Ÿฌํ•œ ํ–‰์œ„๋“ค์€ ์˜๋ฃŒ ์‹œ์Šคํ…œ์—์„œ ๋ฐœ์ƒํ•˜๋Š” ์ฃผ์š” ์žฌ์ • ์†์‹ค์˜ ์ด์œ  ์ค‘ ํ•˜๋‚˜๋กœ, ์ด๋ฅผ ๋ฐฉ์ง€ํ•˜๊ธฐ ์œ„ํ•ด, ๋ณดํ—˜ํšŒ์‚ฌ์—์„œ๋Š” ์˜๋ฃŒ ์ „๋ฌธ๊ฐ€๋ฅผ ๊ณ ์šฉํ•˜์—ฌ ์˜ํ•™์  ์ •๋‹น์„ฑ์—ฌ๋ถ€๋ฅผ ์ผ์ผํžˆ ๊ฒ€์‚ฌํ•œ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜, ์ด๋Ÿฌํ•œ ๊ฒ€ํ† ๊ณผ์ •์€ ๋งค์šฐ ๋น„์‹ธ๊ณ  ๋งŽ์€ ์‹œ๊ฐ„์ด ์†Œ์š”๋œ๋‹ค. ์ด๋Ÿฌํ•œ ๊ฒ€ํ† ๊ณผ์ •์„ ํšจ์œจ์ ์œผ๋กœ ํ•˜๊ธฐ ์œ„ํ•ด, ๋ฐ์ดํ„ฐ๋งˆ์ด๋‹ ๊ธฐ๋ฒ•์„ ํ™œ์šฉํ•˜์—ฌ ๋ฌธ์ œ๊ฐ€ ์žˆ๋Š” ์ฒญ๊ตฌ์„œ๋‚˜ ์ฒญ๊ตฌ ํŒจํ„ด์ด ๋น„์ •์ƒ์ ์ธ ์˜๋ฃŒ ์„œ๋น„์Šค ๊ณต๊ธ‰์ž๋ฅผ ํƒ์ง€ํ•˜๋Š” ์—ฐ๊ตฌ๊ฐ€ ์žˆ์–ด์™”๋‹ค. ๊ทธ๋Ÿฌ๋‚˜, ์ด๋Ÿฌํ•œ ์—ฐ๊ตฌ๋“ค์€ ๋ฐ์ดํ„ฐ๋กœ๋ถ€ํ„ฐ ์ฒญ๊ตฌ์„œ ๋‹จ์œ„๋‚˜ ๊ณต๊ธ‰์ž ๋‹จ์œ„์˜ ๋ณ€์ˆ˜๋ฅผ ์œ ๋„ํ•˜์—ฌ ๋ชจ๋ธ์„ ํ•™์Šตํ•œ ์‚ฌ๋ก€๋“ค๋กœ, ๊ฐ€์žฅ ๋‚ฎ์€ ๋‹จ์œ„์˜ ๋ฐ์ดํ„ฐ์ธ ์ง„๋ฃŒ ๋‚ด์—ญ ๋ฐ์ดํ„ฐ๋ฅผ ํ™œ์šฉํ•˜์ง€ ๋ชปํ–ˆ๋‹ค. ์ด ๋…ผ๋ฌธ์—์„œ๋Š” ์ฒญ๊ตฌ์„œ์—์„œ ๊ฐ€์žฅ ๋‚ฎ์€ ๋‹จ์œ„์˜ ๋ฐ์ดํ„ฐ์ธ ์ง„๋ฃŒ ๋‚ด์—ญ ๋ฐ์ดํ„ฐ๋ฅผ ํ™œ์šฉํ•˜์—ฌ ๋ถ€๋‹น์ฒญ๊ตฌ๋ฅผ ํƒ์ง€ํ•˜๋Š” ๋ฐฉ๋ฒ•๋ก ์„ ์ œ์•ˆํ•œ๋‹ค. ์ฒซ์งธ, ๋น„์ •์ƒ์ ์ธ ์ฒญ๊ตฌ ํŒจํ„ด์„ ๊ฐ–๋Š” ์˜๋ฃŒ ์„œ๋น„์Šค ์ œ๊ณต์ž๋ฅผ ํƒ์ง€ํ•˜๋Š” ๋ฐฉ๋ฒ•๋ก ์„ ์ œ์•ˆํ•˜์˜€๋‹ค. ์ด๋ฅผ ์‹ค์ œ ๋ฐ์ดํ„ฐ์— ์ ์šฉํ•˜์˜€์„ ๋•Œ, ๊ธฐ์กด์˜ ๊ณต๊ธ‰์ž ๋‹จ์œ„์˜ ๋ณ€์ˆ˜๋ฅผ ์‚ฌ์šฉํ•œ ๋ฐฉ๋ฒ•๋ณด๋‹ค ๋” ํšจ์œจ์ ์ธ ์‹ฌ์‚ฌ๊ฐ€ ์ด๋ฃจ์–ด ์ง์„ ํ™•์ธํ•˜์˜€๋‹ค. ์ด ๋•Œ, ํšจ์œจ์„ฑ์„ ์ •๋Ÿ‰ํ™”ํ•˜๊ธฐ ์œ„ํ•œ ํ‰๊ฐ€ ์ฒ™๋„๋„ ์ œ์•ˆํ•˜์˜€๋‹ค. ๋‘˜์งธ๋กœ, ์ฒญ๊ตฌ์„œ์˜ ๊ณ„์ ˆ์„ฑ์ด ์กด์žฌํ•˜๋Š” ์ƒํ™ฉ์—์„œ ๊ณผ์ž‰์ง„๋ฃŒ๋ฅผ ํƒ์ง€ํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•˜์˜€๋‹ค. ์ด ๋•Œ, ์ง„๋ฃŒ ๊ณผ๋ชฉ๋‹จ์œ„๋กœ ๋ชจ๋ธ์„ ์šด์˜ํ•˜๋Š” ๋Œ€์‹  ์งˆ๋ณ‘๊ตฐ(DRG) ๋‹จ์œ„๋กœ ๋ชจ๋ธ์„ ํ•™์Šตํ•˜๊ณ  ํ‰๊ฐ€ํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•˜์˜€๋‹ค. ๊ทธ๋ฆฌ๊ณ  ์‹ค์ œ ๋ฐ์ดํ„ฐ์— ์ ์šฉํ•˜์˜€์„ ๋•Œ, ์ œ์•ˆํ•œ ๋ฐฉ๋ฒ•์ด ๊ธฐ์กด ๋ฐฉ๋ฒ•๋ณด๋‹ค ๊ณ„์ ˆ์„ฑ์— ๋” ๊ฐ•๊ฑดํ•จ์„ ํ™•์ธํ•˜์˜€๋‹ค. ์…‹์งธ๋กœ, ๋™์ผ ํ™˜์ž์— ๋Œ€ํ•ด์„œ ์˜์‚ฌ๊ฐ„์˜ ์ƒ์ดํ•œ ์ง„๋ฃŒ ํŒจํ„ด์„ ๊ฐ–๋Š” ํ™˜๊ฒฝ์—์„œ์˜ ๊ณผ์ž‰์ง„๋ฃŒ ํƒ์ง€ ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•˜์˜€๋‹ค. ์ด๋Š” ํ™˜์ž์˜ ์งˆ๋ณ‘๊ณผ ์ง„๋ฃŒ๋‚ด์—ญ๊ฐ„์˜ ๊ด€๊ณ„๋ฅผ ๋„คํŠธ์›Œํฌ ๊ธฐ๋ฐ˜์œผ๋กœ ๋ชจ๋ธ๋งํ•˜๋Š”๊ฒƒ์„ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•œ๋‹ค. ์‹คํ—˜ ๊ฒฐ๊ณผ ์ œ์•ˆํ•œ ๋ฐฉ๋ฒ•์ด ํ•™์Šต ๋ฐ์ดํ„ฐ์—์„œ ๋‚˜ํƒ€๋‚˜์ง€ ์•Š๋Š” ์ง„๋ฃŒ ํŒจํ„ด์— ๋Œ€ํ•ด์„œ๋„ ์ž˜ ๋ถ„๋ฅ˜ํ•จ์„ ์•Œ ์ˆ˜ ์žˆ์—ˆ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ์ด๋Ÿฌํ•œ ์—ฐ๊ตฌ๋“ค๋กœ๋ถ€ํ„ฐ ์ง„๋ฃŒ ๋‚ด์—ญ์„ ํ™œ์šฉํ•˜์˜€์„ ๋•Œ, ์ง„๋ฃŒ๋‚ด์—ญ, ์ฒญ๊ตฌ์„œ, ์˜๋ฃŒ ์„œ๋น„์Šค ์ œ๊ณต์ž ๋“ฑ ๋‹ค์–‘ํ•œ ๋ ˆ๋ฒจ์—์„œ์˜ ๋ถ€๋‹น ์ฒญ๊ตฌ๋ฅผ ํƒ์ง€ํ•  ์ˆ˜ ์žˆ์Œ์„ ํ™•์ธํ•˜์˜€๋‹ค.Chapter 1 Introduction 1 Chapter 2 Detection of Abusive Providers by department with Neural Network 9 2.1 Background 9 2.2 Literature Review 12 2.2.1 Abnormality Detection in Healthcare Insurance with Datamining Technique 12 2.2.2 Feed-Forward Neural Network 17 2.3 Proposed Method 21 2.3.1 Calculating the Likelihood of Abuse for each Treatment with Deep Neural Network 22 2.3.2 Calculating the Abuse Score of the Provider 25 2.4 Experiments 26 2.4.1 Data Description 27 2.4.2 Experimental Settings 32 2.4.3 Evaluation Measure (1): Relative Efficiency 33 2.4.4 Evaluation Measure (2): Precision at k 37 2.5 Results 38 2.5.1 Results in the test set 38 2.5.2 The Relationship among the Claimed Amount, the Abused Amount and the Abuse Score 40 2.5.3 The Relationship between the Performance of the Treatment Scoring Model and Review Efficiency 41 2.5.4 Treatment Scoring Model Results 42 2.5.5 Post-deployment Performance 44 2.6 Summary 45 Chapter 3 Detection of overtreatment by Diagnosis-related Group with Neural Network 48 3.1 Background 48 3.2 Literature review 51 3.2.1 Seasonality in disease 51 3.2.2 Diagnosis related group 52 3.3 Proposed method 54 3.3.1 Training a deep neural network model for treatment classi fication 55 3.3.2 Comparing the Performance of DRG-based Model against the department-based Model 57 3.4 Experiments 60 3.4.1 Data Description and Preprocessing 60 3.4.2 Performance Measures 64 3.4.3 Experimental Settings 65 3.5 Results 65 3.5.1 Overtreatment Detection 65 3.5.2 Abnormal Claim Detection 67 3.6 Summary 68 Chapter 4 Detection of overtreatment with graph embedding of disease-treatment pair 70 4.1 Background 70 4.2 Literature review 72 4.2.1 Graph embedding methods 73 4.2.2 Application of graph embedding methods to biomedical data analysis 79 4.2.3 Medical concept embedding methods 87 4.3 Proposed method 88 4.3.1 Network construction 89 4.3.2 Link Prediction between the Disease and the Treatment 90 4.3.3 Overtreatment Detection 93 4.4 Experiments 96 4.4.1 Data Description 97 4.4.2 Experimental Settings 99 4.5 Results 102 4.5.1 Network Construction 102 4.5.2 Link Prediction between the Disease and the Treatment 104 4.5.3 Overtreatment Detection 105 4.6 Summary 106 Chapter 5 Conclusion 108 5.1 Contribution 108 5.2 Future Work 110 Bibliography 112 ๊ตญ๋ฌธ์ดˆ๋ก 129Docto

    Bioinformatics

    Get PDF
    This book is divided into different research areas relevant in Bioinformatics such as biological networks, next generation sequencing, high performance computing, molecular modeling, structural bioinformatics, molecular modeling and intelligent data analysis. Each book section introduces the basic concepts and then explains its application to problems of great relevance, so both novice and expert readers can benefit from the information and research works presented here

    Consensus clustering and functional interpretation of gene-expression data

    Get PDF
    Microarray analysis using clustering algorithms can suffer from lack of inter-method consistency in assigning related gene-expression profiles to clusters. Obtaining a consensus set of clusters from a number of clustering methods should improve confidence in gene-expression analysis. Here we introduce consensus clustering, which provides such an advantage. When coupled with a statistically based gene functional analysis, our method allowed the identification of novel genes regulated by NFฮบB and the unfolded protein response in certain B-cell lymphomas

    A Comparative Study of Pairwise Learning Methods based on Kernel Ridge Regression

    Full text link
    Many machine learning problems can be formulated as predicting labels for a pair of objects. Problems of that kind are often referred to as pairwise learning, dyadic prediction or network inference problems. During the last decade kernel methods have played a dominant role in pairwise learning. They still obtain a state-of-the-art predictive performance, but a theoretical analysis of their behavior has been underexplored in the machine learning literature. In this work we review and unify existing kernel-based algorithms that are commonly used in different pairwise learning settings, ranging from matrix filtering to zero-shot learning. To this end, we focus on closed-form efficient instantiations of Kronecker kernel ridge regression. We show that independent task kernel ridge regression, two-step kernel ridge regression and a linear matrix filter arise naturally as a special case of Kronecker kernel ridge regression, implying that all these methods implicitly minimize a squared loss. In addition, we analyze universality, consistency and spectral filtering properties. Our theoretical results provide valuable insights in assessing the advantages and limitations of existing pairwise learning methods.Comment: arXiv admin note: text overlap with arXiv:1606.0427
    • โ€ฆ
    corecore