대장암 진단 및 예후 예측를 위한 혈액내 종양DNA의 genome-wide 메틸화 및 fragmentomics 마커 발굴에 관한 연구

Abstract

학위논문(박사) -- 서울대학교대학원 : 융합과학기술대학원 분자의학 및 바이오제약학과, 2022.2. 김태유.Non-genetic signatures from liquid biopsy samples are emerging as feasible markers of cancer because plasma cell-free DNA (cfDNA) is representative of the patient's systemic state. Non-genetic signatures include cfDNA methylation, topology of cfDNA, and cfDNA fragmentomics. DNA methylation has somatic tissue specific patterns, and DNA fragment size is one of the most representative characteristics of cfDNA. In particular, cfDNA from the plasma of cancer patients, which contains circulating tumor DNA (ctDNA), can be representative of the status of both the primary tumor and minimal residual disease. For this reason, the tissue of origin (TOO) could be determined from ctDNA methylation patterns. Fragment size of ctDNA could also be a useful marker for cancer patients. However, studies on the comprehensive applications of non-genetic signatures for cancer diagnosis, monitoring, and predicted prognosis are still needed to define and validate the role of non-genetic markers in clinical practice. Here, I show 1) an accurate prediction model that was developed using a machine learning algorithm for the comprehensive analysis of multiple CpG sites. Although many DNA methylation markers have been reported, previously reported markers were based on a single marker and a western population. My prediction model includes 305 CpG sites and was built by a machine learning algorithm based on tissue samples from Korean colorectal cancer patients. The prediction model showed high performance not only in databases of pan-cancer tissue samples but also those based on plasma from cancer patients. In addition, the prognosis of colorectal cancer patients was accurately predicted with a subset of the 305 CpG sites. Next, I showed that 2) the fragmentation ratio of specific lengths of DNA could be a valuable prognostic marker for colorectal cancer patients. Many recent studies have shown ctDNA fragment size is shorter than that of cfDNA derived from healthy tissue and have attempted to apply this to cancer diagnosis; however, the data are limited, and the only application has been for cancer diagnosis. In order to fill this gap, cfDNA fragment size was analyzed using targeted deep sequencing from paired ends. I demonstrated that ctDNA fragment length was related to variant allele frequency, and the prognosis of colorectal cancer patients could be predicted by the fragmentation ratio at a specific sampling time in longitudinal samples. In summary, blood based non-genetic signatures are significantly associated with the status of colorectal cancer and can be used to predict patient prognosis.암을 진단하고 모니터링하고 예후를 예측하는 것에 있어서 액체생검은 매우 중요한 한가지 방법으로써 주목받고 있다. 특히나 새로운 마커로써 비유전적 시그니처 들은 더욱 대두되고 있다. 그러한 이유는 암환자의 혈액종양DNA는 다른 어떠한 마커보다 종합적으로 신체를 반영하고 있고, 원발암을 대표하는데 있어서 많은 정보를 갖는다 것에 있다. 이러한 혈액종양DNA는 유전적 마커뿐만 아니라, 비유전적 마커 즉, DNA 메틸레이션 or DNA 프래그먼트 크기 등 다양한 분자적 특성들을 반영한다. DNA 메틸레이션 은 조직에 대한 특이한 패턴을 갖고 있으며, DNA 프래그먼트 크기에 대한 특이성은 무세포핵산 자체의 특징 중 하나고, 이를 활용하려는 노력들이 많아지고 있다. 이러한 특성을 포괄적으로 활용하기 위하여, 통합적인 분석이 필요하고 새로운 마커의 발굴이 필요하다. 본 논문에서는 1) 기존에 DNA 메틸레이션 은 많이 보고 되어있지만, 단일마커 그리고 서양인들 중심으로 보고가 되어왔다. 하지만, 메틸레이션 패턴은 인종간의 차이도 어느정도 있고, 조직의 특이성을 반영하기 위해서는 단일마커보다는 다양한 마커를 활용하여 예측력을 높이는 것이 중요하다. 따라서 나는 709개의 한국인 대장암 조직을 이용하여 얻은 메틸레이션 데이터를 이용하여 머신러닝 기반 305개 마커를 활용하는 진단 예측 모델을 구축하였다. 구축한 모델은 조직 데이터뿐 만아니라 혈장 무세포핵산 메틸레이션 데이터에서도 또한 높은 예측력을 보였으며, 마커의 서브셋을 이용한 예후 예측도 또한 가능하였다. 다음으로 2) 무세포핵산의 프래그먼트 크기는 무세포핵산 만이 갖는 분자적 특성이다. 최근에 암환자에서 유래한 무세포핵산의 크기는 체성변이에서 특이적으로 사이즈 차이가 난다는 점을 이용하는 연구들이 주되었다. 유전체 전체를 이용하여 암 특이적 진단 마커를 발굴하는 내용 그리고 패널 시퀀싱을 이용하여 특정 변이들에서 크기의 차이를 이용하여 변이의 검출확률을 높이는 방법등이 대표적인 예이다. 하지만 진단 이외의 활용측면에서는 아직 연구할 부분이 많다. 이러한 간극을 매꾸기 위하여 혈액종양DNA의 프래그먼트 크기 분석을 진행하였다. 우리는 paired end 시퀀싱 기반의 패널 시퀀싱 데이터를 활용하여 핵산 분자의 실제 크기를 계산하였고, 이러한 크기가 원발암 유래에 의함이라는 것을 데이터상으로 증명했다. 나아가, 한환자로부터 유래한 다양한 치료 전/후 대장암 혈액 샘플에서 특정 시점에서 크기를 활용한 마커가 예후 예측에 통계적으로 유의미한 파워를 갖는 것을 확인하였다.TABLE OF CONTENTS ABSTRACT i TABLE OF CONTENTS iv LIST OF TABLES AND FIGURES v I. Use of an optimized machine learning algorithm to discover DNA methylation markers from Korean colorectal cancer patients 1 Abstract 2 Introduction 4 Experimental Design 6 Results 11 Discussion 35 II. Combined analysis of ctDNA mutation and fragment size for predicting prognosis of colorectal cancer 38 Abstract 39 Introduction 41 Experimental Design 43 Results 48 Discussion 64 III. CONCLUSION 66 REFERENCES 68 ABSTRACT IN KOREAN 76 LIST OF TABLES AND FIGURES I. Use of an optimized machine learning algorithm to discover DNA methylation markers from Korean colorectal cancer patients TABLE 1. Clinicopathological information of the COPM cohort. 12 FIGURE 1. In silico simulation for setting the optimal number of DMRs. 14 FIGURE 2. Pipeline for building the prediction model and discovering cancer-specific markers. 15 FIGURE 3. Statistical differences according to tissue type. 16 FIGURE 4. Statistical differences according to tissue type. 17 FIGURE 5. Prediction model performance using 305 DNA methylation markers for cancer diagnosis. 18 FIGURE 6. tSNE analysis with CpG methylation level. 20 FIGURE 7. Permutation test for error rate of TOO (n = 1,000) 22 FIGURE 8. The PCA (A, C) and tSNE (B, D) analyses were performed for data and sample types. 23 FIGURE 9. Prediction model performance using intersected 76 DNA methylation markers for cancer diagnosis. 24 FIGURE 10. Re-constructed prediction model performance for other cancer and sample types. 25 FIGURE 11. Chromatin status correlated with the probe set (ChromHMM). 27 FIGURE 12. Pathway analysis using various databases through Metascape. 28 FIGURE 13. Correlation between methylation level and gene expression. 29 FIGURE 14. The risk score using the subset of 305 probe set as prognostic marker. 30 FIGURE 15. Risk score using the total of 305 probe sets as prognostic markers. 31 FIGURE 16. The association risk score with cancer patient age. 32 FIGURE 17. The association risk score with cancer patient sex. 33 FIGURE 18. The association risk score with cancer stage. 34 II. Combined analysis of ctDNA mutation and fragment size for predicting prognosis of colorectal cancer FIGURE 1. DNA fragment size calculations. 47 Table 1. Clinicopathological information of the prospective patient cohort. 49 FIGURE 2. Distribution curve of cfDNA fragment size in patients with colorectal cancer (n=62) and in healthy controls (n=50). 51 FIGURE 3. Distribution curve of cfDNA fragments by mutation type. 52 FIGURE 4. Distribution curve of the VAF of somatic mutations detected in plasma cfDNA. 55 FIGURE 5. The association between clonality and ctDNA fragment size. 56 FIGURE 6. Correlation between the maximum VAF and ctDNA fragment size. 57 FIGURE 7. Distribution curves for ctDNA fragments from patients with more than 10% somatic mutations detected in plasma (n=33). 58 FIGURE 8. Calculation of PFS according to the RECIST 1.1. guideline. 60 FIGURE 9. ROC analysis for calculating the optimal cutoff values used to classify patients into the responder and non-responder groups. 61 FIGURE 10. Survival plot for each sampling time point and variables. 62 FIGURE 11. Clinical response monitoring using the fragmentation ratio (AUCp1 / AUCp2). 63박

    Similar works