2,058 research outputs found

    Pipeline design to identify key features and classify the chemotherapy response on lung cancer patients using large-scale genetic data

    Get PDF
    Background: During the last decade, the interest to apply machine learning algorithms to genomic data has increased in many bioinformatics applications. Analyzing this type of data entails difficulties for managing high-dimensional data, class imbalance for knowledge extraction, identifying important features and classifying individuals. In this study, we propose a general framework to tackle these challenges with different machine learning algorithms and techniques. We apply the configuration of this framework on lung cancer patients, identifying genetic signatures for classifying response to drug treatment response. We intersect these relevant SNPs with the GWAS Catalog of the National Human Genome Research Institute and explore the Regulomedb, GTEx databases for functional analysis purposes. Results: The machine learning based solution proposed in this study is a scalable and flexible alternative to the classical uni-variate regression approach to analyze large-scale data. From 36 experiments executed using the machine learning framework design, we obtain good classification performance from the top 5 models with the highest cross-validation score and the smallest standard deviation. One thousand two hundred twenty four SNPs corresponding to the key features from the top 20 models (cross validation F1 mean >= 0.65) were compared with the GWAS Catalog finding no intersection with genome-wide significant reported hits. From these, new genetic signatures in MAE, CEP104, PRKCZ and ADRB2 show relevant biological regulatory functionality related to lung physiology. Conclusions: We have defined a machine learning framework using data with an unbalanced large data-set of SNP-arrays and imputed genotyping data from a pharmacogenomics study in lung cancer patients subjected to first-line platinum-based treatment. This approach found genome signals with no genome-wide significance in the uni-variate regression approach (GWAS Catalog) that are valuable for classifying patients, only few of them with related biological function. The effect results of these variants can be explained by the recently proposed omnigenic model hypothesis, which states that complex traits can be influenced mostly by genes outside not only by the โ€œcore genesโ€, mainly found by the genome-wide significant SNPs, but also by the rest of genes outside of the โ€œcore pathwaysโ€ with apparent unrelated biological functionality.Peer ReviewedPostprint (published version

    Deep Learning Solutions for Lung Cancer Characterization in Histopathological Images

    Get PDF
    Cancer is one of the leading death causes in the world, specifically, lung cancer. According to theWorld Health Organization (WHO), at the end of 2020, around 2.2 million people were diagnosedwith lung cancer, and 1.8 million fatalities resulted from it. Correctly identifying it's presence in apatient and classifying it's sub-type and stage is fundamental for the adoption of appropriate targettherapies. One of the gold standards used to identify and classify cancer is the microscopic visual in-spection of histopathological imagesi.e.small tissue samples excised from a patient. Expertpathologists are responsible for this inspection, however, it requires a significant amount of timeand sometimes leads to non-consensual results . With the growth of computational power and data availability, modern Artificial Intelligencesolutions can be developed to automate and speed up this process. Deep Neural Networks us-ing histopathological images as an input currently embody the state-of-the-art in automated lungcancer diagnostic solutions, with Deep Convolutional Neural Networks achieving the most com-pelling acuracies in tissue type classification. One of the main reasons for such results is theincreasing availability of voluminous amounts of data, acquired through the efforts employed byextensive projects like The Cancer Genome Atlas. Nonetheless, histopathological images remain weakly labelled/annotated, as most commonpathologist annotations refer to the entirety of the image and not to individual regions of interestin the patient's tissue sample. Recent works have demonstrated Multiple Instance Learning as asuccessful approach in classification tasks entangled with this lack of annotation, by representingimages as a bag of instances where a single label is available for the whole bag. Thus, we propose a bag/embedding-level lung tissue type and sub-type classifier using a Con-volutional Neural Network in a Multiple Instance Learning approach, where the automated inspec-tion of lung histopathological images determines the presence of cancer, and it's possible sub-type,in a given patient. Furthermore, we employ a post-model interpretability algorithm to validate ourmodel's predictions and highlight the regions of interest for such predictions

    Bioinformatics in translational drug discovery

    Get PDF
    Bioinformatics approaches are becoming ever more essential in translational drug discovery both in academia and within the pharmaceutical industry. Computational exploitation of the increasing volumes of data generated during all phases of drug discovery is enabling key challenges of the process to be addressed. Here, we highlight some of the areas in which bioinformatics resources and methods are being developed to support the drug discovery pipeline. These include the creation of large data warehouses, bioinformatics algorithms to analyse โ€˜big dataโ€™ that identify novel drug targets and/or biomarkers, programs to assess the tractability of targets, and prediction of repositioning opportunities that use licensed drugs to treat additional indications

    Clinical application of a cancer genomic profiling assay to guide precision medicine decisions

    Get PDF
    AIM: Develop and apply a comprehensive and accurate next-generation sequencing based assay to help clinicians to match oncology patients to therapies. MATERIALS and METHODS: The performance of the CANCERPLEX(R) assay was assessed using DNA from well-characterized routine clinical formalin-fixed paraffin-embedded (FFPE) specimens and cell lines. RESULTS: The maximum sensitivity of the assay is 99.5% and its accuracy is virtually 100% for detecting somatic alterations with an allele fraction of as low as 10%. Clinically actionable variants were identified in 93% of patients (930 of 1000) who underwent testing. CONCLUSION: The test\u27s capacity to determine all of the critical genetic changes, tumor mutation burden, microsatellite instability status and viral associations has important ramifications on clinical decision support strategies, including identification of patients who are likely to benefit from immune checkpoint blockage therapies

    Texture Analysis Platform for Imaging Biomarker Research

    Get PDF
    abstract: The rate of progress in improving survival of patients with solid tumors is slow due to late stage diagnosis and poor tumor characterization processes that fail to effectively reflect the nature of tumor before treatment or the subsequent change in its dynamics because of treatment. Further advancement of targeted therapies relies on advancements in biomarker research. In the context of solid tumors, bio-specimen samples such as biopsies serve as the main source of biomarkers used in the treatment and monitoring of cancer, even though biopsy samples are susceptible to sampling error and more importantly, are local and offer a narrow temporal scope. Because of its established role in cancer care and its non-invasive nature imaging offers the potential to complement the findings of cancer biology. Over the past decade, a compelling body of literature has emerged suggesting a more pivotal role for imaging in the diagnosis, prognosis, and monitoring of diseases. These advances have facilitated the rise of an emerging practice known as Radiomics: the extraction and analysis of large numbers of quantitative features from medical images to improve disease characterization and prediction of outcome. It has been suggested that radiomics can contribute to biomarker discovery by detecting imaging traits that are complementary or interchangeable with other markers. This thesis seeks further advancement of imaging biomarker discovery. This research unfolds over two aims: I) developing a comprehensive methodological pipeline for converting diagnostic imaging data into mineable sources of information, and II) investigating the utility of imaging data in clinical diagnostic applications. Four validation studies were conducted using the radiomics pipeline developed in aim I. These studies had the following goals: (1 distinguishing between benign and malignant head and neck lesions (2) differentiating benign and malignant breast cancers, (3) predicting the status of Human Papillomavirus in head and neck cancers, and (4) predicting neuropsychological performances as they relate to Alzheimerโ€™s disease progression. The long-term objective of this thesis is to improve patient outcome and survival by facilitating incorporation of routine care imaging data into decision making processes.Dissertation/ThesisDoctoral Dissertation Biomedical Informatics 201

    Interpretation of Mutations, Expression, Copy Number in Somatic Breast Cancer: Implications for Metastasis and Chemotherapy

    Get PDF
    Breast cancer (BC) patient management has been transformed over the last two decades due to the development and application of genome-wide technologies. The vast amounts of data generated by these assays, however, create new challenges for accurate and comprehensive analysis and interpretation. This thesis describes novel methods for fluorescence in-situ hybridization (FISH), array comparative genomic hybridization (aCGH), and next generation DNA- and RNA-sequencing, to improve upon current approaches used for these technologies. An ab initio algorithm was implemented to identify genomic intervals of single copy and highly divergent repetitive sequences that were applied to FISH and aCGH probe design. FISH probes with higher resolution than commercially available reagents were developed and validated on metaphase chromosomes. An aCGH microarray was developed that had improved reproducibility compared to the standard Agilent 44K array, which was achieved by placing oligonucleotide probes distant from conserved repetitive sequences. Splicing mutations are currently underrepresented in genome-wide sequencing analyses, and there are limited methods to validate genome-wide mutation predictions. This thesis describes Veridical, a program developed to statistically validate aberrant splicing caused by a predicted mutation. Splicing mutation analysis was performed on a large subset of BC patients previously analyzed by the Cancer Genome Atlas. This analysis revealed an elevated number of splicing mutations in genes involved in NCAM pathways in basal-like and HER2-enriched lymph node positive tumours. Genome-wide technologies were leveraged further to develop chemosensitivity models that predict BC response to paclitaxel and gemcitabine. A type of machine learning, called support vector machines (SVM), was used to create predictive models from small sets of biologically-relevant genes to drug disposition or resistance. SVM models generated were able to predict sensitivity in two groups of independent patient data. High variability between individuals requires more accurate and higher resolution genomic data. However the data themselves are insufficient; also needed are more insightful analytical methods to fully exploit these data. This dissertation presents both improvements in data quality and accuracy as well as analytical procedures, with the aim of detecting and interpreting critical genomic abnormalities that are hallmarks of BC subtypes, metastasis and therapy response

    Tracking Cancer Evolution through the Disease Course.

    Get PDF
    During cancer evolution, constituent tumor cells compete under dynamic selection pressures. Phenotypic variation can be observed as intratumor heterogeneity, which is propagated by genome instability leading to mutations, somatic copy-number alterations, and epigenomic changes. TRACERx was set up in 2014 to observe the relationship between intratumor heterogeneity and patient outcome. By integrating multiregion sequencing of primary tumors with longitudinal sampling of a prospectively recruited patient cohort, cancer evolution can be tracked from early- to late-stage disease and through therapy. Here we review some of the key features of the studies and look to the future of the field. SIGNIFICANCE: Cancers evolve and adapt to environmental challenges such as immune surveillance and treatment pressures. The TRACERx studies track cancer evolution in a clinical setting, through primary disease to recurrence. Through multiregion and longitudinal sampling, evolutionary processes have been detailed in the tumor and the immune microenvironment in non-small cell lung cancer and clear-cell renal cell carcinoma. TRACERx has revealed the potential therapeutic utility of targeting clonal neoantigens and ctDNA detection in the adjuvant setting as a minimal residual disease detection tool primed for translation into clinical trials

    INTEGRATIVE ANALYSIS OF OMICS DATA IN ADULT GLIOMA AND OTHER TCGA CANCERS TO GUIDE PRECISION MEDICINE

    Get PDF
    Transcriptomic profiling and gene expression signatures have been widely applied as effective approaches for enhancing the molecular classification, diagnosis, prognosis or prediction of therapeutic response towards personalized therapy for cancer patients. Thanks to modern genome-wide profiling technology, scientists are able to build engines leveraging massive genomic variations and integrating with clinical data to identify โ€œat riskโ€ individuals for the sake of prevention, diagnosis and therapeutic interventions. In my graduate work for my Ph.D. thesis, I have investigated genomic sequencing data mining to comprehensively characterise molecular classifications and aberrant genomic events associated with clinical prognosis and treatment response, through applying high-dimensional omics genomic data to promote the understanding of gene signatures and somatic molecular alterations contributing to cancer progression and clinical outcomes. Following this motivation, my dissertation has been focused on the following three topics in translational genomics. 1) Characterization of transcriptomic plasticity and its association with the tumor microenvironment in glioblastoma (GBM). I have integrated transcriptomic, genomic, protein and clinical data to increase the accuracy of GBM classification, and identify the association between the GBM mesenchymal subtype and reduced tumorpurity, accompanied with increased presence of tumor-associated microglia. Then I have tackled the sole source of microglial as intrinsic tumor bulk but not their corresponding neurosphere cells through both transcriptional and protein level analysis using a panel of sphere-forming glioma cultures and their parent GBM samples.FurthermoreI have demonstrated my hypothesis through longitudinal analysis of paired primary and recurrent GBM samples that the phenotypic alterations of GBM subtypes are not due to intrinsic proneural-to-mesenchymal transition in tumor cells, rather it is intertwined with increased level of microglia upon disease recurrence. Collectively I have elucidated the critical role of tumor microenvironment (Microglia and macrophages from central nervous system) contributing to the intra-tumor heterogeneity and accurate classification of GBM patients based on transcriptomic profiling, which will not only significantly impact on clinical perspective but also pave the way for preclinical cancer research. 2) Identification of prognostic gene signatures that stratify adult diffuse glioma patientsharboring1p/19q co-deletions. I have compared multiple statistical methods and derived a gene signature significantly associated with survival by applying a machine learning algorithm. Then I have identified inflammatory response and acetylation activity that associated with malignant progression of 1p/19q co-deleted glioma. In addition, I showed this signature translates to other types of adult diffuse glioma, suggesting its universality in the pathobiology of other subset gliomas. My efforts on integrative data analysis of this highly curated data set usingoptimizedstatistical models will reflect the pending update to WHO classification system oftumorsin the central nervous system (CNS). 3) Comprehensive characterization of somatic fusion transcripts in Pan-Cancers. I have identified a panel of novel fusion transcripts across all of TCGA cancer types through transcriptomic profiling. Then I have predicted fusion proteins with kinase activity and hub function of pathway network based on the annotation of genetically mobile domains and functional domain architectures. I have evaluated a panel of in -frame gene fusions as potential driver mutations based on network fusion centrality hypothesis. I have also characterised the emerging complexity of genetic architecture in fusion transcripts through integrating genomic structure and somatic variants and delineating the distinct genomic patterns of fusion events across different cancer types. Overall my exploration of the pathogenetic impact and clinical relevance of candidate gene fusions have provided fundamental insights into the management of a subset of cancer patients by predicting the oncogenic signalling and specific drug targets encoded by these fusion genes. Taken together, the translational genomic research I have conducted during my Ph.D. study will shed new light on precision medicine and contribute to the cancer research community. The novel classification concept, gene signature and fusion transcripts I have identified will address several hotly debated issues in translational genomics, such as complex interactions between tumor bulks and their adjacent microenvironments, prognostic markers for clinical diagnostics and personalized therapy, distinct patterns of genomic structure alterations and oncogenic events in different cancer types, therefore facilitating our understanding of genomic alterations and moving us towards the development of precision medicine

    ๋Œ€์žฅ์•” ์ง„๋‹จ ๋ฐ ์˜ˆํ›„ ์˜ˆ์ธก๋ฅผ ์œ„ํ•œ ํ˜ˆ์•ก๋‚ด ์ข…์–‘DNA์˜ genome-wide ๋ฉ”ํ‹ธํ™” ๋ฐ fragmentomics ๋งˆ์ปค ๋ฐœ๊ตด์— ๊ด€ํ•œ ์—ฐ๊ตฌ

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ(๋ฐ•์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต๋Œ€ํ•™์› : ์œตํ•ฉ๊ณผํ•™๊ธฐ์ˆ ๋Œ€ํ•™์› ๋ถ„์ž์˜ํ•™ ๋ฐ ๋ฐ”์ด์˜ค์ œ์•ฝํ•™๊ณผ, 2022.2. ๊น€ํƒœ์œ .Non-genetic signatures from liquid biopsy samples are emerging as feasible markers of cancer because plasma cell-free DNA (cfDNA) is representative of the patient's systemic state. Non-genetic signatures include cfDNA methylation, topology of cfDNA, and cfDNA fragmentomics. DNA methylation has somatic tissue specific patterns, and DNA fragment size is one of the most representative characteristics of cfDNA. In particular, cfDNA from the plasma of cancer patients, which contains circulating tumor DNA (ctDNA), can be representative of the status of both the primary tumor and minimal residual disease. For this reason, the tissue of origin (TOO) could be determined from ctDNA methylation patterns. Fragment size of ctDNA could also be a useful marker for cancer patients. However, studies on the comprehensive applications of non-genetic signatures for cancer diagnosis, monitoring, and predicted prognosis are still needed to define and validate the role of non-genetic markers in clinical practice. Here, I show 1) an accurate prediction model that was developed using a machine learning algorithm for the comprehensive analysis of multiple CpG sites. Although many DNA methylation markers have been reported, previously reported markers were based on a single marker and a western population. My prediction model includes 305 CpG sites and was built by a machine learning algorithm based on tissue samples from Korean colorectal cancer patients. The prediction model showed high performance not only in databases of pan-cancer tissue samples but also those based on plasma from cancer patients. In addition, the prognosis of colorectal cancer patients was accurately predicted with a subset of the 305 CpG sites. Next, I showed that 2) the fragmentation ratio of specific lengths of DNA could be a valuable prognostic marker for colorectal cancer patients. Many recent studies have shown ctDNA fragment size is shorter than that of cfDNA derived from healthy tissue and have attempted to apply this to cancer diagnosis; however, the data are limited, and the only application has been for cancer diagnosis. In order to fill this gap, cfDNA fragment size was analyzed using targeted deep sequencing from paired ends. I demonstrated that ctDNA fragment length was related to variant allele frequency, and the prognosis of colorectal cancer patients could be predicted by the fragmentation ratio at a specific sampling time in longitudinal samples. In summary, blood based non-genetic signatures are significantly associated with the status of colorectal cancer and can be used to predict patient prognosis.์•”์„ ์ง„๋‹จํ•˜๊ณ  ๋ชจ๋‹ˆํ„ฐ๋งํ•˜๊ณ  ์˜ˆํ›„๋ฅผ ์˜ˆ์ธกํ•˜๋Š” ๊ฒƒ์— ์žˆ์–ด์„œ ์•ก์ฒด์ƒ๊ฒ€์€ ๋งค์šฐ ์ค‘์š”ํ•œ ํ•œ๊ฐ€์ง€ ๋ฐฉ๋ฒ•์œผ๋กœ์จ ์ฃผ๋ชฉ๋ฐ›๊ณ  ์žˆ๋‹ค. ํŠนํžˆ๋‚˜ ์ƒˆ๋กœ์šด ๋งˆ์ปค๋กœ์จ ๋น„์œ ์ „์  ์‹œ๊ทธ๋‹ˆ์ฒ˜ ๋“ค์€ ๋”์šฑ ๋Œ€๋‘๋˜๊ณ  ์žˆ๋‹ค. ๊ทธ๋Ÿฌํ•œ ์ด์œ ๋Š” ์•”ํ™˜์ž์˜ ํ˜ˆ์•ก์ข…์–‘DNA๋Š” ๋‹ค๋ฅธ ์–ด๋– ํ•œ ๋งˆ์ปค๋ณด๋‹ค ์ข…ํ•ฉ์ ์œผ๋กœ ์‹ ์ฒด๋ฅผ ๋ฐ˜์˜ํ•˜๊ณ  ์žˆ๊ณ , ์›๋ฐœ์•”์„ ๋Œ€ํ‘œํ•˜๋Š”๋ฐ ์žˆ์–ด์„œ ๋งŽ์€ ์ •๋ณด๋ฅผ ๊ฐ–๋Š”๋‹ค ๊ฒƒ์— ์žˆ๋‹ค. ์ด๋Ÿฌํ•œ ํ˜ˆ์•ก์ข…์–‘DNA๋Š” ์œ ์ „์  ๋งˆ์ปค๋ฟ๋งŒ ์•„๋‹ˆ๋ผ, ๋น„์œ ์ „์  ๋งˆ์ปค ์ฆ‰, DNA ๋ฉ”ํ‹ธ๋ ˆ์ด์…˜ or DNA ํ”„๋ž˜๊ทธ๋จผํŠธ ํฌ๊ธฐ ๋“ฑ ๋‹ค์–‘ํ•œ ๋ถ„์ž์  ํŠน์„ฑ๋“ค์„ ๋ฐ˜์˜ํ•œ๋‹ค. DNA ๋ฉ”ํ‹ธ๋ ˆ์ด์…˜ ์€ ์กฐ์ง์— ๋Œ€ํ•œ ํŠน์ดํ•œ ํŒจํ„ด์„ ๊ฐ–๊ณ  ์žˆ์œผ๋ฉฐ, DNA ํ”„๋ž˜๊ทธ๋จผํŠธ ํฌ๊ธฐ์— ๋Œ€ํ•œ ํŠน์ด์„ฑ์€ ๋ฌด์„ธํฌํ•ต์‚ฐ ์ž์ฒด์˜ ํŠน์ง• ์ค‘ ํ•˜๋‚˜๊ณ , ์ด๋ฅผ ํ™œ์šฉํ•˜๋ ค๋Š” ๋…ธ๋ ฅ๋“ค์ด ๋งŽ์•„์ง€๊ณ  ์žˆ๋‹ค. ์ด๋Ÿฌํ•œ ํŠน์„ฑ์„ ํฌ๊ด„์ ์œผ๋กœ ํ™œ์šฉํ•˜๊ธฐ ์œ„ํ•˜์—ฌ, ํ†ตํ•ฉ์ ์ธ ๋ถ„์„์ด ํ•„์š”ํ•˜๊ณ  ์ƒˆ๋กœ์šด ๋งˆ์ปค์˜ ๋ฐœ๊ตด์ด ํ•„์š”ํ•˜๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” 1) ๊ธฐ์กด์— DNA ๋ฉ”ํ‹ธ๋ ˆ์ด์…˜ ์€ ๋งŽ์ด ๋ณด๊ณ  ๋˜์–ด์žˆ์ง€๋งŒ, ๋‹จ์ผ๋งˆ์ปค ๊ทธ๋ฆฌ๊ณ  ์„œ์–‘์ธ๋“ค ์ค‘์‹ฌ์œผ๋กœ ๋ณด๊ณ ๊ฐ€ ๋˜์–ด์™”๋‹ค. ํ•˜์ง€๋งŒ, ๋ฉ”ํ‹ธ๋ ˆ์ด์…˜ ํŒจํ„ด์€ ์ธ์ข…๊ฐ„์˜ ์ฐจ์ด๋„ ์–ด๋Š์ •๋„ ์žˆ๊ณ , ์กฐ์ง์˜ ํŠน์ด์„ฑ์„ ๋ฐ˜์˜ํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” ๋‹จ์ผ๋งˆ์ปค๋ณด๋‹ค๋Š” ๋‹ค์–‘ํ•œ ๋งˆ์ปค๋ฅผ ํ™œ์šฉํ•˜์—ฌ ์˜ˆ์ธก๋ ฅ์„ ๋†’์ด๋Š” ๊ฒƒ์ด ์ค‘์š”ํ•˜๋‹ค. ๋”ฐ๋ผ์„œ ๋‚˜๋Š” 709๊ฐœ์˜ ํ•œ๊ตญ์ธ ๋Œ€์žฅ์•” ์กฐ์ง์„ ์ด์šฉํ•˜์—ฌ ์–ป์€ ๋ฉ”ํ‹ธ๋ ˆ์ด์…˜ ๋ฐ์ดํ„ฐ๋ฅผ ์ด์šฉํ•˜์—ฌ ๋จธ์‹ ๋Ÿฌ๋‹ ๊ธฐ๋ฐ˜ 305๊ฐœ ๋งˆ์ปค๋ฅผ ํ™œ์šฉํ•˜๋Š” ์ง„๋‹จ ์˜ˆ์ธก ๋ชจ๋ธ์„ ๊ตฌ์ถ•ํ•˜์˜€๋‹ค. ๊ตฌ์ถ•ํ•œ ๋ชจ๋ธ์€ ์กฐ์ง ๋ฐ์ดํ„ฐ๋ฟ ๋งŒ์•„๋‹ˆ๋ผ ํ˜ˆ์žฅ ๋ฌด์„ธํฌํ•ต์‚ฐ ๋ฉ”ํ‹ธ๋ ˆ์ด์…˜ ๋ฐ์ดํ„ฐ์—์„œ๋„ ๋˜ํ•œ ๋†’์€ ์˜ˆ์ธก๋ ฅ์„ ๋ณด์˜€์œผ๋ฉฐ, ๋งˆ์ปค์˜ ์„œ๋ธŒ์…‹์„ ์ด์šฉํ•œ ์˜ˆํ›„ ์˜ˆ์ธก๋„ ๋˜ํ•œ ๊ฐ€๋Šฅํ•˜์˜€๋‹ค. ๋‹ค์Œ์œผ๋กœ 2) ๋ฌด์„ธํฌํ•ต์‚ฐ์˜ ํ”„๋ž˜๊ทธ๋จผํŠธ ํฌ๊ธฐ๋Š” ๋ฌด์„ธํฌํ•ต์‚ฐ ๋งŒ์ด ๊ฐ–๋Š” ๋ถ„์ž์  ํŠน์„ฑ์ด๋‹ค. ์ตœ๊ทผ์— ์•”ํ™˜์ž์—์„œ ์œ ๋ž˜ํ•œ ๋ฌด์„ธํฌํ•ต์‚ฐ์˜ ํฌ๊ธฐ๋Š” ์ฒด์„ฑ๋ณ€์ด์—์„œ ํŠน์ด์ ์œผ๋กœ ์‚ฌ์ด์ฆˆ ์ฐจ์ด๊ฐ€ ๋‚œ๋‹ค๋Š” ์ ์„ ์ด์šฉํ•˜๋Š” ์—ฐ๊ตฌ๋“ค์ด ์ฃผ๋˜์—ˆ๋‹ค. ์œ ์ „์ฒด ์ „์ฒด๋ฅผ ์ด์šฉํ•˜์—ฌ ์•” ํŠน์ด์  ์ง„๋‹จ ๋งˆ์ปค๋ฅผ ๋ฐœ๊ตดํ•˜๋Š” ๋‚ด์šฉ ๊ทธ๋ฆฌ๊ณ  ํŒจ๋„ ์‹œํ€€์‹ฑ์„ ์ด์šฉํ•˜์—ฌ ํŠน์ • ๋ณ€์ด๋“ค์—์„œ ํฌ๊ธฐ์˜ ์ฐจ์ด๋ฅผ ์ด์šฉํ•˜์—ฌ ๋ณ€์ด์˜ ๊ฒ€์ถœํ™•๋ฅ ์„ ๋†’์ด๋Š” ๋ฐฉ๋ฒ•๋“ฑ์ด ๋Œ€ํ‘œ์ ์ธ ์˜ˆ์ด๋‹ค. ํ•˜์ง€๋งŒ ์ง„๋‹จ ์ด์™ธ์˜ ํ™œ์šฉ์ธก๋ฉด์—์„œ๋Š” ์•„์ง ์—ฐ๊ตฌํ•  ๋ถ€๋ถ„์ด ๋งŽ๋‹ค. ์ด๋Ÿฌํ•œ ๊ฐ„๊ทน์„ ๋งค๊พธ๊ธฐ ์œ„ํ•˜์—ฌ ํ˜ˆ์•ก์ข…์–‘DNA์˜ ํ”„๋ž˜๊ทธ๋จผํŠธ ํฌ๊ธฐ ๋ถ„์„์„ ์ง„ํ–‰ํ•˜์˜€๋‹ค. ์šฐ๋ฆฌ๋Š” paired end ์‹œํ€€์‹ฑ ๊ธฐ๋ฐ˜์˜ ํŒจ๋„ ์‹œํ€€์‹ฑ ๋ฐ์ดํ„ฐ๋ฅผ ํ™œ์šฉํ•˜์—ฌ ํ•ต์‚ฐ ๋ถ„์ž์˜ ์‹ค์ œ ํฌ๊ธฐ๋ฅผ ๊ณ„์‚ฐํ•˜์˜€๊ณ , ์ด๋Ÿฌํ•œ ํฌ๊ธฐ๊ฐ€ ์›๋ฐœ์•” ์œ ๋ž˜์— ์˜ํ•จ์ด๋ผ๋Š” ๊ฒƒ์„ ๋ฐ์ดํ„ฐ์ƒ์œผ๋กœ ์ฆ๋ช…ํ–ˆ๋‹ค. ๋‚˜์•„๊ฐ€, ํ•œํ™˜์ž๋กœ๋ถ€ํ„ฐ ์œ ๋ž˜ํ•œ ๋‹ค์–‘ํ•œ ์น˜๋ฃŒ ์ „/ํ›„ ๋Œ€์žฅ์•” ํ˜ˆ์•ก ์ƒ˜ํ”Œ์—์„œ ํŠน์ • ์‹œ์ ์—์„œ ํฌ๊ธฐ๋ฅผ ํ™œ์šฉํ•œ ๋งˆ์ปค๊ฐ€ ์˜ˆํ›„ ์˜ˆ์ธก์— ํ†ต๊ณ„์ ์œผ๋กœ ์œ ์˜๋ฏธํ•œ ํŒŒ์›Œ๋ฅผ ๊ฐ–๋Š” ๊ฒƒ์„ ํ™•์ธํ•˜์˜€๋‹ค.TABLE OF CONTENTS ABSTRACT i TABLE OF CONTENTS iv LIST OF TABLES AND FIGURES v I. Use of an optimized machine learning algorithm to discover DNA methylation markers from Korean colorectal cancer patients 1 Abstract 2 Introduction 4 Experimental Design 6 Results 11 Discussion 35 II. Combined analysis of ctDNA mutation and fragment size for predicting prognosis of colorectal cancer 38 Abstract 39 Introduction 41 Experimental Design 43 Results 48 Discussion 64 III. CONCLUSION 66 REFERENCES 68 ABSTRACT IN KOREAN 76 LIST OF TABLES AND FIGURES I. Use of an optimized machine learning algorithm to discover DNA methylation markers from Korean colorectal cancer patients TABLE 1. Clinicopathological information of the COPM cohort. 12 FIGURE 1. In silico simulation for setting the optimal number of DMRs. 14 FIGURE 2. Pipeline for building the prediction model and discovering cancer-specific markers. 15 FIGURE 3. Statistical differences according to tissue type. 16 FIGURE 4. Statistical differences according to tissue type. 17 FIGURE 5. Prediction model performance using 305 DNA methylation markers for cancer diagnosis. 18 FIGURE 6. tSNE analysis with CpG methylation level. 20 FIGURE 7. Permutation test for error rate of TOO (n = 1,000) 22 FIGURE 8. The PCA (A, C) and tSNE (B, D) analyses were performed for data and sample types. 23 FIGURE 9. Prediction model performance using intersected 76 DNA methylation markers for cancer diagnosis. 24 FIGURE 10. Re-constructed prediction model performance for other cancer and sample types. 25 FIGURE 11. Chromatin status correlated with the probe set (ChromHMM). 27 FIGURE 12. Pathway analysis using various databases through Metascape. 28 FIGURE 13. Correlation between methylation level and gene expression. 29 FIGURE 14. The risk score using the subset of 305 probe set as prognostic marker. 30 FIGURE 15. Risk score using the total of 305 probe sets as prognostic markers. 31 FIGURE 16. The association risk score with cancer patient age. 32 FIGURE 17. The association risk score with cancer patient sex. 33 FIGURE 18. The association risk score with cancer stage. 34 II. Combined analysis of ctDNA mutation and fragment size for predicting prognosis of colorectal cancer FIGURE 1. DNA fragment size calculations. 47 Table 1. Clinicopathological information of the prospective patient cohort. 49 FIGURE 2. Distribution curve of cfDNA fragment size in patients with colorectal cancer (n=62) and in healthy controls (n=50). 51 FIGURE 3. Distribution curve of cfDNA fragments by mutation type. 52 FIGURE 4. Distribution curve of the VAF of somatic mutations detected in plasma cfDNA. 55 FIGURE 5. The association between clonality and ctDNA fragment size. 56 FIGURE 6. Correlation between the maximum VAF and ctDNA fragment size. 57 FIGURE 7. Distribution curves for ctDNA fragments from patients with more than 10% somatic mutations detected in plasma (n=33). 58 FIGURE 8. Calculation of PFS according to the RECIST 1.1. guideline. 60 FIGURE 9. ROC analysis for calculating the optimal cutoff values used to classify patients into the responder and non-responder groups. 61 FIGURE 10. Survival plot for each sampling time point and variables. 62 FIGURE 11. Clinical response monitoring using the fragmentation ratio (AUCp1 / AUCp2). 63๋ฐ•

    Artificial intelligence for imaging in immunotherapy

    Get PDF
    • โ€ฆ
    corecore