1,455 research outputs found

    ์ข… ๋ฐ ๋ณต์ œ์ˆ˜ ๋ณ€์ด ๊ฒ€์ถœ์„ ์œ„ํ•œ ์ฐจ์„ธ๋Œ€ ์—ผ๊ธฐ์„œ์—ด ๊ธฐ์ˆ  ๊ธฐ๋ฐ˜ ์ƒˆ๋กœ์šด ์ ‘๊ทผ๋ฒ•์˜ ์ ์šฉ

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ(๋ฐ•์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต๋Œ€ํ•™์› : ๋†์—…์ƒ๋ช…๊ณผํ•™๋Œ€ํ•™ ๋†์ƒ๋ช…๊ณตํ•™๋ถ€(๋ฐ”์ด์˜ค๋ชจ๋“ˆ๋ ˆ์ด์…˜์ „๊ณต), 2022. 8. ๊น€ํฌ๋ฐœ.Next-generation sequencing (NGS) technologies have contributed to a diverse range of biological research areas: NGS-based studies have revealed previously unknown host-microbe interaction or have enabled effective genomic selection by discovering genetic variants that cause phenotypic changes during domestication. The accumulation of knowledge and new insights derived from NGS was possible because various approaches have been developed and applied according to the specific purpose for solving each biological problem. Novel approaches include targeted sequencing that produces only genomic regions of interest economically and the development of new algorithms that can efficiently analyze genomic big data. This doctoral dissertation, which consists of three studies, focuses on these novel approaches for NGS data analysis. The first study focuses on the development of a novel pipeline related to food quality and safety management. The second study focuses on the development of a database and an analysis tool for the activation of the use of novel marker in bacterial metabarcoding analysis with long-reads. The last study, although not a novel approach, focuses on the identification of copy number variation (CNV) in domesticated chicken, which is a relatively less studied genetic variant compared to a single nucleotide variant. Specifically, in chapter 1, background knowledge and research trends of metagenome analysis and identification of CNV are summarized. Chapter 2 describes the probiotic species detection pipeline using NGS data with the breadth of coverage. For the accuracy of determining the presence and absence of probiotic species in the product, a reference data set was established by selecting a representative strain for each species, and a threshold value for the breadth of coverage was defined. Regardless of the sequencing platform, the pipeline accurately detected the probiotic species contained in the product. Also, it was confirmed that the false-positive case was controlled completely, which was the problem in other read classification-based methods. Chapter 3 describes the construction of the 16S-ITS-23S rRNA operon database and tool for species-level bacterial community analysis. The advent of long-read sequencing platforms made it possible to use long markers for metabarcoding. In the bacterial community analysis, a taxonomic resolution was considerably improved up to the species-level by using 16S-ITS-23S rRNA operon sequences (~4300 bp), which has about 10 times longer than the previously used partial 16S rRNA sequences (~400 bp). However, curated databases and appropriate tools for rRNA operon analysis are still lacking. Therefore, to activate the 16S-ITS-23S rRNA operon sequence analysis, all bacterial genomes were collected from the National Center for Biotechnology Information (NCBI) and curated for the construction of the database. A user-friendly mapping-based analysis tool was also developed. Analysis of various mock and simulated samples using the database and tool showed promising results at the species level. In chapter 4, breed-specific CNV was identified in three chicken breeds: Rhode Island Red, Cornish, and White Leghorn. Red Jungle Fowl was used as a control group to explore CNV only found in domesticated breeds. The depth of coverage was used to identify CNV. And CNV regions were obtained for comparison between breeds. Based on CNVR, Cornish was closer to Rhode Island Red than White Leghorn. And functional annotation of domesticated CNVR revealed that mainly enriched terms involved in immune regulation, metabolism, and organ development. This dissertation presented that novel approaches to NGS data can yield a variety of biological insights. I expect that the analysis methods and databases which were constructed in this dissertation will contribute to various studies.์œ ์ „ํ•™, ๋ฏธ์ƒ๋ฌผํ•™, ์˜ํ•™ ๋“ฑ์—์„œ ๋„๋ฆฌ ํ™œ์šฉ๋˜๊ณ  ์žˆ๋Š” ์ฐจ์„ธ๋Œ€ ์—ผ๊ธฐ์„œ์—ด ๋ถ„์„ ๊ธฐ์ˆ ์€ ์ „์— ์•Œ์ง€ ๋ชปํ–ˆ๋˜ ์ˆ™์ฃผ-๋ฏธ์ƒ๋ฌผ ์ƒํ˜ธ ์ž‘์šฉ์— ๋Œ€ํ•œ ์ดํ•ด๋ฅผ ๋•๊ณ , ๊ฐ€์ถ•ํ™” ์œ ์ „์ž ํƒ์ƒ‰์„ ํ†ตํ•ด ์œ ์ „์ฒด ์„ ๋ฐœ์„ ์œ„ํ•œ ์ •๋ณด๋ฅผ ์ œ๊ณตํ•ด์ฃผ๋Š” ๋“ฑ ๋‹ค์–‘ํ•œ ๋ถ„์•ผ์—์„œ ์ง€์‹์„ ๋ˆ„์ ์‹œํ‚ค๊ณ  ์ƒˆ๋กœ์šด ์ˆ˜์ค€์˜ ํ†ต์ฐฐ๋ ฅ์„ ์ œ๊ณตํ•˜๊ณ  ์žˆ๋‹ค. ์ด๋Ÿฌํ•œ ์ง€์‹์˜ ํ™•์žฅ์€ ์—ผ๊ธฐ์„œ์—ด ๋ถ„์„ ๊ธฐ์ˆ ์˜ ์ถœํ˜„๊ณผ ๋”๋ถˆ์–ด ๊ฐ ๋ถ„์•ผ์—์„œ ๋ชฉ์ ์— ๋งž๊ฒŒ ๋‹ค์–‘ํ•œ ์ ‘๊ทผ ๋ฐฉ๋ฒ•์„ ๊ฐœ๋ฐœํ•˜๊ณ  ์‘์šฉํ•˜์˜€๊ธฐ ๋•Œ๋ฌธ์— ๊ฐ€๋Šฅํ•˜์˜€๋‹ค. ์ ‘๊ทผ ๋ฐฉ๋ฒ•์˜ ๋‹ค๊ฐํ™”๋Š” ๋‹ค์–‘ํ•œ ์ธก๋ฉด์—์„œ ์ด๋ฃจ์–ด์ง€๋Š”๋ฐ ๋ฐ์ดํ„ฐ ์ƒ์‚ฐ๋‹จ๊ณ„์—์„œ ํ‘œ์  ์‹œํ€€์‹ฑ๊ณผ ๊ฐ™์ด ์—ผ๊ธฐ์„œ์—ด ์ฝ๋Š” ๋ถ€์œ„๋ฅผ ์ œํ•œํ•˜์—ฌ ๋ณด๋‹ค ๊ฒฝ์ œ์ ์œผ๋กœ ํ•„์š”ํ•œ ์ •๋ณด๋งŒ์„ ์ƒ์‚ฐํ•˜๊ฑฐ๋‚˜, ๋ฐ์ดํ„ฐ ๋ถ„์„๋‹จ๊ณ„์—์„œ ํšจ์œจ์  ๋ถ„์„์„ ์œ„ํ•ด ์ƒˆ๋กœ์šด ์•Œ๊ณ ๋ฆฌ์ฆ˜์ด ํƒ‘์žฌ๋œ ์†Œํ”„ํŠธ์›จ์–ด๋ฅผ ๊ฐœ๋ฐœํ•˜๋Š” ๋“ฑ์ด ๊ทธ ์˜ˆ์ด๋‹ค. ๋ณธ ๋…ผ๋ฌธ์€ ์ด๋Ÿฌํ•œ ์ ‘๊ทผ ๋ฐฉ๋ฒ•์˜ ๋‹ค๊ฐํ™”์— ์ดˆ์ ์„ ๋งž์ถ”์—ˆ์œผ๋ฉฐ โ€˜์ฐจ์„ธ๋Œ€ ์—ผ๊ธฐ์„œ์—ด์„ ์ด์šฉํ•œ ๊ฒ€์ถœโ€™์ด๋ผ๋Š” ์ฃผ์ œํ•˜์— ์„ธ ๊ฐ€์ง€์˜ ์—ฐ๊ตฌ๋กœ ์ด๋ฃจ์–ด์ ธ ์žˆ๋‹ค. ์ฒซ ๋ฒˆ์งธ ์—ฐ๊ตฌ๋Š” ์‹ํ’ˆ์˜ ํ’ˆ์งˆ๊ด€๋ฆฌ์™€ ๊ด€๋ จ๋œ ์ƒˆ๋กœ์šด ์—ผ๊ธฐ์„œ์—ด ๋ถ„์„ ๋ฐฉ๋ฒ• ๊ฐœ๋ฐœ์— ๊ด€ํ•œ ๊ฒƒ์ด๋‹ค. ๋‘ ๋ฒˆ์งธ ์—ฐ๊ตฌ๋Š” ์„ธ๊ท  ๋ฉ”ํƒ€๋ฐ”์ฝ”๋”ฉ ๋ถ„์„์—์„œ ์ƒˆ๋กœ์šด ๋งˆ์ปค ์œ ์ „์ž ์‚ฌ์šฉ์˜ ํ™œ์„ฑํ™”๋ฅผ ์œ„ํ•œ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค ๊ตฌ์ถ• ๋ฐ ๋ถ„์„ ๋ฐฉ๋ฒ• ๊ฐœ๋ฐœ์— ๊ด€ํ•œ ๊ฒƒ์ด๋‹ค. ๋งˆ์ง€๋ง‰์œผ๋กœ ์„ธ ๋ฒˆ์งธ ์—ฐ๊ตฌ๋Š” ๊ธฐ์กด์— ๊ฐœ๋ฐœ๋œ ๋ณต์ œ์ˆ˜ ๋ณ€์ด ๊ฒ€์ถœ ๋ฐฉ๋ฒ•์„ ํ†ตํ•ด ์ž˜ ์•Œ๋ ค์ง€์ง€ ์•Š์€ ๊ฐ€์ถ•ํ™” ๋œ ๋‹ญ์˜ ๋ณต์ œ์ˆ˜ ๋ณ€์ด ๋ฐœ๊ตด์— ๊ด€ํ•œ ๊ฒƒ์ด๋‹ค. ๊ตฌ์ฒด์ ์œผ๋กœ ์ด 4์žฅ์œผ๋กœ ์ด๋ฃจ์–ด์ง„ ๋ณธ ๋…ผ๋ฌธ์˜ ์ œ 1์žฅ์—์„œ๋Š” ๋ฉ”ํƒ€๊ฒŒ๋†ˆ๋ถ„์„, ๋ณต์ œ์ˆ˜ ๋ณ€์ด ๊ฒ€์ถœ ๋ฒ• ๋“ฑ ๋ณธ ๋…ผ๋ฌธ์˜ ๊ธฐ๋ณธ ๋ฐฐ๊ฒฝ์ง€์‹ ๋ฐ ์—ฐ๊ตฌ๋™ํ–ฅ์„ ์ •๋ฆฌํ•˜์˜€๋‹ค. ์ œ 2์žฅ์—์„œ๋Š” ์ฐธ์กฐ์„œ์—ด์— ์„œ์—ด ๋ฆฌ๋“œ๋ฅผ ๋งตํ•‘ ํ•  ๋•Œ ์ƒ๊ธฐ๋Š” ์ปค๋ฒ„๋ฆฌ์ง€ ํญ์„ ์ด์šฉํ•˜์—ฌ ํ”„๋กœ๋ฐ”์ด์˜คํ‹ฑ์Šค ์ข… ๊ฒ€์ถœ ํŒŒ์ดํ”„๋ผ์ธ์„ ๊ตฌ์ถ•ํ•˜์˜€๋‹ค. ์ œํ’ˆ ๋‚ด ํ”„๋กœ๋ฐ”์ด์˜คํ‹ฑ์Šค ์ข…์˜ ํ•จ์œ  ์œ ๋ฌด๋ฅผ ๋ณด๋‹ค ์ •ํ™•ํ•˜๊ฒŒ ํŒ๋‹จํ•˜๊ธฐ ์œ„ํ•˜์—ฌ ์ข…๋งˆ๋‹ค ๋Œ€ํ‘œ ๊ท ์ฃผ๋ฅผ ์„ ์ •ํ•˜์—ฌ ์ฐธ์กฐ ๋ฐ์ดํ„ฐ์…‹์„ ๊ตฌ์ถ•ํ•˜์˜€๊ณ , ์ƒˆ๋กœ์šด ๋ฏธ์ƒ๋ฌผ ๋™์ • ๊ธฐ์ค€์ธ ์ปค๋ฒ„๋ฆฌ์ง€ ํญ์— ๋Œ€ํ•œ ์ž„๊ณ„๊ฐ’์„ ์ œ์‹œ, ๊ฒ€์ถœ์— ์‚ฌ์šฉํ•˜์˜€๋‹ค. ๊ทธ ๊ฒฐ๊ณผ ์‹œํ€€์‹ฑ ํ”Œ๋žซํผ๊ณผ ์ƒ๊ด€์—†์ด ์ œํ’ˆ ๋‚ด ํ•จ์œ ๋œ ์ข…์„ ์ •ํ™•ํžˆ ๊ฒ€์ถœํ•˜์˜€๊ณ  ํŠนํžˆ, ๊ธฐ์กด ์„œ์—ด ๋ฆฌ๋“œ ๋ถ„๋ฅ˜ ๊ธฐ๋ฐ˜ ํŒŒ์ดํ”„๋ผ์ธ์—์„œ ๋ฌธ์ œ๊ฐ€ ๋œ ์œ„์–‘์„ฑ ์ผ€์ด์Šค๊ฐ€ ์™„๋ฒฝํ•˜๊ฒŒ ์ œ์–ด๋จ์„ ํ™•์ธํ•˜์˜€๋‹ค. ์ œ 3์žฅ์—์„œ๋Š” ์ข… (species) ์ˆ˜์ค€์˜ ์„ธ๊ท  ๊ตฐ์ง‘ ๋ถ„์„์„ ์œ„ํ•œ 16S-ITS-23S rRNA ์˜คํŽ˜๋ก  ์„œ์—ด ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค ๊ตฌ์ถ• ๋ฐ ์ด๋ฅผ ํ™œ์šฉํ•  ์ˆ˜ ์žˆ๋Š” ์†Œํ”„ํŠธ์›จ์–ด ๊ฐœ๋ฐœ์— ๊ด€ํ•˜์—ฌ ๊ธฐ์ˆ ํ•˜์˜€๋‹ค. ์ˆ˜ ์ฒœ์—์„œ ์ˆ˜ ์‹ญ๋งŒ ์ด์ƒ์˜ ๊ธธ์ด๋ฅผ ํ•œ ๋ฒˆ์— ์ฝ์„ ์ˆ˜ ์žˆ๋Š” 3์„ธ๋Œ€ ์‹œํ€€์„œ์˜ ์ถœํ˜„์œผ๋กœ ๋ฉ”ํƒ€์ง€๋…ธ๋ฏน์Šค ๋ถ„์•ผ์—์„œ ๊ธฐ์กด ์„ธ๊ท  ๊ตฐ์ง‘ ๋ถ„์„์— ์‚ฌ์šฉ๋˜๋˜ 16S rRNA ์„œ์—ด์˜ ์ผ๋ถ€๊ฐ€ ์•„๋‹Œ ์•ฝ 10๋ฐฐ์˜ ์ •๋ณด๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ๋Š” 16S-ITS-23S rRNA ์˜คํŽ˜๋ก  ์„œ์—ด์˜ ์‚ฌ์šฉ์ด ๊ฐ€๋Šฅํ•ด์กŒ๊ณ  ์ข… ์ˆ˜์ค€๊นŒ์ง€ ๋ถ„๋ฅ˜ ํ•ด์ƒ๋„๊ฐ€ ํฌ๊ฒŒ ํ–ฅ์ƒ๋˜์—ˆ๋‹ค. ๋ณธ ์—ฐ๊ตฌ์—์„œ๋Š” ๋†’์€ ๋ถ„๋ฅ˜ํ•™์  ํ•ด์ƒ๋„๋ฅผ ๋ณด์ด๋Š” 16S-ITS-23S rRNA ์˜คํŽ˜๋ก  ์„œ์—ด ๋ถ„์„์˜ ํ™œ์„ฑํ™”๋ฅผ ์œ„ํ•ด ๋ฏธ๊ตญ ๊ตญ๋ฆฝ์ƒ๋ฌผ๊ณตํ•™์ •๋ณด์„ผํ„ฐ์—์„œ ๋ชจ๋“  ์„ธ๊ท  ์œ ์ „์ฒด๋ฅผ ๋ชจ์•„ ํ๋ ˆ์ด์…˜์„ ๊ฑฐ์ณ 16S-ITS-23S rRNA ์˜คํŽ˜๋ก  ์„œ์—ด ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค๋ฅผ ๊ตฌ์ถ•ํ•˜์˜€๊ณ  ์‚ฌ์šฉ์ž ์นœํ™”์ ์ธ ๋งตํ•‘ ๊ธฐ๋ฐ˜์˜ ์ž๋™ํ™” ์†Œํ”„ํŠธ์›จ์–ด๋ฅผ ๊ฐœ๋ฐœํ•˜์˜€๋‹ค. ๋˜ํ•œ ์ด๋ฅผ ํ™œ์šฉํ•œ ๋‹ค์–‘ํ•œ ๋ชจ์˜ ์ƒ˜ํ”Œ ๋ถ„์„ ๊ฒฐ๊ณผ ์ข… ์ˆ˜์ค€์œผ๋กœ ๋งค์šฐ ์ •ํ™•ํ•˜๊ฒŒ ๋ถ„๋ฅ˜ํ•จ์„ ํ™•์ธํ•˜์˜€๋‹ค. ์ œ 4์žฅ์—์„œ๋Š” ๋กœ๋“œ์•„์ผ๋žœ๋“œ๋ ˆ๋“œ, ์ฝ”๋‹ˆ์‹œ, ํ™”์ดํŠธ ๋ ˆ๊ทธํ˜ผ ์„ธ ๋‹ญ ํ’ˆ์ข…์—์„œ ํ’ˆ์ข… ํŠน์ด์ ์ธ ๋ณต์ œ์ˆ˜ ๋ณ€์ด๋ฅผ ๋ฐœ๊ตดํ•˜์˜€๋‹ค. ์ฐธ์กฐ์„œ์—ด์— ์„œ์—ด ๋ฆฌ๋“œ๋ฅผ ๋งตํ•‘ ํ•  ๋•Œ ์ƒ๊ธฐ๋Š” ์ปค๋ฒ„๋ฆฌ์ง€ ๊นŠ์ด๋ฅผ ๋ณต์ œ์ˆ˜ ๋ณ€์ด๋ฅผ ๋ฐœ๊ตด์— ์‚ฌ์šฉํ•˜์˜€์œผ๋ฉฐ, ์•ผ์ƒ์ข…์ธ ์ ์ƒ‰์•ผ๊ณ„์˜ ๋ฐ์ดํ„ฐ๋ฅผ ๋Œ€์กฐ๊ตฐ ์‚ผ์•„ ๊ฐ€์ถ•ํ™”๋œ ๋‹ญ์—์„œ๋งŒ ๋ฐœ๊ฒฌ๋˜๋Š” ๋ณต์ œ์ˆ˜ ๋ณ€์ด๋ฅผ ์ •๋ฆฌํ•˜์˜€๋‹ค. ๋ณต์ œ์ˆ˜ ๋ณ€์ด๋ฅผ ๊ธฐ์ค€์œผ๋กœ ์ฝ”๋‹ˆ์‹œ์™€ ๋กœ๋“œ์•„์ผ๋žœ๋“œ๋ ˆ๋“œ๊ฐ€ ํ™”์ดํŠธ ๋ ˆ๊ทธํ˜ผ ์ข… ๋ณด๋‹ค ์„œ๋กœ ์ข€ ๋” ๊ฐ€๊นŒ์›€์„ ๋ฐํ˜”์œผ๋ฉฐ, ๊ธฐ๊ด€ ๋ฐœ๋‹ฌ, ๋ฉด์—ญ์กฐ์ ˆ, ๋Œ€์‚ฌ์™€ ๊ด€๋ จ๋œ ์œ ์ „์ž ๋ถ€๊ทผ์— ๋ณต์ œ์ˆ˜ ๋ณ€์ด๊ฐ€ ๋งŽ์ด ๋ฐœ๊ตด๋จ์„ ํ™•์ธํ•˜์˜€๋‹ค. ๋ณธ ๋…ผ๋ฌธ์€ ์ƒˆ๋กœ์šด ์ ‘๊ทผ๋ฐฉ์‹์œผ๋กœ ์—ผ๊ธฐ์„œ์—ด ๋ฐ์ดํ„ฐ๋ฅผ ๋ถ„์„ํ•˜์—ฌ ๋‹ค์–‘ํ•œ ์ƒ๋ฌผํ•™์  ํ†ต์ฐฐ๋ ฅ์„ ์–ป์„ ์ˆ˜ ์žˆ์Œ์„ ๋ณด์—ฌ์ฃผ์—ˆ์œผ๋ฉฐ, ๊ตฌ์ถ•๋œ ๋ถ„์„ ๋ฐฉ๋ฒ• ๋ฐ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค๋Š” ์—ฌ๋Ÿฌ ์—ฐ๊ตฌ์ž๋“ค์—๊ฒŒ ๋„์›€์ด ๋  ๊ฒƒ์œผ๋กœ ๊ธฐ๋Œ€ํ•œ๋‹ค.CHAPTER 1. LITERATURE REVIEW 1 1.1 Metagenomics 2 1.2 Copy Number Variation (CNV) 10 CHAPTER 2. ACCURATE AND STRICT IDENTIFICATION OF PROBIOTIC SPECIES BASED ON COVERAGE OF WHOLE-METAGENOME SHOTGUN SEQUENCING DATA 13 2.1 Abstract 14 2.2 Introduction 15 2.3 Material and methods 19 2.4 Results 32 2.5 Discussion 66 CHAPTER 3. MICROBIAL IDENTIFICATION USING RRNA OPERON REGION: DATABASE AND TOOL FOR METATAXONOMICS WITH LONG-READ SEQUENCE 72 3.1 Abstract 73 3.2 Introduction 75 3.3 Materials and Methods 79 3.4 Results 89 3.5 Discussion 126 CHAPTER 4. IDENTIFICATION OF COPY NUMBER VARIATION IN DOMESTIC CHICKEN USING WHOLE-GENOME SEQUENCING REVEALS EVIDENCE OF SELECTION IN THE GENOME 131 4.1 Abstract 132 4.2 Introduction 135 4.3 Materials and Methods 139 4.4 Results 144 4.5 Discussion 181 REFERENCES 187 ๊ตญ๋ฌธ์ดˆ๋ก 223๋ฐ•

    PRINCE: Accurate approximation of the copy number of tandem repeats

    Get PDF
    Variable-Number Tandem Repeats (VNTR) are genomic regions where a short sequence of DNA is repeated with no space in between repeats. While a fixed set of VNTRs is typically identified for a given species, the copy number at each VNTR varies between individuals within a species. Although VNTRs are found in both prokaryotic and eukaryotic genomes, the methodology called multi-locus VNTR analysis (MLVA) is widely used to distinguish different strains of bacteria, as well as cluster strains that might be epidemiologically related and investigate evolutionary rates. We propose PRINCE (Processing Reads to Infer the Number of Copies via Estimation), an algorithm that is able to accurately estimate the copy number of a VNTR given the sequence of a single repeat unit and a set of short reads from a whole-genome sequence (WGS) experiment. This is a challenging problem, especially in the cases when the repeat region is longer than the expected read length. Our proposed method computes a statistical approximation of the local coverage inside the repeat region. This approximation is then mapped to the copy number using a linear function whose parameters are fitted to simulated data. We test PRINCE on the genomes of three datasets of Mycobacterium tuberculosis strains and show that it is more than twice as accurate as a previous method. An implementation of PRINCE in the Python language is freely available at https://github.com/WGS-TB/PythonPRINCE

    Detection of copy number variations in rice using array-based comparative genomic hybridization

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Copy number variations (CNVs) can create new genes, change gene dosage, reshape gene structures, and modify elements regulating gene expression. As with all types of genetic variation, CNVs may influence phenotypic variation and gene expression. CNVs are thus considered major sources of genetic variation. Little is known, however, about their contribution to genetic variation in rice.</p> <p>Results</p> <p>To detect CNVs, we used a set of NimbleGen whole-genome comparative genomic hybridization arrays containing 718,256 oligonucleotide probes with a median probe spacing of 500 bp. We compiled a high-resolution map of CNVs in the rice genome, showing 641 CNVs between the genomes of the rice cultivars 'Nipponbare' (from <it>O. sativa </it>ssp. <it>japonica</it>) and 'Guang-lu-ai 4' (from <it>O. sativa </it>ssp. <it>indica</it>). The CNVs identified vary in size from 1.1 kb to 180.7 kb, and encompass approximately 7.6 Mb of the rice genome. The largest regions showing copy gain and loss are of 37.4 kb on chromosome 4, and 180.7 kb on chromosome 8. In addition, 85 DNA segments were identified, including some genic sequences. Contracted genes greatly outnumbered duplicated ones. Many of the contracted genes corresponded to either the same genes or genes involved in the same biological processes; this was also the case for genes involved in disease and defense.</p> <p>Conclusion</p> <p>We detected CNVs in rice by array-based comparative genomic hybridization. These CNVs contain known genes. Further discussion of CNVs is important, as they are linked to variation among rice varieties, and are likely to contribute to subspecific characteristics.</p

    The plastic genome of Bordetella pertussis

    Get PDF

    wuHMM: a robust algorithm to detect DNA copy number variation using long oligonucleotide microarray data

    Get PDF
    Copy number variants (CNVs) are currently defined as genomic sequences that are polymorphic in copy number and range in length from 1000 to several million base pairs. Among current array-based CNV detection platforms, long-oligonucleotide arrays promise the highest resolution. However, the performance of currently available analytical tools suffers when applied to these data because of the lower signal:noise ratio inherent in oligonucleotide-based hybridization assays. We have developed wuHMM, an algorithm for mapping CNVs from array comparative genomic hybridization (aCGH) platforms comprised of 385 000 to more than 3 million probes. wuHMM is unique in that it can utilize sequence divergence information to reduce the false positive rate (FPR). We apply wuHMM to 385K-aCGH, 2.1M-aCGH and 3.1M-aCGH experiments comparing the 129X1/SvJ and C57BL/6J inbred mouse genomes. We assess wuHMM's performance on the 385K platform by comparison to the higher resolution platforms and we independently validate 10 CNVs. The method requires no training data and is robust with respect to changes in algorithm parameters. At a FPR of <10%, the algorithm can detect CNVs with five probes on the 385K platform and three on the 2.1M and 3.1M platforms, resulting in effective resolutions of 24 kb, 2โ€“5 kb and 1 kb, respectively

    Genome dynamics of the human embryonic kidney 293 lineage in response to cell biology manipulations

    Get PDF
    The HEK293 human cell lineage is widely used in cell biology and biotechnology. Here we use whole-genome resequencing of six 293 cell lines to study the dynamics of this aneuploid genome in response to the manipulations used to generate common 293 cell derivatives, such as transformation and stable clone generation (293T); suspension growth adaptation (293S); and cytotoxic lectin selection (293SG). Remarkably, we observe that copy number alteration detection could identify the genomic region that enabled cell survival under selective conditions (i.c. ricin selection). Furthermore, we present methods to detect human/vector genome breakpoints and a user-friendly visualization tool for the 293 genome data. We also establish that the genome structure composition is in steady state for most of these cell lines when standard cell culturing conditions are used. This resource enables novel and more informed studies with 293 cells, and we will distribute the sequenced cell lines to this effect

    Arvutuslikud meetodid DNA koopiaarvu mรครคramiseks

    Get PDF
    Vรคitekirja elektrooniline versioon ei sisalda publikatsioone.DNA koopiaarvu variantideks vรตi muutusteks nimetatakse selliseid erinevusi inimeste geneetilises materjalis, mille puhul mingi DNA lรตigu koopiaarv on erinev oodatavast koopiaarvust kaks (รผks koopia mingit kindlat DNA jรคrjestust emalt pรคritud kromosoomil ja รผks koopia isalt pรคritud kromosoomil). DNA koopiate vรคhenemist nimetatakse deletsiooniks ning vastavaid DNA variante nimetatakse deletsioonideks. DNA koopiate juurdetulemist nimetatakse duplitseerumiseks ning selliseid kahest suurema koopiaarvuga variante vastavalt duplikatsioonideks. Antud doktoritรถรถs uuriti inimese DNA koopiaarvu variante, nende seotust erinevate haigustega ja nende tekkimise ja pรคrandumise eripรคrasid. Kasutades DNA mikrokiipe ehk geenikiipe uuriti esmalt kas ja millised DNA koopiaarvu muutused vรตivad olla seotud vaimse arengu mahajรครคmusega (VAM-ga). Uurides perekondasid, kus รผhel vรตi mitmel liikmel oli diagnoositud VAM, leiti mitmeid juba varem VAM-ga seostatud DNA koopiaarvu muutusi ning lisaks leiti ka mitmeid uusi DNA koopiaarvu variante, mille esinemine vรตib olla seotud VAM-e vรคljakujunemisega. Sarnane uuring viidi lรคbi ka korduva spontaanse raseduse katkemise probleemiga paaride ja naiste puhul. Vรตrreldes nende patsientide gruppi kuuluvate naiste DNA koopiaarvu muutusi ning nende sagedusi terveid emasid sisaldavate kontroll-grupi indiviidide omadega, leiti statistiliselt ja bioloogiliselt oluline erinevus muutunud koopiaarvuga DNA lรตigus, mis sisaldab PDZD2 ja GOLPH3 geene ja kus esinevate duplikatsioonide โ€žomamineโ€œ suurendas naistel mรคrkimisvรครคrselt spontaanse raseduse katkemise ohtu. Doktoritรถรถ viimases osas uuriti Tartu รœlikooli Eesti Geenivaramu ja rahvusvahelise HapMap projekti poolt kogutud tรตsiste haigusteta inimestel esinevaid DNA koopiaarvu muutusi ja nende pรคrandumist perekondades. Selle uuringu รผheks huvitavamaks tulemuseks oli deletsioonide alapรคrandumine vanematelt lastele ehk deletsioone kandvaid DNA regioone esines laste genoomides oluliselt vรคhem, kui normaalse Mendeliaalse (juhusliku) pรคrandumise korral oleks oodata vรตinud. Uurides duplikatsioonide regioone perekondades leiti aga, et kaks kolmandikku duplikatsioonides esinevatest DNA koopiatest ei olnud identsed (รผksteise tรคpsed koopiad), vaid mรตnevรตrra erinevad, demonstreerides seniajani teadmata olnud alleelse varieeruvuse mรครคra DNA duplikatsioonide regioonides.DNA copy number variation is a type of genetic variation in which case the number of copies of a particular region of a chromosome is altered from its normal state. In the non-repetitive portion of the human genome, the normal haploid copy number is one โ€“ one copy of each sequence per chromosome. Accordingly, the normal diploid copy number in humans is two โ€“ one copy inherited from both parents. A copy number variant (CNV) can result from either a loss of copies (most often called a deletion) or gain of copies (called a duplication or amplification). In this thesis we studied DNA copy number variation in human โ€“ how CNVs emerge and how they are inherited from parents to offspring. We also analysed CNVs in the context of few different diseases. By using DNA microarrays we first aimed to determine if CNVs are associated with mental retardation (MR). For this we studied not only index cases with MR but larger nuclear families, where we discovered several already MR-associated CNVs and also a few novel CNV regions that are possibly associated with predisposition to MR. Similar study was conducted in couples and females suffering from recurrent miscarriage. By comparing CNVs and their frequencies in the latter group to these of healthy mothers, we discovered a multi-copy duplication at 5p13.3 that disrupts PDZD2 and GOLPH3 genes and significantly increases maternal risk for pregnancy complications. In the last part of this thesis we studied how CNVs are inherited in Estonian nuclear families (22 trios and 12 families with multiple siblings) and in HapMap Yoruban trios. We determined that deletion-carrying chromosomal regions were observed in the offspring slightly less frequently than expected by random Mendelian inheritance. By analysing duplication-carrying chromosomal regions in these families, we discovered that in two-thirds of such regions the duplicated copies of the underlying DNA sequence were not exactly identical but somewhat different, allowing us to define alternative allelic copies within these copy number gain-carrying chromosomal regions and demonstrating extensive and to-date unmeasured allelic variability in multi-copy CNV regions of the human genome

    Detection of copy number variation from array intensity and sequencing read depth using a stepwise Bayesian model

    Get PDF
    Abstract Background Copy number variants (CNVs) have been demonstrated to occur at a high frequency and are now widely believed to make a significant contribution to the phenotypic variation in human populations. Array-based comparative genomic hybridization (array-CGH) and newly developed read-depth approach through ultrahigh throughput genomic sequencing both provide rapid, robust, and comprehensive methods to identify CNVs on a whole-genome scale. Results We developed a Bayesian statistical analysis algorithm for the detection of CNVs from both types of genomic data. The algorithm can analyze such data obtained from PCR-based bacterial artificial chromosome arrays, high-density oligonucleotide arrays, and more recently developed high-throughput DNA sequencing. Treating parameters--e.g., the number of CNVs, the position of each CNV, and the data noise level--that define the underlying data generating process as random variables, our approach derives the posterior distribution of the genomic CNV structure given the observed data. Sampling from the posterior distribution using a Markov chain Monte Carlo method, we get not only best estimates for these unknown parameters but also Bayesian credible intervals for the estimates. We illustrate the characteristics of our algorithm by applying it to both synthetic and experimental data sets in comparison to other segmentation algorithms. Conclusions In particular, the synthetic data comparison shows that our method is more sensitive than other approaches at low false positive rates. Furthermore, given its Bayesian origin, our method can also be seen as a technique to refine CNVs identified by fast point-estimate methods and also as a framework to integrate array-CGH and sequencing data with other CNV-related biological knowledge, all through informative priors.</p
    • โ€ฆ
    corecore