52 research outputs found
Systematic dissection of biases in whole-exome and whole-genome sequencing reveals major determinants of coding sequence coverage
Advantages and diagnostic effectiveness of the two most widely used resequencing approaches, whole exome (WES) and whole genome (WGS) sequencing, are often debated. WES dominated large-scale resequencing projects because of lower cost and easier data storage and processing. Rapid development of 3(rd) generation sequencing methods and novel exome sequencing kits predicate the need for a robust statistical framework allowing informative and easy performance comparison of the emerging methods. In our study we developed a set of statistical tools to systematically assess coverage of coding regions provided by several modern WES platforms, as well as PCR-free WGS. We identified a substantial problem in most previously published comparisons which did not account for mappability limitations of short reads. Using regression analysis and simple machine learning, as well as several novel metrics of coverage evenness, we analyzed the contribution from the major determinants of CDS coverage. Contrary to a common view, most of the observed bias in modern WES stems from mappability limitations of short reads and exome probe design rather than sequence composition. We also identified the similar to 500kb region of human exome that could not be effectively characterized using short read technology and should receive special attention during variant analysis. Using our novel metrics of sequencing coverage, we identified main determinants of WES and WGS performance. Overall, our study points out avenues for improvement of enrichment-based methods and development of novel approaches that would maximize variant discovery at optimal cost
Identification of Novel Candidate Markers of Type 2 Diabetes and Obesity in Russia by Exome Sequencing with a Limited Sample Size
Type 2 diabetes (T2D) and obesity are common chronic disorders with multifactorial etiology. In our study, we performed an exome sequencing analysis of 110 patients of Russian ethnicity together with a multi-perspective approach based on biologically meaningful filtering criteria to detect novel candidate variants and loci for T2D and obesity. We have identified several known single nucleotide polymorphisms (SNPs) as markers for obesity (rs11960429), T2D (rs9379084, rs1126930), and body mass index (BMI) (rs11553746, rs1956549 and rs7195386) (p < 0.05). We show that a method based on scoring of case-specific variants together with selection of protein-altering variants can allow for the interrogation of novel and known candidate markers of T2D and obesity in small samples. Using this method, we identified rs328 in LPL (p = 0.023), rs11863726 in HBQ1 (p = 8 × 10−5), rs112984085 in VAV3 (p = 4.8 × 10−4) for T2D and obesity, rs6271 in DBH (p = 0.043), rs62618693 in QSER1 (p = 0.021), rs61758785 in RAD51B (p = 1.7 × 10−4), rs34042554 in PCDHA1 (p = 1 × 10−4), and rs144183813 in PLEKHA5 (p = 1.7 × 10−4) for obesity; and rs9379084 in RREB1 (p = 0.042), rs2233984 in C6orf15 (p = 0.030), rs61737764 in ITGB6 (p = 0.035), rs17801742 in COL2A1 (p = 8.5 × 10−5), and rs685523 in ADAMTS13 (p = 1 × 10−6) for T2D as important susceptibility loci in Russian population. Our results demonstrate the effectiveness of whole exome sequencing (WES) technologies for searching for novel markers of multifactorial diseases in cohorts of limited size in poorly studied populations
Analytical “Bake-Off” of Whole Genome Sequencing Quality for the Genome Russia Project Using a Small Cohort for Autoimmune Hepatitis
A comparative analysis of whole genome sequencing (WGS) and genotype calling was initiated for ten human genome samples sequenced by St. Petersburg State University Peterhof Sequencing Center and by three commercial sequencing centers outside of Russia. The sequence quality, efficiency of DNA variant and genotype calling were compared with each other and with DNA microarrays for each of ten study subjects. We assessed calling of SNPs, indels, copy number variation, and the speed of WGS throughput promised. Twenty separate QC analyses showed high similarities among the sequence quality and called genotypes. The ten genomes tested by the centers included eight American patients afflicted with autoimmune hepatitis (AIH), plus one case’s unaffected parents, in a prelude to discovering genetic influences in this rare disease of unknown etiology. The detailed internal replication and parallel analyses allowed the observation of two of eight AIH cases carrying a rare allele genotype for a previously described AIH-associated gene (FTCD), plus multiple occurrences of known HLA-DRB1 alleles associated with AIH (HLA-DRB1-03:01:01, 13:01:01 and 7:01:01). We also list putative SNVs in other genes as suggestive in AIH influence
Cytogenomic Profile of Uterine Leiomyoma: In Vivo vs. In Vitro Comparison
We performed a comparative cytogenomic analysis of cultured and uncultured uterine leiomyoma (UL) samples. The experimental approach included karyotyping, aCGH, verification of the detected chromosomal abnormalities by metaphase and interphase FISH, MED12 mutation analysis and telomere measurement by Q-FISH. An abnormal karyotype was detected in 12 out of 32 cultured UL samples. In five karyotypically abnormal ULs, MED12 mutations were found. The chromosomal abnormalities in ULs were present mostly by complex rearrangements, including chromothripsis. In both karyotypically normal and abnormal ULs, telomeres were ~40% shorter than in the corresponding myometrium, being possibly prerequisite to chromosomal rearrangements. The uncultured samples of six karyotypically abnormal ULs were checked for the detected chromosomal abnormalities through interphase FISH with individually designed DNA probe sets. All chromosomal abnormalities detected in cultured ULs were found in corresponding uncultured samples. In all tumors, clonal spectra were present by the karyotypically abnormal cell clone/clones which coexisted with karyotypically normal ones, suggesting that chromosomal abnormalities acted as drivers, rather than triggers, of the neoplastic process. In vitro propagation did not cause any changes in the spectrum of the cell clones, but altered their ratio compared to uncultured sample. The alterations were unique for every UL. Compared to its uncultured counterpart, the frequency of chromosomally abnormal cells in the cultured sample was higher in some ULs and lower in others. To summarize, ULs are characterized by both inter- and intratumor genetic heterogeneity. Regardless of its MED12 status, a tumor may be comprised of clones with and without chromosomal abnormalities. In contrast to the clonal spectrum, which is unique and constant for each UL, the clonal frequency demonstrates up or down shifts under in vitro conditions, most probably determined by the unequal ability of cells with different genetic aberrations to exist outside the body
Genome-wide sequence analyses of ethnic populations across Russia
The Russian Federation is the largest and one of the most ethnically diverse countries in the world, however no centralized reference database of genetic variation exists to date. Such data are crucial for medical genetics and essential for studying population history. The Genome Russia Project aims at filling this gap by performing whole genome sequencing and analysis of peoples of the Russian Federation. Here we report the characterization of genome-wide variation of 264 healthy adults, including 60 newly sequenced samples. People of Russia carry known and novel genetic variants of adaptive, clinical and functional consequence that in many cases show allele frequency divergence from neighboring populations. Population genetics analyses revealed six phylogeographic partitions among indigenous ethnicities corresponding to their geographic locales. This study presents a characterization of population-specific genomic variation in Russia with results important for medical genetics and for understanding the dynamic population history of the world's largest country
- …