42 research outputs found
Systematic Inference of Copy-Number Genotypes from Personal Genome Sequencing Data Reveals Extensive Olfactory Receptor Gene Content Diversity
Copy-number variations (CNVs) are widespread in the human genome, but comprehensive assignments of integer locus copy-numbers (i.e., copy-number genotypes) that, for example, enable discrimination of homozygous from heterozygous CNVs, have remained challenging. Here we present CopySeq, a novel computational approach with an underlying statistical framework that analyzes the depth-of-coverage of high-throughput DNA sequencing reads, and can incorporate paired-end and breakpoint junction analysis based CNV-analysis approaches, to infer locus copy-number genotypes. We benchmarked CopySeq by genotyping 500 chromosome 1 CNV regions in 150 personal genomes sequenced at low-coverage. The assessed copy-number genotypes were highly concordant with our performed qPCR experiments (Pearson correlation coefficient 0.94), and with the published results of two microarray platforms (95–99% concordance). We further demonstrated the utility of CopySeq for analyzing gene regions enriched for segmental duplications by comprehensively inferring copy-number genotypes in the CNV-enriched >800 olfactory receptor (OR) human gene and pseudogene loci. CopySeq revealed that OR loci display an extensive range of locus copy-numbers across individuals, with zero to two copies in some OR loci, and two to nine copies in others. Among genetic variants affecting OR loci we identified deleterious variants including CNVs and SNPs affecting ∼15% and ∼20% of the human OR gene repertoire, respectively, implying that genetic variants with a possible impact on smell perception are widespread. Finally, we found that for several OR loci the reference genome appears to represent a minor-frequency variant, implying a necessary revision of the OR repertoire for future functional studies. CopySeq can ascertain genomic structural variation in specific gene families as well as at a genome-wide scale, where it may enable the quantitative evaluation of CNVs in genome-wide association studies involving high-throughput sequencing
Spectrum and prevalence of genetic predisposition in medulloblastoma: a retrospective genetic study and prospective validation in a clinical trial cohort.
BACKGROUND: Medulloblastoma is associated with rare hereditary cancer predisposition syndromes; however, consensus medulloblastoma predisposition genes have not been defined and screening guidelines for genetic counselling and testing for paediatric patients are not available. We aimed to assess and define these genes to provide evidence for future screening guidelines. METHODS: In this international, multicentre study, we analysed patients with medulloblastoma from retrospective cohorts (International Cancer Genome Consortium [ICGC] PedBrain, Medulloblastoma Advanced Genomics International Consortium [MAGIC], and the CEFALO series) and from prospective cohorts from four clinical studies (SJMB03, SJMB12, SJYC07, and I-HIT-MED). Whole-genome sequences and exome sequences from blood and tumour samples were analysed for rare damaging germline mutations in cancer predisposition genes. DNA methylation profiling was done to determine consensus molecular subgroups: WNT (MBWNT), SHH (MBSHH), group 3 (MBGroup3), and group 4 (MBGroup4). Medulloblastoma predisposition genes were predicted on the basis of rare variant burden tests against controls without a cancer diagnosis from the Exome Aggregation Consortium (ExAC). Previously defined somatic mutational signatures were used to further classify medulloblastoma genomes into two groups, a clock-like group (signatures 1 and 5) and a homologous recombination repair deficiency-like group (signatures 3 and 8), and chromothripsis was investigated using previously established criteria. Progression-free survival and overall survival were modelled for patients with a genetic predisposition to medulloblastoma. FINDINGS: We included a total of 1022 patients with medulloblastoma from the retrospective cohorts (n=673) and the four prospective studies (n=349), from whom blood samples (n=1022) and tumour samples (n=800) were analysed for germline mutations in 110 cancer predisposition genes. In our rare variant burden analysis, we compared these against 53 105 sequenced controls from ExAC and identified APC, BRCA2, PALB2, PTCH1, SUFU, and TP53 as consensus medulloblastoma predisposition genes according to our rare variant burden analysis and estimated that germline mutations accounted for 6% of medulloblastoma diagnoses in the retrospective cohort. The prevalence of genetic predispositions differed between molecular subgroups in the retrospective cohort and was highest for patients in the MBSHH subgroup (20% in the retrospective cohort). These estimates were replicated in the prospective clinical cohort (germline mutations accounted for 5% of medulloblastoma diagnoses, with the highest prevalence [14%] in the MBSHH subgroup). Patients with germline APC mutations developed MBWNT and accounted for most (five [71%] of seven) cases of MBWNT that had no somatic CTNNB1 exon 3 mutations. Patients with germline mutations in SUFU and PTCH1 mostly developed infant MBSHH. Germline TP53 mutations presented only in childhood patients in the MBSHH subgroup and explained more than half (eight [57%] of 14) of all chromothripsis events in this subgroup. Germline mutations in PALB2 and BRCA2 were observed across the MBSHH, MBGroup3, and MBGroup4 molecular subgroups and were associated with mutational signatures typical of homologous recombination repair deficiency. In patients with a genetic predisposition to medulloblastoma, 5-year progression-free survival was 52% (95% CI 40-69) and 5-year overall survival was 65% (95% CI 52-81); these survival estimates differed significantly across patients with germline mutations in different medulloblastoma predisposition genes. INTERPRETATION: Genetic counselling and testing should be used as a standard-of-care procedure in patients with MBWNT and MBSHH because these patients have the highest prevalence of damaging germline mutations in known cancer predisposition genes. We propose criteria for routine genetic screening for patients with medulloblastoma based on clinical and molecular tumour characteristics. FUNDING: German Cancer Aid; German Federal Ministry of Education and Research; German Childhood Cancer Foundation (Deutsche Kinderkrebsstiftung); European Research Council; National Institutes of Health; Canadian Institutes for Health Research; German Cancer Research Center; St Jude Comprehensive Cancer Center; American Lebanese Syrian Associated Charities; Swiss National Science Foundation; European Molecular Biology Organization; Cancer Research UK; Hertie Foundation; Alexander and Margaret Stewart Trust; V Foundation for Cancer Research; Sontag Foundation; Musicians Against Childhood Cancer; BC Cancer Foundation; Swedish Council for Health, Working Life and Welfare; Swedish Research Council; Swedish Cancer Society; the Swedish Radiation Protection Authority; Danish Strategic Research Council; Swiss Federal Office of Public Health; Swiss Research Foundation on Mobile Communication; Masaryk University; Ministry of Health of the Czech Republic; Research Council of Norway; Genome Canada; Genome BC; Terry Fox Research Institute; Ontario Institute for Cancer Research; Pediatric Oncology Group of Ontario; The Family of Kathleen Lorette and the Clark H Smith Brain Tumour Centre; Montreal Children's Hospital Foundation; The Hospital for Sick Children: Sonia and Arthur Labatt Brain Tumour Research Centre, Chief of Research Fund, Cancer Genetics Program, Garron Family Cancer Centre, MDT's Garron Family Endowment; BC Childhood Cancer Parents Association; Cure Search Foundation; Pediatric Brain Tumor Foundation; Brainchild; and the Government of Ontario
Building graph models of oncogenesis by using microRNA expression data
MicroRNAs (miRNAs) are a class of small non-coding RNAs that control gene expression by targeting mRNAs and triggering either translation repression or RNA degradation. Several groups pointed out that miRNAs play a major role in several diseases, including cancer. This is assumed since the expression level of several miRNAs differs between normal and cancerous cells. Further, it has been shown that miRNAs are involved in cell proliferation and cell death. Because of this role it is suspected that miRNAs could serve as biomarkers to improve tumor classification, therapy selection, or prediction of survival. In this context, it is questioned, among other things, whether miRNA deregulations in cancer cells occur according to some pattern or in a rather random order. With this work we contribute to answering this question by adapting two approaches (Beerenwinkel et al. (J Comput Biol, 2005) and Höglund et al. (Gene Chromosome Canc, 2001)), developed to derive graph models of oncogenesis for chromosomal imbalances, to miRNA expression data and applying them to a breast cancer data set. Further, we evaluated the results by comparing them to results derived from randomly altered versions of the used data set. We could show that miRNA deregulations most likely follow a rough temporal order, i.e. some deregulations occur early and some occur late in cancer progression. Thus, it seems to be possible that the expression level of some miRNAs can be used as indicator for the stage of a tumor. Further, our results suggest that the over expression of mir-21 as well as mir-102 are initial events in breast cancer oncogenesis. Additionally, we identified a set of miRNAs showing a cluster-like behavior, i.e. their deregulations often occur together in a tumor, but other deregulations are less frequently present. These miRNAs are let-7d, mir-10b, mir-125a, mir-125b, mir-145, mir-206, and mir-210. Further, we could confirm the strong relationship between the expression of mir-125a and mir-125b
Algorithm Engineering for Color-Coding with Applications to Signaling Pathway Detection
Color-coding is a technique to design fixed-parameter algorithms for several NP-complete subgraph isomorphism problems. Somewhat surprisingly, not much work has so far been spent on the actual implementation of algorithms that are based on color-coding, despite the elegance of this technique and its wide range of applicability to practically important problems. This work gives various novel algorithmic improvements for color-coding, both from a worst-case perspective as well as under practical considerations. We apply the resulting implementation to the identification of signaling pathways in protein interaction networks, demonstrating that our improvements speed up the color-coding algorithm by orders of magnitude over previous implementations. This allows more complex and larger structures to be identified in reasonable time; many biologically relevant instances can even be solved in seconds where, previously, hours were required
DELLY: structural variant discovery by integrated paired‐end and split‐read analysis. Bioinformatics 28:i333‐i339.
ABSTRACT Motivation: The discovery of genomic structural variants (SVs) at high sensitivity and specificity is an essential requirement for characterizing naturally occurring variation and for understanding pathological somatic rearrangements in personal genome sequencing data. Of particular interest are integrated methods that accurately identify simple and complex rearrangements in heterogeneous sequencing datasets at single-nucleotide resolution, as an optimal basis for investigating the formation mechanisms and functional consequences of SVs. Results: We have developed an SV discovery method, called DELLY, that integrates short insert paired-ends, long-range matepairs and split-read alignments to accurately delineate genomic rearrangements at single-nucleotide resolution. DELLY is suitable for detecting copy-number variable deletion and tandem duplication events as well as balanced rearrangements such as inversions or reciprocal translocations. DELLY, thus, enables to ascertain the full spectrum of genomic rearrangements, including complex events. On simulated data, DELLY compares favorably to other SV prediction methods across a wide range of sequencing parameters. On real data, DELLY reliably uncovers SVs from the 1000 Genomes Project and cancer genomes, and validation experiments of randomly selected deletion loci show a high specificity
Natural variation in genome architecture among 205 Drosophila melanogaster Genetic Reference Panel lines
The Drosophila melanogaster Genetic Reference Panel (DGRP) is a community resource of 205 sequenced inbred lines, derived to improve our understanding of the effects of naturally occurring genetic variation on molecular and organismal phenotypes. We used an integrated genotyping strategy to identify 4,853,802 single nucleotide polymorphisms (SNPs) and 1,296,080 non-SNP variants. Our molecular population genomic analyses show higher deletion than insertion mutation rates and stronger purifying selection on deletions. Weaker selection on insertions than deletions is consistent with our observed distribution of genome size determined by flow cytometry, which is skewed toward larger genomes. Insertion/deletion and single nucleotide polymorphisms are positively correlated with each other and with local recombination, suggesting that their nonrandom distributions are due to hitchhiking and background selection. Our cytogenetic analysis identified 16 polymorphic inversions in the DGRP. Common inverted and standard karyotypes are genetically divergent and account for most of the variation in relatedness among the DGRP lines. Intriguingly, variation in genome size and many quantitative traits are significantly associated with inversions. Approximately 50% of the DGRP lines are infected with Wolbachia, and four lines have germline insertions of Wolbachia sequences, but effects of Wolbachia infection on quantitative traits are rarely significant. The DGRP complements ongoing efforts to functionally annotate the Drosophila genome. Indeed, 15% of all D. melanogaster genes segregate for potentially damaged proteins in the DGRP, and genome-wide analyses of quantitative traits identify novel candidate genes. The DGRP lines, sequence data, genotypes, quality scores, phenotypes, and analysis and visualization tools are publicly available
Identifying the Unknowns by Aligning Fragmentation Trees
Mass spectrometry allows sensitive, automated, and high-throughput
analysis of small molecules. In principle, tandem mass spectrometry
allows us to identify “unknown” small molecules not
in any database, but the automated interpretation of such data is
in its infancy. Fragmentation trees have recently been introduced
for the automated analysis of the fragmentation patterns of small
molecules. We present a method for the automated comparison of such
fragmentation patterns, based on aligning the compounds’ fragmentation
trees. We cluster compounds based solely on their fragmentation patterns
and show a good agreement with known compound classes. Fragmentation
pattern similarities are strongly correlated with the chemical similarity
of molecules. We present a tool for searching a database for compounds
with fragmentation pattern similar to an unknown sample compound.
We apply this tool to metabolites from Icelandic poppy. Our method
allows fully automated computational identification of small molecules
that cannot be found in any database
Identifying the Unknowns by Aligning Fragmentation Trees
Mass spectrometry allows sensitive, automated, and high-throughput
analysis of small molecules. In principle, tandem mass spectrometry
allows us to identify “unknown” small molecules not
in any database, but the automated interpretation of such data is
in its infancy. Fragmentation trees have recently been introduced
for the automated analysis of the fragmentation patterns of small
molecules. We present a method for the automated comparison of such
fragmentation patterns, based on aligning the compounds’ fragmentation
trees. We cluster compounds based solely on their fragmentation patterns
and show a good agreement with known compound classes. Fragmentation
pattern similarities are strongly correlated with the chemical similarity
of molecules. We present a tool for searching a database for compounds
with fragmentation pattern similar to an unknown sample compound.
We apply this tool to metabolites from Icelandic poppy. Our method
allows fully automated computational identification of small molecules
that cannot be found in any database
Identifying the Unknowns by Aligning Fragmentation Trees
Mass spectrometry allows sensitive, automated, and high-throughput
analysis of small molecules. In principle, tandem mass spectrometry
allows us to identify “unknown” small molecules not
in any database, but the automated interpretation of such data is
in its infancy. Fragmentation trees have recently been introduced
for the automated analysis of the fragmentation patterns of small
molecules. We present a method for the automated comparison of such
fragmentation patterns, based on aligning the compounds’ fragmentation
trees. We cluster compounds based solely on their fragmentation patterns
and show a good agreement with known compound classes. Fragmentation
pattern similarities are strongly correlated with the chemical similarity
of molecules. We present a tool for searching a database for compounds
with fragmentation pattern similar to an unknown sample compound.
We apply this tool to metabolites from Icelandic poppy. Our method
allows fully automated computational identification of small molecules
that cannot be found in any database