11,060 research outputs found
Change-point analysis of paired allele-specific copy number variation data
The recent genome-wide allele-specific copy number variation data enable us to explore two types of genomic information including chromosomal genotype variations as well as DNA copy number variations. For a cancer study, it is common to collect data for paired normal and tumor samples. Then, two types of paired data can be obtained to study a disease subject. However, there is a lack of methods for a simultaneous analysis of these four sequences of data. In this study, we propose a statistical framework based on the change-point analysis approach. The validity and usefulness of our proposed statistical framework are demonstrated through the simulation studies and applications based on an experimental data set
Extensive Copy-Number Variation of Young Genes across Stickleback Populations
MM received funding from the Max Planck innovation funds for this project. PGDF was supported by a Marie Curie European Reintegration Grant (proposal nr 270891). CE was supported by German Science Foundation grants (DFG, EI 841/4-1 and EI 841/6-1). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript
CLEVER: Clique-Enumerating Variant Finder
Next-generation sequencing techniques have facilitated a large scale analysis
of human genetic variation. Despite the advances in sequencing speeds, the
computational discovery of structural variants is not yet standard. It is
likely that many variants have remained undiscovered in most sequenced
individuals. Here we present a novel internal segment size based approach,
which organizes all, including also concordant reads into a read alignment
graph where max-cliques represent maximal contradiction-free groups of
alignments. A specifically engineered algorithm then enumerates all max-cliques
and statistically evaluates them for their potential to reflect insertions or
deletions (indels). For the first time in the literature, we compare a large
range of state-of-the-art approaches using simulated Illumina reads from a
fully annotated genome and present various relevant performance statistics. We
achieve superior performance rates in particular on indels of sizes 20--100,
which have been exposed as a current major challenge in the SV discovery
literature and where prior insert size based approaches have limitations. In
that size range, we outperform even split read aligners. We achieve good
results also on real data where we make a substantial amount of correct
predictions as the only tool, which complement the predictions of split-read
aligners. CLEVER is open source (GPL) and available from
http://clever-sv.googlecode.com.Comment: 30 pages, 8 figure
- …