Search CORE

27 research outputs found

Additional file 2: Figure S2. of Identification and classification of small molecule kinases: insights into substrate recognition and specificity

Author: Andrew Neuwald (3461711)
Eric Talevich (235366)
Krishnadev Oruganty (471060)
Natarajan Kannan (41300)
Publication venue
Publication date
Field of study

Phylogeny and taxonomic analysis of ELKs A) Full tree showing the relationships found between various ELK groups using core domains. The nodes are colored according to Fig.Â 2 coloring scheme. pknBs, which are protein kinases, found in bacteria cluster together with other EPKs suggesting that they are EPKs rather than ELKs. The branch points are annotated with bootstrap values (out of 100) in a maximum likelihood tree. B) Taxonomic distribution of APH3 families showing the prevalence of APH3 groups in bacteria, fungi and other eukaryotes. The taxonomic classes are colored according to scheme given in the left top corner of the figure. (PNG 1308 kb

FigShare

Bias corrections reduce the extraneous variation in bin read depths.

Author: A. Hunter Shain (208624)
Boris C. Bastian (241103)
Eric Talevich (235366)
Thomas Botton (360391)
Publication venue
Publication date
Field of study

Distributions of the absolute deviation of on– and off-target bins from the final, segmented copy ratio estimates are shown as box plots at each step of bias correction for all samples in the TR and EX sequencing cohorts. At each step, for on- and off-target bins separately, boxes show the median and interquartile range of absolute deviations and whiskers show the 95% range. Steps shown are the initial median-centered log2 read depth (“Raw”), correction of GC bias (“GC”), correction of on-target density and off-target repeat biases (“Density/Repeat”), and normalization to a pooled reference (“Reference“).</p

FigShare

CNVkit copy ratios agree with experimental results array CGH and FISH on cell line DNA.

Author: A. Hunter Shain (208624)
Boris C. Bastian (241103)
Eric Talevich (235366)
Thomas Botton (360391)
Publication venue
Publication date
Field of study

A: Whole-genome profiles of log2 copy ratio by CNVkit (top) and array CGH (bottom) are shown. B: Genes additionally assayed by FISH are labeled with the detected absolute copy number. At CDKN2A, log2 ratios below the marked level of -3.58 indicate the site is entirely deleted in the majority of cells.</p

FigShare

Baited region size and spacing affect read depth systematically.

Author: A. Hunter Shain (208624)
Boris C. Bastian (241103)
Eric Talevich (235366)
Thomas Botton (360391)
Publication venue
Publication date
Field of study

A: Example of typical coverage observed at a targeted exon, as viewed in IGV, and simplified geometric models of the negative coverage biases (yellow) that can occur as a function of the relative sizes of sequence fragments and the baited region. B: Coverage observed at two neighboring targeted exons, and models of the positive coverage biases (red) that can occur where intervals are separated by less than half the insert size of sequence fragments.</p

FigShare

CNVkit: Genome-Wide Copy Number Detection and Visualization from Targeted DNA Sequencing

Author: A. Hunter Shain (208624)
Boris C. Bastian (241103)
Eric Talevich (235366)
Thomas Botton (360391)
Publication venue
Publication date: 01/01/2016
Field of study

<div>Germline copy number variants (CNVs) and somatic copy number alterations (SCNAs) are of significant importance in syndromic conditions and cancer. Massively parallel sequencing is increasingly used to infer copy number information from variations in the read depth in sequencing data. However, this approach has limitations in the case of targeted re-sequencing, which leaves gaps in coverage between the regions chosen for enrichment and introduces biases related to the efficiency of target capture and library preparation. We present a method for copy number detection, implemented in the software package CNVkit, that uses both the targeted reads and the nonspecifically captured off-target reads to infer copy number evenly across the genome. This combination achieves both exon-level resolution in targeted regions and sufficient resolution in the larger intronic and intergenic regions to identify copy number changes. In particular, we successfully inferred copy number at equivalent to 100-kilobase resolution genome-wide from a platform targeting as few as 293 genes. After normalizing read counts to a pooled reference, we evaluated and corrected for three sources of bias that explain most of the extraneous variability in the sequencing read depth: GC content, target footprint size and spacing, and repetitive sequences. We compared the performance of CNVkit to copy number changes identified by array comparative genomic hybridization. We packaged the components of CNVkit so that it is straightforward to use and provides visualizations, detailed reporting of significant features, and export options for integration into existing analysis pipelines. CNVkit is freely available from <a href="https://github.com/etal/cnvkit" target="_blank">https://github.com/etal/cnvkit</a>.</div

Crossref

Directory of Open Access Journals

PubMed Central

eScholarship - University of California

FigShare

Precion and recall of absolute copy number calls.

Author: A. Hunter Shain (208624)
Boris C. Bastian (241103)
Eric Talevich (235366)
Thomas Botton (360391)
Publication venue
Publication date
Field of study

CNV calls obtained using each sequencing-based method are compared to those determined by array CGH to calculate precision and recall under several criteria for the C0902 cell line sample. Columns show detection of each copy number level versus the neutral hexaploid state. Rows show criteria for comparison: all CNVs, CNVs larger than 5 MB, CNVs smaller than 5MB, all CNV basepairs. Each subplot shows the calculated precision and recall of CNVkit, CopywriteR and CONTRA with each supported reference.</p

FigShare

Top predicted unconfirmed mutations.

Author: Eric Talevich (235366)
Khaled Rasheed (554000)
ManChon U (553999)
Natarajan Kannan (41300)
Samiksha Katiyar (379207)
Publication venue
Publication date
Field of study

Probability scores and rankings of the top predicted mutations. Scores were calculated with the multiple classifier trained on COSMIC v.50 data. Asterisks indicate the five mutations selected for cell-based assays.</p

FigShare

Bin read depths are systematically biased by GC content and other factors.

Author: A. Hunter Shain (208624)
Boris C. Bastian (241103)
Eric Talevich (235366)
Thomas Botton (360391)
Publication venue
Publication date
Field of study

A: GC coverage bias follows a unimodal distribution in sample TR_37_T. Target bins are sorted according to bin GC fraction (x-axis), and the uncorrected, median-centered log2 bin read depths are plotted (y-axis). A rolling median of the bin log2 read depths in order of GC value is drawn in red, showing a systematic deviation from 0 in the selected sample. B: Trendlines summarize each bias type in each sample. TR and EX samples are shown in the top and bottom rows, respectively. Columns show biases due to GC content in target bins and off-target bins, repeat content in off-target bins, and density bias in target bins.</p

FigShare

CNVkit workflows.

Author: A. Hunter Shain (208624)
Boris C. Bastian (241103)
Eric Talevich (235366)
Thomas Botton (360391)
Publication venue
Publication date
Field of study

The target and off-target bin BED files and reference file are constructed once for a given platform and can be used to process many samples sequenced on the same platform, as shown in the workflow on the left. Steps to construct the off-target bins are shown at the top-right, and construction of the reference is shown at the lower-right.</p

FigShare

Feature values of selected mutations.

Author: Eric Talevich (235366)
Khaled Rasheed (554000)
ManChon U (553999)
Natarajan Kannan (41300)
Samiksha Katiyar (379207)
Publication venue
Publication date
Field of study

Feature values of selected mutations.</p

FigShare