Search CORE

15 research outputs found

Accuracy of phylogenetic inference on simulated copy number data for varying algorithms.

Author: Alejandro A. Schäffer (110833)
Kerstin Heselmeyer-Haddad (606768)
Russell Schwartz (39843)
Salim Akhter Chowdhury (606766)
Stanley E. Shackney (606767)
Thomas Ried (48620)
Publication venue
Publication date
Field of study

Variants of our phylogenetic algorithms and two competing methods from the literature were applied to simulated FISH datasets describing evolution by combinations of single-gene (SD), chromosome (CD), and whole-genome (GD) duplication and loss events. Results are reported for inference by our methods from simulated trees, allowing for SD events alone, SD+CD events, SD+GD events, and SD+CD+GD events. We compared these results to inference by neighbor-joining (NJ) and pure maximum parsimony (MP) as implemented in MEGA, version 6. Accuracy is assessed by mean reconstruction error of bipartitions between true and inferred trees. Error bars show plus or minus one standard deviation across the samples for each method.</p

The Francis Crick Institute

Comparison of mean percentage reconstruction error (with standard deviation) of different phylogeny models on simulated data for different sampling distributions of the cells.

Author: Alejandro A. Schäffer (110833)
Kerstin Heselmeyer-Haddad (606768)
Russell Schwartz (39843)
Salim Akhter Chowdhury (606766)
Stanley E. Shackney (606767)
Thomas Ried (48620)
Publication venue
Publication date
Field of study

Mean percentage reconstruction error on simulated samples are shown for six tree-building models considering (i) SD, (ii) SD+CD, (iii) SD+GD, (iv) SD+CD+GD (v) NJ and (vi) MP when the sampling distribution of cells is varied.</p

The Francis Crick Institute

Comparison of mean percentage reconstruction error (with standard deviation) of different phylogeny models on simulated data for different combinations of SD, CD and GD event probabilities.

Author: Alejandro A. Schäffer (110833)
Kerstin Heselmeyer-Haddad (606768)
Russell Schwartz (39843)
Salim Akhter Chowdhury (606766)
Stanley E. Shackney (606767)
Thomas Ried (48620)
Publication venue
Publication date
Field of study

Mean percentage reconstruction error on simulated samples are shown for four tree-building models considering (i) SD, (ii) SD+CD, (iii) SD+GD and (iv) SD+CD+GD across five different combinations of SD, CD, and GD probabilities.</p

The Francis Crick Institute

Wilcoxon signed rank test results for separating primary CC samples from the metastases.

Author: Alejandro A. Schäffer (110833)
Kerstin Heselmeyer-Haddad (606768)
Russell Schwartz (39843)
Salim Akhter Chowdhury (606766)
Stanley E. Shackney (606767)
Thomas Ried (48620)
Publication venue
Publication date
Field of study

Wilcoxon signed rank test 1-sided p-values for separating the primary CC samples from the metastases across subsets of increasing numbers of randomly selected tumor samples. For each set of tumors, samples were randomly selected from paired CC primary and metastatic tumors with atleast one of each type and then Wilcoxon signed rank test was used to calculate the p-values for separating the primary from metastases based on three different statistics: (A) Shannon index calculated using the distribution of cells across different tree levels, (B) weighted mean depth of the trees and (C) sum of differences of fractional gain and loss of each gene across the tree edges.</p

The Francis Crick Institute

Classification results on the CC dataset.

Author: Alejandro A. Schäffer (110833)
Kerstin Heselmeyer-Haddad (606768)
Russell Schwartz (39843)
Salim Akhter Chowdhury (606766)
Stanley E. Shackney (606767)
Thomas Ried (48620)
Publication venue
Publication date
Field of study

Prediction accuracy on three different classification tasks of CC samples of an SVM classifier using tree-based and cell-based features. Each of the two tree-based features, edge count and tree level cell percentage, is derived from phylogenetic trees built using two different models of tumor progression, namely SD and combination of SD, CD and GD. Two cell-based features, average gain/loss and maximum copy number of each gene, and two information theoretic measures of cell heterogeneity, Shannon entropy and Simpson's index, are used.</p

The Francis Crick Institute

Parsimony score comparison on the CC samples.

Author: Alejandro A. Schäffer (110833)
Kerstin Heselmeyer-Haddad (606768)
Russell Schwartz (39843)
Salim Akhter Chowdhury (606766)
Stanley E. Shackney (606767)
Thomas Ried (48620)
Publication venue
Publication date
Field of study

Comparison of (A) Primary and (B) Metastatic CC tumor progression tree weights built considering only SD and combined SD, CD and GD models. “Total Cell Type” refers to the total number of unique probe copy number configurations in the dataset, providing a lower bound on the minimum possible parsimony score for a given data set.</p

The Francis Crick Institute

Example simulated and inferred trees illustrating key terms in the formula for calculating the reconstruction error.

Author: Alejandro A. Schäffer (110833)
Kerstin Heselmeyer-Haddad (606768)
Russell Schwartz (39843)
Salim Akhter Chowdhury (606766)
Stanley E. Shackney (606767)
Thomas Ried (48620)
Publication venue
Publication date
Field of study

(A) A hypothetical simulated ground truth tree on the set of taxa . (B) Example inferred tree built on the sampled set of taxa on the dataset resulting from the ground truth tree.</p

The Francis Crick Institute

Example showing the three mechanisms of copy number changes in a hypothetical cell.

Author: Alejandro A. Schäffer (110833)
Kerstin Heselmeyer-Haddad (606768)
Russell Schwartz (39843)
Salim Akhter Chowdhury (606766)
Stanley E. Shackney (606767)
Thomas Ried (48620)
Publication venue
Publication date
Field of study

A copy number profile of four genes is shown as an ordered set for homologous chromosome pairs and respectively, where the gene located on the top position in the chromosome precedes the gene located on the bottom position in the ordering. After the (A) Single gene duplication event, the copy number of a gene located on gets increased by 1. After the (B) Single chromosome duplication event, the chromosome gets duplicated and the cell has one extra copy of that chromosome as chromosome . After the (C) Whole genome duplication event, all the chromosomes are duplicated and the total number of chromosomes in the daughter cell is twice the number of chromosomes in the mother cell.</p

The Francis Crick Institute

Comparison of mean percentage reconstruction error (with standard deviation) of different phylogeny models on simulated data for two different probe settings.

Author: Alejandro A. Schäffer (110833)
Kerstin Heselmeyer-Haddad (606768)
Russell Schwartz (39843)
Salim Akhter Chowdhury (606766)
Stanley E. Shackney (606767)
Thomas Ried (48620)
Publication venue
Publication date
Field of study

Mean percentage reconstruction error on simulated samples are shown for four tree-building models considering (i) SD, (ii) SD+CD, (iii) SD+GD and (iv) SD+CD+GD for two different cases when the number of chromosomes harboring two genes is 1 or 2.</p

The Francis Crick Institute

Algorithm 3 pseudocode.

Author: Alejandro A. Schäffer (110833)
Kerstin Heselmeyer-Haddad (606768)
Russell Schwartz (39843)
Salim Akhter Chowdhury (606766)
Stanley E. Shackney (606767)
Thomas Ried (48620)
Publication venue
Publication date
Field of study

This figure provided the main steps in the algorithm to generate tumor progression trees; generate_distance_matrix uses Algorithm 2 on each distinct pair of nodes in the set of nodes it is passed. To compute Minimum Spanning Tree (function mst called at lines 4 and 16), we implemented Prim's algorithm.</p

The Francis Crick Institute