101 research outputs found

    svclassify: a method to establish benchmark structural variant calls

    Get PDF
    The human genome contains variants ranging in size from small single nucleotide polymorphisms (SNPs) to large structural variants (SVs). High-quality benchmark small variant calls for the pilot National Institute of Standards and Technology (NIST) Reference Material (NA12878) have been developed by the Genome in a Bottle Consortium, but no similar high-quality benchmark SV calls exist for this genome. Since SV callers output highly discordant results, we developed methods to combine multiple forms of evidence from multiple sequencing technologies to classify candidate SVs into likely true or false positives. Our method (svclassify) calculates annotations from one or more aligned bam files from many high-throughput sequencing technologies, and then builds a one-class model using these annotations to classify candidate SVs as likely true or false positives. We first used pedigree analysis to develop a set of high-confidence breakpoint-resolved large deletions. We then used svclassify to cluster and classify these deletions as well as a set of high-confidence deletions from the 1000 Genomes Project and a set of breakpoint-resolved complex insertions from Spiral Genetics. We find that likely SVs cluster separately from likely non-SVs based on our annotations, and that the SVs cluster into different types of deletions. We then developed a supervised one-class classification method that uses a training set of random non-SV regions to determine whether candidate SVs have abnormal annotations different from most of the genome. To test this classification method, we use our pedigree-based breakpoint-resolved SVs, SVs validated by the 1000 Genomes Project, and assembly-based breakpoint-resolved insertions, along with semi-automated visualization using svviz. We find that candidate SVs with high scores from multiple technologies have high concordance with PCR validation and an orthogonal consensus method MetaSV (99.7 % concordant), and candidate SVs with low scores are questionable. We distribute a set of 2676 high-confidence deletions and 68 high-confidence insertions with high svclassify scores from these call sets for benchmarking SV callers. We expect these methods to be particularly useful for establishing high-confidence SV calls for benchmark samples that have been characterized by multiple technologies.https://doi.org/10.1186/s12864-016-2366-

    UNDERSTANDING CELL DIFFERENTIATION AND MIGRATION WITH MULTIVARIATE CELL SHAPE QUANTIFICATION

    No full text
    This thesis focuses on developing multivariate quantification methods of cell shape to facilitate understanding of physiological processes such as cell differentiation and migration. Cell shape reflects complex intracellular and extracellular factors affecting cell function. However, analyses associating cell shape and cell function need to account for challenges of multivariate interpretation, single-cell heterogeneity and reproducibility. Specifically, Human Bone Marrow Stromal Cells (hBMSCs) population in nanofiber scaffolds can develop osteogenic differentiation without chemical cues. I developed a method based on Support Vector Machine (SVM) to train classifiers as boundaries in the shape metric spaces to identify the day 1 cell shape phenotype of hBMSCs population in nanofiber scaffolds. To reduce the effect of single-cell heterogeneity in the population phenotyping, the “supercell” method was introduced to generate average measurements of small groups of cells for SVM training. To overcome the multivariate complexity in biological interpretation, a feature selection process was implemented to select the most significant cell shape metrics. The predictive potential of the achieved classifiers was validated by subsampling. It was found that in nanofiber scaffolds, hBMSCs were narrower with more elongated and dendritic shape and rougher cell boundary. Further, I found that increase in nanofiber density enhanced hBMSCs osteogenic differentiation potential. The pre-trained classifiers successfully predict the modulation of nanofiber density on hBMSCs fates and single-cell shape. While much can be learned from cell shapes alone, it is important to note that shapes can change with time, especially for migrating cells. The second part of my thesis focuses on analysis of shape dynamics. Quantification for cell shape dynamics at the subcellular level was developed to understand the coordination of the subcellular myosin localization and the cell boundary dynamics in neutrophil migration in vivo. The correlation of myosin localization and positive cell boundary curvature was identified as a unique in vivo neutrophil migration phenotype. Correlations of myosin localization and local cell boundary dynamics in vivo were found to be affected by cell motility and polarization. This analysis framework shown here can also be used to study the link between other subcellular features and neutrophil migration and shape dynamics
    corecore