35 research outputs found

    MEDALT: Single-cell copy number lineage tracing enabling gene discovery

    Get PDF
    We present a Minimal Event Distance Aneuploidy Lineage Tree (MEDALT) algorithm that infers the evolution history of a cell population based on single-cell copy number (SCCN) profiles, and a statistical routine named lineage speciation analysis (LSA), whichty facilitates discovery of fitness-associated alterations and genes from SCCN lineage trees. MEDALT appears more accurate than phylogenetics approaches in reconstructing copy number lineage. From data from 20 triple-negative breast cancer patients, our approaches effectively prioritize genes that are essential for breast cancer cell fitness and predict patient survival, including those implicating convergent evolution.The source code of our study is available at https://github.com/KChen-lab/MEDALT

    Direct CD32 T-cell cytotoxicity: implications for breast cancer prognosis and treatment

    Get PDF
    The FcγRII (CD32) ligands are IgFc fragments and pentraxins. The existence of additional ligands is unknown. We engineered T cells with human chimeric receptors resulting from the fusion between CD32 extracellular portion and transmembrane CD8α linked toCD28/ζ chain intracellular moiety (CD32-CR). Transduced T cells recognized three breast cancer (BC) and one colon cancer cell line among 15 tested in the absence of targeting antibodies. Sensitive BC cell conjugation with CD32-CR T cells induced CD32 polarization and down-regulation, CD107a release, mutual elimination, and proinflammatory cytokine production unaffected by human IgGs but enhanced by cetuximab. CD32-CR T cells protected immunodeficient mice from subcutaneous growth of MDA-MB-468 BC cells. RNAseq analysis identified a 42 gene fingerprint predicting BC cell sensitivity and favorable outcomes in advanced BC. ICAM1 was a major regulator of CD32-CR T cell–mediated cytotoxicity. CD32-CR T cells may help identify cell surface CD32 ligand(s) and novel prognostically relevant transcriptomic signatures and develop innovative BC treatments

    Reference-free SNP calling: improved accuracy by preventing incorrect calls from repetitive genomic regions

    No full text
    <p>Abstract</p> <p>Background</p> <p>Single nucleotide polymorphisms (SNPs) are the most abundant type of genetic variation in eukaryotic genomes and have recently become the marker of choice in a wide variety of ecological and evolutionary studies. The advent of next-generation sequencing (NGS) technologies has made it possible to efficiently genotype a large number of SNPs in the non-model organisms with no or limited genomic resources. Most NGS-based genotyping methods require a reference genome to perform accurate SNP calling. Little effort, however, has yet been devoted to developing or improving algorithms for accurate SNP calling in the absence of a reference genome.</p> <p>Results</p> <p>Here we describe an improved maximum likelihood (ML) algorithm called iML, which can achieve high genotyping accuracy for SNP calling in the non-model organisms without a reference genome. The iML algorithm incorporates the mixed Poisson/normal model to detect composite read clusters and can efficiently prevent incorrect SNP calls resulting from repetitive genomic regions. Through analysis of simulation and real sequencing datasets, we demonstrate that in comparison with ML or a threshold approach, iML can remarkably improve the accuracy of <it>de novo</it> SNP genotyping and is especially powerful for the reference-free genotyping in diploid genomes with high repeat contents.</p> <p>Conclusions</p> <p>The iML algorithm can efficiently prevent incorrect SNP calls resulting from repetitive genomic regions, and thus outperforms the original ML algorithm by achieving much higher genotyping accuracy. Our algorithm is therefore very useful for accurate <it>de novo</it> SNP genotyping in the non-model organisms without a reference genome.</p> <p>Reviewers</p> <p>This article was reviewed by Dr. Richard Durbin, Dr. Liliana Florea (nominated by Dr. Steven Salzberg) and Dr. Arcady Mushegian.</p

    A scallop IGF binding protein gene: molecular characterization and association of variants with growth traits.

    Get PDF
    BACKGROUND: Scallops represent economically important aquaculture shellfish. The identification of genes and genetic variants related to scallop growth could benefit high-yielding scallop breeding. The insulin-like growth factor (IGF) system is essential for growth and development, with IGF binding proteins (IGFBPs) serving as the major regulators of IGF actions. Although an effect of IGF on growth was detected in bivalve, IGFBP has not been reported, and members of the IGF system have not been characterized in scallop. RESULTS: We cloned and characterized an IGFBP (PyIGFBP) gene from the aquaculture bivalve species, Yesso scallop (Patinopecten yessoensis, Jay, 1857). Its full-length cDNA sequence was 1,445 bp, with an open reading frame of 378 bp, encoding 125 amino acids, and its genomic sequence was 10,193 bp, consisting of three exons and two introns. The amino acid sequence exhibited the characteristics of IGFBPs, including multiple cysteine residues and relatively conserved motifs in the N-terminal and C-terminal domains. Expression analysis indicated that PyIGFBP was expressed in all the tissues and developmental stages examined, with a significantly higher level in the mantle than in other tissues and a significantly higher level in gastrulae and trochophore larvae than in other stages. Furthermore, three single nucleotide polymorphisms (SNPs) were identified in this gene. SNP c.1054A>G was significantly associated with both shell and soft body traits in two populations, with the highest trait values in GG type scallops and lowest in AG type ones. CONCLUSION: We cloned and characterized an IGFBP gene in a bivalve, and this report also represents the first characterizing an IGF system gene in scallops. A SNP associated with scallop growth for both the shell and soft body was identified in this gene. In addition to providing a candidate marker for scallop breeding, our results also suggest the role of PyIGFBP in scallop growth

    Estimation of kinship coefficient in structured and admixed populations using sparse sequencing data

    No full text
    <div><p>Knowledge of biological relatedness between samples is important for many genetic studies. In large-scale human genetic association studies, the estimated kinship is used to remove cryptic relatedness, control for family structure, and estimate trait heritability. However, estimation of kinship is challenging for sparse sequencing data, such as those from off-target regions in target sequencing studies, where genotypes are largely uncertain or missing. Existing methods often assume accurate genotypes at a large number of markers across the genome. We show that these methods, without accounting for the genotype uncertainty in sparse sequencing data, can yield a strong downward bias in kinship estimation. We develop a computationally efficient method called SEEKIN to estimate kinship for both homogeneous samples and heterogeneous samples with population structure and admixture. Our method models genotype uncertainty and leverages linkage disequilibrium through imputation. We test SEEKIN on a whole exome sequencing dataset (WES) of Singapore Chinese and Malays, which involves substantial population structure and admixture. We show that SEEKIN can accurately estimate kinship coefficient and classify genetic relatedness using off-target sequencing data down sampled to ~0.15X depth. In application to the full WES dataset without down sampling, SEEKIN also outperforms existing methods by properly analyzing shallow off-target data (~0.75X). Using both simulated and real phenotypes, we further illustrate how our method improves estimation of trait heritability for WES studies.</p></div

    Systematic decomposition of sequence determinants governing CRISPR/Cas9 specificity

    No full text
    The specificity of CRISPR/Cas9 genome editing is largely determined by the sequences of guide RNA (gRNA) and the targeted DNA, yet the sequence-dependent rules underlying off-target effects are not fully understood. To systematically explore the sequence determinants governing CRISPR/Cas9 specificity, here we describe a dual-target system to measure the relative cleavage rate between off- and on-target sequences (off-on ratios) of 1902 gRNAs on 13,314 synthetic target sequences, and reveal a set of sequence rules involving 2 factors in off-targeting: 1) a guide-intrinsic mismatch tolerance (GMT) independent of the mismatch context; 2) an “epistasis-like” combinatorial effect of multiple mismatches, which are associated with the free-energy landscape in R-loop formation and are explainable by a multi-state kinetic model. These sequence rules lead to the development of MOFF, a model-based predictor of Cas9-mediated off-target effects. Moreover, the “epistasis-like” combinatorial effect suggests a strategy of allele-specific genome editing using mismatched guides. With the aid of MOFF prediction, this strategy significantly improves the selectivity and expands the application domain of Cas9-based allele-specific editing, as tested in a high-throughput allele-editing screen on 18 cancer hotspot mutations.BN/Bionanoscienc

    Performance of heterogeneous kinship estimators in ~0.15X sequencing data of 762 Chinese and Malays.

    No full text
    <p>In each panel, we compared sequence-based estimates (<i>ϕ</i><sub>seq</sub>, y-axis) with the array-based estimates from PC-Relate (<i>ϕ</i><sub>array</sub>, x-axis). Colored circles represent kinship coefficients between two individuals and different types of relatedness were determined in <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1007021#pgen.1007021.g002" target="_blank">Fig 2</a>. Grey crosses represent self-kinship coefficients. We evaluated SEEKIN (A, E), PC-Relate (B, F), REAP (C, G), and RelateAdmix (D, H) using the BEAGLE call set (A-D), and the BEAGLE+1KG3 call set (E-H). We only included SNPs overlapping with the SGVP dataset in the analyses, because we used the SGVP dataset as the reference panel to estimate individual-specific allele frequencies for SEEKIN, REAP and RelateAdmix.</p
    corecore