50 research outputs found

    CLEVER: Clique-Enumerating Variant Finder

    Full text link
    Next-generation sequencing techniques have facilitated a large scale analysis of human genetic variation. Despite the advances in sequencing speeds, the computational discovery of structural variants is not yet standard. It is likely that many variants have remained undiscovered in most sequenced individuals. Here we present a novel internal segment size based approach, which organizes all, including also concordant reads into a read alignment graph where max-cliques represent maximal contradiction-free groups of alignments. A specifically engineered algorithm then enumerates all max-cliques and statistically evaluates them for their potential to reflect insertions or deletions (indels). For the first time in the literature, we compare a large range of state-of-the-art approaches using simulated Illumina reads from a fully annotated genome and present various relevant performance statistics. We achieve superior performance rates in particular on indels of sizes 20--100, which have been exposed as a current major challenge in the SV discovery literature and where prior insert size based approaches have limitations. In that size range, we outperform even split read aligners. We achieve good results also on real data where we make a substantial amount of correct predictions as the only tool, which complement the predictions of split-read aligners. CLEVER is open source (GPL) and available from http://clever-sv.googlecode.com.Comment: 30 pages, 8 figure

    On Tree-Constrained Matchings and Generalizations

    Get PDF
    We consider the following \textsc{Tree-Constrained Bipartite Matching} problem: Given two rooted trees T1=(V1,E1)T_1=(V_1,E_1), T2=(V2,E2)T_2=(V_2,E_2) and a weight function w:V1×V2R+w: V_1\times V_2 \mapsto \mathbb{R}_+, find a maximum weight matching M\mathcal{M} between nodes of the two trees, such that none of the matched nodes is an ancestor of another matched node in either of the trees. This generalization of the classical bipartite matching problem appears, for example, in the computational analysis of live cell video data. We show that the problem is APX\mathcal{APX}-hard and thus, unless P=NP\mathcal{P} = \mathcal{NP}, disprove a previous claim that it is solvable in polynomial time. Furthermore, we give a 22-approximation algorithm based on a combination of the local ratio technique and a careful use of the structure of basic feasible solutions of a natural LP-relaxation, which we also show to have an integrality gap of 2o(1)2-o(1). In the second part of the paper, we consider a natural generalization of the problem, where trees are replaced by partially ordered sets (posets). We show that the local ratio technique gives a 2kρ2k\rho-approximation for the kk-dimensional matching generalization of the problem, in which the maximum number of incomparable elements below (or above) any given element in each poset is bounded by ρ\rho. We finally give an almost matching integrality gap example, and an inapproximability result showing that the dependence on ρ\rho is most likely unavoidable

    On tree-constrained matchings and generalizations

    Get PDF

    On Tree-Constrained Matchings and Generalizations

    Get PDF
    International audienceWe consider the following Tree-Constrained Bipartite Matching problem: Given a bipartite graph G=(V1,V2,E) with edge weights w:E↦ℝ+w:E↦R+, a rooted tree T1 on the set V1 and a rooted tree T2 on the set V1, find a maximum weight matching M in G, such that none of the matched nodes is an ancestor of another matched node in either of the trees. This generalization of the classical bipartite matching problem appears, for example, in the computational analysis of live cell video data. We show that the problem is APX-hard and thus, unless =P=NP, disprove a previous claim that it is solvable in polynomial time. Furthermore, we give a 2-approximation algorithm based on a combination of the local ratio technique and a careful use of the structure of basic feasible solutions of a natural LP-relaxation, which we also show to have an integrality gap of 2−o(1).In the second part of the paper, we consider a natural generalization of the problem, where trees are replaced by partially ordered sets (posets). We show that the local ratio technique gives a 2kρ-approximation for the k-dimensional matching generalization of the problem, in which the maximum number of incomparable elements below (or above) any given element in each poset is bounded by ρ. We finally give an almost matching integrality gap example, and an inapproximability result showing that the dependence on ρ is most likely unavoidable

    CIDANE: Comprehensive Isoform Discovery and Abundance Estimation

    Get PDF
    International audienceHigh-throughput sequencing of cellular RNA (RNA-seq) allows to assess the set of all RNA molecules, the transcriptome, produced by a cell at a high resolution, under various conditions. The assembly of short sequencing reads to full-length transcripts, however, poses profound challenges to bioinformatics tools

    Metric multidimensional scaling for large single-cell datasets using neural networks

    Get PDF
    Metric multidimensional scaling is one of the classical methods for embedding data into low-dimensional Euclidean space. It creates the low-dimensional embedding by approximately preserving the pairwise distances between the input points. However, current state-of-the-art approaches only scale to a few thousand data points. For larger data sets such as those occurring in single-cell RNA sequencing experiments, the running time becomes prohibitively large and thus alternative methods such as PCA are widely used instead. Here, we propose a simple neural network-based approach for solving the metric multidimensional scaling problem that is orders of magnitude faster than previous state-of-the-art approaches, and hence scales to data sets with up to a few million cells. At the same time, it provides a non-linear mapping between high- and low-dimensional space that can place previously unseen cells in the same embedding

    Genome wide association analysis in a mouse advanced intercross line

    Get PDF
    We are grateful to Heather Lawson at Washington University in St. Louis for providing LG and SM genome sequences. We thank the Gilad Lab and Functional Genomics Facility at the University of Chicago for generating DNA- and RNA-seq data. We wish to acknowledge outstanding technical assistance from Apurva Chitre at UCSD and Mike Jarsulic at the Biological Sciences Division Center for Research Informatics at the University of Chicago. We thank Clarissa Parker, John Novembre, Graham McVicker, Joe Davis, Peter Carbonetto and Shyam Gopalakrishnan for advice, training, and mentorship. Our work was funded by NIDA (AAP: R01DA021336) and NIAMS (AL: R01AR056280). We received additional support from NIGMS (NMG: T32GM007197; MGD: T32GM07281), NIDA (NMG: F31DA03635803), NHGRI (MA: R01 HG002899), and the IMS Elphinstone Scholarship at the University of Aberdeen (AIHC). The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH.Peer reviewedPublisher PD

    Genome‐wide off‐target analyses of CRISPR/Cas9‐mediated T‐cell receptor engineering in primary human T cells

    Get PDF
    Objectives Exploiting the forces of human T cells for treatment has led to the current paradigm of emerging immunotherapy strategies. Genetic engineering of the T-cell receptor (TCR) redirects specificity, ablates alloreactivity and brings significant progress and off-the-shelf options to emerging adoptive T-cell transfer (ACT) approaches. Targeted CRISPR/Cas9-mediated double-strand breaks in the DNA enable knockout or knock-in engineering. Methods Here, we perform CRISPR/Cas9-mediated TCR knockout using a therapeutically relevant ribonucleoprotein (RNP) delivery method to assess the safety of genetically engineered T-cell products. Whole-genome sequencing was performed to analyse whether CRISPR/Cas9-mediated DNA double-strand break at the TCR locus is associated with off-target events in human primary T cells. Results TCRα chain and TCRβ chain knockout leads to high on-target InDel frequency and functional knockout. None of the predicted off-target sites could be confirmed experimentally, whereas whole-genome sequencing and manual Integrative Genomics Viewer (IGV) review revealed 9 potential low-frequency off-target events genome-wide. Subsequent amplification and targeted deep sequencing in 7 of 7 evaluable loci did not confirm these low-frequency InDels. Therefore, off-target events are unlikely to be caused by the CRISPR/Cas9 engineering. Conclusion The combinatorial approach of whole-genome sequencing and targeted deep sequencing confirmed highly specific genetic engineering using CRISPR/Cas9-mediated TCR knockout without potentially harmful exonic off-target effects
    corecore