113 research outputs found

    MPAgenomics : An R package for multi-patients analysis of genomic markers

    Get PDF
    MPAgenomics, standing for multi-patients analysis (MPA) of genomic markers, is an R-package devoted to: (i) efficient segmentation, and (ii) genomic marker selection from multi-patient copy number and SNP data profiles. It provides wrappers from commonly used packages to facilitate their repeated (sometimes difficult) use, offering an easy-to-use pipeline for beginners in R. The segmentation of successive multiple profiles (finding losses and gains) is based on a new automatic choice of influential parameters since default ones were misleading in the original packages. Considering multiple profiles in the same time, MPAgenomics wraps efficient penalized regression methods to select relevant markers associated with a given response

    Fast multiclonal clusterization of V(D)J recombinations from high-throughput sequencing

    Get PDF
    International audienceBACKGROUND: V(D)J recombinations in lymphocytes are essential for immunological diversity. They are also usefulmarkers of pathologies. In leukemia, they are used to quantify the minimal residual disease duringpatient follow-up. However, the full breadth of lymphocyte diversity is not fully understood. RESULTS: We propose new algorithms that process high-throughput sequencing (HTS) data to extract unnamedV(D)J junctions and gather them into clones for quantification. This analysis is based on a seedheuristic and is fast and scalable because in the first phase, no alignment is performed with germlinedatabase sequences. The algorithms were applied to TR HTS data from a patient with acutelymphoblastic leukemia, and also on data simulating hypermutations. Our methods identified themain clone, as well as additional clones that were not identified with standard protocols. CONCLUSIONS: The proposed algorithms provide new insight into the analysis of high-troughput sequencing data forleukemia, and also to the quantitative assessment of any immunological profile. The methodsdescribed here are implemented in a C++ open-source program called Vidjil

    Multi-loci diagnosis of acute lymphoblastic leukaemia with high-throughput sequencing and bioinformatics analysis

    Get PDF
    International audienceHigh-throughput sequencing (HTS) is considered a technical revolution that has improved our knowledge of lymphoid and autoimmune diseases, changing our approach to leukaemia both at diagnosis and during follow-up. As part of an immunoglobulin/T cell receptor-based minimal residual disease (MRD) assessment of acute lymphoblastic leukaemia patients, we assessed the performance and feasibility of the replacement of the first steps of the approach based on DNA isolation and Sanger sequencing, using a HTS protocol combined with bioinformatics analysis and visualization using the Vidjil software. We prospectively analysed the diagnostic and relapse samples of 34 paediatric patients, thus identifying 125 leukaemic clones with recombinations on multiple loci (TRG, TRD, IGH and IGK), including Dd2/Dd3 and Intron/KDE rearrangements. Sequencing failures were halved (14% vs. 34%, P = 0.0007), enabling more patients to be monitored. Furthermore, more markers per patient could be monitored, reducing the probability of false negative MRD results. The whole analysis, from sample receipt to clinical validation, was shorter than our current diagnostic protocol, with equal resources. V(D)J recombination was successfully assigned by the software, even for unusual recombinations. This study emphasizes the progress that HTS with adapted bioinformatics tools can bring to the diagnosis of leukaemia patients

    DiNAMO: Exact method for degenerate IUPAC motifs discovery, characterization of sequence-specific errors

    Get PDF
    National audienceNext generation sequencing technologies are still associated with relatively high error rates, about 1%, which correspond to thousands of errors in the scale of a complete genome. Each region needs therefore to be sequenced several times and variants are usually filtered based on depth criteria. The significant number of artifacts, in spite of those filters, shows the limit of conventional approaches and indicates that some sequencing artifacts are recurrent. This recurrence underlines that sequencing errors can depend on the upstream nucleotide sequence context. Our goal is to search for overrepresented motifs that tend to induce sequencing errors. Previous studies showed that some motifs, such as GGT [1,2], induce sequencing errors in the Illumina technologies. However, these studies were dedicated to exact motifs, and did not take into account approximate motifs, limiting the statistical power of such approaches. On the other hand, some tools, such as FIRE [3], DREME [4] and Discrover [5], were developed to search for degenerate motifs over the 15-letter IUPAC alphabet in the context of chip-seq studies. However, these tools use greedy algorithms, implying a lack of sensitivity. So we developed an exact algorithm to search for degenerate motifs by enumerating all possible IUPAC motifs. This algorithm is based on mutual information and uses hashtables with graphs data structure to store the motifs. It is independent from the sequencing technology. Experimental results on real data show that there are many overrepresented motifs upstream of sequencing artifacts. These latter are identified through the strand bias between forward and reverse reads. The homopoly-mer of length 3 CCC seems to be sufficient to induce errors on IonTorrent. On Illumina, motifs are mainly composed of GGC followed by GGT (like: TGGCNGGT) or homopolymers. We have also noticed a base quality fall after the detected motifs. Our exact algorithm requires less than one minute (Intel R Core TM i5-4570 CPU, 3.20GHz), and less than 2GB of RAM to search for full degenerate motifs of length 6 on a dataset of approximately 24000 sequences, extracted from 11 exomes sequenced on IonTorrent Proton

    Fast multiclonal clusterization of V(D)J recombinations from high-throughput sequencing

    Get PDF
    International audienceBACKGROUND: V(D)J recombinations in lymphocytes are essential for immunological diversity. They are also usefulmarkers of pathologies. In leukemia, they are used to quantify the minimal residual disease duringpatient follow-up. However, the full breadth of lymphocyte diversity is not fully understood. RESULTS: We propose new algorithms that process high-throughput sequencing (HTS) data to extract unnamedV(D)J junctions and gather them into clones for quantification. This analysis is based on a seedheuristic and is fast and scalable because in the first phase, no alignment is performed with germlinedatabase sequences. The algorithms were applied to TR HTS data from a patient with acutelymphoblastic leukemia, and also on data simulating hypermutations. Our methods identified themain clone, as well as additional clones that were not identified with standard protocols. CONCLUSIONS: The proposed algorithms provide new insight into the analysis of high-troughput sequencing data forleukemia, and also to the quantitative assessment of any immunological profile. The methodsdescribed here are implemented in a C++ open-source program called Vidjil

    Activating mutations in genes related to TCR signaling in angioimmunoblastic and other follicular helper T-cell-derived lymphomas.

    Get PDF
    Angioimmunoblastic T-cell lymphoma (AITL) and other lymphomas derived from follicular T-helper cells (TFH) represent a large proportion of peripheral T-cell lymphomas (PTCLs) with poorly understood pathogenesis and unfavorable treatment results. We investigated a series of 85 patients with AITL (n = 72) or other TFH-derived PTCL (n = 13) by targeted deep sequencing of a gene panel enriched in T-cell receptor (TCR) signaling elements. RHOA mutations were identified in 51 of 85 cases (60%) consisting of the highly recurrent dominant negative G17V variant in most cases and a novel K18N in 3 cases, the latter showing activating properties in in vitro assays. Moreover, half of the patients carried virtually mutually exclusive mutations in other TCR-related genes, most frequently in PLCG1 (14.1%), CD28 (9.4%, exclusively in AITL), PI3K elements (7%), CTNNB1 (6%), and GTF2I (6%). Using in vitro assays in transfected cells, we demonstrated that 9 of 10 PLCG1 and 3 of 3 CARD11 variants induced MALT1 protease activity and increased transcription from NFAT or NF-κB response element reporters, respectively. Collectively, the vast majority of variants in TCR-related genes could be classified as gain-of-function. Accordingly, the samples with mutations in TCR-related genes other than RHOA had transcriptomic profiles enriched in signatures reflecting higher T-cell activation. Although no correlation with presenting clinical features nor significant impact on survival was observed, the presence of TCR-related mutations correlated with early disease progression. Thus, targeting of TCR-related events may hold promise for the treatment of TFH-derived lymphomas
    corecore