11 research outputs found

    Bioinformatics approaches for the detection and classification of somatic mutations in hematological malignancies

    No full text
    Die Sequenzierungstechnologien entwickeln sich stetig weiter, dies ermöglicht eine zuvor nicht erreichte Ausbeute an experimentellen Daten und auch an Neuentwicklungen von zuvor nicht realisierbaren Experimenten. Zugleich werden spezifische Datenbanken, Algorithmen und Softwareprogramme entwickelt, um die neu entstandenen Daten zu analysieren. Während der Untersuchung bioinformatischer Methoden für die Identifizierung und Klassifizierung somatischer Mutationen in hämatologischen Erkrankungen, zeigte sich eine hohe Vielfalt an alternativen Softwaretools die für die jeweiligen Analyseschritte genutzt werden können. Derzeit existiert noch kein Standard zur effizienten Analyse von Mutationen aus Next-Generation-Sequencing (NGS)-Daten. Die unterschiedlichen Methoden und Pipelines generieren Kandidaten, die zum größten Anteil in allen Ansätzen identifiziert werden können, jedoch werden Software spezifische Kandidaten nicht einheitlich detektiert. Um eine einheitliche und effiziente Analyse von NGS-Daten durchzuführen war im Rahmen dieser Arbeit die Entwicklung einer benutzerfreundlichen und einheitlichen Pipeline vorgesehen. Hierfür wurden zunächst die essentiellen Analysen wie die Identifizierung der Basen, die Alignierung und die Identifizierung der Mutationen untersucht. Des Weiteren wurden unter Berücksichtigung von Effizienz und Performance diverse verfügbare Softwaretools getestet, ausgewertet und sowohl mögliche Verbesserungen als auch Erleichterungen der bisherigen Analysen vorgestellt und diskutiert. Durch Mitwirken in Konsortien wie der klinischen Forschergruppe 216 (KFO 216) und International Cancer Genome Consortium (ICGC) oder auch bei Haus-internen Projekten wurden Datensätze zu den Entitäten Multiples Myelom (MM), Burkitt Lymphom (BL) und Follikuläres Lymphom (FL) erstellt und analysiert. Die Selektion geeigneter Softwaretools und die Generierung der Pipeline basieren auf komparativen Analysen dieser Daten, sowie auf geteilte Ergebnisse und Erfahrungen in der Literatur und auch in Foren. Durch die gezielte Entwicklung von Skripten konnten biologische und klinische Fragestellungen bearbeitet werden. Hierzu zählten eine einheitliche Annotation der Gennamen, sowie die Erstellung von Genmutations-Heatmaps mit nicht Variant-Calling-File (VCF)-Syntax konformen Dateien. Des Weiteren konnten nicht abgedeckte Regionen des Genoms in den NGS-Daten identifiziert und analysiert werden. Neue Projekte zur detaillierten Untersuchung der Verteilung von wiederkehrender Mutationen und Funktionsassays zu einzelnen Mutationskandidaten konnten basierend auf den Ergebnissen initiiert werden. Durch eigens erstellte Python-Skripte konnte somit die Funktionalität der Pipeline erweitert werden und zu wichtigen Erkenntnissen bei der biologischen Interpretation der Sequenzierungsdaten führen, wie beispielsweise zu der Detektion von drei neuen molekularen Subgruppen im MM. Die Erweiterungen, der in dieser Arbeit entwickelten Pipeline verbesserte somit die Effizienz der Analyse und die Vergleichbarkeit unserer Daten. Des Weiteren konnte durch die Erstellung eines eigenen Skripts die Analyse von unbeachteten Regionen in den NGS-Daten erfolgen.The sequencing technologies, while still being under further development, render it possible to develop novel experiments and allow the generation of larger amounts of utilizable data. At the same time novel software tools, databases and algorithms are developed to analyze these larger amounts of data. The analysis of somatic mutations in hematological malignancies showed that a high variety of alternative software tools can be used for different analysis steps. Furthermore there is currently no standardized procedure for the efficient identification and analysis of mutations in NGS data. The different pipeline and methods are, for the most part, able to identify the same mutation candidates, however there are software specific candidates which are not called by all pipelines. The scope of this dissertation was therefore to develop a user-friendly pipeline which is able to call candidate mutations uniformly and efficiently. For this purpose necessary analysis steps including base calling, alignment generation and variant calling were investigated. Furthermore available software tools were tested and evaluated regarding their efficiency and performance. Possible improvements of these software tools and previously performed analysis are explained and discussed in this work. NGS data sets of the different cancer entities multiple myeloma (MM), Burkitt lymphoma (BL) and follicular lymphoma (FL) were generated and analyzed within the framework of cooperate projects like the International Cancer Genome Consortium (ICGC) and the Clinical Research Group 216 (KFO) as well as for internal projects. The development of the pipeline and selection of suitable software tools is based on the comparative analysis of the generated data sets, as well as previously described results and experiences in literature and forums. The selective development of certain python scripts enabled the evaluation of novel biological and clinical questions by standardizing gene names in the annotation step, generating heat- maps of non-standardized VCF-files as well as the identification and analysis of uncovered regions in NGS data sets. This work and the obtained results thereby provide the groundwork for further projects e.g. the analysis of the distribution of recurrent mutations or the functional analysis of specific mutation candidates. This extensions of the developed pipeline with python scripts helped to improve the efficiency and comparability of the NGS data. The interpretation of the NGS data with the extended script for example led to the discovery of three distinct molecular subgroups in MM. Furthermore the generation of the novel python scripts helped to analyze uncovered regions in the NGS data sets.

    Bioinformatics approaches for the detection and classification of somatic mutations in hematological malignancies

    No full text
    Die Sequenzierungstechnologien entwickeln sich stetig weiter, dies ermöglicht eine zuvor nicht erreichte Ausbeute an experimentellen Daten und auch an Neuentwicklungen von zuvor nicht realisierbaren Experimenten. Zugleich werden spezifische Datenbanken, Algorithmen und Softwareprogramme entwickelt, um die neu entstandenen Daten zu analysieren. Während der Untersuchung bioinformatischer Methoden für die Identifizierung und Klassifizierung somatischer Mutationen in hämatologischen Erkrankungen, zeigte sich eine hohe Vielfalt an alternativen Softwaretools die für die jeweiligen Analyseschritte genutzt werden können. Derzeit existiert noch kein Standard zur effizienten Analyse von Mutationen aus Next-Generation-Sequencing (NGS)-Daten. Die unterschiedlichen Methoden und Pipelines generieren Kandidaten, die zum größten Anteil in allen Ansätzen identifiziert werden können, jedoch werden Software spezifische Kandidaten nicht einheitlich detektiert. Um eine einheitliche und effiziente Analyse von NGS-Daten durchzuführen war im Rahmen dieser Arbeit die Entwicklung einer benutzerfreundlichen und einheitlichen Pipeline vorgesehen. Hierfür wurden zunächst die essentiellen Analysen wie die Identifizierung der Basen, die Alignierung und die Identifizierung der Mutationen untersucht. Des Weiteren wurden unter Berücksichtigung von Effizienz und Performance diverse verfügbare Softwaretools getestet, ausgewertet und sowohl mögliche Verbesserungen als auch Erleichterungen der bisherigen Analysen vorgestellt und diskutiert. Durch Mitwirken in Konsortien wie der klinischen Forschergruppe 216 (KFO 216) und International Cancer Genome Consortium (ICGC) oder auch bei Haus-internen Projekten wurden Datensätze zu den Entitäten Multiples Myelom (MM), Burkitt Lymphom (BL) und Follikuläres Lymphom (FL) erstellt und analysiert. Die Selektion geeigneter Softwaretools und die Generierung der Pipeline basieren auf komparativen Analysen dieser Daten, sowie auf geteilte Ergebnisse und Erfahrungen in der Literatur und auch in Foren. Durch die gezielte Entwicklung von Skripten konnten biologische und klinische Fragestellungen bearbeitet werden. Hierzu zählten eine einheitliche Annotation der Gennamen, sowie die Erstellung von Genmutations-Heatmaps mit nicht Variant-Calling-File (VCF)-Syntax konformen Dateien. Des Weiteren konnten nicht abgedeckte Regionen des Genoms in den NGS-Daten identifiziert und analysiert werden. Neue Projekte zur detaillierten Untersuchung der Verteilung von wiederkehrender Mutationen und Funktionsassays zu einzelnen Mutationskandidaten konnten basierend auf den Ergebnissen initiiert werden. Durch eigens erstellte Python-Skripte konnte somit die Funktionalität der Pipeline erweitert werden und zu wichtigen Erkenntnissen bei der biologischen Interpretation der Sequenzierungsdaten führen, wie beispielsweise zu der Detektion von drei neuen molekularen Subgruppen im MM. Die Erweiterungen, der in dieser Arbeit entwickelten Pipeline verbesserte somit die Effizienz der Analyse und die Vergleichbarkeit unserer Daten. Des Weiteren konnte durch die Erstellung eines eigenen Skripts die Analyse von unbeachteten Regionen in den NGS-Daten erfolgen.The sequencing technologies, while still being under further development, render it possible to develop novel experiments and allow the generation of larger amounts of utilizable data. At the same time novel software tools, databases and algorithms are developed to analyze these larger amounts of data. The analysis of somatic mutations in hematological malignancies showed that a high variety of alternative software tools can be used for different analysis steps. Furthermore there is currently no standardized procedure for the efficient identification and analysis of mutations in NGS data. The different pipeline and methods are, for the most part, able to identify the same mutation candidates, however there are software specific candidates which are not called by all pipelines. The scope of this dissertation was therefore to develop a user-friendly pipeline which is able to call candidate mutations uniformly and efficiently. For this purpose necessary analysis steps including base calling, alignment generation and variant calling were investigated. Furthermore available software tools were tested and evaluated regarding their efficiency and performance. Possible improvements of these software tools and previously performed analysis are explained and discussed in this work. NGS data sets of the different cancer entities multiple myeloma (MM), Burkitt lymphoma (BL) and follicular lymphoma (FL) were generated and analyzed within the framework of cooperate projects like the International Cancer Genome Consortium (ICGC) and the Clinical Research Group 216 (KFO) as well as for internal projects. The development of the pipeline and selection of suitable software tools is based on the comparative analysis of the generated data sets, as well as previously described results and experiences in literature and forums. The selective development of certain python scripts enabled the evaluation of novel biological and clinical questions by standardizing gene names in the annotation step, generating heat- maps of non-standardized VCF-files as well as the identification and analysis of uncovered regions in NGS data sets. This work and the obtained results thereby provide the groundwork for further projects e.g. the analysis of the distribution of recurrent mutations or the functional analysis of specific mutation candidates. This extensions of the developed pipeline with python scripts helped to improve the efficiency and comparability of the NGS data. The interpretation of the NGS data with the extended script for example led to the discovery of three distinct molecular subgroups in MM. Furthermore the generation of the novel python scripts helped to analyze uncovered regions in the NGS data sets.

    Comparative Analysis of Plasmids in the Genus Listeria

    Get PDF
    Kuenne C, Voget S, Pischimarov J, et al. Comparative Analysis of Plasmids in the Genus Listeria. PLOS ONE. 2010;5(9): e12511.Background: We sequenced four plasmids of the genus Listeria, including two novel plasmids from L. monocytogenes serotype 1/2c and 7 strains as well as one from the species L. grayi. A comparative analysis in conjunction with 10 published Listeria plasmids revealed a common evolutionary background. Principal Findings: All analysed plasmids share a common replicon-type related to theta-replicating plasmid pAMbeta1. Nonetheless plasmids could be broadly divided into two distinct groups based on replicon diversity and the genetic content of the respective plasmid groups. Listeria plasmids are characterized by the presence of a large number of diverse mobile genetic elements and a commonly occurring translesion DNA polymerase both of which have probably contributed to the evolution of these plasmids. We detected small non-coding RNAs on some plasmids that were homologous to those present on the chromosome of L. monocytogenes EGD-e. Multiple genes involved in heavy metal resistance (cadmium, copper, arsenite) as well as multidrug efflux (MDR, SMR, MATE) were detected on all listerial plasmids. These factors promote bacterial growth and survival in the environment and may have been acquired as a result of selective pressure due to the use of disinfectants in food processing environments. MDR efflux pumps have also recently been shown to promote transport of cyclic diadenosine monophosphate (c-di-AMP) as a secreted molecule able to trigger a cytosolic host immune response following infection. Conclusions: The comparative analysis of 14 plasmids of genus Listeria implied the existence of a common ancestor. Ubiquitously-occurring MDR genes on plasmids and their role in listerial infection now deserve further attention

    Comparative genome-wide analysis of small RNAs of major Gram-positive pathogens

    Get PDF
    In the recent years, the number of drug- and multi-drug-resistant microbial strains has increased rapidly. Therefore, the need to identify innovative approaches for development of novel anti-infectives and new therapeutic targets is of high priority in global health care. The detection of small RNAs (sRNAs) in bacteria has attracted considerable attention as an emerging class of new gene expression regulators. Several experimental technologies to predict sRNA have been established for the Gram-negative model organism Escherichia coli. In many respects, sRNA screens in this model system have set a blueprint for the global and functional identification of sRNAs for Gram-positive microbes, but the functional role of sRNAs in colonization and pathogenicity for Listeria monocytogenes, Staphylococcus aureus, Streptococcuspyogenes, Enterococcus faecalis and Clostridium difficile is almost completely unknown. Here, we report the current knowledge about the sRNAs of these socioeconomically relevant Gram-positive pathogens, overview the state-of-the-art high-throughput sRNA screening methods and summarize bioinformatics approaches for genome-wide sRNA identification and target prediction. Finally, we discuss the use of modified peptide nucleic acids (PNAs) as a novel tool to inactivate potential sRNA and their applications in rapid and specific detection of pathogenic bacteria

    Rare SNPs in receptor tyrosine kinases are negative outcome predictors in multiple myeloma

    No full text
    Multiple myeloma (MM) is a plasma cell disorder that is characterized by a great genetic heterogeneity. Recent next generation sequencing studies revealed an accumulation of tumor-associated mutations in receptor tyrosine kinases (RTKs) which may also contribute to the activation of survival pathways in MM. To investigate the clinical role of RTK-mutations in MM, we deep-sequenced the coding DNA-sequence of EGFR, EPHA2, ERBB3, IGF1R, NTRK1 and NTRK2 which were previously found to be mutated in MM, in 75 uniformly treated MM patients of the “Deutsche Studiengruppe Multiples Myelom”. Subsequently, we correlated the detected mutations with common cytogenetic alterations and clinical parameters. We identified 11 novel non-synonymous SNVs or rare patient-specific SNPs, not listed in the SNP databases 1000 genomes and dbSNP, in 10 primary MM cases. The mutations predominantly affected the tyrosine-kinase and ligand-binding domains and no correlation with cytogenetic parameters was found. Interestingly, however, patients with RTK-mutations, specifically those with rare patient-specific SNPs, showed a significantly lower overall, event-free and progression-free survival. This indicates that RTK SNVs and rare patient-specific RTK SNPs are of prognostic relevance and suggests that MM patients with RTK-mutations could potentially profit from treatment with RTK-inhibitors

    Comparative genome-wide analysis of small RNAs of major Gram-positive pathogens: from identification to application

    Get PDF
    International audienceIn the recent years, the number of drug‐ and multi‐drug‐resistant microbial strains has increased rapidly. Therefore, the need to identify innovative approaches for development of novel anti‐infectives and new therapeutic targets is of high priority in global health care. The detection of small RNAs (sRNAs) in bacteria has attracted considerable attention as an emerging class of new gene expression regulators. Several experimental technologies to predict sRNA have been established for the Gram‐negative model organism Escherichia coli. In many respects, sRNA screens in this model system have set a blueprint for the global and functional identification of sRNAs for Gram‐positive microbes, but the functional role of sRNAs in colonization and pathogenicity for Listeria monocytogenes, Staphylococcus aureus, Streptococcus pyogenes, Enterococcus faecalis and Clostridium difficile is almost completely unknown. Here, we report the current knowledge about the sRNAs of these socioeconomically relevant Gram‐positive pathogens, overview the state‐of‐the‐art high‐throughput sRNA screening methods and summarize bioinformatics approaches for genome‐wide sRNA identification and target prediction. Finally, we discuss the use of modified peptide nucleic acids (PNAs) as a novel tool to inactivate potential sRNA and their applications in rapid and specific detection of pathogenic bacteria

    Recurrent mutation of the ID3 gene in Burkitt lymphoma identified by integrated genome, exome and transcriptome sequencing

    No full text
    Burkitt lymphoma is a mature aggressive B-cell lymphoma derived from germinal center B cells. Its cytogenetic hallmark is the Burkitt translocation t(8;14)(q24;q32) and its variants, which juxtapose the MYC oncogene with one of the three immunoglobulin loci. Consequently, MYC is deregulated, resulting in massive perturbation of gene expression. Nevertheless, MYC deregulation alone seems not to be sufficient to drive Burkitt lymphomagenesis. By whole-genome, whole-exome and transcriptome sequencing of four prototypical Burkitt lymphomas with immunoglobulin gene (IG)-MYC translocation, we identified seven recurrently mutated genes. One of these genes, ID3, mapped to a region of focal homozygous loss in Burkitt lymphoma. In an extended cohort, 36 of 53 molecularly defined Burkitt lymphomas (68%) carried potentially damaging mutations of ID3. These were strongly enriched at somatic hypermutation motifs. Only 6 of 47 other B-cell lymphomas with the IG-MYC translocation (13%) carried ID3 mutations. These findings suggest that cooperation between ID3 inactivation and IG-MYC translocation is a hallmark of Burkitt lymphomagenesis
    corecore