766 research outputs found

    New Approaches to Long-Read Assembly under High Error Rates

    Get PDF
    Das Gebiet der Genomassemblierung beschäftigt sich mit der Entwicklung von Algorithmen, die Genome am Computer anhand von Sequenzierungsdaten rekonstruieren. Es geriet erstmals in den Neunzigern mit dem Human Genome Project in den Fokus der Öffentlichkeit. Da nur kurze Abschnitte des menschlichen Genoms ausgelesen werden konnten, musste die Rekonstruktion längerer Genomsequenzen aus den ausgelesenen Abschnitten im Nachhinein am Computer erfolgen. Auch fast 20 Jahre nach der Veröffentlichung der menschlichen Genomsequenzen stellt die Genomeassemblierung nach wie vor noch einen essentiellen Verarbeitungsschritt für Sequenzierungsdaten dar. Nur Datendurchsatz, Länge und Fehlerprofil der ausgelesenen Genomabschnitte haben sich verändert und damit einhergehend auch die algorithmischen Anforderungen. Damit komplementiert das Forschungsgebiet der Genomeassemblierung die Sequenzierungstechnologien, die sich mit enormer Geschwindigkeit weiter entwickelt haben. Zusammen erlauben sie die Entschlüsselung der Genome einer stark zunehmenden Anzahl von Lebewesen und bilden damit die Grundlage für einen Großteil der Forschung in verschiedensten Bereichen der Biologie und Medizin. Trotz der beeindruckenden technologischen und algorithmischen Entwicklungen der vergangenen Jahrzehnte ist es bisher nur für bakterielle Genome gelungen, die komplette Genomsequenz zu rekontruieren. Bei der Assemblierung der wesentlich größeren eukaryotischen Genome bestehen mehrere ungelöste algorithmische Probleme. Diese Probleme hängen mit verschiedenen repetitiven Strukturen zusammen, die in fast allen Genomen höherer Lebewesen vorkommen. Deshalb werden eukaryotische Genome immer in wesentlich mehr unzusammenhängenden Sequenzen veröffentlicht als die jeweiligen Lebewesen Chromosomen haben. Die repetitiven Strukturen, die für die Lücken in den Genomsequenzen verantwortlich sind, lassen sich grob in drei Klassen unterteilen. Mikrosatelliten und Minisatelliten sind sehr kurze Sequenzen, die sich tausende oder zehntausende Male direkt aufeinander folgend wiederholen können. Dieses Muster ist typisch für sogenannte Centromere und Telomere, die sich in der Mitte und an den Enden vieler Chromosome befinden. Sogenannte Interspersed Repeats, oft auch als Transposons bezeichnet, sind längere Sequenzen, die häufig in fast identischer Form an unterschiedlichen Stellen im Genome vorkommen. Sogenannte Tandem Repeats dagegen sind längere Sequenzen, die direkt aufeinanderfolgend mehrere Male in einem Genom auftreten können. Oft sind Tandem Repeats Genkomplexe, das heißt Ansammlungen fast identischer proteinkodierender Abschnitte, die es der Zelle erlauben, die kodierten Proteine besonders schnell zu produzieren. Jede dieser repetitive Strukturen stellt spezifische Anforderung an Assemblierungsalgorithmen. In dieser Doktorarbeit leisten wir mehrere Beiträge zur Lösung der letzteren zwei vorgestellten Probleme, der Assemblierung von Interspersed Repeats und Tandem Repeats. In Teil 1 der Arbeit stellen wir mehrere Datenverarbeitungsprozeduren vor, die Sequenzierungsdaten aufbereiten, um die seltenen Unterschiede zwischen mehrfach auftretenden Genomsequenzen zu identifizieren. Diese beinhalten Softwareprogramme zur Berechnung und Optimierung von Multiplen Sequenz Alignments (MSA) anhand dynamischer Programmierung und zur statistischen Modellierung und Analyse der Unterschiede, wie das MSA sie präsentiert. In Teil 2 bauen wir auf dieser Analyse auf und präsentieren ein Softwareprogramm zur Assemblierung von Interspersed Repeats. Dieses Programm baut auf mehreren algorithmischen Neuerungen auf und ist in der Lage, Transposonfamilien mit sehr langen Sequenzen und sehr vielen verschiedenen Kopien effektiv zu assemblieren. Es ist das erste Programm dieser Art, welches in der Lage ist, Transposonfamilien mit dutzenden von Kopien zu assemblieren. Es gelingt uns zu zeigen, dass es auch für kleinere Transposonfamilien akkurater und schneller ist als das bisher einzige Konkurrenzprogramm, welches auf dieses Assemblierungsproblem spezialisiert ist. In Teil 3 beschreiben wir eine Analysepipeline, die es uns ermöglicht, Genkomplexe aus dutzenden von Tandem Repeats zu assemblieren. Diese Pipeline enthält Clustering und Graph Drawing Algorithmen. Ihr Herzstück ist ein Fehlerkorrekturalgorithmus, der auf Neuronalen Netzwerken basiert. Wir demonstrieren den praktischen Nutzen dieser Pipeline durch die Assemblierung des Drosophila Histone Komplexes. Im Abschluss diskutieren wir die Möglichkeit, Mikro- und Minisatelliten zu assemblieren und schlagen Forschungsansätze für weitere Verbesserungen im Bereich der Interspersed Repeat- und Genkomplexassemblierung vor

    LTC: a novel algorithm to improve the efficiency of contig assembly for physical mapping in complex genomes

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Physical maps are the substrate of genome sequencing and map-based cloning and their construction relies on the accurate assembly of BAC clones into large contigs that are then anchored to genetic maps with molecular markers. High Information Content Fingerprinting has become the method of choice for large and repetitive genomes such as those of maize, barley, and wheat. However, the high level of repeated DNA present in these genomes requires the application of very stringent criteria to ensure a reliable assembly with the FingerPrinted Contig (FPC) software, which often results in short contig lengths (of 3-5 clones before merging) as well as an unreliable assembly in some difficult regions. Difficulties can originate from a non-linear topological structure of clone overlaps, low power of clone ordering algorithms, and the absence of tools to identify sources of gaps in Minimal Tiling Paths (MTPs).</p> <p>Results</p> <p>To address these problems, we propose a novel approach that: (i) reduces the rate of false connections and Q-clones by using a new cutoff calculation method; (ii) obtains reliable clusters robust to the exclusion of single clone or clone overlap; (iii) explores the topological contig structure by considering contigs as networks of clones connected by significant overlaps; (iv) performs iterative clone clustering combined with ordering and order verification using re-sampling methods; and (v) uses global optimization methods for clone ordering and Band Map construction. The elements of this new analytical framework called Linear Topological Contig (LTC) were applied on datasets used previously for the construction of the physical map of wheat chromosome 3B with FPC. The performance of LTC vs. FPC was compared also on the simulated BAC libraries based on the known genome sequences for chromosome 1 of rice and chromosome 1 of maize.</p> <p>Conclusions</p> <p>The results show that compared to other methods, LTC enables the construction of highly reliable and longer contigs (5-12 clones before merging), the detection of "weak" connections in contigs and their "repair", and the elongation of contigs obtained by other assembly methods.</p

    A genome survey of Moniliophthora perniciosa gives new insights into Witches' Broom Disease of cacao

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The basidiomycete fungus <it>Moniliophthora perniciosa </it>is the causal agent of Witches' Broom Disease (WBD) in cacao (<it>Theobroma cacao</it>). It is a hemibiotrophic pathogen that colonizes the apoplast of cacao's meristematic tissues as a biotrophic pathogen, switching to a saprotrophic lifestyle during later stages of infection. <it>M. perniciosa</it>, together with the related species <it>M. roreri</it>, are pathogens of aerial parts of the plant, an uncommon characteristic in the order Agaricales. A genome survey (1.9× coverage) of <it>M. perniciosa </it>was analyzed to evaluate the overall gene content of this phytopathogen.</p> <p>Results</p> <p>Genes encoding proteins involved in retrotransposition, reactive oxygen species (ROS) resistance, drug efflux transport and cell wall degradation were identified. The great number of genes encoding cytochrome P450 monooxygenases (1.15% of gene models) indicates that <it>M. perniciosa </it>has a great potential for detoxification, production of toxins and hormones; which may confer a high adaptive ability to the fungus. We have also discovered new genes encoding putative secreted polypeptides rich in cysteine, as well as genes related to methylotrophy and plant hormone biosynthesis (gibberellin and auxin). Analysis of gene families indicated that <it>M. perniciosa </it>have similar amounts of carboxylesterases and repertoires of plant cell wall degrading enzymes as other hemibiotrophic fungi. In addition, an approach for normalization of gene family data using incomplete genome data was developed and applied in <it>M. perniciosa </it>genome survey.</p> <p>Conclusion</p> <p>This genome survey gives an overview of the <it>M. perniciosa </it>genome, and reveals that a significant portion is involved in stress adaptation and plant necrosis, two necessary characteristics for a hemibiotrophic fungus to fulfill its infection cycle. Our analysis provides new evidence revealing potential adaptive traits that may play major roles in the mechanisms of pathogenicity in the <it>M. perniciosa</it>/cacao pathosystem.</p

    Strainer: software for analysis of population variation in community genomic datasets

    Get PDF
    Background: Metagenomic analyses of microbial communities that are comprehensive enough to provide multiple samples of most loci in the genomes of the dominant organism types will also reveal patterns of genetic variation within natural populations. New bio-informatic tools will enable visualization and comprehensive analysis of this sequence variation and inference of recent evolutionary and ecological processes. Results: We have developed a software package for analysis and visualization of genetic variation in populations and reconstruction of strain variants from otherwise co-assembled sequences. Sequencing reads can be clustered by matching patterns of single nucleotide polymorphisms to generate predicted gene and protein variant sequences, identify conserved intergenic regulatory sequences, and determine the quantity and distribution of recombination events. Conclusion: The Strainer software, a first generation metagenomic bioinformatics tool, facilitates comprehension and analysis of heterogeneity intrinsic in natural communities. The program reveals the degree of clustering among closely related sequence variants and provides a rapid means to generate gene and protein sequences for functional, ecological, and evolutionary analyses

    Genome characterization and population genetic structure of the zoonotic pathogen, streptococcus canis

    Get PDF
    Background - Streptococcus canis is an important opportunistic pathogen of dogs and cats that can also infect a wide range of additional mammals including cows where it can cause mastitis. It is also an emerging human pathogen. Results - Here we provide characterization of the first genome sequence for this species, strain FSL S3-227 (milk isolate from a cow with an intra-mammary infection). A diverse array of putative virulence factors was encoded by the S. canis FSL S3-227 genome. Approximately 75% of these gene sequences were homologous to known Streptococcal virulence factors involved in invasion, evasion, and colonization. Present in the genome are multiple potentially mobile genetic elements (MGEs) [plasmid, phage, integrative conjugative element (ICE)] and comparison to other species provided convincing evidence for lateral gene transfer (LGT) between S. canis and two additional bovine mastitis causing pathogens (Streptococcus agalactiae, and Streptococcus dysgalactiae subsp. dysgalactiae), with this transfer possibly contributing to host adaptation. Population structure among isolates obtained from Europe and USA [bovine = 56, canine = 26, and feline = 1] was explored. Ribotyping of all isolates and multi locus sequence typing (MLST) of a subset of the isolates (n = 45) detected significant differentiation between bovine and canine isolates (Fisher exact test: P = 0.0000 [ribotypes], P = 0.0030 [sequence types]), suggesting possible host adaptation of some genotypes. Concurrently, the ancestral clonal complex (54% of isolates) occurred in many tissue types, all hosts, and all geographic locations suggesting the possibility of a wide and diverse niche. Conclusion - This study provides evidence highlighting the importance of LGT in the evolution of the bacteria S. canis, specifically, its possible role in host adaptation and acquisition of virulence factors. Furthermore, recent LGT detected between S. canis and human bacteria (Streptococcus urinalis) is cause for concern, as it highlights the possibility for continued acquisition of human virulence factors for this emerging zoonotic pathogen

    Genomic evidence for genes encoding leucine-rich repeat receptors linked to resistance against the eukaryotic extra- and intracellular Brassica napus pathogens Leptosphaeria maculans and Plasmodiophora brassicae

    Get PDF
    © 2018 Stotz et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.Genes coding for nucleotide-binding leucine-rich repeat (LRR) receptors (NLRs) control resistance against intracellular (cell-penetrating) pathogens. However, evidence for a role of genes coding for proteins with LRR domains in resistance against extracellular (apoplastic) fungal pathogens is limited. Here, the distribution of genes coding for proteins with eLRR domains but lacking kinase domains was determined for the Brassica napus genome. Predictions of signal peptide and transmembrane regions divided these genes into 184 coding for receptor-like proteins (RLPs) and 121 coding for secreted proteins (SPs). Together with previously annotated NLRs, a total of 720 LRR genes were found. Leptosphaeria maculans-induced expression during a compatible interaction with cultivar Topas differed between RLP, SP and NLR gene families; NLR genes were induced relatively late, during the necrotrophic phase of pathogen colonization. Seven RLP, one SP and two NLR genes were found in Rlm1 and Rlm3/Rlm4/Rlm7/Rlm9 loci for resistance against L. maculans on chromosome A07 of B. napus. One NLR gene at the Rlm9 locus was positively selected, as was the RLP gene on chromosome A10 with LepR3 and Rlm2 alleles conferring resistance against L. maculans races with corresponding effectors AvrLm1 and AvrLm2, respectively. Known loci for resistance against L. maculans (extracellular hemi-biotrophic fungus), Sclerotinia sclerotiorum (necrotrophic fungus) and Plasmodiophora brassicae (intracellular, obligate biotrophic protist) were examined for presence of RLPs, SPs and NLRs in these regions. Whereas loci for resistance against P. brassicae were enriched for NLRs, no such signature was observed for the other pathogens. These findings demonstrate involvement of (i) NLR genes in resistance against the intracellular pathogen P. brassicae and a putative NLR gene in Rlm9-mediated resistance against the extracellular pathogen L. maculans.Peer reviewe

    The role of clonal communication and heterogeneity in breast cancer

    Get PDF
    Background: Cancer is a rapidly evolving, multifactorial disease that accumulates numerous genetic and epigenetic alterations. This results in molecular and phenotypic heterogeneity within the tumor, the complexity of which is further amplified through specific interactions between cancer cells. We aimed to dissect the molecular mechanisms underlying the cooperation between different clones. Methods: We produced clonal cell lines derived from the MDA-MB-231 breast cancer cell line, using the UbC-StarTrack system, which allowed tracking of multiple clones by color: GFP C3, mKO E10 and Sapphire D7. Characterization of these clones was performed by growth rate, cell metabolic activity, wound healing, invasion assays and genetic and epigenetic arrays. Tumorigenicity was tested by orthotopic and intravenous injections. Clonal cooperation was evaluated by medium complementation, co-culture and co-injection assays. Results: Characterization of these clones in vitro revealed clear genetic and epigenetic differences that affected growth rate, cell metabolic activity, morphology and cytokine expression among cell lines. In vivo, all clonal cell lines were able to form tumors; however, injection of an equal mix of the different clones led to tumors with very few mKO E10 cells. Additionally, the mKO E10 clonal cell line showed a significant inability to form lung metastases. These results confirm that even in stable cell lines heterogeneity is present. In vitro, the complementation of growth medium with medium or exosomes from parental or clonal cell lines increased the growth rate of the other clones. Complementation assays, co-growth and co-injection of mKO E10 and GFP C3 clonal cell lines increased the efficiency of invasion and migration. Conclusions: These findings support a model where interplay between clones confers aggressiveness, and which may allow identification of the factors involved in cellular communication that could play a role in clonal cooperation and thus represent new targets for preventing tumor progression

    Metagenomics Reveal Triclosan-Induced Changes in the Antibiotic Resistome of Anaerobic Digesters

    Get PDF
    Triclosan (TCS) is a broad-spectrum antimicrobial used in a variety of consumer products. While it was recently banned from hand soaps in the US, it is still a key ingredient in a top-selling toothpaste. TCS is a hydrophobic micropollutant that is recalcitrant under anaerobic digestion thereby resulting in high TCS concentrations in biosolids. The objective of this study was to determine the impact of TCS on the antibiotic resistome and potential cross-protection in lab-scale anaerobic digesters using shotgun metagenomics. It was hypothesized that metagenomics would reveal selection for antibiotic resistance genes (ARGs) not previously found in pure culture studies or mixed-culture studies using targeted qPCR. In this study, four different levels of TCS were continuously fed to triplicate lab-scale anaerobic digesters to assess the effect of TCS levels on the antibiotic resistance gene profiles (resistome). Blasting metagenomic reads against antibiotic/metal resistance gene database (BacMet) revealed that ARG diversity and abundance changed along the TCS concentration gradient. While loss of bacterial diversity and digester function were observed in the digester treated with the highest TCS concentration, FabV, which is a known TCS resistance gene, increased in this extremely high TCS environment. The abundance of several other known ARG or metal resistance genes (MRGs), including corA and arsB, also increased as the concentrations of TCS increased. Analysis of other functional genes using SEED database revealed the increase of potentially key genes for resistance including different types of transporters and transposons. These results indicate that antimicrobials can alter the abundance of multiple resistance genes in anaerobic digesters even when function (i.e. methane production) is maintained. This study also suggests that enriched ARGs could be released into environments with biosolids land application
    corecore