124 research outputs found

    Analysis and application of hash-based similarity estimation techniques for biological sequence analysis

    Get PDF
    In Bioinformatics, a large group of problems requires the computation or estimation of sequence similarity. However, the analysis of biological sequence data has, among many others, three capital challenges: a large amount generated data which contains technology-specific errors (that can be mistaken for biological signals), and that might need to be analyzed without access to a reference genome. Through the use of locality sensitive hashing methods, both the efficient estimation of sequence similarity and tolerance against the errors specific to biological data can be achieved. We developed a variant of the winnowing algorithm for local minimizer computation, which is specifically geared to deal with repetitive regions within biological sequences. Through compressing redundant information, we can both reduce the size of the hash tables required to save minimizer sketches, as well as reduce the amount of redundant low quality alignment candidates. Analyzing the distribution of segment lengths generated by this approach, we can better judge the size of required data structures, as well as identify hash functions feasible for this technique. Our evaluation could verify that simple and fast hash functions, even when using small hash value spaces (hash functions with small codomain), are sufficient to compute compressed minimizers and perform comparable to uniformly randomly chosen hash values. We also outlined an index for a taxonomic protein database using multiple compressed winnowings to identify alignment candidates. To store MinHash values, we present a cache-optimized implementation of a hash table using Hopscotch hashing to resolve collisions. As a biological application of similarity based analysis, we describe the analysis of double digest restriction site associated DNA sequencing (ddRADseq). We implemented a simulation software able to model the biological and technological influences of this technology to allow better development and testing of ddRADseq analysis software. Using datasets generated by our software, as well as data obtained from population genetic experiments, we developed an analysis workflow for ddRADseq data, based on the Stacks software. Since the quality of results generated by Stacks strongly depends on how well the used parameters are adapted to the specific dataset, we developed a Snakemake workflow that automates preprocessing tasks while also allowing the automatic exploration of different parameter sets. As part of this workflow, we developed a PCR deduplication approach able to generate consensus reads incorporating the base quality values (as reported by the sequencing device), without performing an alignment first. As an outlook, we outline a MinHashing approach that can be used for a faster and more robust clustering, while addressing incomplete digestion and null alleles, two effects specific for ddRADseq that current analysis tools cannot reliably detect

    A push for better RDM

    Get PDF
    Die Versionskontrollsoftware git und die Serversoftware GitLab wurden fĂŒr die Softwareentwicklung konzipiert, ermöglichen aber auch kooperatives Arbeiten an Forschungsdaten entlang der FAIR-Prinzipien, beides elementare Herausforderungen im Forschungsdatenmanagement (FDM). Koordiniert durch die Landesinitiative fdm.nrw wurden in den vergangenen Jahren daher AblĂ€ufe und Schulungen zum FDM mit git und GitLab erprobt und durchgefĂŒhrt. Eine HĂŒrde bei der Verwendung von git und GitLab im FDM kann die Anwendung der zugrundeliegenden Software mittels Kommandozeilenbefehlen darstellen. Zwar wird das VerstĂ€ndnis der VersionierungsvorgĂ€nge erhöht, einige Forschende bevorzugen aber die intuitiven graphischen Interfaces. In jedem Fall ist fĂŒr die Adaption Einarbeitungszeit einzuplanen. FĂŒr datengetriebene Forschung werden Versionsverwaltung und andere Digitalkompetenzen mittelfristig einen ebenso hohen Stellenwert wie das „wissenschaftliche Schreiben“ im Bereich der Sprachkompetenzen einnehmen mĂŒssen. Es gilt daher zunĂ€chst ein grundlegendes VerstĂ€ndnis fĂŒr git und die Möglichkeiten zur Nutzung im FDM darzustellen. ErgĂ€nzend zu etablierten Schulungen wurden kurze, konkrete Beispiele in einer Best Practice Sammlung zusammengetragen: öffentliche GitLab-Projekte realisieren und dokumentieren einzelne AnwendungsfĂ€lle mit Hilfe der gebotenen Werkzeuge. Dadurch soll die Anwendungs- und Digitalkompetenz der Forschenden, aber auch der Infrastrukturmitarbeitenden und der Schulenden gesteigert werden und so git und GitLab zu einer besseren Umsetzung des FDM beitragen

    Root Bacterial Endophytes Alter Plant Phenotype, but not Physiology

    Get PDF
    Plant traits, such as root and leaf area, influence how plants interact with their environment and the diverse microbiota living within plants can influence plant morphology and physiology. Here, we explored how three bacterial strains isolated from the Populus root microbiome, influenced plant phenotype. We chose three bacterial strains that differed in predicted metabolic capabilities, plant hormone production and metabolism, and secondary metabolite synthesis. We inoculated each bacterial strain on a single genotype of Populus trichocarpa and measured the response of plant growth related traits (root:shoot, biomass production, root and leaf growth rates) and physiological traits (chlorophyll content, net photosynthesis, net photosynthesis at saturating light–Asat, and saturating CO2–Amax). Overall, we found that bacterial root endophyte infection increased root growth rate up to 184% and leaf growth rate up to 137% relative to non-inoculated control plants, evidence that plants respond to bacteria by modifying morphology. However, endophyte inoculation had no influence on total plant biomass and photosynthetic traits (net photosynthesis, chlorophyll content). In sum, bacterial inoculation did not significantly increase plant carbon fixation and biomass, but their presence altered where and how carbon was being allocated in the plant host

    Root bacterial endophytes alter plant phenotype, but not physiology

    Get PDF
    Plant traits, such as root and leaf area, influence how plants interact with their environment and the diverse microbiota living within plants can influence plant morphology and physiology. Here, we explored how three bacterial strains isolated from the Populus root microbiome, influenced plant phenotype. We chose three bacterial strains that differed in predicted metabolic capabilities, plant hormone production and metabolism, and secondary metabolite synthesis. We inoculated each bacterial strain on a single genotype of Populus trichocarpa and measured the response of plant growth related traits (root:shoot, biomass production, root and leaf growth rates) and physiological traits (chlorophyll content, net photosynthesis, net photosynthesis at saturating light–Asat, and saturating CO2–Amax). Overall, we found that bacterial root endophyte infection increased root growth rate up to 184% and leaf growth rate up to 137% relative to non-inoculated control plants, evidence that plants respond to bacteria by modifying morphology. However, endophyte inoculation had no influence on total plant biomass and photosynthetic traits (net photosynthesis, chlorophyll content). In sum, bacterial inoculation did not significantly increase plant carbon fixation and biomass, but their presence altered where and how carbon was being allocated in the plant host

    SPorTs - Semantic Portal Technologies

    Get PDF
    Die Projektgruppe Semantic Portal Technologies (SPorTs) startete im Sommersemester 2009 am Lehrstuhl 14 der FakultĂ€t fĂŒr Informatik an der TU Dortmund. Thema der Projektgruppe war die "CodeGenerierung und automatisierte Integration von semantisch beschriebenen Prozessen am Beispiel von E-Government mit Hilfe von Web-Portaltechnologien". Der vorliegende Endbericht dokumentiert die Motivation fĂŒr die Projektgruppe, die Vorgehensweise bei der Bearbeitung der Problemstellung, organisatorische Entscheidungen und die im Rahmen der Projektgruppe entwickelte Software. Die folgenden Abschnitte behandeln die Motivation der Projektgruppe

    Osteosarcopenia, an Asymmetrical Overlap of Two Connected Syndromes: Data from the OsteoSys Study

    Get PDF
    Osteoporosis and sarcopenia are two chronic conditions, which widely affect older people and share common risk factors. We investigated the prevalence of low bone mineral density (BMD) and sarcopenia, including the overlap of both conditions (osteosarcopenia) in 572 older hospitalized patients (mean age 75.1 ± 10.8 years, 78% women) with known or suspected osteoporosis in this prospective observational multicenter study. Sarcopenia was assessed according to the revised defini tion of the European Working Group on Sarcopenia in Older People (EWGSOP2). Low BMD was defined according to the World Health Organization (WHO) recommendations as a T-score < −1.0. Osteosarcopenia was diagnosed when both low BMD and sarcopenia were present. Low BMD was prevalent in 76% and the prevalence of sarcopenia was 9%, with 90% of the sarcopenic patients showing the overlap of osteosarcopenia (8% of the entire population). Conversely, only few patients with low BMD demonstrated sarcopenia (11%). Osteosarcopenic patients were older and frailer and had lower BMI, fat, and muscle mass, handgrip strength, and T-score compared to nonosteosar copenic patients. We conclude that osteosarcopenia is extremely common in sarcopenic subjects. Considering the increased risk of falls in patients with sarcopenia, they should always be evaluated for osteoporosis

    Author Correction: High-resolution cryo-EM structure of urease from the pathogen Yersinia enterocolitica

    Get PDF
    A Correction to this paper has been published: https://doi.org/10.1038/s41467-020-19845-z
    • 

    corecore