23 research outputs found

    Ultraplex- A rapid, flexible, all-in-one fastq demultiplexer [version 1; peer review- 1 approved]

    Get PDF
    BACKGROUND: The first step of virtually all next generation sequencing analysis involves the splitting of the raw sequencing data into separate files using sample-specific barcodes, a process known as “demultiplexing”. However, we found that existing software for this purpose was either too inflexible or too computationally intensive for fast, streamlined processing of raw, single end fastq files containing combinatorial barcodes. RESULTS: Here, we introduce a fast and uniquely flexible demultiplexer, named Ultraplex, which splits a raw FASTQ file containing barcodes either at a single end or at both 5’ and 3’ ends of reads, trims the sequencing adaptors and low-quality bases, and moves unique molecular identifiers (UMIs) into the read header, allowing subsequent removal of PCR duplicates. Ultraplex is able to perform such single or combinatorial demultiplexing on both single- and paired-end sequencing data, and can process an entire Illumina HiSeq lane, consisting of nearly 500 million reads, in less than 20 minutes. CONCLUSIONS: Ultraplex greatly reduces computational burden and pipeline complexity for the demultiplexing of complex sequencing libraries, such as those produced by various CLIP and ribosome profiling protocols, and is also very user friendly, enabling streamlined, robust data processing. Ultraplex is available on PyPi and Conda and via Github

    LaRA 2: parallel and vectorized program for sequence–structure alignment of RNA sequences

    Get PDF
    Background The function of non-coding RNA sequences is largely determined by their spatial conformation, namely the secondary structure of the molecule, formed by Watson–Crick interactions between nucleotides. Hence, modern RNA alignment algorithms routinely take structural information into account. In order to discover yet unknown RNA families and infer their possible functions, the structural alignment of RNAs is an essential task. This task demands a lot of computational resources, especially for aligning many long sequences, and it therefore requires efficient algorithms that utilize modern hardware when available. A subset of the secondary structures contains overlapping interactions (called pseudoknots), which add additional complexity to the problem and are often ignored in available software. Results We present the SeqAn-based software LaRA 2 that is significantly faster than comparable software for accurate pairwise and multiple alignments of structured RNA sequences. In contrast to other programs our approach can handle arbitrary pseudoknots. As an improved re-implementation of the LaRA tool for structural alignments, LaRA 2 uses multi-threading and vectorization for parallel execution and a new heuristic for computing a lower boundary of the solution. Our algorithmic improvements yield a program that is up to 130 times faster than the previous version. Conclusions With LaRA 2 we provide a tool to analyse large sets of RNA secondary structures in relatively short time, based on structural alignment. The produced alignments can be used to derive structural motifs for the search in genomic databases

    LaRA 2: parallel and vectorized program for sequence–structure alignment of RNA sequences

    Get PDF
    Background The function of non-coding RNA sequences is largely determined by their spatial conformation, namely the secondary structure of the molecule, formed by Watson–Crick interactions between nucleotides. Hence, modern RNA alignment algorithms routinely take structural information into account. In order to discover yet unknown RNA families and infer their possible functions, the structural alignment of RNAs is an essential task. This task demands a lot of computational resources, especially for aligning many long sequences, and it therefore requires efficient algorithms that utilize modern hardware when available. A subset of the secondary structures contains overlapping interactions (called pseudoknots), which add additional complexity to the problem and are often ignored in available software. Results We present the SeqAn-based software LaRA 2 that is significantly faster than comparable software for accurate pairwise and multiple alignments of structured RNA sequences. In contrast to other programs our approach can handle arbitrary pseudoknots. As an improved re-implementation of the LaRA tool for structural alignments, LaRA 2 uses multi-threading and vectorization for parallel execution and a new heuristic for computing a lower boundary of the solution. Our algorithmic improvements yield a program that is up to 130 times faster than the previous version. Conclusions With LaRA 2 we provide a tool to analyse large sets of RNA secondary structures in relatively short time, based on structural alignment. The produced alignments can be used to derive structural motifs for the search in genomic databases

    Studying the virome in psychiatric disease

    Get PDF
    An overlooked aspect of current microbiome studies is the role of viruses in human health. Compared to bacterial studies, laboratory and analytical methods to study the entirety of viral communities in clinical samples are rudimentary and need further refinement. In order to address this need, we developed Virobiome-Seq, a sequence capture method and an accompanying bioinformatics analysis pipeline, that identifies viral reads in human samples. Virobiome-Seq is able to enrich for and detect multiple types of viruses in human samples, including novel subtypes that diverge at the sequence level. In addition, Virobiome-Seq is able to detect RNA transcripts from DNA viruses and may provide a sensitive method for detecting viral activity in vivo. Since Virobiome-Seq also yields the viral sequence, it makes it possible to investigate associations between viral genotype and psychiatric illness. In this proof of concept study, we detected HIV1, Torque Teno, Pegi, Herpes and Papilloma virus sequences in Peripheral Blood Mononuclear Cells, plasma and stool samples collected from individuals with psychiatric disorders. We also detected the presence of numerous novel circular RNA viruses but were unable to determine whether these viruses originate from the sample or represent contaminants. Despite this challenge, we demonstrate that our knowledge of viral diversity is incomplete and opportunities for novel virus discovery exist. Virobiome-Seq will enable a more sophisticated analysis of the virome and has the potential of uncovering complex interactions between viral activity and psychiatric disease. (c) 2021 Elsevier B.V. All rights reserved.Peer reviewe

    Long rDNA amplicon sequencing of insect-infecting nephridiophagids reveals their affiliation to the Chytridiomycota and a potential to switch between hosts

    Get PDF
    Nephridiophagids are unicellular eukaryotes that parasitize the Malpighian tubules of numerous insects. Their life cycle comprises multinucleate vegetative plasmodia that divide into oligonucleate and uninucleate cells, and sporogonial plasmodia that form uninucleate spores. Nephridiophagids are poor in morphological characteristics, and although they have been tentatively identified as early-branching fungi based on the SSU rRNA gene sequences of three species, their exact position within the fungal tree of live remained unclear. In this study, we describe two new species of nephridiophagids (Nephridiophaga postici and Nephridiophaga javanicae) from cockroaches. Using long-read sequencing of the nearly complete rDNA operon of numerous further species obtained from cockroaches and earwigs to improve the resolution of the phylogenetic analysis, we found a robust affiliation of nephridiophagids with the Chytridiomycota—a group of zoosporic fungi that comprises parasites of diverse host taxa, such as microphytes, plants, and amphibians. The presence of the same nephridiophagid species in two only distantly related cockroaches indicates that their host specificity is not as strict as generally assumed

    Genome‑wide insights into population structure and host specifcity of Campylobacter jejuni

    Get PDF
    The zoonotic pathogen Campylobacter jejuni is among the leading causes of foodborne diseases worldwide. While C. jejuni colonises many wild animals and livestock, persistence mechanisms enabling the bacterium to adapt to host species' guts are not fully understood. In order to identify putative determinants influencing host preferences of distinct lineages, bootstrapping based on stratified random sampling combined with a k-mer-based genome-wide association was conducted on 490 genomes from diverse origins in Germany and Canada. We show a strong association of both the core and the accessory genome characteristics with distinct host animal species, indicating multiple adaptive trajectories defining the evolution of C. jejuni lifestyle preferences in different ecosystems. Here, we demonstrate that adaptation towards a specific host niche ecology is most likely a long evolutionary and multifactorial process, expressed by gene absence or presence and allele variations of core genes. Several host-specific allelic variants from different phylogenetic backgrounds, including dnaE, rpoB, ftsX or pycB play important roles for genome maintenance and metabolic pathways. Thus, variants of genes important for C. jejuni to cope with specific ecological niches or hosts may be useful markers for both surveillance and future pathogen intervention strategies.Peer Reviewe

    A MAFG-lncRNA axis links systemic nutrient abundance to hepatic glucose metabolism

    Get PDF
    Obesity and type 2 diabetes mellitus are global emergencies and long noncoding RNAs (lncRNAs) are regulatory transcripts with elusive functions in metabolism. Here we show that a high fraction of lncRNAs, but not protein-coding mRNAs, are repressed during diet-induced obesity (DIO) and refeeding, whilst nutrient deprivation induced lncRNAs in mouse liver. Similarly, lncRNAs are lost in diabetic humans. LncRNA promoter analyses, global cistrome and gain-of-function analyses confirm that increased MAFG signaling during DIO curbs lncRNA expression. Silencing Mafg in mouse hepatocytes and obese mice elicits a fasting-like gene expression profile, improves glucose metabolism, de-represses lncRNAs and impairs mammalian target of rapamycin (mTOR) activation. We find that obesity-repressed LincIRS2 is controlled by MAFG and observe that genetic and RNAi-mediated LincIRS2 loss causes elevated blood glucose, insulin resistance and aberrant glucose output in lean mice. Taken together, we identify a MAFG-lncRNA axis controlling hepatic glucose metabolism in health and metabolic disease

    Phosphorylation of the ribosomal protein RPL12/uL11 affects translation during mitosis

    Get PDF
    Emerging evidence indicates that heterogeneity in ribosome composition can give rise to specialized functions. Until now, research mainly focused on differences in core ribosomal proteins and associated factors. The effect of posttranslational modifications has not been studied systematically. Analyzing ribosome heterogeneity is challenging because individual proteins can be part of different subcomplexes (40S, 60S, 80S, and polysomes). Here we develop polysome proteome profiling to obtain unbiased proteomic maps across ribosomal subcomplexes. Our method combines extensive fractionation by sucrose gradient centrifugation with quantitative mass spectrometry. The high resolution of the profiles allows us to assign proteins to specific subcomplexes. Phosphoproteomics on the fractions reveals that phosphorylation of serine 38 in RPL12/uL11, a known mitotic CDK1 substrate, is strongly depleted in polysomes. Follow-up experiments confirm that RPL12/uL11 phosphorylation regulates the translation of specific subsets of mRNAs during mitosis. Together, our results show that posttranslational modification of ribosomal proteins can regulate translation

    A MAFG-lncRNA axis links systemic nutrient abundance to hepatic glucose metabolism

    Get PDF
    Obesity and type 2 diabetes mellitus are global emergencies and long noncoding RNAs (lncRNAs) are regulatory transcripts with elusive functions in metabolism. Here we show that a high fraction of lncRNAs, but not protein-coding mRNAs, are repressed during diet-induced obesity (DIO) and refeeding, whilst nutrient deprivation induced lncRNAs in mouse liver. Similarly, lncRNAs are lost in diabetic humans. LncRNA promoter analyses, global cistrome and gain-of-function analyses confirm that increased MAFG signaling during DIO curbs lncRNA expression. Silencing Mafg in mouse hepatocytes and obese mice elicits a fasting-like gene expression profile, improves glucose metabolism, de-represses lncRNAs and impairs mammalian target of rapamycin (mTOR) activation. We find that obesity-repressed LincIRS2 is controlled by MAFG and observe that genetic and RNAi-mediated LincIRS2 loss causes elevated blood glucose, insulin resistance and aberrant glucose output in lean mice. Taken together, we identify a MAFG-lncRNA axis controlling hepatic glucose metabolism in health and metabolic disease

    Chemotherapy-induced transposable elements activate MDA5 to enhance haematopoietic regeneration.

    Get PDF
    Funder: RCUK | Medical Research Council (MRC); doi: https://doi.org/10.13039/501100000265Funder: Max-Planck-Gesellschaft (Max Planck Society); doi: https://doi.org/10.13039/501100004189Haematopoietic stem cells (HSCs) are normally quiescent, but have evolved mechanisms to respond to stress. Here, we evaluate haematopoietic regeneration induced by chemotherapy. We detect robust chromatin reorganization followed by increased transcription of transposable elements (TEs) during early recovery. TE transcripts bind to and activate the innate immune receptor melanoma differentiation-associated protein 5 (MDA5) that generates an inflammatory response that is necessary for HSCs to exit quiescence. HSCs that lack MDA5 exhibit an impaired inflammatory response after chemotherapy and retain their quiescence, with consequent better long-term repopulation capacity. We show that the overexpression of ERV and LINE superfamily TE copies in wild-type HSCs, but not in Mda5-/- HSCs, results in their cycling. By contrast, after knockdown of LINE1 family copies, HSCs retain their quiescence. Our results show that TE transcripts act as ligands that activate MDA5 during haematopoietic regeneration, thereby enabling HSCs to mount an inflammatory response necessary for their exit from quiescence
    corecore