Search CORE

34 research outputs found

Gapless provides combined scaffolding, gap filling, and assembly correction with long reads

Author: Robinson Mark D
Schmeing Stephan
Publication venue: Life Science Alliance
Publication date: 01/07/2023
Field of study

Continuity, correctness, and completeness of genome assemblies are important for many biological projects. Long reads represent a major driver towards delivering high-quality genomes, but not everybody can achieve the necessary coverage for good long read-only assemblies. Therefore, improving existing assemblies with low-coverage long reads is a promising alternative. The improvements include correction, scaffolding, and gap filling. However, most tools perform only one of these tasks and the useful information of reads that supported the scaffolding is lost when running separate programs successively. Therefore, we propose a new tool for combined execution of all three tasks using PacBio or Oxford Nanopore reads. gapless is available at: https://github.com/schmeing/gapless

ZORA

Meta-analysis of (single-cell method) benchmarks reveals the need for extensibility and interoperability

Author: Al-Ajami Ahmad
Crowell Helena L
Fanaswala Imran
Gerber Reto
Germain Pierre-Luc
Gilis Jeroen
Heidari Elyas
Knyazev Sergey
Luetge Almut
Mallona Izaskun
Mangul Serghei
Milosavljevic Stefan
Paul Dominique
Robinson Mark D
Saeys Yvan
Schmeing Stephan
Seurinck Ruth
Sonder Emanuel
Soneson Charlotte
Sonrel Anthony
Publication venue: BioMed Central
Publication date: 17/05/2023
Field of study

Computational methods represent the lifeblood of modern molecular biology. Benchmarking is important for all methods, but with a focus here on computational methods, benchmarking is critical to dissect important steps of analysis pipelines, formally assess performance across common situations as well as edge cases, and ultimately guide users on what tools to use. Benchmarking can also be important for community building and advancing methods in a principled way. We conducted a meta-analysis of recent single-cell benchmarks to summarize the scope, extensibility, and neutrality, as well as technical features and whether best practices in open data and reproducible research were followed. The results highlight that while benchmarks often make code available and are in principle reproducible, they remain difficult to extend, for example, as new methods and new ways to assess methods emerge. In addition, embracing containerization and workflow systems would enhance reusability of intermediate benchmarking results, thus also driving wider adoption

ZORA

Different substrate-dependent transition states in the active site of the ribosome

Author: A Korostelev
A Sievers
A Weixlbaumer
AC Seila
DA Kingery
DM Quinn
DV Freistroffer
EM Youngman
G Wallin
H Jin
JJ Shaw
JL Brunelle
KBJ Schowen
KS Huang
L Mora
M Amort
M Beringer
M Beringer
M Laurberg
MA Rangelov
Marina V. Rodnina
N Polacek
P Bieling
PK Glasoe
S Petry
S Trobro
S Trobro
Stephan Kuhlenkoetter
TM Schmeing
V Dincbas-Renqvist
VI Katunin
Wolfgang Wintermeyer
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

ReSeq simulates realistic Illumina high-throughput sequencing data

Author: Robinson Mark D
Schmeing Stephan
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/12/2021
Field of study

In high-throughput sequencing data, performance comparisons between computational tools are essential for making informed decisions at each step of a project. Simulations are a critical part of method comparisons, but for standard Illumina sequencing of genomic DNA, they are often oversimplified, which leads to optimistic results for most tools. ReSeq improves the authenticity of synthetic data by extracting and reproducing key components from real data. Major advancements are the inclusion of systematic errors, a fragment-based coverage model and sampling-matrix estimates based on two-dimensional margins. These improvements lead to more faithful performance evaluations. ReSeq is available at https://github.com/schmeing/ReSeq

ZORA

Resonance Extraction in Diffractive 3 p Production using 190 GeV/c $\pi^{-}$ at the COMPASS experiment (CERN)

Author: Schmeing Stephan
Publication venue
Publication date: 11/02/2015
Field of study

CERN Document Server

Divergent evolution of male-determining loci on proto-Y chromosomes of the housefly

Author: Anvar Seyed Yahya
Beukeboom Leo W.
Geuverink Elzemiek
Kıvanç Ece Naz
Li Xuan
Pippel Martin
Schenkel Martijn A.
Schmeing Stephan
Son Jae Hak
Visser Sander
Wu Yanli
Publication venue
Publication date: 01/01/2024
Field of study

Abstract Houseflies provide a good experimental model to study the initial evolutionary stages of a primary sex-determining locus because they possess different recently evolved proto-Y chromosomes that contain male-determining loci ( M ) with the same male-determining gene, Mdmd . We investigate M -loci genomically and cytogenetically revealing distinct molecular architectures among M -loci. M on chromosome V ( M V ) has two intact Mdmd copies in a palindrome. M on chromosome III ( M III ) has tandem duplications containing 88 Mdmd copies (only one intact) and various repeats, including repeats that are XY-prevalent. M on chromosome II ( M II ) and the Y ( M Y ) share M III -like architecture, but with fewer repeats. M Y additionally shares M V -specific sequence arrangements. Based on these data and karyograms using two probes, one derives from M III and one Mdmd -specific, we infer evolutionary histories of polymorphic M -loci, which have arisen from unique translocations of Mdmd , embedded in larger DNA fragments, and diverged independently into regions of varying complexity

GRO.publications (Univ. Göttingen)

Maleness-on-the-Y (MoY) orchestrates male sex determination in major agricultural fruit fly pests

Author: Arunkumar Kallare P
Bourtzis Kostas
Dalíková Martina
Forlenza Federica
Giordano Ennio
Gravina Andrea
Gregoriou Maria-Eleni
Gucciardino Michela Anna
Hall Brantley
Ippolito Domenica
Koskinioti Panagiota
Marec František
Mathiopoulos Kostas D
Meccariello Angela
Monti Simona Maria
Papathanos Philippos Aris
Perrotta Maryanna Martina
Petrella Valeria
Primo Pasquale
Ragoussis Jiannis
Robinson Mark D
Ruggiero Alessia
Saccone Giuseppe
Salvemini Marco
Schmeing Stephan
Scolari Francesca
Tsoumani Konstantina T
Tu Zhijian
Vitagliano Luigi
Windbichler Nikolai
Publication venue: 'American Association for the Advancement of Science (AAAS)'
Publication date: 27/09/2019
Field of study

In insects, rapidly evolving primary sex-determining signals are transduced by a conserved regulatory module controlling sexual differentiation. In the agricultural pest Ceratitis capitata (Mediterranean fruit fly, or Medfly), we identified a Y-linked gene, Maleness-on-the-Y (MoY), encoding a small protein that is necessary and sufficient for male development. Silencing or disruption of MoY in XY embryos causes feminization, whereas overexpression of MoY in XX embryos induces masculinization. Crosses between transformed XY females and XX males give rise to males and females, indicating that a Y chromosome can be transmitted by XY females. MoY is Y-linked and functionally conserved in other species of the Tephritidae family, highlighting its potential to serve as a tool for developing more effective control strategies against these major agricultural insect pests

ZORA

Raw and processed (filtered and annotated) scRNAseq data

Author: Anne B. Krug (11705854)
Bhavesh Soni (18325827)
Giovanna Fiore (18325824)
Kerstin Paetzold (9741842)
Llucia Albertí Servera (16610531)
Meher Majety (23784)
Monika Julia Wolf (18325830)
Sabine Hoves (14957760)
Sina Nassiri (14869060)
Steffen Dettling (15005742)
Stephan Schmeing (10171458)
Wolfgang Weckwarth (18325826)
Publication venue
Publication date: 08/04/2024
Field of study

Single cell RNA-seq data generated and reported as part of the manuscript entitled "Human CD34+-derived plasmacytoid dendritic cells as surrogates for primary pDCs and potential cancer immunotherapy" by Fiore et al.Raw and processed (filtered and annotated) data are provided, which can be directly ingested to reproduce the findings of the paper or for ab initio data reuse:1- raw.h5ad provides concatenated raw/unfiltered table of counts as obtained from Cell Ranger, along with relevant metadata in the standard H5AD format.2- processed.h5ad provides raw and normalized counts for those cells that passed QC and were annotated as pDC, along with relevant metadata in the standard H5AD format.For instance, to load data in R, try:library(zellkonverter)raw processed ##############################scRNAseq data generation:Differentiated CB-DCs (3 independent donors) either left unprimed or primed with IFN were used to enable characterization of the heterogeneity of the in vitro differentiation protocol. For comparison, primary pan-DCs (3 independent donors) were isolated from PBMCs as described above. CB-DCs and primary pan-DCs were normalized to 10,000 pDCs per well and stimulated with TLR9 or TLR7 agonists for 4 hrs or left untreated. A total of 27 samples were included for scRNAseq. Single-cell RNA-seq was performed using Chromium Connect (10x Genomics). Next GEM Automated Single Cell 5' Reagent Kits v2 (PN-1000290, 10 x Genomics, Pleasanton, CA, USA) were used following the manufacturer’s protocol. Roughly 8000–10,000 cells per sample were diluted at a density of 100–800 cells/μL in PBS plus 1% BSA determined by Cellometer Auto 2000 Cell Viability Counter (Nexelom Bioscience, Lawrence, MA), and were loaded onto the chip. The quality and concentration of both cDNA and libraries were assessed using an Agilent BioAnalyzer with High Sensitivity kit (#5067–4626, Agilent, Santa Clara, CA USA) and Qubit Fluorometer with dsDNA HS assay kit (#Q33230, Thermo Fischer Scientific, Waltham, MA) according to the manufacturer’s recommendation. For sequencing, samples were mixed in equimolar fashion and sequenced on an Illumina Nova Seq 6000 with a targeted read depth of 20,000 reads/cell and sequencing parameters were set for Read 1 (26 cycles), i7 Index (10 cycles), i5 Index (10 cycles) and Read 2 (90 cycles). The Cell Ranger mkfastq function was used to convert the output files into FASTQ files.scRNAseq data analysis:For data processing and quality control, raw sequencing reads were mapped to the GRCh38 genome using the Cell Ranger Single Cell software (10x Genomics). Raw gene expression matrices generated per sample were merged and analyzed with the besca package. First, low quality cells and potential multiplets were excluded (minimum 600 genes, 1,000 counts, maximum 6,500 genes and 60,000 counts), resulting in 4,000 to 8,000 cells per sample and a total of 183,398 cells passing quality control for downstream analysis. Filtered cells were normalized by log-transformed UMI counts per 10,000 reads [log(CP10K+1)]. After scaling the gene expression, the most variable genes per sample were calculated (minimum mean expression of 0.0125, maximum mean expression of 3 and minimum dispersion of 0.5) and those shared by at least 50% of the samples, in total 2,208 genes, were used for principal component analysis. Finally, the first 50 PCs were used as input for calculating the 10 nearest neighbors and the neighborhood graph was then embedded into the two-dimensional space using the uniform manifold approximation and projection (UMAP) algorithm. Cell clustering was performed using the Leiden algorithm. Cell type annotation was performed using the Sig-annot semi-automated besca module. The gene sets used for different cell types can be found under:https://github.com/bedapub/besca/blob/main/besca/datasets/genesets/CellNames_scseqCMs6_sigs.gmtGitHub/besca/besca/datasets/genesets/CellNames_scseqCMs6_sigs.gmt.First, each cluster was assigned to a cell type at different levels of granularity. Subsequently, annotations were manually inspected to resolve cluster mixtures, especially for different DC types. Cell type annotations were further curated by selecting a cluster and applying heuristic cutoffs on a combination of signature scores to reannotate individual cells. The per-cell signature scores were calculated with the scanpy function scanpy.tl.score_genes, using default parameters and besca signatures. Cells annotated as doublets were excluded from downstream analyses. In order to generate visualizations, such as the expression level of selected genes across conditions, custom scripts with mainly besca and scanpy functions were used.For more details, please refer to the publication.</p

FigShare