14 research outputs found
Expanding the ancient DNA bioinformatics toolbox, and its applications to archeological microbiomes
The 1980s were very prolific years not only for music, but also for molecular biology and genetics, with the first publications on the microbiome and ancient DNA. Several technical revolutions later, the field of ancient metagenomics is now progressing full steam ahead, at a never seen before pace. While generating sequencing data is becoming cheaper every year, the bioinformatics methods and the compute power needed to analyze them are struggling to catch up. In this thesis, I propose new methods to reduce the sequencing to analysis gap, by introducing scalable and parallelized softwares for ancient DNA metagenomics analysis. In manuscript A, I first introduce a method for estimating the mixtures of different sources in a sequencing sample, a problem known as source tracking. I then apply this method to predict the original sources of paleofeces in manuscript B. In manuscript C, I propose a new method to scale the lowest common ancestor calling from sequence alignment files, which brings a solution for the computational intractability of fitting ever growing metagenomic reference database indices in memory. In manuscript D, I present a method to statistically estimate in parallel the ancient DNA deamination damage, and test it in the context of de novo assembly. Finally, in manuscript E, I apply some of the methods developed in this thesis to the analyis of ancient wine fermentation samples, and present the first ancient genomes of ancient fermentation bacteria. Taken together, the tools developed in this thesis will help the researchers working in the field of ancient DNA metagenomics to scale their analysis to the massive amount of sequencing data routinely produced nowadays
Community-curated and standardised metadata of published ancient metagenomic samples with AncientMetagenomeDir
Ancient DNA and RNA are valuable data sources for a wide range of disciplines. Within the field of ancient metagenomics, the number of published genetic datasets has risen dramatically in recent years, and tracking this data for reuse is particularly important for large-scale ecological and evolutionary studies of individual taxa and communities of both microbes and eukaryotes. AncientMetagenomeDir (archived at https://doi.org/10.5281/zenodo.3980833) is a collection of annotated metagenomic sample lists derived from published studies that provide basic, standardised metadata and accession numbers to allow rapid data retrieval from online repositories. These tables are community-curated and span multiple sub-disciplines to ensure adequate breadth and consensus in metadata definitions, as well as longevity of the database. Internal guidelines and automated checks facilitate compatibility with established sequence-read archives and term-ontologies, and ensure consistency and interoperability for future meta-analyses. This collection will also assist in standardising metadata reporting for future ancient metagenomic studies
Reconstruction of ancient microbial genomes from the human gut
Loss of gut microbial diversity in industrial populations is associated with chronic diseases, underscoring the importance of studying our ancestral gut microbiome. However, relatively little is known about the composition of pre-industrial gut microbiomes. Here we performed a large-scale de novo assembly of microbial genomes from palaeofaeces. From eight authenticated human palaeofaeces samples (1,000–2,000 years old) with well-preserved DNA from southwestern USA and Mexico, we reconstructed 498 medium- and high-quality microbial genomes. Among the 181 genomes with the strongest evidence of being ancient and of human gut origin, 39% represent previously undescribed species-level genome bins. Tip dating suggests an approximate diversification timeline for the key human symbiont Methanobrevibacter smithii. In comparison to 789 present-day human gut microbiome samples from eight countries, the palaeofaeces samples are more similar to non-industrialized than industrialized human gut microbiomes. Functional profiling of the palaeofaeces samples reveals a markedly lower abundance of antibiotic-resistance and mucin-degrading genes, as well as enrichment of mobile genetic elements relative to industrial gut microbiomes. This study facilitates the discovery and characterization of previously undescribed gut microorganisms from ancient microbiomes and the investigation of the evolutionary history of the human gut microbiota through genome reconstruction from palaeofaeces
Reproducible, portable, and efficient ancient genome reconstruction with nf-core/eager
The broadening utilisation of ancient DNA to address archaeological, palaeontological, and biological questions is resulting in a rising diversity in the size of laboratories and scale of analyses being performed. In the context of this heterogeneous landscape, we present an advanced, and entirely redesigned and extended version of the EAGER pipeline for the analysis of ancient genomic data. This Nextflow pipeline aims to address three main themes: accessibility and adaptability to different computing configurations, reproducibility to ensure robust analytical standards, and updating the pipeline to the latest routine ancient genomic practices. The new version of EAGER has been developed within the nf-core initiative to ensure high-quality software development and maintenance support; contributing to a long-term life-cycle for the pipeline. nf-core/eager will assist in ensuring that a wider range of ancient DNA analyses can be applied by a diverse range of research groups and fields
nf-core/taxprofiler: v1.0.0 - Dodgy Dachshund [2023-03-13]
v1.0.0 - Dodgy Dachshund [2023-03-13]
Added
<ul>
<li>Add read quality control (sequencing QC, adapter removal and merging)</li>
<li>Add read complexity filtering</li>
<li>Add host-reads removal step</li>
<li>Add run merging</li>
<li>Add taxonomic classification</li>
<li>Add taxon table standardisation</li>
<li>Add post-classification visualisation</li>
</ul>
<p>Contributed by: @jfy133 @sofstam @Midnighter @ljmesi @MillironX @jianhong @mjamy @rafalstepien @maxibor @talnor</p>
nf-core/taxprofiler: v1.0.1 - Dodgy Dachshund Patch [2023-05-15]
<code>Fixed</code>
<ul>
<li><a href="https://github.com/nf-core/taxprofiler/pull/291">#291</a> - Fix Taxpasta not receiving taxonomy directory (❤️ to SannaAb for reporting, fix by @jfy133)</li>
</ul>
nf-core/eager: 2.5.0 - Bopfingen -2023-11-07
<h3><code>Added</code></h3>
<ul>
<li><a href="https://github.com/nf-core/eager/issues/1020">#1020</a> Added mapDamage2 as an alternative for damage calculation.</li>
</ul>
<h3><code>Fixed</code></h3>
<ul>
<li><a href="https://github.com/nf-core/eager/issues/1017">#1017</a> Fixed file name collision in niche cases with multiple libraries of multiple UDG treatments.</li>
<li><a href="https://github.com/nf-core/eager/issues/1024">#1024</a> <code>multiqc_general_stats.txt</code> is now generated even if the table is a beeswarm plot in the report.</li>
<li><a href="https://github.com/nf-core/eager/issues/655">#655</a> Updated RG tags for all mappers. RG-id now includes Sample as well as Library ID. Added <code>LB:</code> tag with the library ID.</li>
<li><a href="https://github.com/nf-core/eager/issues/1031">#1031</a> Always index fasta regardless of mapper. This ensures that DamageProfiler and genotyping processes get submitted when using bowtie2 and not providing a fasta index.</li>
</ul>
<h3><code>Dependencies</code></h3>
<ul>
<li><code>multiqc</code>: 1.14 -> 1.16</li>
</ul>
<h3><code>Deprecated</code></h3>
nf-core/taxprofiler: v1.1.0 - Augmented Akita [2023-09-19]
Added
#298 New classifier ganon (added by @jfy133)
#312 New classifier KMCP (added by @sofstam)
#318 New classifier MetaPhlAn4 (MetaPhlAn3 support remains) (added by @LilyAnderssonLee)
#276 Implemented batching in the KrakenUniq samples processing (added by @Midnighter)
#272 Add saving of final 'analysis-ready-reads' to dedicated directory (❤️ to @alexhbnr for request, added by @jfy133)
#303 Add support for taxpasta profile standardisation in single sample pipeline runs (❤️ to @artur-matysik for request, added by @jfy133)
#308 Add citations and bibliographic information to the MultiQC methods text of tools used in a given pipeline run (added by @jfy133)
#315 Updated to nf-core pipeline template v2.9 (added by @sofstam & @jfy133)
#319 Added support for virus hit expansion in Kaiju (❤️ to @dnlrxn for requesting, added by @jfy133)
#323 Add ability to skip sequencing quality control tools (❤️ to @vinisalazar for requesting, added by @jfy133)
#345 Add simple tutorial to explain how to get up and running with an nf-core/taxprofiler run (added by @jfy133)
#355 Add support for TAXPASTA's --add-rank-lineage to output (❤️ to @MajoroMask for request, added by @Midnighter, @sofstam, @jfy133)
#368 Add the ability to ignore profile errors caused by empty profiles and other validation errors when merging multiple profiles using TAXPASTA (added by @Midnighter and @LilyAnderssonLee)
Fixed
#271 Improved standardised table generation documentation for mOTUs manual database download tutorial (♥ to @prototaxites for reporting, fix by @jfy133)
#269 Reduced output files in AWS full test output due to very large files (fix by @jfy133)
#270 Fixed warning for host removal index parameter, and improved index checks (♥ to @prototaxites for reporting, fix by @jfy133)
#274 Substituted the samtools/bam2fq module with samtools/fastq module (fix by @sofstam)
#275 Replaced function used for error reporting to more Nextflow friendly method (fix by @jfy133)
#285 Fixed overly large log files in Kraken2 output (♥ to @prototaxites for reporting, fix by @Midnighter & @jfy133)
#286 Runtime optimisation of MultiQC step via improved log file processing (fix by @Midnighter & @jfy133)
#289 Pipeline updated to nf-core template 2.8 (fix by @Midnighter & @jfy133)
#290 Minor database input documentation improvements (♥ to @alneberg for reporting, fix by @jfy133)
#305 Fix docker/podman registry definition for tower compatibility (fix by @adamrtalbot, @jfy133)
#304 Correct mistake in kaiju2table documentation, only single rank can be supplied (♥ to @artur-matysik for reporting, fix by @jfy133)
#307 Fix databases being sometimes associated with the wrong tool (e.g. Kaiju) (fix by @jfy133, @Midnighter and @LilyAnderssonLee)
#313 Fix pipeline not providing error when database sheet does not have a header (♥ to @noah472 for reporting, fix by @jfy133)
#330 Added better tagging to allow disambiguation of Kraken2 steps of Kraken2 vs Bracken (♥ to @MajoroMask for requesting, added by @jfy133)
#334 Increase the memory of the FALCO process to 4GB (fix by @LilyAnderssonLee)
#332 Improved meta map stability for more robust pipeline resuming (fix by @jfy133)
#338 Fixed wrong file 'out' file going to centrifuge kreport module (♥ to @LilyAnderssonLee for reporting, fix by @jfy133)
#342 Fixed docs/usage to correctly list the required database files for Bracken and tips to obtain Kraken2 databases (fix by @husensofteng)
#350 Reorganize the CI tests into separate profiles in preparation for implementation of nf-test (fix by @LilyAnderssonLee)
#364 Add autoMounts to apptainer profile in nextflow.config (♥ to @hkaspersento for reporting, fix by @LilyAnderssonLee)
#372 Update modules to use quay.io nf-core mirrored containers (♥ to @maxulysse for pointing out, fix by @LilyAnderssonLee and @jfy133)
Dependencies
Tool
Previous version
New version
MultiQC
1.13
1.15
TAXPASTA
0.2.3
0.6.0
MetaPhlAn
3.0.12
4.0.6
fastp
0.23.2
0.23.4
samtools
1.16.1
1.17
Deprecated
#338 Updated Centrifuge module to not generate (undocumented) SAM alignments by default if --save_centrifuge_reads supplied, due to a Centrifuge bug modifying profile header. SAM alignments can still be generated if --out-fmt supplied in database.csv (♥ to @LilyAnderssonLee for reporting, fix by @jfy133
nf-core/taxprofiler: v1.1.2 - Augmented Akita Patch [2023-10-27]
<h3><code>Added</code></h3>
<ul>
<li><a href="https://github.com/nf-core/taxprofiler/pull/408">#408</a> Added preprint citation information to README and manifest (added by @jfy133)</li>
</ul>
<h3><code>Fixed</code></h3>
<ul>
<li><a href="https://github.com/nf-core/taxprofiler/pull/405">#405</a> Fix database to tool mismatching in KAIJU2KRONA input (❤️ to @MajoroMask for reporting, fix by @jfy133)</li>
<li><a href="https://github.com/nf-core/taxprofiler/pull/406">#406</a> Fix overwriting of bracken-derived kraken2 outputs when the database name is shared between Bracken/Kraken2. (❤️ to @MajoroMask for reporting, fix by @jfy133)</li>
<li><a href="https://github.com/nf-core/taxprofiler/pull/409">#409</a> Fix a NullPointerException error occurring occasionally in older version of MEGAN's rma2info (❤️ to @MajoroMask for reporting, fix by @jfy133)</li>
</ul>
<h3><code>Dependencies</code></h3>
<p>| Tool | Previous version | New version |
| -------------- | ---------------- | ----------- |
| megan/rma2info | 6.21.7 | 6.24.20 |</p>
nf-core/taxprofiler: v1.1.1 - Augmented Akita Patch [2023-10-11]
Added
#379 Added support for previously missing Bracken-corrected Kraken2 report as output (added by @hkaspersen & @jfy133 )
#380 Updated to nf-core pipeline template v2.10 (added by @LilyAnderssonLee & @sofstam)
#393 Add validation check for a taxpasta taxonomy directory if --taxpastaadd* parameters requested (♥️ to @alimalrashed for reporting, added by @jfy133)
Fixed
#383 Update the module of KrakenUniq to the latest to account for edge case bugs where FASTQ input was mis-detected as wrong format (❤️ to @asafpr for reporting and solution, fixed by @LilyAnderssonLee)
#392 Update the module of Taxpasta to support adding taxa information to results (❤️ to @SannaAb for reporting, fixed by @Midnighter)
Dependencies
Tool
Previous version
New version
KrakenUniq
1.0.2
1.0.4
taxpasta
0.6.0
0.6.1
Deprecate