119 research outputs found

    Halvade: scalable sequence analysis with MapReduce

    Get PDF
    Motivation: Post-sequencing DNA analysis typically consists of read mapping followed by variant calling. Especially for whole genome sequencing, this computational step is very time-consuming, even when using multithreading on a multi-core machine. Results: We present Halvade, a framework that enables sequencing pipelines to be executed in parallel on a multi-node and/or multi-core compute infrastructure in a highly efficient manner. As an example, a DNA sequencing analysis pipeline for variant calling has been implemented according to the GATK Best Practices recommendations, supporting both whole genome and whole exome sequencing. Using a 15-node computer cluster with 360 CPU cores in total, Halvade processes the NA12878 dataset (human, 100 bp paired-end reads, 50x coverage) in <3 h with very high parallel efficiency. Even on a single, multi-core machine, Halvade attains a significant speedup compared with running the individual tools with multithreading. Availability and implementation: Halvade is written in Java and uses the Hadoop MapReduce 2.0 API. It supports a wide range of distributions of Hadoop, including Cloudera and Amazon EMR

    elPrep 4 : a multithreaded framework for sequence analysis

    Get PDF
    We present elPrep 4, a reimplementation from scratch of the elPrep framework for processing sequence alignment map files in the Go programming language. elPrep 4 includes multiple new features allowing us to process all of the preparation steps defined by the GATK Best Practice pipelines for variant calling. This includes new and improved functionality for sorting, (optical) duplicate marking, base quality score recalibration, BED and VCF parsing, and various filtering options. The implementations of these options in elPrep 4 faithfully reproduce the outcomes of their counterparts in GATK 4, SAMtools, and Picard, even though the underlying algorithms are redesigned to take advantage of elPrep's parallel execution framework to vastly improve the runtime and resource use compared to these tools. Our benchmarks show that elPrep executes the preparation steps of the GATK Best Practices up to 13x faster on WES data, and up to 7.4x faster for WGS data compared to running the same pipeline with GATK 4, while utilizing fewer compute resources

    Multithreaded variant calling in elPrep 5

    Get PDF
    We present elPrep 5, which updates the elPrep framework for processing sequencing alignment/map files with variant calling. elPrep 5 can now execute the full pipeline described by the GATK Best Practices for variant calling, which consists of PCR and optical duplicate marking, sorting by coordinate order, base quality score recalibration, and variant calling using the haplotype caller algorithm. elPrep 5 produces identical BAM and VCF output as GATK4 while significantly reducing the runtime by parallelizing and merging the execution of the pipeline steps. Our benchmarks show that elPrep 5 speeds up the runtime of the variant calling pipeline by a factor 8-16x on both whole-exome and whole-genome data while using the same hardware resources as GATK4. This makes elPrep 5 a suitable drop-in replacement for GATK4 when faster execution times are needed

    Assessing the impact of beach nourishment on the intertidal food web through the development of a mechanistic-envelope model

    Get PDF
    1. Beach nourishment, the placement of sand onto a sediment-starved stretch of coast, is widely applied as a soft coastal protection measure because of its reduced ecological impact relative to hard coastal protection. In order to predict effects on the intertidal sandy beach ecosystem, we developed a simulation model that integrates species envelope-based projections for the dominant macrobenthos species and mechanistic food web modules for higher trophic levels. 2. Species envelopes were estimated by using Bayesian inference of species’ biomass relationships according to the three determining abiotic variables: intertidal elevation, median grain size and total organic matter, obtained from multiple sampling campaigns along the Belgian coast. Maximum potential abundance of higher trophic levels represented by birds, shrimp and flatfish were estimated based on their derived trophic relationship with macrobenthos. 3. After validation, we demonstrated that unlike nourishment slope, sediment grain size strongly determines beach-level species richness and production, with strong deterioration in species richness after nourishment with coarse sediment (>300 lm). Patterns for higher trophic levels do not follow the changes in macrobenthos abundance and biomass. 4. Synthesis and applications. The optimal grain size range for nourishment of fine-grained beaches is 200–300 lm. This modelling approach shows that the impact assessment of beach nourishment needs to include the evaluation of different species richness and biomass variables. Focusing solely on the potential abundance of species from higher trophic levels might lead to deceptive conclusions due to the dominance of opportunistic prey species

    elPrep: high-performance preparation of sequence alignment/map files for variant calling

    Get PDF
    elPrep is a high-performance tool for preparing sequence alignment/map files for variant calling in sequencing pipelines. It can be used as a replacement for SAMtools and Picard for preparation steps such as filtering, sorting, marking duplicates, reordering contigs, and so on, while producing identical results. What sets elPrep apart is its software architecture that allows executing preparation pipelines by making only a single pass through the data, no matter how many preparation steps are used in the pipeline. elPrep is designed as a multithreaded application that runs entirely in memory, avoids repeated file I/O, and merges the computation of several preparation steps to significantly speed up the execution time. For example, for a preparation pipeline of five steps on a whole-exome BAM file (NA12878), we reduce the execution time from about 1: 40 hours, when using a combination of SAMtools and Picard, to about 15 minutes when using elPrep, while utilising the same server resources, here 48 threads and 23GB of RAM. For the same pipeline on whole-genome data (NA12878), elPrep reduces the runtime from 24 hours to less than 5 hours. As a typical clinical study may contain sequencing data for hundreds of patients, elPrep can remove several hundreds of hours of computing time, and thus substantially reduce analysis time and cost

    Prenatal Air Pollution and Newborns' Predisposition to Accelerated Biological Aging.

    Get PDF
    Importance: Telomere length is a marker of biological aging that may provide a cellular memory of exposures to oxidative stress and inflammation. Telomere length at birth has been related to life expectancy. An association between prenatal air pollution exposure and telomere length at birth could provide new insights in the environmental influence on molecular longevity. Objective: To assess the association of prenatal exposure to particulate matter (PM) with newborn telomere length as reflected by cord blood and placental telomere length. Design, Setting, and Participants: In a prospective birth cohort (ENVIRONAGE [Environmental Influence on Ageing in Early Life]), a total of 730 mother-newborn pairs were recruited in Flanders, Belgium between February 2010 and December 2014, all with a singleton full-term birth (≥37 weeks of gestation). For statistical analysis, participants with full data on both cord blood and placental telomere lengths were included, resulting in a final study sample size of 641. Exposures: Maternal residential PM2.5 (particles with an aerodynamic diameter ≤2.5 μm) exposure during pregnancy. Main Outcomes and Measures: In the newborns, cord blood and placental tissue relative telomere length were measured. Maternal residential PM2.5 exposure during pregnancy was estimated using a high-resolution spatial-temporal interpolation method. In distributed lag models, both cord blood and placental telomere length were associated with average weekly exposures to PM2.5 during pregnancy, allowing the identification of critical sensitive exposure windows. Results: In 641 newborns, cord blood and placental telomere length were significantly and inversely associated with PM2.5 exposure during midgestation (weeks 12-25 for cord blood and weeks 15-27 for placenta). A 5-µg/m3 increment in PM2.5 exposure during the entire pregnancy was associated with 8.8% (95% CI, -14.1% to -3.1%) shorter cord blood leukocyte telomeres and 13.2% (95% CI, -19.3% to -6.7%) shorter placental telomere length. These associations were controlled for date of delivery, gestational age, maternal body mass index, maternal age, paternal age, newborn sex, newborn ethnicity, season of delivery, parity, maternal smoking status, maternal educational level, pregnancy complications, and ambient temperature. Conclusions and Relevance: Mothers who were exposed to higher levels of PM2.5 gave birth to newborns with shorter telomere length. The observed telomere loss in newborns by prenatal air pollution exposure indicates less buffer for postnatal influences of factors decreasing telomere length during life. Therefore, improvements in air quality may promote molecular longevity from birth onward

    Le projet CRUMBEL et l'apport de la recherche l'archéométrique

    Get PDF
    The CRUMBEL project aims to investigate the mobility of the former population in Belgium from the Neolithic period until the Early Middle Ages. To reach these research goals different topics will be studied. In a preliminary phase, the ancient collections of cremated bone will be documented. A selection of these funerary sites will be studied to understand the mobility using different archaeometric approaches as stable isotopes and radiocarbon dating to obtain reliable information on earlier mobility in Belgium

    Recommendations for diagnosing and managing individuals with glutaric aciduria type 1: Third revision

    Full text link
    Glutaric aciduria type 1 is a rare inherited neurometabolic disorder of lysine metabolism caused by pathogenic gene variations in GCDH (cytogenic location: 19p13.13), resulting in deficiency of mitochondrial glutaryl-CoA dehydrogenase (GCDH) and, consequently, accumulation of glutaric acid, 3-hydroxyglutaric acid, glutaconic acid and glutarylcarnitine detectable by gas chromatography/mass spectrometry (organic acids) and tandem mass spectrometry (acylcarnitines). Depending on residual GCDH activity, biochemical high and low excreting phenotypes have been defined. Most untreated individuals present with acute onset of striatal damage before age 3 (to 6) years, precipitated by infectious diseases, fever or surgery, resulting in irreversible, mostly dystonic movement disorder with limited life expectancy. In some patients, striatal damage develops insidiously. In recent years, the clinical phenotype has been extended by the finding of extrastriatal abnormalities and cognitive dysfunction, preferably in the high excreter group, as well as chronic kidney failure. Newborn screening is the prerequisite for pre-symptomatic start of metabolic treatment with low lysine diet, carnitine supplementation and intensified emergency treatment during catabolic episodes, which, in combination, have substantially improved neurologic outcome. In contrast, start of treatment after onset of symptoms cannot reverse existing motor dysfunction caused by striatal damage. Dietary treatment can be relaxed after the vulnerable period for striatal damage, that is, age 6 years. However, impact of dietary relaxation on long-term outcomes is still unclear. This third revision of evidence-based recommendations aims to re-evaluate previous recommendations (Boy et al., J Inherit Metab Dis, 2017;40(1):75-101; Kolker et al., J Inherit Metab Dis 2011;34(3):677-694; Kolker et al., J Inherit Metab Dis, 2007;30(1):5-22) and to implement new research findings on the evolving phenotypic diversity as well as the impact of non-interventional variables and treatment quality on clinical outcomes

    Seniority-based entitlements : extent, policy debates and research

    Get PDF
    Aquesta publicació s'elabora a partir de les contribucions de cadascú dels membres nacionals que integren la Network of Eufound Correspondent. Pel cas d'Espanya la contribució ha estat realitzada per l'Oscar MolinaSeniority systems - schemes that allot improving employment rights or benefits to employees as their length of employment increases - have not been widely studied. This report provides the first comprehensive study comparing the design and spread of seniority-based entitlements (SBEs) in Europe and mapping related policy debates. It is primarily based on contributions from the Network of Eurofound Correspondents, covering the 28 EU Member States and Norway, but also presents aggregate seniority-earnings curves for the EU based on data from the Structure of Earnings Survey. The aim of the report is to take stock of the currently existing different types of SBEs in the private and public sectors. It concludes that despite an obvious trend to remove them from regulations or reform them, a substantial amount of such entitlements is here to stay. Paradoxically, countries which have regulations on seniority pay in place tend to have flatter aggregate seniority-earnings curves than countries without such regulations
    corecore