8 research outputs found
Experiences with workflows for automating data-intensive bioinformatics
High-throughput technologies, such as next-generation sequencing, have turned molecular biology into a
data-intensive discipline, requiring bioinformaticians to use high-performance computing resources and carry out
data management and analysis tasks on large scale. Workflow systems can be useful to simplify construction of
analysis pipelines that automate tasks, support reproducibility and provide measures for fault-tolerance. However,
workflow systems can incur significant development and administration overhead so bioinformatics pipelines are
often still built without them. We present the experiences with workflows and workflow systems within the
bioinformatics community participating in a series of hackathons and workshops of the EU COST action SeqAhead.
The organizations are working on similar problems, but we have addressed them with different strategies and
solutions. This fragmentation of efforts is inefficient and leads to redundant and incompatible solutions. Based on our
experiences we define a set of recommendations for future systems to enable efficient yet simple bioinformatics
workflow construction and execution.Pubblicat
Recommended from our members
Sequencing of 53,831 diverse genomes from the NHLBI TOPMed Program
The Trans-Omics for Precision Medicine (TOPMed) programme seeks to elucidate the genetic architecture and biology of heart, lung, blood and sleep disorders, with the ultimate goal of improving diagnosis, treatment and prevention of these diseases. The initial phases of the programme focused on whole-genome sequencing of individuals with rich phenotypic data and diverse backgrounds. Here we describe the TOPMed goals and design as well as the available resources and early insights obtained from the sequence data. The resources include a variant browser, a genotype imputation server, and genomic and phenotypic data that are available through dbGaP (Database of Genotypes and Phenotypes)1. In the first 53,831 TOPMed samples, we detected more than 400 million single-nucleotide and insertion or deletion variants after alignment with the reference genome. Additional previously undescribed variants were detected through assembly of unmapped reads and customized analysis in highly variable loci. Among the more than 400 million detected variants, 97% have frequencies of less than 1% and 46% are singletons that are present in only one individual (53% among unrelated individuals). These rare variants provide insights into mutational processes and recent human evolutionary history. The extensive catalogue of genetic variation in TOPMed studies provides unique opportunities for exploring the contributions of rare and noncoding sequence variants to phenotypic variation. Furthermore, combining TOPMed haplotypes with modern imputation methods improves the power and reach of genome-wide association studies to include variants down to a frequency of approximately 0.01%
Mirror extreme BMI phenotypes associated with gene dosage at the chromosome 16p11.2 locus
Mirror extreme BMI phenotypes associated with gene dosage at the chromosome 16p11.2 locus
Both obesity and being underweight have been associated with increased mortality. Underweight, defined as a body mass index (BMI) ≤ 18.5 kg per m(2) in adults and ≤ -2 standard deviations from the mean in children, is the main sign of a series of heterogeneous clinical conditions including failure to thrive, feeding and eating disorder and/or anorexia nervosa. In contrast to obesity, few genetic variants underlying these clinical conditions have been reported. We previously showed that hemizygosity of a ∼600-kilobase (kb) region on the short arm of chromosome 16 causes a highly penetrant form of obesity that is often associated with hyperphagia and intellectual disabilities. Here we show that the corresponding reciprocal duplication is associated with being underweight. We identified 138 duplication carriers (including 132 novel cases and 108 unrelated carriers) from individuals clinically referred for developmental or intellectual disabilities (DD/ID) or psychiatric disorders, or recruited from population-based cohorts. These carriers show significantly reduced postnatal weight and BMI. Half of the boys younger than five years are underweight with a probable diagnosis of failure to thrive, whereas adult duplication carriers have an 8.3-fold increased risk of being clinically underweight. We observe a trend towards increased severity in males, as well as a depletion of male carriers among non-medically ascertained cases. These features are associated with an unusually high frequency of selective and restrictive eating behaviours and a significant reduction in head circumference. Each of the observed phenotypes is the converse of one reported in carriers of deletions at this locus. The phenotypes correlate with changes in transcript levels for genes mapping within the duplication but not in flanking regions. The reciprocal impact of these 16p11.2 copy-number variants indicates that severe obesity and being underweight could have mirror aetiologies, possibly through contrasting effects on energy balance
Plant-parasitic nematodes of potential phytosanitary importance, their main hosts and reported yield losses
The potential phytosanitary importance of all named plant-parasitic nematode species was determined by evaluating available information on species characteristics, association with economically-important crop hosts, and ability to act as vectors of viruses or form disease complexes with other pathogens. Most named species of plant-parasitic nematodes (PPN) are poorly known, recorded from a single location only, not associated with economically-important crops, and not known to be associated with other plant disease organisms. However, 250 species from 43 genera fulfilled one or more of the criteria to be considered to present a phytosanitary risk. The genera and number of species (in parentheses) considered as posing phytosanitary risk included: Achlysiella (1), Anguina (8), Aphasmatylenchus (1), Aphelenchoides (12), Aphelenchus (1), Belonolaimus (2), Bitylenchus (3), Bursaphelenchus (4), Cactodera (3), Ditylenchus (8), Dolichodorus (1), Globodera (3), Helicotylenchus (7), Hemicriconemoides (3), Hemicycliophora (3), Heterodera (25), Hirschmanniella (5), Hoplolaimus (5), Ibipora (3), Longidorus (10), Macroposthonia (2), Meloidogyne (38), Merlinius (3), Nacobbus (1), Neodolichodorus (2), Paralongidorus (2), Paratrichodorus (11), Paratylenchus (3), Pratylenchus (24), Punctodera (3), Quinisulcius (3), Radopholus (5), Rotylenchulus (3), Rotylenchus (1), Scutellonema (5), Sphaeronema (1), Subanguina (3), Trichodorus (5), Tylenchorhynchus (8), Tylenchulus (2), Vittatidera (1), Xiphinema (15) and Zygotylenchus (1). For each of the 250 species main hosts and yield loss estimates are provided with an extensive bibliography. Of the 250 species, only 126 species from 33 genera are currently listed as regulated pests in one or more countries worldwide. Almost all of these 250 species were also associated with economically important crops and some also acted as vectors for viruses. © 2013 The Authors. Journal compilatio