188 research outputs found
Accelerating exhaustive pairwise metagenomic comparisons
In this manuscript, we present an optimized and parallel version of our previous work IMSAME, an exhaustive gapped aligner for the pairwise and accurate comparison of metagenomes. Parallelization strategies are applied to take advantage of modern multiprocessor architectures. In addition, sequential optimizations in CPU time and memory consumption are provided. These algorithmic and computational enhancements enable IMSAME to calculate near optimal alignments which are used to directly assess similarity between metagenomes without requiring reference databases. We show that the overall efficiency of the parallel implementation is superior to 80% while retaining scalability as the number of parallel cores used increases. Moreover, we also show thats equential optimizations yield up to 8x speedup for scenarios with larger data.Universidad de Málaga. Campus de Excelencia Internacional Andalucía Tec
MetAMOS: A modular and open source metagenomic assembly and analysis pipeline
© 2013 Treangen et al. We describe MetAMOS, an open source and modular metagenomic assembly and analysis pipeline. MetAMOS represents an important step towards fully automated metagenomic analysis, starting with next-generation sequencing reads and producing genomic scaffolds, open-reading frames and taxonomic or functional annotations. MetAMOS can aid in reducing assembly errors, commonly encountered when assembling metagenomic samples, and improves taxonomic assignment accuracy while also reducing computational cost. MetAMOS can be downloaded from: https://github.com/treangen/MetAMOS
Atmospheric boundary-layer structure observed during a haze event due to forest-fire smoke
During a haze event in Baltimore, U.S.A. from July 6 to 8, 2002, smoke from forest fires in the Québec region (Canada), degraded air quality and impacted upon local climate, decreasing solar radiation and air temperature. The smoke particles in and above the atmospheric boundary layer (ABL) served as a tracer and provided a unique opportunity to investigate the ABL structure, especially entrainment. Elastic backscatter lidar measurements taken during the haze event distinctly reveal the downward sweeps (or wisps) of smoke-laden air from the free atmosphere into the ABL. Visualisations of mechanisms such as dry convection, the entrainment process, detrainment, coherent entrainment structures, and mixing inside the ABL, are presented. Thermals overshooting at the ABL top are shown to create disturbances in the form of gravity waves in the free atmosphere aloft, as evidenced by a corresponding ripple structure at the bottom of the smoke layer. Lidar data, aerosol groundbased measurements and supporting meteorological data are used to link free atmosphere, mixed-layer and ground-level aerosols. During the peak period of the haze event (July 7, 2002), the correlation between time series of elastic backscatter lidar data within the mixed layer and the scattering coefficient from a nephelometer at ground level was found to be high (R ¼ 0.96 for z ¼ 324 m, and R ¼ 0.89 for z ¼ 504 m). Ground-level aerosol concentration was at a maximum about 2 h after the smoke layer intersected with the growing ABL, confirming that the wisps do not initially reach the ground
Aerosol optical characterization by nephelometer and lidar: the Baltimore Supersite experiment during the Canadian forest fire smoke intrusion
[1] High spatial and temporal resolution elastic backscatter lidar data from Baltimore are analyzed with a near-end approach to estimate vertical profiles of the aerosol extinction coefficient. The near-end approach makes use of the (1) aerosol scattering coefficient measured at the surface with a nephelometer (0.530 μm), (2) surface level particle size distribution, and (3) refractive index calculated using Mie theory to estimate the aerosol extinction coefficient boundary condition for the lidar equation. There was a broad range of atmospheric turbidity due to a strong haze event, which occurred because of smoke transport from Canadian forest fires, and led to a wide range of observed atmospheric properties. The index of refraction for aerosols estimated during the entire study period is 1.5–0.47 i, which is typical for soot. The measured surface level aerosol scattering coefficient ranged from σp = 0.002 to σp = 0.541 km−1, and the computed aerosol extinction coefficient spanned values κp = 0.01 to κp = 1.05 km−1. The derived mass concentration and the mass scattering ranges were 3.96–194 μg m−3 and 0.05–3.260 m2g−1, respectively. The aerosol optical properties were dominated by light absorption by soot
Interactive metagenomic visualization in a Web browser
<p>Abstract</p> <p>Background</p> <p>A critical output of metagenomic studies is the estimation of abundances of taxonomical or functional groups. The inherent uncertainty in assignments to these groups makes it important to consider both their hierarchical contexts and their prediction confidence. The current tools for visualizing metagenomic data, however, omit or distort quantitative hierarchical relationships and lack the facility for displaying secondary variables.</p> <p>Results</p> <p>Here we present Krona, a new visualization tool that allows intuitive exploration of relative abundances and confidences within the complex hierarchies of metagenomic classifications. Krona combines a variant of radial, space-filling displays with parametric coloring and interactive polar-coordinate zooming. The HTML5 and JavaScript implementation enables fully interactive charts that can be explored with any modern Web browser, without the need for installed software or plug-ins. This Web-based architecture also allows each chart to be an independent document, making them easy to share via e-mail or post to a standard Web server. To illustrate Krona's utility, we describe its application to various metagenomic data sets and its compatibility with popular metagenomic analysis tools.</p> <p>Conclusions</p> <p>Krona is both a powerful metagenomic visualization tool and a demonstration of the potential of HTML5 for highly accessible bioinformatic visualizations. Its rich and interactive displays facilitate more informed interpretations of metagenomic analyses, while its implementation as a browser-based application makes it extremely portable and easily adopted into existing analysis packages. Both the Krona rendering code and conversion tools are freely available under a BSD open-source license, and available from: <url>http://krona.sourceforge.net</url>.</p
Impact of the 2002 Canadian Forest Fires on Particulate Matter Air Quality in Baltimore City
With increasing evidence of adverse health effects associated with particulate matter (PM), the exposure impact of natural sources, such as forest fires, has substantial public health relevance. In addition to the threat to nearby communities, pollutants released from forest fires can travel thousands of kilometers to heavily populated urban areas. There was a dramatic increase in forest fire activity in the province of Quebec, Canada, during July 2002. The transport of PM released from these forest fires was examined using a combination of a moderateresolution imaging spectroradiometer satellite image, backtrajectories using a hybrid single-particle Lagrangian integrated trajectory, and local light detection and ranging measurements. Time- and size-resolved PM was evaluated at three ambient and four indoor measurement sites using a combination of direct reading instruments (laser, timeof- flight aerosol spectrometer, nephelometer, and an oscillating microbalance). The transport and monitoring results consistently identified a forest fire relatedPMepisode in Baltimore that occurred the first weekend of July 2002 and resulted in as much as a 30-fold increase in ambient fine PM. On the basis of tapered element oscillating microbalance measurements, the 24 h PM2.5 concentration reached 86 μg/m3 on July 7, 2002, exceeding the 24 h national ambient air quality standard. The episode was primarily comprised of particles less than 2.5 μm in aerodynamic diameter, highlighting the preferential transport of the fraction of PM that is of greatest health concern. Penetration of the ambient episode indoors was efficient (median indoor-to-outdoor ratio 0.91) such that the high ambient levels were similarly experienced indoors. These results are significant in demonstrating the impact of a natural source thousands of kilometers away on ambient levels of and potential exposures to air pollution within an urban center. This research highlights the significance of transboundary air pollution and the need for studies that assess the public health impacts associated with such sources and transport processes
SHRiMP: Accurate Mapping of Short Color-space Reads
The development of Next Generation Sequencing technologies, capable of sequencing hundreds of millions of short reads (25–70 bp each) in a single run, is opening the door to population genomic studies of non-model species. In this paper we present SHRiMP - the SHort Read Mapping Package: a set of algorithms and methods to map short reads to a genome, even in the presence of a large amount of polymorphism. Our method is based upon a fast read mapping technique, separate thorough alignment methods for regular letter-space as well as AB SOLiD (color-space) reads, and a statistical model for false positive hits. We use SHRiMP to map reads from a newly sequenced Ciona savignyi individual to the reference genome. We demonstrate that SHRiMP can accurately map reads to this highly polymorphic genome, while confirming high heterozygosity of C. savignyi in this second individual. SHRiMP is freely available at http://compbio.cs.toronto.edu/shrimp
Blue Noise Plots
We propose Blue Noise Plots, two-dimensional dot plots that depict data
points of univariate data sets. While often one-dimensional strip plots are
used to depict such data, one of their main problems is visual clutter which
results from overlap. To reduce this overlap, jitter plots were introduced,
whereby an additional, non-encoding plot dimension is introduced, along which
the data point representing dots are randomly perturbed. Unfortunately, this
randomness can suggest non-existent clusters, and often leads to visually
unappealing plots, in which overlap might still occur. To overcome these
shortcomings, we introduce BlueNoise Plots where random jitter along the
non-encoding plot dimension is replaced by optimizing all dots to keep a
minimum distance in 2D i. e., Blue Noise. We evaluate the effectiveness as well
as the aesthetics of Blue Noise Plots through both, a quantitative and a
qualitative user study. The Python implementation of Blue Noise Plots is
available here.Comment: 9 pages, 16 figure
Streaming histogram sketching for rapid microbiome analytics
Background: The growth in publically available microbiome data in recent years has yielded an invaluable resource for genomic research, allowing for the design of new studies, augmentation of novel datasets and reanalysis of published works. This vast amount of microbiome data, as well as the widespread proliferation of microbiome research and the looming era of clinical metagenomics, means there is an urgent need to develop analytics that can process huge amounts of data in a short amount of time. To address this need, we propose a new method for the compact representation of microbiome sequencing data using similarity-preserving sketches of streaming k-mer spectra. These sketches allow for dissimilarity estimation, rapid microbiome catalogue searching and classification of microbiome samples in near real time. Results: We apply streaming histogram sketching to microbiome samples as a form of dimensionality reduction, creating a compressed ‘histosketch’ that can efficiently represent microbiome k-mer spectra. Using public microbiome datasets, we show that histosketches can be clustered by sample type using the pairwise Jaccard similarity estimation, consequently allowing for rapid microbiome similarity searches via a locality sensitive hashing indexing scheme. Furthermore, we use a ‘real life’ example to show that histosketches can train machine learning classifiers to accurately label microbiome samples. Specifically, using a collection of 108 novel microbiome samples from a cohort of premature neonates, we trained and tested a random forest classifier that could accurately predict whether the neonate had received antibiotic treatment (97% accuracy, 96% precision) and could subsequently be used to classify microbiome data streams in less than 3 s. Conclusions: Our method offers a new approach to rapidly process microbiome data streams, allowing samples to be rapidly clustered, indexed and classified. We also provide our implementation, Histosketching Using Little K-mers (HULK), which can histosketch a typical 2 GB microbiome in 50 s on a standard laptop using four cores, with the sketch occupying 3000 bytes of disk space
Airborne Emissions from 1961 to 2004 of Benzo[a]pyrene from U.S. Vehicles per km of Travel Based on Tunnel Studies
We identified 13 historical measurements of polycyclic aromatic hydrocarbons (PAHs) in U.S. vehicular traffic tunnels that were either directly presented as tailpipe emission factors in μg per vehicle-kilometer or convertible to such a form. Tunnel measurements capture fleet cruise emissions. Emission factors for benzo[a]pyrene (BaP) for a tunnel fleet operating under cruise conditions were highest prior to the 1980s and fell from more than 30-μg per vehicle-km to approximately 2-μg/km in the 1990s, an approximately 15-fold decline. Total annual U.S. (cruise) emissions of BaP dropped by a lesser factor, because total annual km driven increased by a factor of 2.7 during the period. Other PAH compounds measured in tunnels over the 40-year period (e.g., benzo[ghi]perylene, coronene) showed comparable reduction factors in emissions. PAH declines were comparable to those measured in tunnels for carbon monoxide, volatile organic compounds, and particulate organic carbon. The historical PAH “source terms” determined from the data are relevant to quantifying the benefits of emissions control technology and can be used in epidemiological studies evaluating the health effects of exposure, such as those undertaken with breast cancer in New York State
- …