1,071 research outputs found

    Microbiome profiling by Illumina sequencing of combinatorial sequence-tagged PCR products

    Get PDF
    We developed a low-cost, high-throughput microbiome profiling method that uses combinatorial sequence tags attached to PCR primers that amplify the rRNA V6 region. Amplified PCR products are sequenced using an Illumina paired-end protocol to generate millions of overlapping reads. Combinatorial sequence tagging can be used to examine hundreds of samples with far fewer primers than is required when sequence tags are incorporated at only a single end. The number of reads generated permitted saturating or near-saturating analysis of samples of the vaginal microbiome. The large number of reads al- lowed an in-depth analysis of errors, and we found that PCR-induced errors composed the vast majority of non-organism derived species variants, an ob- servation that has significant implications for sequence clustering of similar high-throughput data. We show that the short reads are sufficient to assign organisms to the genus or species level in most cases. We suggest that this method will be useful for the deep sequencing of any short nucleotide region that is taxonomically informative; these include the V3, V5 regions of the bac- terial 16S rRNA genes and the eukaryotic V9 region that is gaining popularity for sampling protist diversity.Comment: 28 pages, 13 figure

    CUDA compatible GPU cards as efficient hardware accelerators for Smith-Waterman sequence alignment

    Get PDF
    Background Searching for similarities in protein and DNA databases has become a routine procedure in Molecular Biology. The Smith-Waterman algorithm has been available for more than 25 years. It is based on a dynamic programming approach that explores all the possible alignments between two sequences; as a result it returns the optimal local alignment. Unfortunately, the computational cost is very high, requiring a number of operations proportional to the product of the length of two sequences. Furthermore, the exponential growth of protein and DNA databases makes the Smith-Waterman algorithm unrealistic for searching similarities in large sets of sequences. For these reasons heuristic approaches such as those implemented in FASTA and BLAST tend to be preferred, allowing faster execution times at the cost of reduced sensitivity. The main motivation of our work is to exploit the huge computational power of commonly available graphic cards, to develop high performance solutions for sequence alignment. Results In this paper we present what we believe is the fastest solution of the exact Smith-Waterman algorithm running on commodity hardware. It is implemented in the recently released CUDA programming environment by NVidia. CUDA allows direct access to the hardware primitives of the last-generation Graphics Processing Units (GPU) G80. Speeds of more than 3.5 GCUPS (Giga Cell Updates Per Second) are achieved on a workstation running two GeForce 8800 GTX. Exhaustive tests have been done to compare our implementation to SSEARCH and BLAST, running on a 3 GHz Intel Pentium IV processor. Our solution was also compared to a recently published GPU implementation and to a Single Instruction Multiple Data (SIMD) solution. These tests show that our implementation performs from 2 to 30 times faster than any other previous attempt available on commodity hardware. Conclusions The results show that graphic cards are now sufficiently advanced to be used as efficient hardware accelerators for sequence alignment. Their performance is better than any alternative available on commodity hardware platforms. The solution presented in this paper allows large scale alignments to be performed at low cost, using the exact Smith-Waterman algorithm instead of the largely adopted heuristic approaches

    Winter Bird Assemblages in Rural and Urban Environments: A National Survey

    Get PDF
    Urban development has a marked effect on the ecological and behavioural traits of many living organisms, including birds. In this paper, we analysed differences in the numbers of wintering birds between rural and urban areas in Poland. We also analysed species richness and abundance in relation to longitude, latitude, human population size, and landscape structure. All these parameters were analysed using modern statistical techniques incorporating species detectability. We counted birds in 156 squares (0.25 km2 each) in December 2012 and again in January 2013 in locations in and around 26 urban areas across Poland (in each urban area we surveyed 3 squares and 3 squares in nearby rural areas). The influence of twelve potential environmental variables on species abundance and richness was assessed with Generalized Linear Mixed Models, Principal Components and Detrended Correspondence Analyses. Totals of 72 bird species and 89,710 individual birds were recorded in this study. On average (±SE) 13.3 ± 0.3 species and 288 ± 14 individuals were recorded in each square in each survey. A formal comparison of rural and urban areas revealed that 27 species had a significant preference; 17 to rural areas and 10 to urban areas. Moreover, overall abundance in urban areas was more than double that of rural areas. There was almost a complete separation of rural and urban bird communities. Significantly more birds and more bird species were recorded in January compared to December. We conclude that differences between rural and urban areas in terms of winter conditions and the availability of resources are reflected in different bird communities in the two environments

    Estimating Animal Abundance in Ground Beef Batches Assayed with Molecular Markers

    Get PDF
    Estimating animal abundance in industrial scale batches of ground meat is important for mapping meat products through the manufacturing process and for effectively tracing the finished product during a food safety recall. The processing of ground beef involves a potentially large number of animals from diverse sources in a single product batch, which produces a high heterogeneity in capture probability. In order to estimate animal abundance through DNA profiling of ground beef constituents, two parameter-based statistical models were developed for incidence data. Simulations were applied to evaluate the maximum likelihood estimate (MLE) of a joint likelihood function from multiple surveys, showing superiority in the presence of high capture heterogeneity with small sample sizes, or comparable estimation in the presence of low capture heterogeneity with a large sample size when compared to other existing models. Our model employs the full information on the pattern of the capture-recapture frequencies from multiple samples. We applied the proposed models to estimate animal abundance in six manufacturing beef batches, genotyped using 30 single nucleotide polymorphism (SNP) markers, from a large scale beef grinding facility. Results show that between 411∼1367 animals were present in six manufacturing beef batches. These estimates are informative as a reference for improving recall processes and tracing finished meat products back to source

    Measuring Global Credibility with Application to Local Sequence Alignment

    Get PDF
    Computational biology is replete with high-dimensional (high-D) discrete prediction and inference problems, including sequence alignment, RNA structure prediction, phylogenetic inference, motif finding, prediction of pathways, and model selection problems in statistical genetics. Even though prediction and inference in these settings are uncertain, little attention has been focused on the development of global measures of uncertainty. Regardless of the procedure employed to produce a prediction, when a procedure delivers a single answer, that answer is a point estimate selected from the solution ensemble, the set of all possible solutions. For high-D discrete space, these ensembles are immense, and thus there is considerable uncertainty. We recommend the use of Bayesian credibility limits to describe this uncertainty, where a (1−α)%, 0≤α≤1, credibility limit is the minimum Hamming distance radius of a hyper-sphere containing (1−α)% of the posterior distribution. Because sequence alignment is arguably the most extensively used procedure in computational biology, we employ it here to make these general concepts more concrete. The maximum similarity estimator (i.e., the alignment that maximizes the likelihood) and the centroid estimator (i.e., the alignment that minimizes the mean Hamming distance from the posterior weighted ensemble of alignments) are used to demonstrate the application of Bayesian credibility limits to alignment estimators. Application of Bayesian credibility limits to the alignment of 20 human/rodent orthologous sequence pairs and 125 orthologous sequence pairs from six Shewanella species shows that credibility limits of the alignments of promoter sequences of these species vary widely, and that centroid alignments dependably have tighter credibility limits than traditional maximum similarity alignments

    PLAST: parallel local alignment search tool for database comparison

    Get PDF
    Background: Sequence similarity searching is an important and challenging task in molecular biology and next-generation sequencing should further strengthen the need for faster algorithms to process such vast amounts of data. At the same time, the internal architecture of current microprocessors is tending towards more parallelism, leading to the use of chips with two, four and more cores integrated on the same die. The main purpose of this work was to design an effective algorithm to fit with the parallel capabilities of modern microprocessors. Results: A parallel algorithm for comparing large genomic banks and targeting middle-range computers has been developed and implemented in PLAST software. The algorithm exploits two key parallel features of existing and future microprocessors: the SIMD programming model (SSE instruction set) and the multithreading concept (multicore). Compared to multithreaded BLAST software, tests performed on an 8-processor server have shown speedup ranging from 3 to 6 with a similar level of accuracy. Conclusions: A parallel algorithmic approach driven by the knowledge of the internal microprocessor architecture allows significant speedup to be obtained while preserving standard sensitivity for similarity search problems.

    Numerical study of circulation on the inner Amazon Shelf

    Get PDF
    Author Posting. © Springer, 2008. This is the author's version of the work. It is posted here by permission of Springer for personal use, not for redistribution. The definitive version was published in Ocean Dynamics 58 (2008): 187-198, doi:10.1007/s10236-008-0139-4.We studied the circulation on the coastal domain of the Amazon Shelf by applying the hydrodynamic module of the Estuarine and Coastal Ocean Model and Sediment Transport - ECOMSED. The first barotropic experiment aimed to explain the major bathymetric effects on tides and those generated by anisotropy in sediment distribution. We analyzed the continental shelf response of barotropic tides under realistic bottom stress parametrization (Cd), considering sediment granulometry obtained from a faciologic map, where river mud deposits and reworked sediments areas are well distinguished, among others classes of sediments. Very low Cd values were set in the fluid mud regions off the Amapa coast (1.0 10-4 ), in contrast to values around 3:5 10-3 for coarser sediment regions off the Para coast. Three-dimensional experiments represented the Amazon River discharge and trade winds, combined to barotropic tide influences and induced vertical mixing. The quasi-resonant response of the Amazon Shelf to the M2 tide act on the local hydrodynamics by increasing tidal admittance, along with tidal forcing at the shelf break and extensive fluid mud regions. Harmonic analysis of modeled currents agreed well with analysis of the AMASSEDS observational data set. Tidal-induced vertical shear provided strong homogenization of threshold waters, which are subject to a kind of hydraulic control due to the topographic steepness. Ahead of the hydraulic jump, the low-salinity plume is disconnected from the bottom and acquires negative vorticity, turning southeastward. Tides act as a generator mechanism and topography, via hydraulic control, as a maintainer mechanism for the low-salinity frontal zone positioning. Tidally induced southeastward plume fate is overwhelmed by northwestward trade winds so that, along with background circulation, probably play the most important role on the plume fate and variability over the Amazon Shelf

    3-Methyl-1-butanol production in Escherichia coli: random mutagenesis and two-phase fermentation

    Get PDF
    Interest in producing biofuels from renewable sources has escalated due to energy and environmental concerns. Recently, the production of higher chain alcohols from 2-keto acid pathways has shown significant progress. In this paper, we demonstrate a mutagenesis approach in developing a strain of Escherichia coli for the production of 3-methyl-1-butanol by leveraging selective pressure toward l-leucine biosynthesis and screening for increased alcohol production. Random mutagenesis and selection with 4-aza-d,l-leucine, a structural analogue to l-leucine, resulted in the development of a new strain of E. coli able to produce 4.4 g/L of 3-methyl-1-butanol. Investigation of the host’s sensitivity to 3-methyl-1-butanol directed development of a two-phase fermentation process in which titers reached 9.5 g/L of 3-methyl-1-butanol with a yield of 0.11 g/g glucose after 60 h
    corecore