3,822 research outputs found

    Species abundance information improves sequence taxonomy classification accuracy.

    Get PDF
    Popular naive Bayes taxonomic classifiers for amplicon sequences assume that all species in the reference database are equally likely to be observed. We demonstrate that classification accuracy degrades linearly with the degree to which that assumption is violated, and in practice it is always violated. By incorporating environment-specific taxonomic abundance information, we demonstrate a significant increase in the species-level classification accuracy across common sample types. At the species level, overall average error rates decline from 25% to 14%, which is favourably comparable to the error rates that existing classifiers achieve at the genus level (16%). Our findings indicate that for most practical purposes, the assumption that reference species are equally likely to be observed is untenable. q2-clawback provides a straightforward alternative for samples from common environments

    Gauge-Invariant Resummation Formalism and Unitarity in Non-Commutative QED

    Get PDF
    We re-examine the perturbative properties of four-dimensional non-commutative QED by extending the pinch techniques to the theta-deformed case. The explicit independence of the pinched gluon self-energy from gauge-fixing parameters, and the absence of unphysical thresholds in the resummed propagators permits a complete check of the optical theorem for the off-shell two-point function. The known anomalous (tachyonic) dispersion relations are recovered within this framework, as well as their improved version in the (softly broken) SUSY case. These applications should be considered as a first step in constructing gauge-invariant truncations of the Schwinger-Dyson equations in the non-commutative case. An interesting result of our formalism appears when considering the theory in two dimensions: we observe a finite gauge-invariant contribution to the photon mass because of a novel incarnation of IR/UV mixing, which survives the commutative limit when matter is present.Comment: 30 pages, 2 eps figure, uses axodraw. Citations adde

    Level-rank duality of the U(N) WZW model, Chern-Simons theory, and 2d qYM theory

    Get PDF
    We study the WZW, Chern-Simons, and 2d qYM theories with gauge group U(N). The U(N) WZW model is only well-defined for odd level K, and this model is shown to exhibit level-rank duality in a much simpler form than that for SU(N). The U(N) Chern-Simons theory on Seifert manifolds exhibits a similar duality, distinct from the level-rank duality of SU(N) Chern-Simons theory on S^3. When q = e^{2 pi i/(N+K)}, the observables of the 2d U(N) qYM theory can be expressed as a sum over a finite subset of U(N) representations. When N and K are odd, the qYM theory exhibits N K duality, provided q = e^{2 pi i/(N+K)} and theta = 0 mod 2 pi /(N+K).Comment: 19 pages; v2: minor typo corrected, 1 paragraph added, published versio

    On the first Gaussian map for Prym-canonical line bundles

    Full text link
    We prove by degeneration to Prym-canonical binary curves that the first Gaussian map of the Prym canonical line bundle ωCA\omega_C \otimes A is surjective for the general point [C,A] of R_g if g >11, while it is injective if g < 12.Comment: To appear in Geometriae Dedicata. arXiv admin note: text overlap with arXiv:1105.447

    SitePainter: a tool for exploring biogeographical patterns

    Get PDF
    As microbial ecologists take advantage of high-throughput analytical techniques to describe microbial communities across ever-increasing numbers of samples, the need for new analysis tools that reveal the intrinsic spatial patterns and structures of these populations is crucial. Here we present SitePainter, an interactive graphical tool that allows investigators to create or upload pictures of their study site, load diversity analyses data and display both diversity and taxonomy results in a spatial context. Features of SitePainter include: visualizing α -diversity, using taxonomic summaries; visualizing β -diversity, using results from multidimensional scaling methods; and animating relationships among microbial taxa or pathways overtime. SitePainter thus increases the visual power and ability to explore spatially explicit studies

    Geography and Location Are the Primary Drivers of Office Microbiome Composition.

    Get PDF
    In the United States, humans spend the majority of their time indoors, where they are exposed to the microbiome of the built environment (BE) they inhabit. Despite the ubiquity of microbes in BEs and their potential impacts on health and building materials, basic questions about the microbiology of these environments remain unanswered. We present a study on the impacts of geography, material type, human interaction, location in a room, seasonal variation, and indoor and microenvironmental parameters on bacterial communities in offices. Our data elucidate several important features of microbial communities in BEs. First, under normal office environmental conditions, bacterial communities do not differ on the basis of surface material (e.g., ceiling tile or carpet) but do differ on the basis of the location in a room (e.g., ceiling or floor), two features that are often conflated but that we are able to separate here. We suspect that previous work showing differences in bacterial composition with surface material was likely detecting differences based on different usage patterns. Next, we find that offices have city-specific bacterial communities, such that we can accurately predict which city an office microbiome sample is derived from, but office-specific bacterial communities are less apparent. This differs from previous work, which has suggested office-specific compositions of bacterial communities. We again suspect that the difference from prior work arises from different usage patterns. As has been previously shown, we observe that human skin contributes heavily to the composition of BE surfaces. IMPORTANCE Our study highlights several points that should impact the design of future studies of the microbiology of BEs. First, projects tracking changes in BE bacterial communities should focus sampling efforts on surveying different locations in offices and in different cities but not necessarily different materials or different offices in the same city. Next, disturbance due to repeated sampling, though detectable, is small compared to that due to other variables, opening up a range of longitudinal study designs in the BE. Next, studies requiring more samples than can be sequenced on a single sequencing run (which is increasingly common) must control for run effects by including some of the same samples in all of the sequencing runs as technical replicates. Finally, detailed tracking of indoor and material environment covariates is likely not essential for BE microbiome studies, as the normal range of indoor environmental conditions is likely not large enough to impact bacterial communities

    The large-scale blast score ratio (LS-BSR) pipeline: a method to rapidly compare genetic content between bacterial genomes

    Get PDF
    Background. As whole genome sequence data from bacterial isolates becomes cheaper to generate, computational methods are needed to correlate sequence data with biological observations. Here we present the large-scale BLAST score ratio (LS-BSR) pipeline, which rapidly compares the genetic content of hundreds to thousands of bacterial genomes, and returns a matrix that describes the relatedness of all coding sequences (CDSs) in all genomes surveyed. This matrix can be easily parsed in order to identify genetic relationships between bacterial genomes. Although pipelines have been published that group peptides by sequence similarity, no other software performs the rapid, large-scale, full-genome comparative analyses carried out by LS-BSR. Results. To demonstrate the utility of the method, the LS-BSR pipeline was tested on 96 Escherichia coli and Shigella genomes; the pipeline ran in 163 min using 16 processors, which is a greater than 7-fold speedup compared to using a single processor. The BSR values for each CDS, which indicate a relative level of relatedness, were then mapped to each genome on an independent core genome single nucleotide polymorphism (SNP) based phylogeny. Comparisons were then used to identify clade specific CDS markers and validate the LS-BSR pipeline based on molecular markers that delineate between classical E. coli pathogenic variant (pathovar) designations. Scalability tests demonstrated that the LS-BSR pipeline can process 1,000 E. coli genomes in 27-57 h, depending upon the alignment method, using 16 processors. Conclusions. LS-BSR is an open-source, parallel implementation of the BSR algorithm, enabling rapid comparison of the genetic content of large numbers of genomes. The results of the pipeline can be used to identify specific markers between user-defined phylogenetic groups, and to identify the loss and/or acquisition of genetic information between bacterial isolates. Taxa-specific genetic markers can then be translated into clinical diagnostics, or can be used to identify broadly conserved putative therapeutic candidates

    mockrobiota: a Public Resource for Microbiome Bioinformatics Benchmarking.

    Get PDF
    Mock communities are an important tool for validating, optimizing, and comparing bioinformatics methods for microbial community analysis. We present mockrobiota, a public resource for sharing, validating, and documenting mock community data resources, available at http://caporaso-lab.github.io/mockrobiota/. The materials contained in mockrobiota include data set and sample metadata, expected composition data (taxonomy or gene annotations or reference sequences for mock community members), and links to raw data (e.g., raw sequence data) for each mock community data set. mockrobiota does not supply physical sample materials directly, but the data set metadata included for each mock community indicate whether physical sample materials are available. At the time of this writing, mockrobiota contains 11 mock community data sets with known species compositions, including bacterial, archaeal, and eukaryotic mock communities, analyzed by high-throughput marker gene sequencing. IMPORTANCE The availability of standard and public mock community data will facilitate ongoing method optimizations, comparisons across studies that share source data, and greater transparency and access and eliminate redundancy. These are also valuable resources for bioinformatics teaching and training. This dynamic resource is intended to expand and evolve to meet the changing needs of the omics community
    corecore