51,682 research outputs found

    Species-level functional profiling of metagenomes and metatranscriptomes.

    Get PDF
    Functional profiles of microbial communities are typically generated using comprehensive metagenomic or metatranscriptomic sequence read searches, which are time-consuming, prone to spurious mapping, and often limited to community-level quantification. We developed HUMAnN2, a tiered search strategy that enables fast, accurate, and species-resolved functional profiling of host-associated and environmental communities. HUMAnN2 identifies a community's known species, aligns reads to their pangenomes, performs translated search on unclassified reads, and finally quantifies gene families and pathways. Relative to pure translated search, HUMAnN2 is faster and produces more accurate gene family profiles. We applied HUMAnN2 to study clinal variation in marine metabolism, ecological contribution patterns among human microbiome pathways, variation in species' genomic versus transcriptional contributions, and strain profiling. Further, we introduce 'contributional diversity' to explain patterns of ecological assembly across different microbial community types

    Middleware-based Database Replication: The Gaps between Theory and Practice

    Get PDF
    The need for high availability and performance in data management systems has been fueling a long running interest in database replication from both academia and industry. However, academic groups often attack replication problems in isolation, overlooking the need for completeness in their solutions, while commercial teams take a holistic approach that often misses opportunities for fundamental innovation. This has created over time a gap between academic research and industrial practice. This paper aims to characterize the gap along three axes: performance, availability, and administration. We build on our own experience developing and deploying replication systems in commercial and academic settings, as well as on a large body of prior related work. We sift through representative examples from the last decade of open-source, academic, and commercial database replication systems and combine this material with case studies from real systems deployed at Fortune 500 customers. We propose two agendas, one for academic research and one for industrial R&D, which we believe can bridge the gap within 5-10 years. This way, we hope to both motivate and help researchers in making the theory and practice of middleware-based database replication more relevant to each other.Comment: 14 pages. Appears in Proc. ACM SIGMOD International Conference on Management of Data, Vancouver, Canada, June 200

    Distinct genealogies for plasmids and chromosome

    Get PDF
    An earlier perspective on the diversity of conjugative elements in microbes [1] attempted to provide a broad audience with an introductory overview of the arcane biology of mobile genetic elements and their terminologies. It might well have been entitled "Plasmids, ICEs, IMEs, and Other Mobile Elements for Dummies," but common sense prevailed. This perspective introduces two related articles in the current issue of PLOS Genetics [2,3] and might have equally aptly been entitled "Antibiotic-Resistant Plasmids and Their Epidemiology for Dummies.

    A machine learning route between band mapping and band structure

    Get PDF
    The electronic band structure (BS) of solid state materials imprints the multidimensional and multi-valued functional relations between energy and momenta of periodically confined electrons. Photoemission spectroscopy is a powerful tool for its comprehensive characterization. A common task in photoemission band mapping is to recover the underlying quasiparticle dispersion, which we call band structure reconstruction. Traditional methods often focus on specific regions of interests yet require extensive human oversight. To cope with the growing size and scale of photoemission data, we develop a generic machine-learning approach leveraging the information within electronic structure calculations for this task. We demonstrate its capability by reconstructing all fourteen valence bands of tungsten diselenide and validate the accuracy on various synthetic data. The reconstruction uncovers previously inaccessible momentum-space structural information on both global and local scales in conjunction with theory, while realizing a path towards integrating band mapping data into materials science databases

    Categorical Dimensions of Human Odor Descriptor Space Revealed by Non-Negative Matrix Factorization

    Get PDF
    In contrast to most other sensory modalities, the basic perceptual dimensions of olfaction remain unclear. Here, we use non-negative matrix factorization (NMF) – a dimensionality reduction technique – to uncover structure in a panel of odor profiles, with each odor defined as a point in multi-dimensional descriptor space. The properties of NMF are favorable for the analysis of such lexical and perceptual data, and lead to a high-dimensional account of odor space. We further provide evidence that odor dimensions apply categorically. That is, odor space is not occupied homogenously, but rather in a discrete and intrinsically clustered manner. We discuss the potential implications of these results for the neural coding of odors, as well as for developing classifiers on larger datasets that may be useful for predicting perceptual qualities from chemical structures

    High Energy Physics Forum for Computational Excellence: Working Group Reports (I. Applications Software II. Software Libraries and Tools III. Systems)

    Full text link
    Computing plays an essential role in all aspects of high energy physics. As computational technology evolves rapidly in new directions, and data throughput and volume continue to follow a steep trend-line, it is important for the HEP community to develop an effective response to a series of expected challenges. In order to help shape the desired response, the HEP Forum for Computational Excellence (HEP-FCE) initiated a roadmap planning activity with two key overlapping drivers -- 1) software effectiveness, and 2) infrastructure and expertise advancement. The HEP-FCE formed three working groups, 1) Applications Software, 2) Software Libraries and Tools, and 3) Systems (including systems software), to provide an overview of the current status of HEP computing and to present findings and opportunities for the desired HEP computational roadmap. The final versions of the reports are combined in this document, and are presented along with introductory material.Comment: 72 page

    Statistical identification with hidden Markov models of large order splitting strategies in an equity market

    Full text link
    Large trades in a financial market are usually split into smaller parts and traded incrementally over extended periods of time. We address these large trades as hidden orders. In order to identify and characterize hidden orders we fit hidden Markov models to the time series of the sign of the tick by tick inventory variation of market members of the Spanish Stock Exchange. Our methodology probabilistically detects trading sequences, which are characterized by a net majority of buy or sell transactions. We interpret these patches of sequential buying or selling transactions as proxies of the traded hidden orders. We find that the time, volume and number of transactions size distributions of these patches are fat tailed. Long patches are characterized by a high fraction of market orders and a low participation rate, while short patches have a large fraction of limit orders and a high participation rate. We observe the existence of a buy-sell asymmetry in the number, average length, average fraction of market orders and average participation rate of the detected patches. The detected asymmetry is clearly depending on the local market trend. We also compare the hidden Markov models patches with those obtained with the segmentation method used in Vaglica {\it et al.} (2008) and we conclude that the former ones can be interpreted as a partition of the latter ones.Comment: 26 pages, 12 figure
    • …
    corecore