51,682 research outputs found
Species-level functional profiling of metagenomes and metatranscriptomes.
Functional profiles of microbial communities are typically generated using comprehensive metagenomic or metatranscriptomic sequence read searches, which are time-consuming, prone to spurious mapping, and often limited to community-level quantification. We developed HUMAnN2, a tiered search strategy that enables fast, accurate, and species-resolved functional profiling of host-associated and environmental communities. HUMAnN2 identifies a community's known species, aligns reads to their pangenomes, performs translated search on unclassified reads, and finally quantifies gene families and pathways. Relative to pure translated search, HUMAnN2 is faster and produces more accurate gene family profiles. We applied HUMAnN2 to study clinal variation in marine metabolism, ecological contribution patterns among human microbiome pathways, variation in species' genomic versus transcriptional contributions, and strain profiling. Further, we introduce 'contributional diversity' to explain patterns of ecological assembly across different microbial community types
Middleware-based Database Replication: The Gaps between Theory and Practice
The need for high availability and performance in data management systems has
been fueling a long running interest in database replication from both academia
and industry. However, academic groups often attack replication problems in
isolation, overlooking the need for completeness in their solutions, while
commercial teams take a holistic approach that often misses opportunities for
fundamental innovation. This has created over time a gap between academic
research and industrial practice.
This paper aims to characterize the gap along three axes: performance,
availability, and administration. We build on our own experience developing and
deploying replication systems in commercial and academic settings, as well as
on a large body of prior related work. We sift through representative examples
from the last decade of open-source, academic, and commercial database
replication systems and combine this material with case studies from real
systems deployed at Fortune 500 customers. We propose two agendas, one for
academic research and one for industrial R&D, which we believe can bridge the
gap within 5-10 years. This way, we hope to both motivate and help researchers
in making the theory and practice of middleware-based database replication more
relevant to each other.Comment: 14 pages. Appears in Proc. ACM SIGMOD International Conference on
Management of Data, Vancouver, Canada, June 200
Distinct genealogies for plasmids and chromosome
An earlier perspective on the diversity of conjugative elements in microbes [1] attempted to provide a broad audience with an introductory overview of the arcane biology of mobile genetic elements and their terminologies. It might well have been entitled "Plasmids, ICEs, IMEs, and Other Mobile Elements for Dummies," but common sense prevailed. This perspective introduces two related articles in the current issue of PLOS Genetics [2,3] and might have equally aptly been entitled "Antibiotic-Resistant Plasmids and Their Epidemiology for Dummies.
A machine learning route between band mapping and band structure
The electronic band structure (BS) of solid state materials imprints the
multidimensional and multi-valued functional relations between energy and
momenta of periodically confined electrons. Photoemission spectroscopy is a
powerful tool for its comprehensive characterization. A common task in
photoemission band mapping is to recover the underlying quasiparticle
dispersion, which we call band structure reconstruction. Traditional methods
often focus on specific regions of interests yet require extensive human
oversight. To cope with the growing size and scale of photoemission data, we
develop a generic machine-learning approach leveraging the information within
electronic structure calculations for this task. We demonstrate its capability
by reconstructing all fourteen valence bands of tungsten diselenide and
validate the accuracy on various synthetic data. The reconstruction uncovers
previously inaccessible momentum-space structural information on both global
and local scales in conjunction with theory, while realizing a path towards
integrating band mapping data into materials science databases
Categorical Dimensions of Human Odor Descriptor Space Revealed by Non-Negative Matrix Factorization
In contrast to most other sensory modalities, the basic perceptual dimensions of olfaction remain unclear. Here, we use non-negative matrix factorization (NMF) – a dimensionality reduction technique – to uncover structure in a panel of odor profiles, with each odor defined as a point in multi-dimensional descriptor space. The properties of NMF are favorable for the analysis of such lexical and perceptual data, and lead to a high-dimensional account of odor space. We further provide evidence that odor dimensions apply categorically. That is, odor space is not occupied homogenously, but rather in a discrete and intrinsically clustered manner. We discuss the potential implications of these results for the neural coding of odors, as well as for developing classifiers on larger datasets that may be useful for predicting perceptual qualities from chemical structures
High Energy Physics Forum for Computational Excellence: Working Group Reports (I. Applications Software II. Software Libraries and Tools III. Systems)
Computing plays an essential role in all aspects of high energy physics. As
computational technology evolves rapidly in new directions, and data throughput
and volume continue to follow a steep trend-line, it is important for the HEP
community to develop an effective response to a series of expected challenges.
In order to help shape the desired response, the HEP Forum for Computational
Excellence (HEP-FCE) initiated a roadmap planning activity with two key
overlapping drivers -- 1) software effectiveness, and 2) infrastructure and
expertise advancement. The HEP-FCE formed three working groups, 1) Applications
Software, 2) Software Libraries and Tools, and 3) Systems (including systems
software), to provide an overview of the current status of HEP computing and to
present findings and opportunities for the desired HEP computational roadmap.
The final versions of the reports are combined in this document, and are
presented along with introductory material.Comment: 72 page
Statistical identification with hidden Markov models of large order splitting strategies in an equity market
Large trades in a financial market are usually split into smaller parts and
traded incrementally over extended periods of time. We address these large
trades as hidden orders. In order to identify and characterize hidden orders we
fit hidden Markov models to the time series of the sign of the tick by tick
inventory variation of market members of the Spanish Stock Exchange. Our
methodology probabilistically detects trading sequences, which are
characterized by a net majority of buy or sell transactions. We interpret these
patches of sequential buying or selling transactions as proxies of the traded
hidden orders. We find that the time, volume and number of transactions size
distributions of these patches are fat tailed. Long patches are characterized
by a high fraction of market orders and a low participation rate, while short
patches have a large fraction of limit orders and a high participation rate. We
observe the existence of a buy-sell asymmetry in the number, average length,
average fraction of market orders and average participation rate of the
detected patches. The detected asymmetry is clearly depending on the local
market trend. We also compare the hidden Markov models patches with those
obtained with the segmentation method used in Vaglica {\it et al.} (2008) and
we conclude that the former ones can be interpreted as a partition of the
latter ones.Comment: 26 pages, 12 figure
- …