28,535 research outputs found
An operational definition of quark and gluon jets
While "quark" and "gluon" jets are often treated as separate, well-defined
objects in both theoretical and experimental contexts, no precise, practical,
and hadron-level definition of jet flavor presently exists. To remedy this
issue, we develop and advocate for a data-driven, operational definition of
quark and gluon jets that is readily applicable at colliders. Rather than
specifying a per-jet flavor label, we aggregately define quark and gluon jets
at the distribution level in terms of measured hadronic cross sections.
Intuitively, quark and gluon jets emerge as the two maximally separable
categories within two jet samples in data. Benefiting from recent work on
data-driven classifiers and topic modeling for jets, we show that the practical
tools needed to implement our definition already exist for experimental
applications. As an informative example, we demonstrate the power of our
operational definition using Z+jet and dijet samples, illustrating that pure
quark and gluon distributions and fractions can be successfully extracted in a
fully well-defined manner.Comment: 38 pages, 10 figures, 1 table; v2: updated to match JHEP versio
Bayesian learning of joint distributions of objects
There is increasing interest in broad application areas in defining flexible
joint models for data having a variety of measurement scales, while also
allowing data of complex types, such as functions, images and documents. We
consider a general framework for nonparametric Bayes joint modeling through
mixture models that incorporate dependence across data types through a joint
mixing measure. The mixing measure is assigned a novel infinite tensor
factorization (ITF) prior that allows flexible dependence in cluster allocation
across data types. The ITF prior is formulated as a tensor product of
stick-breaking processes. Focusing on a convenient special case corresponding
to a Parafac factorization, we provide basic theory justifying the flexibility
of the proposed prior and resulting asymptotic properties. Focusing on ITF
mixtures of product kernels, we develop a new Gibbs sampling algorithm for
routine implementation relying on slice sampling. The methods are compared with
alternative joint mixture models based on Dirichlet processes and related
approaches through simulations and real data applications.Comment: Appearing in Proceedings of the 16th International Conference on
Artificial Intelligence and Statistics (AISTATS) 2013, Scottsdale, AZ, US
Learning Topic Models and Latent Bayesian Networks Under Expansion Constraints
Unsupervised estimation of latent variable models is a fundamental problem
central to numerous applications of machine learning and statistics. This work
presents a principled approach for estimating broad classes of such models,
including probabilistic topic models and latent linear Bayesian networks, using
only second-order observed moments. The sufficient conditions for
identifiability of these models are primarily based on weak expansion
constraints on the topic-word matrix, for topic models, and on the directed
acyclic graph, for Bayesian networks. Because no assumptions are made on the
distribution among the latent variables, the approach can handle arbitrary
correlations among the topics or latent factors. In addition, a tractable
learning method via optimization is proposed and studied in numerical
experiments.Comment: 38 pages, 6 figures, 2 tables, applications in topic models and
Bayesian networks are studied. Simulation section is adde
Modeling Documents as Mixtures of Persons for Expert Finding
In this paper we address the problem of searching for knowledgeable
persons within the enterprise, known as the expert finding (or
expert search) task. We present a probabilistic algorithm using the assumption
that terms in documents are produced by people who are mentioned
in them.We represent documents retrieved to a query as mixtures
of candidate experts language models. Two methods of personal language
models extraction are proposed, as well as the way of combining
them with other evidences of expertise. Experiments conducted with the
TREC Enterprise collection demonstrate the superiority of our approach
in comparison with the best one among existing solutions
On the Topic of Jets: Disentangling Quarks and Gluons at Colliders
We introduce jet topics: a framework to identify underlying classes of jets
from collider data. Because of a close mathematical relationship between
distributions of observables in jets and emergent themes in sets of documents,
we can apply recent techniques in "topic modeling" to extract jet topics from
data with minimal or no input from simulation or theory. As a proof of concept
with parton shower samples, we apply jet topics to determine separate quark and
gluon jet distributions for constituent multiplicity. We also determine
separate quark and gluon rapidity spectra from a mixed Z-plus-jet sample. While
jet topics are defined directly from hadron-level multi-differential cross
sections, one can also predict jet topics from first-principles theoretical
calculations, with potential implications for how to define quark and gluon
jets beyond leading-logarithmic accuracy. These investigations suggest that jet
topics will be useful for extracting underlying jet distributions and fractions
in a wide range of contexts at the Large Hadron Collider.Comment: 8 pages, 4 figures, 1 table. v2: Improved discussion to match PRL
versio
- ā¦