947 research outputs found
Expressive Monotonic Neural Networks
The monotonic dependence of the outputs of a neural network on some of its
inputs is a crucial inductive bias in many scenarios where domain knowledge
dictates such behavior. This is especially important for interpretability and
fairness considerations. In a broader context, scenarios in which monotonicity
is important can be found in finance, medicine, physics, and other disciplines.
It is thus desirable to build neural network architectures that implement this
inductive bias provably. In this work, we propose a weight-constrained
architecture with a single residual connection to achieve exact monotonic
dependence in any subset of the inputs. The weight constraint scheme directly
controls the Lipschitz constant of the neural network and thus provides the
additional benefit of robustness. Compared to currently existing techniques
used for monotonicity, our method is simpler in implementation and in theory
foundations, has negligible computational overhead, is guaranteed to produce
monotonic dependence, and is highly expressive. We show how the algorithm is
used to train powerful, robust, and interpretable discriminators that achieve
competitive performance compared to current state-of-the-art methods across
various benchmarks, from social applications to the classification of the
decays of subatomic particles produced at the CERN Large Hadron Collider.Comment: 9 pages, 4 figures, ICLR 2023 final submissio
Finding NEEMo: Geometric Fitting using Neural Estimation of the Energy Mover's Distance
A novel neural architecture was recently developed that enforces an exact
upper bound on the Lipschitz constant of the model by constraining the norm of
its weights in a minimal way, resulting in higher expressiveness compared to
other techniques. We present a new and interesting direction for this
architecture: estimation of the Wasserstein metric (Earth Mover's Distance) in
optimal transport by employing the Kantorovich-Rubinstein duality to enable its
use in geometric fitting applications. Specifically, we focus on the field of
high-energy particle physics, where it has been shown that a metric for the
space of particle-collider events can be defined based on the Wasserstein
metric, referred to as the Energy Mover's Distance (EMD). This metrization has
the potential to revolutionize data-driven collider phenomenology. The work
presented here represents a major step towards realizing this goal by providing
a differentiable way of directly calculating the EMD. We show how the
flexibility that our approach enables can be used to develop novel clustering
algorithms.Comment: 5 pages, 4 figure
DiSK: A Diffusion Model for Structured Knowledge
Structured (dictionary-like) data presents challenges for left-to-right
language models, as they can struggle with structured entities for a wide
variety of reasons such as formatting and sensitivity to the order in which
attributes are presented. Tabular generative models suffer from a different set
of limitations such as their lack of flexibility. We introduce Diffusion Models
of Structured Knowledge (DiSK) - a new architecture and training approach
specialized for structured data. DiSK handles text, categorical, and continuous
numerical data using a Gaussian mixture model approach, which allows for
improved precision when dealing with numbers. It employs diffusion training to
model relationships between properties. Experiments demonstrate DiSK's
state-of-the-art performance on tabular data modeling, synthesis, and
imputation on over 15 datasets across diverse domains. DiSK provides an
effective inductive bias for generative modeling and manipulation of structured
data. The techniques we propose could open the door to improved knowledge
manipulation in future language models.Comment: 24 pages, 12 figure
NuCLR: Nuclear Co-Learned Representations
We introduce Nuclear Co-Learned Representations (NuCLR), a deep learning
model that predicts various nuclear observables, including binding and decay
energies, and nuclear charge radii. The model is trained using a multi-task
approach with shared representations and obtains state-of-the-art performance,
achieving levels of precision that are crucial for understanding fundamental
phenomena in nuclear (astro)physics. We also report an intriguing finding that
the learned representation of NuCLR exhibits the prominent emergence of crucial
aspects of the nuclear shell model, namely the shell structure, including the
well-known magic numbers, and the Pauli Exclusion Principle. This suggests that
the model is capable of capturing the underlying physical principles and that
our approach has the potential to offer valuable insights into nuclear theory.Comment: 5 pages, 3 figure
The Factorization Curse: Which Tokens You Predict Underlie the Reversal Curse and More
Today's best language models still struggle with hallucinations: factually
incorrect generations, which impede their ability to reliably retrieve
information seen during training. The reversal curse, where models cannot
recall information when probed in a different order than was encountered during
training, exemplifies this in information retrieval. We reframe the reversal
curse as a factorization curse - a failure of models to learn the same joint
distribution under different factorizations. Through a series of controlled
experiments with increasing levels of realism including WikiReversal, a setting
we introduce to closely simulate a knowledge intensive finetuning task, we find
that the factorization curse is an inherent failure of the next-token
prediction objective used in popular large language models. Moreover, we
demonstrate reliable information retrieval cannot be solved with scale,
reversed tokens, or even naive bidirectional-attention training. Consequently,
various approaches to finetuning on specialized data would necessarily provide
mixed results on downstream tasks, unless the model has already seen the right
sequence of tokens. Across five tasks of varying levels of complexity, our
results uncover a promising path forward: factorization-agnostic objectives can
significantly mitigate the reversal curse and hint at improved knowledge
storage and planning capabilities.Comment: 18 pages, 7 figure
From Neurons to Neutrons: A Case Study in Interpretability
Mechanistic Interpretability (MI) promises a path toward fully understanding
how neural networks make their predictions. Prior work demonstrates that even
when trained to perform simple arithmetic, models can implement a variety of
algorithms (sometimes concurrently) depending on initialization and
hyperparameters. Does this mean neuron-level interpretability techniques have
limited applicability? We argue that high-dimensional neural networks can learn
low-dimensional representations of their training data that are useful beyond
simply making good predictions. Such representations can be understood through
the mechanistic interpretability lens and provide insights that are
surprisingly faithful to human-derived domain knowledge. This indicates that
such approaches to interpretability can be useful for deriving a new
understanding of a problem from models trained to solve it. As a case study, we
extract nuclear physics concepts by studying models trained to reproduce
nuclear data.Comment: International Conference on Machine Learning (ICML) 202
Machine learning discovery of cost-efficient dry cooler designs for concentrated solar power plants
Concentrated solar power (CSP) is one of the few sustainable energy technologies that offers day-to-night energy storage. Recent development of the supercritical carbon dioxide (sCO2) Brayton cycle has made CSP a potentially cost-competitive energy source. However, as CSP plants are most efficient in desert regions, where there is high solar irradiance and low land cost, careful design of a dry cooling system is crucial to make CSP practical. In this work, we present a machine learning system to optimize the factory design and configuration of a dry cooling system for an sCO2 Brayton cycle CSP plant. For this, we develop a physics-based simulation of the cooling properties of an air-cooled heat exchanger. The simulator is able to construct a dry cooling system satisfying a wide variety of power cycle requirements (e.g., 10–100 MW) for any surface air temperature. Using this simulator, we leverage recent results in high-dimensional Bayesian optimization to optimize dry cooler designs that minimize lifetime cost for a given location, reducing this cost by 67% compared to recently proposed designs. Our simulation and optimization framework can increase the development pace of economically-viable sustainable energy generation systems
Study of the decay
The decay is studied
in proton-proton collisions at a center-of-mass energy of TeV
using data corresponding to an integrated luminosity of 5
collected by the LHCb experiment. In the system, the
state observed at the BaBar and Belle experiments is
resolved into two narrower states, and ,
whose masses and widths are measured to be where the first uncertainties are statistical and the second
systematic. The results are consistent with a previous LHCb measurement using a
prompt sample. Evidence of a new
state is found with a local significance of , whose mass and width
are measured to be and , respectively. In addition, evidence of a new decay mode
is found with a significance of
. The relative branching fraction of with respect to the
decay is measured to be , where the first
uncertainty is statistical, the second systematic and the third originates from
the branching fractions of charm hadron decays.Comment: All figures and tables, along with any supplementary material and
additional information, are available at
https://cern.ch/lhcbproject/Publications/p/LHCb-PAPER-2022-028.html (LHCb
public pages
Multidifferential study of identified charged hadron distributions in -tagged jets in proton-proton collisions at 13 TeV
Jet fragmentation functions are measured for the first time in proton-proton
collisions for charged pions, kaons, and protons within jets recoiling against
a boson. The charged-hadron distributions are studied longitudinally and
transversely to the jet direction for jets with transverse momentum 20 GeV and in the pseudorapidity range . The
data sample was collected with the LHCb experiment at a center-of-mass energy
of 13 TeV, corresponding to an integrated luminosity of 1.64 fb. Triple
differential distributions as a function of the hadron longitudinal momentum
fraction, hadron transverse momentum, and jet transverse momentum are also
measured for the first time. This helps constrain transverse-momentum-dependent
fragmentation functions. Differences in the shapes and magnitudes of the
measured distributions for the different hadron species provide insights into
the hadronization process for jets predominantly initiated by light quarks.Comment: All figures and tables, along with machine-readable versions and any
supplementary material and additional information, are available at
https://cern.ch/lhcbproject/Publications/p/LHCb-PAPER-2022-013.html (LHCb
public pages
Measurement of the ratios of branching fractions and
The ratios of branching fractions
and are measured, assuming isospin symmetry, using a
sample of proton-proton collision data corresponding to 3.0 fb of
integrated luminosity recorded by the LHCb experiment during 2011 and 2012. The
tau lepton is identified in the decay mode
. The measured values are
and
, where the first uncertainty is
statistical and the second is systematic. The correlation between these
measurements is . Results are consistent with the current average
of these quantities and are at a combined 1.9 standard deviations from the
predictions based on lepton flavor universality in the Standard Model.Comment: All figures and tables, along with any supplementary material and
additional information, are available at
https://cern.ch/lhcbproject/Publications/p/LHCb-PAPER-2022-039.html (LHCb
public pages
- …
