206 research outputs found
The Behavioral Foundations of Representative Bureaucracy
The article of record as published may be found at https://doi.org/10.1093/ppmgov/gvac013Representative bureaucracy is a values-based theory of bureaucratic decision making. Its key assumption is that a bureaucrat’s demography shapes her pre-organizational socialization, values, and ultimately her decisions, in a way that can advance the interests of a represented client or group (i.e., active representation). However, scholars have not critically examined the presumed links among these four factors. We review the literature and make an argument for representative bureaucracy scholars to incorporate a psychological perspective to better understand the behavioral mechanisms that influence active representation. We discuss the tripartite classification of the mind, dual-process theories of decision making, identity theory and the deservingness heuristic as theoretical perspectives scholars can use to investigate the behavioral foundations of representative bureaucracy
Regularizing Towards Soft Equivariance Under Mixed Symmetries
Datasets often have their intrinsic symmetries, and particular deep-learning
models called equivariant or invariant models have been developed to exploit
these symmetries. However, if some or all of these symmetries are only
approximate, which frequently happens in practice, these models may be
suboptimal due to the architectural restrictions imposed on them. We tackle
this issue of approximate symmetries in a setup where symmetries are mixed,
i.e., they are symmetries of not single but multiple different types and the
degree of approximation varies across these types. Instead of proposing a new
architectural restriction as in most of the previous approaches, we present a
regularizer-based method for building a model for a dataset with mixed
approximate symmetries. The key component of our method is what we call
equivariance regularizer for a given type of symmetries, which measures how
much a model is equivariant with respect to the symmetries of the type. Our
method is trained with these regularizers, one per each symmetry type, and the
strength of the regularizers is automatically tuned during training, leading to
the discovery of the approximation levels of some candidate symmetry types
without explicit supervision. Using synthetic function approximation and motion
forecasting tasks, we demonstrate that our method achieves better accuracy than
prior approaches while discovering the approximate symmetry levels correctly.Comment: Proceedings of the International Conference on Machine Learning
(ICML), 202
Differentiable Algorithm for Marginalising Changepoints
We present an algorithm for marginalising changepoints in time-series models
that assume a fixed number of unknown changepoints. Our algorithm is
differentiable with respect to its inputs, which are the values of latent
random variables other than changepoints. Also, it runs in time O(mn) where n
is the number of time steps and m the number of changepoints, an improvement
over a naive marginalisation method with O(n^m) time complexity. We derive the
algorithm by identifying quantities related to this marginalisation problem,
showing that these quantities satisfy recursive relationships, and transforming
the relationships to an algorithm via dynamic programming. Since our algorithm
is differentiable, it can be applied to convert a model non-differentiable due
to changepoints to a differentiable one, so that the resulting models can be
analysed using gradient-based inference or learning techniques. We empirically
show the effectiveness of our algorithm in this application by tackling the
posterior inference problem on synthetic and real-world data.Comment: To appear at AAAI 202
Flow-Induced Voltage Generation Over Monolayer Graphene in the Presence of Herringbone Grooves
While flow-induced voltage over a graphene layer has been reported, its origin remains unclear. In our previous study, we suggested different mechanisms for different experimental configurations: phonon dragging effect for the parallel alignment and an enhanced out-of-plane phonon mode for the perpendicular alignment (Appl. Phys. Lett. 102:063116, 2011). In order to further examine the origin of flow-induced voltage, we introduced a transverse flow component by integrating staggered herringbone grooves in the microchannel. We found that the flow-induced voltage decreased significantly in the presence of herringbone grooves in both parallel and perpendicular alignments. These results support our previous interpretation
A Generalization of Hierarchical Exchangeability on Trees to Directed Acyclic Graphs
Motivated by the problem of designing inference-friendly Bayesian
nonparametric models in probabilistic programming languages, we introduce a
general class of partially exchangeable random arrays which generalizes the
notion of hierarchical exchangeability introduced in Austin and Panchenko
(2014). We say that our partially exchangeable arrays are DAG-exchangeable
since their partially exchangeable structure is governed by a collection of
Directed Acyclic Graphs. More specifically, such a random array is indexed by
for some DAG , and its exchangeability structure is
governed by the edge set . We prove a representation theorem for such arrays
which generalizes the Aldous-Hoover and Austin-Panchenko representation
theorems.Comment: 35 pages, 10 figures. Accepted version before re-formattin
On Correctness of Automatic Differentiation for Non-Differentiable Functions
Differentiation lies at the core of many machine-learning algorithms, and is
well-supported by popular autodiff systems, such as TensorFlow and PyTorch.
Originally, these systems have been developed to compute derivatives of
differentiable functions, but in practice, they are commonly applied to
functions with non-differentiabilities. For instance, neural networks using
ReLU define non-differentiable functions in general, but the gradients of
losses involving those functions are computed using autodiff systems in
practice. This status quo raises a natural question: are autodiff systems
correct in any formal sense when they are applied to such non-differentiable
functions? In this paper, we provide a positive answer to this question. Using
counterexamples, we first point out flaws in often-used informal arguments,
such as: non-differentiabilities arising in deep learning do not cause any
issues because they form a measure-zero set. We then investigate a class of
functions, called PAP functions, that includes nearly all (possibly
non-differentiable) functions in deep learning nowadays. For these PAP
functions, we propose a new type of derivatives, called intensional
derivatives, and prove that these derivatives always exist and coincide with
standard derivatives for almost all inputs. We also show that these intensional
derivatives are what most autodiff systems compute or try to compute
essentially. In this way, we formally establish the correctness of autodiff
systems applied to non-differentiable functions
Deep neural networks with dependent weights: Gaussian Process mixture limit, heavy tails, sparsity and compressibility
This article studies the infinite-width limit of deep feedforward neural
networks whose weights are dependent, and modelled via a mixture of Gaussian
distributions. Each hidden node of the network is assigned a nonnegative random
variable that controls the variance of the outgoing weights of that node. We
make minimal assumptions on these per-node random variables: they are iid and
their sum, in each layer, converges to some finite random variable in the
infinite-width limit. Under this model, we show that each layer of the
infinite-width neural network can be characterised by two simple quantities: a
non-negative scalar parameter and a L\'evy measure on the positive reals. If
the scalar parameters are strictly positive and the L\'evy measures are trivial
at all hidden layers, then one recovers the classical Gaussian process (GP)
limit, obtained with iid Gaussian weights. More interestingly, if the L\'evy
measure of at least one layer is non-trivial, we obtain a mixture of Gaussian
processes (MoGP) in the large-width limit. The behaviour of the neural network
in this regime is very different from the GP regime. One obtains correlated
outputs, with non-Gaussian distributions, possibly with heavy tails.
Additionally, we show that, in this regime, the weights are compressible, and
some nodes have asymptotically non-negligible contributions, therefore
representing important hidden features. Many sparsity-promoting neural network
models can be recast as special cases of our approach, and we discuss their
infinite-width limits; we also present an asymptotic analysis of the pruning
error. We illustrate some of the benefits of the MoGP regime over the GP regime
in terms of representation learning and compressibility on simulated, MNIST and
Fashion MNIST datasets.Comment: 96 pages, 15 figures, 9 table
- …