206 research outputs found

    The Behavioral Foundations of Representative Bureaucracy

    Get PDF
    The article of record as published may be found at https://doi.org/10.1093/ppmgov/gvac013Representative bureaucracy is a values-based theory of bureaucratic decision making. Its key assumption is that a bureaucrat’s demography shapes her pre-organizational socialization, values, and ultimately her decisions, in a way that can advance the interests of a represented client or group (i.e., active representation). However, scholars have not critically examined the presumed links among these four factors. We review the literature and make an argument for representative bureaucracy scholars to incorporate a psychological perspective to better understand the behavioral mechanisms that influence active representation. We discuss the tripartite classification of the mind, dual-process theories of decision making, identity theory and the deservingness heuristic as theoretical perspectives scholars can use to investigate the behavioral foundations of representative bureaucracy

    Regularizing Towards Soft Equivariance Under Mixed Symmetries

    Full text link
    Datasets often have their intrinsic symmetries, and particular deep-learning models called equivariant or invariant models have been developed to exploit these symmetries. However, if some or all of these symmetries are only approximate, which frequently happens in practice, these models may be suboptimal due to the architectural restrictions imposed on them. We tackle this issue of approximate symmetries in a setup where symmetries are mixed, i.e., they are symmetries of not single but multiple different types and the degree of approximation varies across these types. Instead of proposing a new architectural restriction as in most of the previous approaches, we present a regularizer-based method for building a model for a dataset with mixed approximate symmetries. The key component of our method is what we call equivariance regularizer for a given type of symmetries, which measures how much a model is equivariant with respect to the symmetries of the type. Our method is trained with these regularizers, one per each symmetry type, and the strength of the regularizers is automatically tuned during training, leading to the discovery of the approximation levels of some candidate symmetry types without explicit supervision. Using synthetic function approximation and motion forecasting tasks, we demonstrate that our method achieves better accuracy than prior approaches while discovering the approximate symmetry levels correctly.Comment: Proceedings of the International Conference on Machine Learning (ICML), 202

    Differentiable Algorithm for Marginalising Changepoints

    Full text link
    We present an algorithm for marginalising changepoints in time-series models that assume a fixed number of unknown changepoints. Our algorithm is differentiable with respect to its inputs, which are the values of latent random variables other than changepoints. Also, it runs in time O(mn) where n is the number of time steps and m the number of changepoints, an improvement over a naive marginalisation method with O(n^m) time complexity. We derive the algorithm by identifying quantities related to this marginalisation problem, showing that these quantities satisfy recursive relationships, and transforming the relationships to an algorithm via dynamic programming. Since our algorithm is differentiable, it can be applied to convert a model non-differentiable due to changepoints to a differentiable one, so that the resulting models can be analysed using gradient-based inference or learning techniques. We empirically show the effectiveness of our algorithm in this application by tackling the posterior inference problem on synthetic and real-world data.Comment: To appear at AAAI 202

    Flow-Induced Voltage Generation Over Monolayer Graphene in the Presence of Herringbone Grooves

    Full text link
    While flow-induced voltage over a graphene layer has been reported, its origin remains unclear. In our previous study, we suggested different mechanisms for different experimental configurations: phonon dragging effect for the parallel alignment and an enhanced out-of-plane phonon mode for the perpendicular alignment (Appl. Phys. Lett. 102:063116, 2011). In order to further examine the origin of flow-induced voltage, we introduced a transverse flow component by integrating staggered herringbone grooves in the microchannel. We found that the flow-induced voltage decreased significantly in the presence of herringbone grooves in both parallel and perpendicular alignments. These results support our previous interpretation

    A Generalization of Hierarchical Exchangeability on Trees to Directed Acyclic Graphs

    Get PDF
    Motivated by the problem of designing inference-friendly Bayesian nonparametric models in probabilistic programming languages, we introduce a general class of partially exchangeable random arrays which generalizes the notion of hierarchical exchangeability introduced in Austin and Panchenko (2014). We say that our partially exchangeable arrays are DAG-exchangeable since their partially exchangeable structure is governed by a collection of Directed Acyclic Graphs. More specifically, such a random array is indexed by N∣V∣\mathbb{N}^{|V|} for some DAG G=(V,E)G=(V,E), and its exchangeability structure is governed by the edge set EE. We prove a representation theorem for such arrays which generalizes the Aldous-Hoover and Austin-Panchenko representation theorems.Comment: 35 pages, 10 figures. Accepted version before re-formattin

    On Correctness of Automatic Differentiation for Non-Differentiable Functions

    Get PDF
    Differentiation lies at the core of many machine-learning algorithms, and is well-supported by popular autodiff systems, such as TensorFlow and PyTorch. Originally, these systems have been developed to compute derivatives of differentiable functions, but in practice, they are commonly applied to functions with non-differentiabilities. For instance, neural networks using ReLU define non-differentiable functions in general, but the gradients of losses involving those functions are computed using autodiff systems in practice. This status quo raises a natural question: are autodiff systems correct in any formal sense when they are applied to such non-differentiable functions? In this paper, we provide a positive answer to this question. Using counterexamples, we first point out flaws in often-used informal arguments, such as: non-differentiabilities arising in deep learning do not cause any issues because they form a measure-zero set. We then investigate a class of functions, called PAP functions, that includes nearly all (possibly non-differentiable) functions in deep learning nowadays. For these PAP functions, we propose a new type of derivatives, called intensional derivatives, and prove that these derivatives always exist and coincide with standard derivatives for almost all inputs. We also show that these intensional derivatives are what most autodiff systems compute or try to compute essentially. In this way, we formally establish the correctness of autodiff systems applied to non-differentiable functions

    Deep neural networks with dependent weights: Gaussian Process mixture limit, heavy tails, sparsity and compressibility

    Full text link
    This article studies the infinite-width limit of deep feedforward neural networks whose weights are dependent, and modelled via a mixture of Gaussian distributions. Each hidden node of the network is assigned a nonnegative random variable that controls the variance of the outgoing weights of that node. We make minimal assumptions on these per-node random variables: they are iid and their sum, in each layer, converges to some finite random variable in the infinite-width limit. Under this model, we show that each layer of the infinite-width neural network can be characterised by two simple quantities: a non-negative scalar parameter and a L\'evy measure on the positive reals. If the scalar parameters are strictly positive and the L\'evy measures are trivial at all hidden layers, then one recovers the classical Gaussian process (GP) limit, obtained with iid Gaussian weights. More interestingly, if the L\'evy measure of at least one layer is non-trivial, we obtain a mixture of Gaussian processes (MoGP) in the large-width limit. The behaviour of the neural network in this regime is very different from the GP regime. One obtains correlated outputs, with non-Gaussian distributions, possibly with heavy tails. Additionally, we show that, in this regime, the weights are compressible, and some nodes have asymptotically non-negligible contributions, therefore representing important hidden features. Many sparsity-promoting neural network models can be recast as special cases of our approach, and we discuss their infinite-width limits; we also present an asymptotic analysis of the pruning error. We illustrate some of the benefits of the MoGP regime over the GP regime in terms of representation learning and compressibility on simulated, MNIST and Fashion MNIST datasets.Comment: 96 pages, 15 figures, 9 table
    • …
    corecore