333 research outputs found
Interpreting Neural Networks through the Polytope Lens
Mechanistic interpretability aims to explain what a neural network has
learned at a nuts-and-bolts level. What are the fundamental primitives of
neural network representations? Previous mechanistic descriptions have used
individual neurons or their linear combinations to understand the
representations a network has learned. But there are clues that neurons and
their linear combinations are not the correct fundamental units of description:
directions cannot describe how neural networks use nonlinearities to structure
their representations. Moreover, many instances of individual neurons and their
combinations are polysemantic (i.e. they have multiple unrelated meanings).
Polysemanticity makes interpreting the network in terms of neurons or
directions challenging since we can no longer assign a specific feature to a
neural unit. In order to find a basic unit of description that does not suffer
from these problems, we zoom in beyond just directions to study the way that
piecewise linear activation functions (such as ReLU) partition the activation
space into numerous discrete polytopes. We call this perspective the polytope
lens. The polytope lens makes concrete predictions about the behavior of neural
networks, which we evaluate through experiments on both convolutional image
classifiers and language models. Specifically, we show that polytopes can be
used to identify monosemantic regions of activation space (while directions are
not in general monosemantic) and that the density of polytope boundaries
reflect semantic boundaries. We also outline a vision for what mechanistic
interpretability might look like through the polytope lens.Comment: 22/11/22 initial uploa
Recommended from our members
A Family of Latent Variable Convex Relaxations for IBM Model 2
Introduced in 1993, the IBM translation models were the first generation Statistical Machine Translation systems. For the IBM Models, only IBM Model 1 is a convex optimization problem, meaning that we can initialize all its probabilistic parameters to uniform values and subsequently converge to a good solution via Expectation Maximization (EM). In this thesis we discuss a mechanism to generate an infinite supply of nontrivial convex relaxations for IBM Model 2 and detail an Exponentiated Subgradient algorithm to solve them. We also detail some interesting relaxations that admit and easy EM algorithm that does not require the tuning of a learning rate. Based on the geometric mean of two variables, this last set of convex models can be seamlessly integrated into the open-source GIZA++ word-alignment library. Finally, we also show other applications of the method, including a more powerful strictly convex IBM Model 1, and a convex HMM surrogate that improves on the performance of the previous convex IBM Model 2 variants
Optimization for Image Segmentation
Image segmentation, i.e., assigning each pixel a discrete label, is an essential task in computer vision with lots of applications. Major techniques for segmentation include for example Markov Random Field (MRF), Kernel Clustering (KC), and nowadays popular Convolutional Neural Networks (CNN). In this work, we focus on optimization for image segmentation. Techniques like MRF, KC, and CNN optimize MRF energies, KC criteria, or CNN losses respectively, and their corresponding optimization is very different. We are interested in the synergy and the complementary benefits of MRF, KC, and CNN for interactive segmentation and semantic segmentation. Our first contribution is pseudo-bound optimization for binary MRF energies that are high-order or non-submodular. Secondly, we propose Kernel Cut, a novel formulation for segmentation, which combines MRF regularization with Kernel Clustering. We show why to combine KC with MRF and how to optimize the joint objective. In the third part, we discuss how deep CNN segmentation can benefit from non-deep (i.e., shallow) methods like MRF and KC. In particular, we propose regularized losses for weakly-supervised CNN segmentation, in which we can integrate MRF energy or KC criteria as part of the losses. Minimization of regularized losses is a principled approach to semi-supervised learning, in general. Our regularized loss method is very simple and allows different kinds of regularization losses for CNN segmentation. We also study the optimization of regularized losses beyond gradient descent. Our regularized losses approach achieves state-of-the-art accuracy in semantic segmentation with near full supervision quality
Proceedings of the 18th Irish Conference on Artificial Intelligence and Cognitive Science
These proceedings contain the papers that were accepted for publication at AICS-2007, the 18th Annual Conference on Artificial Intelligence and Cognitive Science, which was held in the Technological University Dublin; Dublin, Ireland; on the 29th to the 31st August 2007. AICS is the annual conference of the Artificial Intelligence Association of Ireland (AIAI)
- …