72 research outputs found
The Information Bottleneck EM Algorithm
Learning with hidden variables is a central challenge in probabilistic
graphical models that has important implications for many real-life problems.
The classical approach is using the Expectation Maximization (EM) algorithm.
This algorithm, however, can get trapped in local maxima. In this paper we
explore a new approach that is based on the Information Bottleneck principle.
In this approach, we view the learning problem as a tradeoff between two
information theoretic objectives. The first is to make the hidden variables
uninformative about the identity of specific instances. The second is to make
the hidden variables informative about the observed attributes. By exploring
different tradeoffs between these two objectives, we can gradually converge on
a high-scoring solution. As we show, the resulting, Information Bottleneck
Expectation Maximization (IB-EM) algorithm, manages to find solutions that are
superior to standard EM methods.Comment: Appears in Proceedings of the Nineteenth Conference on Uncertainty in
Artificial Intelligence (UAI2003
Learning the Dimensionality of Hidden Variables
A serious problem in learning probabilistic models is the presence of hidden
variables. These variables are not observed, yet interact with several of the
observed variables. Detecting hidden variables poses two problems: determining
the relations to other variables in the model and determining the number of
states of the hidden variable. In this paper, we address the latter problem in
the context of Bayesian networks. We describe an approach that utilizes a
score-based agglomerative state-clustering. As we show, this approach allows us
to efficiently evaluate models with a range of cardinalities for the hidden
variable. We show how to extend this procedure to deal with multiple
interacting hidden variables. We demonstrate the effectiveness of this approach
by evaluating it on synthetic and real-life data. We show that our approach
learns models with hidden variables that generalize better and have better
structure than previous approaches.Comment: Appears in Proceedings of the Seventeenth Conference on Uncertainty
in Artificial Intelligence (UAI2001
"Ideal Parent" Structure Learning for Continuous Variable Networks
In recent years, there is a growing interest in learning Bayesian networks
with continuous variables. Learning the structure of such networks is a
computationally expensive procedure, which limits most applications to
parameter learning. This problem is even more acute when learning networks with
hidden variables. We present a general method for significantly speeding the
structure search algorithm for continuous variable networks with common
parametric distributions. Importantly, our method facilitates the addition of
new hidden variables into the network structure efficiently. We demonstrate the
method on several data sets, both for learning structure on fully observable
data, and for introducing new hidden variables during structure search.Comment: Appears in Proceedings of the Twentieth Conference on Uncertainty in
Artificial Intelligence (UAI2004
Convex Point Estimation using Undirected Bayesian Transfer Hierarchies
When related learning tasks are naturally arranged in a hierarchy, an
appealing approach for coping with scarcity of instances is that of transfer
learning using a hierarchical Bayes framework. As fully Bayesian computations
can be difficult and computationally demanding, it is often desirable to use
posterior point estimates that facilitate (relatively) efficient prediction.
However, the hierarchical Bayes framework does not always lend itself naturally
to this maximum aposteriori goal. In this work we propose an undirected
reformulation of hierarchical Bayes that relies on priors in the form of
similarity measures. We introduce the notion of "degree of transfer" weights on
components of these similarity measures, and show how they can be automatically
learned within a joint probabilistic framework. Importantly, our reformulation
results in a convex objective for many learning problems, thus facilitating
optimal posterior point estimation using standard optimization techniques. In
addition, we no longer require proper priors, allowing for flexible and
straightforward specification of joint distributions over transfer hierarchies.
We show that our framework is effective for learning models that are part of
transfer hierarchies for two real-life tasks: object shape modeling using
Gaussian density estimation and document classification.Comment: Appears in Proceedings of the Twenty-Fourth Conference on Uncertainty
in Artificial Intelligence (UAI2008
Learning Rules-First Classifiers
Complex classifiers may exhibit "embarassing" failures in cases where humans
can easily provide a justified classification. Avoiding such failures is
obviously of key importance. In this work, we focus on one such setting, where
a label is perfectly predictable if the input contains certain features, or
rules, and otherwise it is predictable by a linear classifier. We define a
hypothesis class that captures this notion and determine its sample complexity.
We also give evidence that efficient algorithms cannot achieve this sample
complexity. We then derive a simple and efficient algorithm and show that its
sample complexity is close to optimal, among efficient algorithms. Experiments
on synthetic and sentiment analysis data demonstrate the efficacy of the
method, both in terms of accuracy and interpretability
Convex Nonparanormal Regression
Quantifying uncertainty in predictions or, more generally, estimating the
posterior conditional distribution, is a core challenge in machine learning and
statistics. We introduce Convex Nonparanormal Regression (CNR), a conditional
nonparanormal approach for coping with this task. CNR involves a convex
optimization of a posterior defined via a rich dictionary of pre-defined non
linear transformations on Gaussians. It can fit an arbitrary conditional
distribution, including multimodal and non-symmetric posteriors. For the
special but powerful case of a piecewise linear dictionary, we provide a closed
form of the posterior mean which can be used for point-wise predictions.
Finally, we demonstrate the advantages of CNR over classical competitors using
synthetic and real world data
MadNet: Using a MAD Optimization for Defending Against Adversarial Attacks
This paper is concerned with the defense of deep models against adversarial
attacks. Inspired by the certificate defense approach, we propose a maximal
adversarial distortion (MAD) optimization method for robustifying deep
networks. MAD captures the idea of increasing separability of class clusters in
the embedding space while decreasing the network sensitivity to small
distortions. Given a deep neural network (DNN) for a classification problem, an
application of MAD optimization results in MadNet, a version of the original
network, now equipped with an adversarial defense mechanism. MAD optimization
is intuitive, effective and scalable, and the resulting MadNet can improve the
original accuracy. We present an extensive empirical study demonstrating that
MadNet improves adversarial robustness performance compared to state-of-the-art
methods
Learning Max-Margin Tree Predictors
Structured prediction is a powerful framework for coping with joint
prediction of interacting outputs. A central difficulty in using this framework
is that often the correct label dependence structure is unknown. At the same
time, we would like to avoid an overly complex structure that will lead to
intractable prediction. In this work we address the challenge of learning tree
structured predictive models that achieve high accuracy while at the same time
facilitate efficient (linear time) inference. We start by proving that this
task is in general NP-hard, and then suggest an approximate alternative.
Briefly, our CRANK approach relies on a novel Circuit-RANK regularizer that
penalizes non-tree structures and that can be optimized using a CCCP procedure.
We demonstrate the effectiveness of our approach on several domains and show
that, despite the relative simplicity of the structure, prediction accuracy is
competitive with a fully connected model that is computationally costly at
prediction time.Comment: Appears in Proceedings of the Twenty-Ninth Conference on Uncertainty
in Artificial Intelligence (UAI2013
DNF-Net: A Neural Architecture for Tabular Data
A challenging open question in deep learning is how to handle tabular data.
Unlike domains such as image and natural language processing, where deep
architectures prevail, there is still no widely accepted neural architecture
that dominates tabular data. As a step toward bridging this gap, we present
DNF-Net a novel generic architecture whose inductive bias elicits models whose
structure corresponds to logical Boolean formulas in disjunctive normal form
(DNF) over affine soft-threshold decision terms. In addition, DNF-Net promotes
localized decisions that are taken over small subsets of the features. We
present an extensive empirical study showing that DNF-Nets significantly and
consistently outperform FCNs over tabular data. With relatively few
hyperparameters, DNF-Nets open the door to practical end-to-end handling of
tabular data using neural networks. We present ablation studies, which justify
the design choices of DNF-Net including the three inductive bias elements,
namely, Boolean formulation, locality, and feature selection
Towards Global Remote Discharge Estimation: Using the Few to Estimate The Many
Learning hydrologic models for accurate riverine flood prediction at scale is
a challenge of great importance. One of the key difficulties is the need to
rely on in-situ river discharge measurements, which can be quite scarce and
unreliable, particularly in regions where floods cause the most damage every
year. Accordingly, in this work we tackle the problem of river discharge
estimation at different river locations. A core characteristic of the data at
hand (e.g. satellite measurements) is that we have few measurements for many
locations, all sharing the same physics that underlie the water discharge. We
capture this scenario in a simple but powerful common mechanism regression
(CMR) model with a local component as well as a shared one which captures the
global discharge mechanism. The resulting learning objective is non-convex, but
we show that we can find its global optimum by leveraging the power of joining
local measurements across sites. In particular, using a spectral initialization
with provable near-optimal accuracy, we can find the optimum using standard
descent methods. We demonstrate the efficacy of our approach for the problem of
discharge estimation using simulations.Comment: The 4-page paper sent to NeurIPS 2018 AI for social good worksho
- …