4,474 research outputs found
Smooth Monotonic Networks
Monotonicity constraints are powerful regularizers in statistical modelling.
They can support fairness in computer supported decision making and increase
plausibility in data-driven scientific models. The seminal min-max (MM) neural
network architecture ensures monotonicity, but often gets stuck in undesired
local optima during training because of vanishing gradients. We propose a
simple modification of the MM network using strictly-increasing smooth
non-linearities that alleviates this problem. The resulting smooth min-max
(SMM) network module inherits the asymptotic approximation properties from the
MM architecture. It can be used within larger deep learning systems trained
end-to-end. The SMM module is considerably simpler and less computationally
demanding than state-of-the-art neural networks for monotonic modelling. Still,
in our experiments, it compared favorably to alternative neural and non-neural
approaches in terms of generalization performance
Certified Monotonic Neural Networks
Learning monotonic models with respect to a subset of the inputs is a
desirable feature to effectively address the fairness, interpretability, and
generalization issues in practice. Existing methods for learning monotonic
neural networks either require specifically designed model structures to ensure
monotonicity, which can be too restrictive/complicated, or enforce monotonicity
by adjusting the learning process, which cannot provably guarantee the learned
model is monotonic on selected features. In this work, we propose to certify
the monotonicity of the general piece-wise linear neural networks by solving a
mixed integer linear programming problem.This provides a new general approach
for learning monotonic neural networks with arbitrary model structures. Our
method allows us to train neural networks with heuristic monotonicity
regularizations, and we can gradually increase the regularization magnitude
until the learned network is certified monotonic. Compared to prior works, our
approach does not require human-designed constraints on the weight space and
also yields more accurate approximation. Empirical studies on various datasets
demonstrate the efficiency of our approach over the state-of-the-art methods,
such as Deep Lattice Networks
Counterexample-Guided Learning of Monotonic Neural Networks
The widespread adoption of deep learning is often attributed to its automatic
feature construction with minimal inductive bias. However, in many real-world
tasks, the learned function is intended to satisfy domain-specific constraints.
We focus on monotonicity constraints, which are common and require that the
function's output increases with increasing values of specific input features.
We develop a counterexample-guided technique to provably enforce monotonicity
constraints at prediction time. Additionally, we propose a technique to use
monotonicity as an inductive bias for deep learning. It works by iteratively
incorporating monotonicity counterexamples in the learning process. Contrary to
prior work in monotonic learning, we target general ReLU neural networks and do
not further restrict the hypothesis space. We have implemented these techniques
in a tool called COMET. Experiments on real-world datasets demonstrate that our
approach achieves state-of-the-art results compared to existing monotonic
learners, and can improve the model quality compared to those that were trained
without taking monotonicity constraints into account
Constrained Monotonic Neural Networks
Wider adoption of neural networks in many critical domains such as finance
and healthcare is being hindered by the need to explain their predictions and
to impose additional constraints on them. Monotonicity constraint is one of the
most requested properties in real-world scenarios and is the focus of this
paper. One of the oldest ways to construct a monotonic fully connected neural
network is to constrain signs on its weights. Unfortunately, this construction
does not work with popular non-saturated activation functions as it can only
approximate convex functions. We show this shortcoming can be fixed by
constructing two additional activation functions from a typical unsaturated
monotonic activation function and employing each of them on the part of
neurons. Our experiments show this approach of building monotonic neural
networks has better accuracy when compared to other state-of-the-art methods,
while being the simplest one in the sense of having the least number of
parameters, and not requiring any modifications to the learning procedure or
post-learning steps. Finally, we prove it can approximate any continuous
monotone function on a compact subset of
Fundamentals of Recurrent Neural Network (RNN) and Long Short-Term Memory (LSTM) Network
Because of their effectiveness in broad practical applications, LSTM networks
have received a wealth of coverage in scientific journals, technical blogs, and
implementation guides. However, in most articles, the inference formulas for
the LSTM network and its parent, RNN, are stated axiomatically, while the
training formulas are omitted altogether. In addition, the technique of
"unrolling" an RNN is routinely presented without justification throughout the
literature. The goal of this paper is to explain the essential RNN and LSTM
fundamentals in a single document. Drawing from concepts in signal processing,
we formally derive the canonical RNN formulation from differential equations.
We then propose and prove a precise statement, which yields the RNN unrolling
technique. We also review the difficulties with training the standard RNN and
address them by transforming the RNN into the "Vanilla LSTM" network through a
series of logical arguments. We provide all equations pertaining to the LSTM
system together with detailed descriptions of its constituent entities. Albeit
unconventional, our choice of notation and the method for presenting the LSTM
system emphasizes ease of understanding. As part of the analysis, we identify
new opportunities to enrich the LSTM system and incorporate these extensions
into the Vanilla LSTM network, producing the most general LSTM variant to date.
The target reader has already been exposed to RNNs and LSTM networks through
numerous available resources and is open to an alternative pedagogical
approach. A Machine Learning practitioner seeking guidance for implementing our
new augmented LSTM model in software for experimentation and research will find
the insights and derivations in this tutorial valuable as well.Comment: 43 pages, 10 figures, 78 reference
- …