55 research outputs found
Robust Model Compression Using Deep Hypotheses
Machine Learning models should ideally be compact and robust. Compactness
provides efficiency and comprehensibility whereas robustness provides
resilience. Both topics have been studied in recent years but in isolation.
Here we present a robust model compression scheme which is independent of model
types: it can compress ensembles, neural networks and other types of models
into diverse types of small models. The main building block is the notion of
depth derived from robust statistics. Originally, depth was introduced as a
measure of the centrality of a point in a sample such that the median is the
deepest point. This concept was extended to classification functions which
makes it possible to define the depth of a hypothesis and the median
hypothesis. Algorithms have been suggested to approximate the median but they
have been limited to binary classification. In this study, we present a new
algorithm, the Multiclass Empirical Median Optimization (MEMO) algorithm that
finds a deep hypothesis in multi-class tasks, and prove its correctness. This
leads to our Compact Robust Estimated Median Belief Optimization (CREMBO)
algorithm for robust model compression. We demonstrate the success of this
algorithm empirically by compressing neural networks and random forests into
small decision trees, which are interpretable models, and show that they are
more accurate and robust than other comparable methods. In addition, our
empirical study shows that our method outperforms Knowledge Distillation on DNN
to DNN compression
Graph Trees with Attention
When dealing with tabular data, models based on regression and decision trees
are a popular choice due to the high accuracy they provide on such tasks and
their ease of application as compared to other model classes. Yet, when it
comes to graph-structure data, current tree learning algorithms do not provide
tools to manage the structure of the data other than relying on feature
engineering. In this work we address the above gap, and introduce Graph Trees
with Attention (GTA), a new family of tree-based learning algorithms that are
designed to operate on graphs. GTA leverages both the graph structure and the
features at the vertices and employs an attention mechanism that allows
decisions to concentrate on sub-structures of the graph. We analyze GTA models
and show that they are strictly more expressive than plain decision trees. We
also demonstrate the benefits of GTA empirically on multiple graph and node
prediction benchmarks. In these experiments, GTA always outperformed other
tree-based models and often outperformed other types of graph-learning
algorithms such as Graph Neural Networks (GNNs) and Graph Kernels. Finally, we
also provide an explainability mechanism for GTA, and demonstrate it can
provide intuitive explanations
Large Scale and Streaming Time Series Segmentation and Piece-Wise Approximation Extended Version
Abstract Segmenting a time series or approximating it with piecewise linear function is often needed when handling data in the time domain to detect outliers, clean data, detect events and more. The data varies from ECG signals, traffic monitors to stock prices and sensor networks. Modern data-sets of this type are large and in many cases are infinite in the sense that the data is a stream rather than a finite sample. Therefore, in order to segment it, an algorithm has to scale gracefully with the size of the data. Dynamic Programming (DP) can find the optimal segmentation, however, the DP approach has a complexity of O T 2 thus cannot handle datasets with millions of elements, nor can it handle streaming data. Therefore, various heuristics are used in practice to handle the data. This study shows that if the approximation measure has an inverse triangle inequality property (ITIP), the solution of the dynamic program can be computed in linear time and streaming data can be handled too. The ITIP is shown to hold in many cases of interest. The speedup due to the new algorithms is evaluated on a variety of data-sets to be in the range of 8 β 8200x over the DP solution without sacrificing accuracy. Confidence intervals for segmentations are derived as well
Graph Neural Networks Use Graphs When They Shouldn't
Predictions over graphs play a crucial role in various domains, including
social networks, molecular biology, medicine, and more. Graph Neural Networks
(GNNs) have emerged as the dominant approach for learning on graph data.
Instances of graph labeling problems consist of the graph-structure (i.e., the
adjacency matrix), along with node-specific feature vectors. In some cases,
this graph-structure is non-informative for the predictive task. For instance,
molecular properties such as molar mass depend solely on the constituent atoms
(node features), and not on the molecular structure. While GNNs have the
ability to ignore the graph-structure in such cases, it is not clear that they
will. In this work, we show that GNNs actually tend to overfit the
graph-structure in the sense that they use it even when a better solution can
be obtained by ignoring it. We examine this phenomenon with respect to
different graph distributions and find that regular graphs are more robust to
this overfitting. We then provide a theoretical explanation for this
phenomenon, via analyzing the implicit bias of gradient-descent-based learning
of GNNs in this setting. Finally, based on our empirical and theoretical
findings, we propose a graph-editing method to mitigate the tendency of GNNs to
overfit graph-structures that should be ignored. We show that this method
indeed improves the accuracy of GNNs across multiple benchmarks
- β¦