183 research outputs found
Robust Model Compression Using Deep Hypotheses
Machine Learning models should ideally be compact and robust. Compactness
provides efficiency and comprehensibility whereas robustness provides
resilience. Both topics have been studied in recent years but in isolation.
Here we present a robust model compression scheme which is independent of model
types: it can compress ensembles, neural networks and other types of models
into diverse types of small models. The main building block is the notion of
depth derived from robust statistics. Originally, depth was introduced as a
measure of the centrality of a point in a sample such that the median is the
deepest point. This concept was extended to classification functions which
makes it possible to define the depth of a hypothesis and the median
hypothesis. Algorithms have been suggested to approximate the median but they
have been limited to binary classification. In this study, we present a new
algorithm, the Multiclass Empirical Median Optimization (MEMO) algorithm that
finds a deep hypothesis in multi-class tasks, and prove its correctness. This
leads to our Compact Robust Estimated Median Belief Optimization (CREMBO)
algorithm for robust model compression. We demonstrate the success of this
algorithm empirically by compressing neural networks and random forests into
small decision trees, which are interpretable models, and show that they are
more accurate and robust than other comparable methods. In addition, our
empirical study shows that our method outperforms Knowledge Distillation on DNN
to DNN compression
Graph Trees with Attention
When dealing with tabular data, models based on regression and decision trees
are a popular choice due to the high accuracy they provide on such tasks and
their ease of application as compared to other model classes. Yet, when it
comes to graph-structure data, current tree learning algorithms do not provide
tools to manage the structure of the data other than relying on feature
engineering. In this work we address the above gap, and introduce Graph Trees
with Attention (GTA), a new family of tree-based learning algorithms that are
designed to operate on graphs. GTA leverages both the graph structure and the
features at the vertices and employs an attention mechanism that allows
decisions to concentrate on sub-structures of the graph. We analyze GTA models
and show that they are strictly more expressive than plain decision trees. We
also demonstrate the benefits of GTA empirically on multiple graph and node
prediction benchmarks. In these experiments, GTA always outperformed other
tree-based models and often outperformed other types of graph-learning
algorithms such as Graph Neural Networks (GNNs) and Graph Kernels. Finally, we
also provide an explainability mechanism for GTA, and demonstrate it can
provide intuitive explanations
On Lightweight Privacy-Preserving Collaborative Learning for IoT Objects
The Internet of Things (IoT) will be a main data generation infrastructure
for achieving better system intelligence. This paper considers the design and
implementation of a practical privacy-preserving collaborative learning scheme,
in which a curious learning coordinator trains a better machine learning model
based on the data samples contributed by a number of IoT objects, while the
confidentiality of the raw forms of the training data is protected against the
coordinator. Existing distributed machine learning and data encryption
approaches incur significant computation and communication overhead, rendering
them ill-suited for resource-constrained IoT objects. We study an approach that
applies independent Gaussian random projection at each IoT object to obfuscate
data and trains a deep neural network at the coordinator based on the projected
data from the IoT objects. This approach introduces light computation overhead
to the IoT objects and moves most workload to the coordinator that can have
sufficient computing resources. Although the independent projections performed
by the IoT objects address the potential collusion between the curious
coordinator and some compromised IoT objects, they significantly increase the
complexity of the projected data. In this paper, we leverage the superior
learning capability of deep learning in capturing sophisticated patterns to
maintain good learning performance. Extensive comparative evaluation shows that
this approach outperforms other lightweight approaches that apply additive
noisification for differential privacy and/or support vector machines for
learning in the applications with light data pattern complexities.Comment: 12 pages,IOTDI 201
Large Scale and Streaming Time Series Segmentation and Piece-Wise Approximation Extended Version
Abstract Segmenting a time series or approximating it with piecewise linear function is often needed when handling data in the time domain to detect outliers, clean data, detect events and more. The data varies from ECG signals, traffic monitors to stock prices and sensor networks. Modern data-sets of this type are large and in many cases are infinite in the sense that the data is a stream rather than a finite sample. Therefore, in order to segment it, an algorithm has to scale gracefully with the size of the data. Dynamic Programming (DP) can find the optimal segmentation, however, the DP approach has a complexity of O T 2 thus cannot handle datasets with millions of elements, nor can it handle streaming data. Therefore, various heuristics are used in practice to handle the data. This study shows that if the approximation measure has an inverse triangle inequality property (ITIP), the solution of the dynamic program can be computed in linear time and streaming data can be handled too. The ITIP is shown to hold in many cases of interest. The speedup due to the new algorithms is evaluated on a variety of data-sets to be in the range of 8 − 8200x over the DP solution without sacrificing accuracy. Confidence intervals for segmentations are derived as well
- …