14,247 research outputs found
A Cost-based Optimizer for Gradient Descent Optimization
As the use of machine learning (ML) permeates into diverse application
domains, there is an urgent need to support a declarative framework for ML.
Ideally, a user will specify an ML task in a high-level and easy-to-use
language and the framework will invoke the appropriate algorithms and system
configurations to execute it. An important observation towards designing such a
framework is that many ML tasks can be expressed as mathematical optimization
problems, which take a specific form. Furthermore, these optimization problems
can be efficiently solved using variations of the gradient descent (GD)
algorithm. Thus, to decouple a user specification of an ML task from its
execution, a key component is a GD optimizer. We propose a cost-based GD
optimizer that selects the best GD plan for a given ML task. To build our
optimizer, we introduce a set of abstract operators for expressing GD
algorithms and propose a novel approach to estimate the number of iterations a
GD algorithm requires to converge. Extensive experiments on real and synthetic
datasets show that our optimizer not only chooses the best GD plan but also
allows for optimizations that achieve orders of magnitude performance speed-up.Comment: Accepted at SIGMOD 201
MLCapsule: Guarded Offline Deployment of Machine Learning as a Service
With the widespread use of machine learning (ML) techniques, ML as a service
has become increasingly popular. In this setting, an ML model resides on a
server and users can query it with their data via an API. However, if the
user's input is sensitive, sending it to the server is undesirable and
sometimes even legally not possible. Equally, the service provider does not
want to share the model by sending it to the client for protecting its
intellectual property and pay-per-query business model.
In this paper, we propose MLCapsule, a guarded offline deployment of machine
learning as a service. MLCapsule executes the model locally on the user's side
and therefore the data never leaves the client. Meanwhile, MLCapsule offers the
service provider the same level of control and security of its model as the
commonly used server-side execution. In addition, MLCapsule is applicable to
offline applications that require local execution. Beyond protecting against
direct model access, we couple the secure offline deployment with defenses
against advanced attacks on machine learning models such as model stealing,
reverse engineering, and membership inference
PlinyCompute: A Platform for High-Performance, Distributed, Data-Intensive Tool Development
This paper describes PlinyCompute, a system for development of
high-performance, data-intensive, distributed computing tools and libraries. In
the large, PlinyCompute presents the programmer with a very high-level,
declarative interface, relying on automatic, relational-database style
optimization to figure out how to stage distributed computations. However, in
the small, PlinyCompute presents the capable systems programmer with a
persistent object data model and API (the "PC object model") and associated
memory management system that has been designed from the ground-up for high
performance, distributed, data-intensive computing. This contrasts with most
other Big Data systems, which are constructed on top of the Java Virtual
Machine (JVM), and hence must at least partially cede performance-critical
concerns such as memory management (including layout and de/allocation) and
virtual method/function dispatch to the JVM. This hybrid approach---declarative
in the large, trusting the programmer's ability to utilize PC object model
efficiently in the small---results in a system that is ideal for the
development of reusable, data-intensive tools and libraries. Through extensive
benchmarking, we show that implementing complex objects manipulation and
non-trivial, library-style computations on top of PlinyCompute can result in a
speedup of 2x to more than 50x or more compared to equivalent implementations
on Spark.Comment: 48 pages, including references and Appendi
- …