28 research outputs found
CountSketches, Feature Hashing and the Median of Three
In this paper, we revisit the classic CountSketch method, which is a sparse,
random projection that transforms a (high-dimensional) Euclidean vector to
a vector of dimension , where are integer parameters. It
is known that even for , a CountSketch allows estimating coordinates of
with variance bounded by . For , the estimator takes
the median of independent estimates, and the probability that the
estimate is off by more than is exponentially small in
. This suggests choosing to be logarithmic in a desired inverse failure
probability. However, implementations of CountSketch often use a small,
constant . Previous work only predicts a constant factor improvement in this
setting.
Our main contribution is a new analysis of Count-Sketch, showing an
improvement in variance to when .
That is, the variance decreases proportionally to , asymptotically for
large enough . We also study the variance in the setting where an inner
product is to be estimated from two CountSketches. This finding suggests that
the Feature Hashing method, which is essentially identical to CountSketch but
does not make use of the median estimator, can be made more reliable at a small
cost in settings where using a median estimator is possible.
We confirm our theoretical findings in experiments and thereby help justify
why a small constant number of estimates often suffice in practice. Our
improved variance bounds are based on new general theorems about the variance
and higher moments of the median of i.i.d. random variables that may be of
independent interest
Local Learning Strategies for Data Management Components
In a world with an ever-increasing amount of data processed, providing tools for highquality and fast data processing is imperative. Database Management Systems (DBMSs) are complex adaptive systems supplying reliable and fast data analysis and storage capabilities. To boost the usability of DBMSs even further, a core research area of databases is performance optimization, especially for query processing.
With the successful application of Artificial Intelligence (AI) and Machine Learning (ML) in other research areas, the question arises in the database community if ML can also be beneficial for better data processing in DBMSs. This question has spawned various works successfully replacing DBMS components with ML models.
However, these global models have four common drawbacks due to their large, complex, and inflexible one-size-fits-all structures. These drawbacks are the high complexity of model architectures, the lower prediction quality, the slow training, and the slow forward passes. All these drawbacks stem from the core expectation to solve a certain problem with one large model at once. The full potential of ML models as DBMS components cannot be reached with a global model because the model’s complexity is outmatched by the problem’s complexity.
Therefore, we present a novel general strategy for using ML models to solve data management problems and to replace DBMS components. The novel strategy is based on four advantages derived from the four disadvantages of global learning strategies. In essence, our local learning strategy utilizes divide-and-conquer to place less complex but more expressive models specializing in sub-problems of a data management problem. It splits the problem space into less complex parts that can be solved with lightweight models. This circumvents the one-size-fits-all characteristics and drawbacks of global models. We will show that this approach and the lesser complexity of the specialized local models lead to better problem-solving qualities and DBMS performance.
The local learning strategy is applied and evaluated in three crucial use cases to replace DBMS components with ML models. These are cardinality estimation, query optimizer hinting, and integer algorithm selection. In all three applications, the benefits of the local learning strategy are demonstrated and compared to related work. We also generalize the strategy’s usability for a broader application and formulate best practices with instructions for others
Sparsity in deep learning: Pruning and growth for efficient inference and training in neural networks
The growing energy and performance costs of deep learning have driven the community to reduce the size of neural networks by selectively pruning components. Similarly to their biological counterparts, sparse networks generalize just as well, sometimes even better than, the original dense networks. Sparsity promises to reduce the memory footprint of regular networks to fit mobile devices, as well as shorten training time for ever growing networks. In this paper, we survey prior work on sparsity in deep learning and provide an extensive tutorial of sparsification for both inference and training. We describe approaches to remove and add elements of neural networks, different training strategies to achieve model sparsity, and mechanisms to exploit sparsity in practice. Our work distills ideas from more than 300 research papers and provides guidance to practitioners who wish to utilize sparsity today, as well as to researchers whose goal is to push the frontier forward. We include the necessary background on mathematical methods in sparsification, describe phenomena such as early structure adaptation, the intricate relations between sparsity and the training process, and show techniques for achieving acceleration on real hardware. We also define a metric of pruned parameter efficiency that could serve as a baseline for comparison of different sparse networks. We close by speculating on how sparsity can improve future workloads and outline major open problems in the field
Fundamentals
Volume 1 establishes the foundations of this new field. It goes through all the steps from data collection, their summary and clustering, to different aspects of resource-aware learning, i.e., hardware, memory, energy, and communication awareness. Machine learning methods are inspected with respect to resource requirements and how to enhance scalability on diverse computing architectures ranging from embedded systems to large computing clusters