43,515 research outputs found
Distributed top-k aggregation queries at large
Top-k query processing is a fundamental building block for efficient ranking in a large number of applications. Efficiency is a central issue, especially for distributed settings, when the data is spread across different nodes in a network. This paper introduces novel optimization methods for top-k aggregation queries in such distributed environments. The optimizations can be applied to all algorithms that fall into the frameworks of the prior TPUT and KLEE methods. The optimizations address three degrees of freedom: 1) hierarchically grouping input lists into top-k operator trees and optimizing the tree structure, 2) computing data-adaptive scan depths for different input sources, and 3) data-adaptive sampling of a small subset of input sources in scenarios with hundreds or thousands of query-relevant network nodes. All optimizations are based on a statistical cost model that utilizes local synopses, e.g., in the form of histograms, efficiently computed convolutions, and estimators based on order statistics. The paper presents comprehensive experiments, with three different real-life datasets and using the ns-2 network simulator for a packet-level simulation of a large Internet-style network
Scalable and Sustainable Deep Learning via Randomized Hashing
Current deep learning architectures are growing larger in order to learn from
complex datasets. These architectures require giant matrix multiplication
operations to train millions of parameters. Conversely, there is another
growing trend to bring deep learning to low-power, embedded devices. The matrix
operations, associated with both training and testing of deep networks, are
very expensive from a computational and energy standpoint. We present a novel
hashing based technique to drastically reduce the amount of computation needed
to train and test deep networks. Our approach combines recent ideas from
adaptive dropouts and randomized hashing for maximum inner product search to
select the nodes with the highest activation efficiently. Our new algorithm for
deep learning reduces the overall computational cost of forward and
back-propagation by operating on significantly fewer (sparse) nodes. As a
consequence, our algorithm uses only 5% of the total multiplications, while
keeping on average within 1% of the accuracy of the original model. A unique
property of the proposed hashing based back-propagation is that the updates are
always sparse. Due to the sparse gradient updates, our algorithm is ideally
suited for asynchronous and parallel training leading to near linear speedup
with increasing number of cores. We demonstrate the scalability and
sustainability (energy efficiency) of our proposed algorithm via rigorous
experimental evaluations on several real datasets
Measuring and Managing Answer Quality for Online Data-Intensive Services
Online data-intensive services parallelize query execution across distributed
software components. Interactive response time is a priority, so online query
executions return answers without waiting for slow running components to
finish. However, data from these slow components could lead to better answers.
We propose Ubora, an approach to measure the effect of slow running components
on the quality of answers. Ubora randomly samples online queries and executes
them twice. The first execution elides data from slow components and provides
fast online answers; the second execution waits for all components to complete.
Ubora uses memoization to speed up mature executions by replaying network
messages exchanged between components. Our systems-level implementation works
for a wide range of platforms, including Hadoop/Yarn, Apache Lucene, the
EasyRec Recommendation Engine, and the OpenEphyra question answering system.
Ubora computes answer quality much faster than competing approaches that do not
use memoization. With Ubora, we show that answer quality can and should be used
to guide online admission control. Our adaptive controller processed 37% more
queries than a competing controller guided by the rate of timeouts.Comment: Technical Repor
Pheromone-based In-Network Processing for wireless sensor network monitoring systems
Monitoring spatio-temporal continuous fields using wireless sensor networks (WSNs) has emerged as a novel solution. An efficient data-driven routing mechanism for sensor querying and information gathering in large-scale WSNs is a challenging problem. In particular, we consider the case of how to query the sensor network information with the minimum energy cost in scenarios where a small subset of sensor nodes has relevant readings. In order to deal with this problem, we propose a Pheromone-based In-Network Processing (PhINP) mechanism. The proposal takes advantages of both a pheromone-based iterative strategy to direct queries towards nodes with relevant information and query- and response-based in-network filtering to reduce the number of active nodes. Additionally, we apply reinforcement learning to improve the performance. The main contribution of this work is the proposal of a simple and efficient mechanism for information discovery and gathering. It can reduce the messages exchanged in the network, by allowing some error, in order to maximize the network lifetime. We demonstrate by extensive simulations that using PhINP mechanism the query dissemination cost can be reduced by approximately 60% over flooding, with an error below 1%, applying the same in-network filtering strategy.Fil: Riva, Guillermo Gaston. Universidad Nacional de Córdoba. Facultad de Ciencias Exactas, Físicas y Naturales; Argentina. Universidad Tecnológica Nacional; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Córdoba; ArgentinaFil: Finochietto, Jorge Manuel. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Córdoba. Instituto de Estudios Avanzados en Ingeniería y Tecnología. Universidad Nacional de Córdoba. Facultad de Ciencias Exactas Físicas y Naturales. Instituto de Estudios Avanzados en Ingeniería y Tecnología; Argentin
- …