3,820 research outputs found
Auto-tuning Distributed Stream Processing Systems using Reinforcement Learning
Fine tuning distributed systems is considered to be a craftsmanship, relying
on intuition and experience. This becomes even more challenging when the
systems need to react in near real time, as streaming engines have to do to
maintain pre-agreed service quality metrics. In this article, we present an
automated approach that builds on a combination of supervised and reinforcement
learning methods to recommend the most appropriate lever configurations based
on previous load. With this, streaming engines can be automatically tuned
without requiring a human to determine the right way and proper time to deploy
them. This opens the door to new configurations that are not being applied
today since the complexity of managing these systems has surpassed the
abilities of human experts. We show how reinforcement learning systems can find
substantially better configurations in less time than their human counterparts
and adapt to changing workloads
Recommended from our members
Versatile stochastic dot product circuits based on nonvolatile memories for high performance neurocomputing and neurooptimization.
The key operation in stochastic neural networks, which have become the state-of-the-art approach for solving problems in machine learning, information theory, and statistics, is a stochastic dot-product. While there have been many demonstrations of dot-product circuits and, separately, of stochastic neurons, the efficient hardware implementation combining both functionalities is still missing. Here we report compact, fast, energy-efficient, and scalable stochastic dot-product circuits based on either passively integrated metal-oxide memristors or embedded floating-gate memories. The circuit's high performance is due to mixed-signal implementation, while the efficient stochastic operation is achieved by utilizing circuit's noise, intrinsic and/or extrinsic to the memory cell array. The dynamic scaling of weights, enabled by analog memory devices, allows for efficient realization of different annealing approaches to improve functionality. The proposed approach is experimentally verified for two representative applications, namely by implementing neural network for solving a four-node graph-partitioning problem, and a Boltzmann machine with 10-input and 8-hidden neurons
Comparative Analysis of Decision Tree Algorithms for Data Warehouse Fragmentation
One of the main problems faced by Data Warehouse designers is fragmentation.Several studies have proposed data mining-based horizontal fragmentation methods.However, not exists a horizontal fragmentation technique that uses a decision tree. This paper presents the analysis of different decision tree algorithms to select the best one to implement the fragmentation method. Such analysis was performed under version 3.9.4 of Weka, considering four evaluation metrics (Precision, ROC Area, Recall and F-measure) for different selected data sets using the Star Schema Benchmark. The results showed that the two best algorithms were J48 and Random Forest in most cases; nevertheless, J48 was selected because it is more efficient in building the model.One of the main problems faced by Data Warehouse designers is fragmentation.Several studies have proposed data mining-based horizontal fragmentation methods.However, not exists a horizontal fragmentation technique that uses a decision tree. This paper presents the analysis of different decision tree algorithms to select the best one to implement the fragmentation method. Such analysis was performed under version 3.9.4 of Weka, considering four evaluation metrics (Precision, ROC Area, Recall and F-measure) for different selected data sets using the Star Schema Benchmark. The results showed that the two best algorithms were J48 and Random Forest in most cases; nevertheless, J48 was selected because it is more efficient in building the model
Qd-tree: Learning Data Layouts for Big Data Analytics
Corporations today collect data at an unprecedented and accelerating scale,
making the need to run queries on large datasets increasingly important.
Technologies such as columnar block-based data organization and compression
have become standard practice in most commercial database systems. However, the
problem of best assigning records to data blocks on storage is still open. For
example, today's systems usually partition data by arrival time into row
groups, or range/hash partition the data based on selected fields. For a given
workload, however, such techniques are unable to optimize for the important
metric of the number of blocks accessed by a query. This metric directly
relates to the I/O cost, and therefore performance, of most analytical queries.
Further, they are unable to exploit additional available storage to drive this
metric down further.
In this paper, we propose a new framework called a query-data routing tree,
or qd-tree, to address this problem, and propose two algorithms for their
construction based on greedy and deep reinforcement learning techniques.
Experiments over benchmark and real workloads show that a qd-tree can provide
physical speedups of more than an order of magnitude compared to current
blocking schemes, and can reach within 2X of the lower bound for data skipping
based on selectivity, while providing complete semantic descriptions of created
blocks.Comment: ACM SIGMOD 202
Recommended from our members
Image-Based Modeling of Bridges and Its Applications to Evaluating Resiliency of Transportation Networks
Modern urban areas are heavily dependent on transportation networks to sustain their economic life. Hence, when vital components of a regional network are disrupted, economic losses are inevitable. As evidenced by 1989, Loma Prieta and 1994, Northridge earthquakes, the seismic damages experienced by bridges alone result in extensive traffic delays and rerouting, not only hindering emergency response but also causing indirect economic losses that far surpass the direct cost of damage to infrastructure. Nevertheless, in many areas of the U.S., transportation networks lack the resilience required to sustain the potential demands of natural hazards. Traditional hazard assessment methods, in theory, provide the tools required for predicting the vulnerabilities associated with natural hazards. Nonetheless, due to their abstractions of the complex infrastructure and the coupled regional behavior, they often fall short of that expectation. This study proposes a semi-automated image-based model generation framework for producing structure-specific models and fragility functions of bridges. The framework effectively fuses geometric and semantic information extracted from Google Street View images with centerline curve geometry, surface topology, and various relevant metadata to construct extremely accurate geometric representations of bridges. Then, using class statistics available in the literature for bridge structural properties, the framework generates structural models. Both the performance of the geometry extraction procedure and the structural modeling method proposed here are validated by comparison against the structural model of a real-life bridge developed based on as-built drawings.In principle, these models can be utilized to assess physical damage for any type of hazard, but in this study, the focus is limited to seismic applications. Thus to relate the damage resulting from seismic demands from ground shaking, bridge-specific fragility functions are developed for 100 bridge structures in the immediate surroundings of Ports of Los Angeles and Long Beach. Using these fragility curves, the physical damage resulting from a magnitude 7.3 scenario earthquake on Palos Verdes fault is predicted. Subsequently, the effects of the bridge infrastructure damage to the transportation patterns in the Los Angeles metropolitan area are investigated in terms of various resilience metrics
- …