5,387 research outputs found
Identifying the Major Sources of Variance in Transaction Latencies: Towards More Predictable Databases
Decades of research have sought to improve transaction processing performance
and scalability in database management systems (DBMSs). However, significantly
less attention has been dedicated to the predictability of performance: how
often individual transactions exhibit execution latency far from the mean?
Performance predictability is vital when transaction processing lies on the
critical path of a complex enterprise software or an interactive web service,
as well as in emerging database-as-a-service markets where customers contract
for guaranteed levels of performance. In this paper, we take several steps
towards achieving more predictable database systems. First, we propose a
profiling framework called VProfiler that, given the source code of a DBMS, is
able to identify the dominant sources of variance in transaction latency.
VProfiler automatically instruments the DBMS source code to deconstruct the
overall variance of transaction latencies into variances and covariances of the
execution time of individual functions, which in turn provide insight into the
root causes of variance. Second, we use VProfiler to analyze MySQL and Postgres
- two of the most popular and complex open-source database systems. Our case
studies reveal that the primary causes of variance in MySQL and Postgres are
lock scheduling and centralized logging, respectively. Finally, based on
VProfiler's findings, we further focus on remedying the performance variance of
MySQL by (1) proposing a new lock scheduling algorithm, called Variance-Aware
Transaction Scheduling (VATS), (2) enhancing the buffer pool replacement
policy, and (3) identifying tuning parameters that can reduce variance
significantly. Our experimental results show that our schemes reduce overall
transaction latency variance by 37% on average (and up to 64%) without
compromising throughput or mean latency
PipeGen: Data Pipe Generator for Hybrid Analytics
We develop a tool called PipeGen for efficient data transfer between database
management systems (DBMSs). PipeGen targets data analytics workloads on
shared-nothing engines. It supports scenarios where users seek to perform
different parts of an analysis in different DBMSs or want to combine and
analyze data stored in different systems. The systems may be colocated in the
same cluster or may be in different clusters. To achieve high performance,
PipeGen leverages the ability of all DBMSs to export, possibly in parallel,
data into a common data format, such as CSV or JSON. It automatically extends
these import and export functions with efficient binary data transfer
capabilities that avoid materializing the transmitted data on the file system.
We implement a prototype of PipeGen and evaluate it by automatically generating
data pipes between five different DBMSs. Our experiments show that PipeGen
delivers speedups up to 3.8x compared with manually exporting and importing
data across systems using CSV.Comment: 12 pages, 15 figure
MITHRIL: Mining Sporadic Associations for Cache Prefetching
The growing pressure on cloud application scalability has accentuated storage
performance as a critical bottle- neck. Although cache replacement algorithms
have been extensively studied, cache prefetching - reducing latency by
retrieving items before they are actually requested remains an underexplored
area. Existing approaches to history-based prefetching, in particular, provide
too few benefits for real systems for the resources they cost. We propose
MITHRIL, a prefetching layer that efficiently exploits historical patterns in
cache request associations. MITHRIL is inspired by sporadic association rule
mining and only relies on the timestamps of requests. Through evaluation of 135
block-storage traces, we show that MITHRIL is effective, giving an average of a
55% hit ratio increase over LRU and PROBABILITY GRAPH, a 36% hit ratio gain
over AMP at reasonable cost. We further show that MITHRIL can supplement any
cache replacement algorithm and be readily integrated into existing systems.
Furthermore, we demonstrate the improvement comes from MITHRIL being able to
capture mid-frequency blocks
A Survey and Taxonomy of Urban Traffic Management: Towards Vehicular Networks
Urban Traffic Management (UTM) topics have been tackled since long time,
mainly by civil engineers and by city planners. The introduction of new
communication technologies - such as cellular systems, satellite positioning
systems and inter-vehicle communications - has significantly changed the way
researchers deal with UTM issues. In this survey, we provide a review and a
classification of how UTM has been addressed in the literature. We start from
the recent achievements of "classical" approaches to urban traffic estimation
and optimization, including methods based on the analysis of data collected by
fixed sensors (e.g., cameras and radars), as well as methods based on
information provided by mobile phones, such as Floating Car Data (FCD).
Afterwards, we discuss urban traffic optimization, presenting the most recent
works on traffic signal control and vehicle routing control. Then, after
recalling the main concepts of Vehicular Ad-Hoc Networks (VANETs), we classify
the different VANET-based approaches to UTM, according to three categories
("pure" VANETs, hybrid vehicular-sensor networks and hybrid vehicular-cellular
networks), while illustrating the major research issues for each of them. The
main objective of this survey is to provide a comprehensive view on UTM to
researchers with focus on VANETs, in order to pave the way for the design and
development of novel techniques for mitigating urban traffic problems, based on
inter-vehicle communications
Recommended from our members
SeeDB: Efficient Data-Driven Visualization Recommendations to Support Visual Analytics.
Data analysts often build visualizations as the first step in their analytical workflow. However, when working with high-dimensional datasets, identifying visualizations that show relevant or desired trends in data can be laborious. We propose SeeDB, a visualization recommendation engine to facilitate fast visual analysis: given a subset of data to be studied, SeeDB intelligently explores the space of visualizations, evaluates promising visualizations for trends, and recommends those it deems most "useful" or "interesting". The two major obstacles in recommending interesting visualizations are (a) scale: evaluating a large number of candidate visualizations while responding within interactive time scales, and (b) utility: identifying an appropriate metric for assessing interestingness of visualizations. For the former, SeeDB introduces pruning optimizations to quickly identify high-utility visualizations and sharing optimizations to maximize sharing of computation across visualizations. For the latter, as a first step, we adopt a deviation-based metric for visualization utility, while indicating how we may be able to generalize it to other factors influencing utility. We implement SeeDB as a middleware layer that can run on top of any DBMS. Our experiments show that our framework can identify interesting visualizations with high accuracy. Our optimizations lead to multiple orders of magnitude speedup on relational row and column stores and provide recommendations at interactive time scales. Finally, we demonstrate via a user study the effectiveness of our deviation-based utility metric and the value of recommendations in supporting visual analytics
An Efficient Hybrid I/O Caching Architecture Using Heterogeneous SSDs
SSDs are emerging storage devices which unlike HDDs, do not have mechanical
parts and therefore, have superior performance compared to HDDs. Due to the
high cost of SSDs, entirely replacing HDDs with SSDs is not economically
justified. Additionally, SSDs can endure a limited number of writes before
failing. To mitigate the shortcomings of SSDs while taking advantage of their
high performance, SSD caching is practiced in both academia and industry.
Previously proposed caching architectures have only focused on either
performance or endurance and neglected to address both parameters in suggested
architectures. Moreover, the cost, reliability, and power consumption of such
architectures is not evaluated. This paper proposes a hybrid I/O caching
architecture that while offers higher performance than previous studies, it
also improves power consumption with a similar budget. The proposed
architecture uses DRAM, Read-Optimized SSD, and Write-Optimized SSD in a
three-level cache hierarchy and tries to efficiently redirect read requests to
either DRAM or RO-SSD while sending writes to WO-SSD. To provide high
reliability, dirty pages are written to at least two devices which removes any
single point of failure. The power consumption is also managed by reducing the
number of accesses issued to SSDs. The proposed architecture reconfigures
itself between performance- and endurance-optimized policies based on the
workload characteristics to maintain an effective tradeoff between performance
and endurance. We have implemented the proposed architecture on a server
equipped with industrial SSDs and HDDs. The experimental results show that as
compared to state-of-the-art studies, the proposed architecture improves
performance and power consumption by an average of 8% and 28%, respectively,
and reduces the cost by 5% while increasing the endurance cost by 4.7% and
negligible reliability penalty
Technical Report: Accelerating Dynamic Graph Analytics on GPUs
As graph analytics often involves compute-intensive operations, GPUs have
been extensively used to accelerate the processing. However, in many
applications such as social networks, cyber security, and fraud detection,
their representative graphs evolve frequently and one has to perform a rebuild
of the graph structure on GPUs to incorporate the updates. Hence, rebuilding
the graphs becomes the bottleneck of processing high-speed graph streams. In
this paper, we propose a GPU-based dynamic graph storage scheme to support
existing graph algorithms easily. Furthermore, we propose parallel update
algorithms to support efficient stream updates so that the maintained graph is
immediately available for high-speed analytic processing on GPUs. Our extensive
experiments with three streaming applications on large-scale real and synthetic
datasets demonstrate the superior performance of our proposed approach.Comment: 34 pages, 18 figure
Performance Optimization of Network Coding Based Communication and Reliable Storage in Internet of Things
Internet or things (IoT) is changing our daily life rapidly. Although new
technologies are emerging everyday and expanding their influence in this
rapidly growing area, many classic theories can still find their places. In
this paper, we study the important applications of the classic network coding
theory in two important components of Internet of things, including the IoT
core network, where data is sensed and transmitted, and the distributed cloud
storage, where the data generated by the IoT core network is stored. First we
propose an adaptive network coding (ANC) scheme in the IoT core network to
improve the transmission efficiency. We demonstrate the efficacy of the scheme
and the performance advantage over existing schemes through simulations. %Next
we study the application of network coding in the distributed cloud storage.
Next we introduce the optimal storage allocation problem in the network coding
based distributed cloud storage, which aims at searching for the most reliable
allocation that distributes the data components into data centers,
given the failure probability of each data center. Then we propose a
polynomial-time optimal storage allocation (OSA) scheme to solve the problem.
Both the theoretical analysis and the simulation results show that the storage
reliability could be greatly improved by the OSA scheme
Wireless Network Design for Control Systems: A Survey
Wireless networked control systems (WNCS) are composed of spatially
distributed sensors, actuators, and con- trollers communicating through
wireless networks instead of conventional point-to-point wired connections. Due
to their main benefits in the reduction of deployment and maintenance costs,
large flexibility and possible enhancement of safety, WNCS are becoming a
fundamental infrastructure technology for critical control systems in
automotive electrical systems, avionics control systems, building management
systems, and industrial automation systems. The main challenge in WNCS is to
jointly design the communication and control systems considering their tight
interaction to improve the control performance and the network lifetime. In
this survey, we make an exhaustive review of the literature on wireless network
design and optimization for WNCS. First, we discuss what we call the critical
interactive variables including sampling period, message delay, message
dropout, and network energy consumption. The mutual effects of these
communication and control variables motivate their joint tuning. We discuss the
effect of controllable wireless network parameters at all layers of the
communication protocols on the probability distribution of these interactive
variables. We also review the current wireless network standardization for WNCS
and their corresponding methodology for adapting the network parameters.
Moreover, we discuss the analysis and design of control systems taking into
account the effect of the interactive variables on the control system
performance. Finally, we present the state-of-the-art wireless network design
and optimization for WNCS, while highlighting the tradeoff between the
achievable performance and complexity of various approaches. We conclude the
survey by highlighting major research issues and identifying future research
directions.Comment: 37 pages, 17 figures, 4 table
Large-Scale Time-Shifted Streaming Delivery
An attractive new feature of connected TV systems consists in allowing users
to access past portions of the TV channel. This feature, called time-shifted
streaming, is now used by millions of TV viewers. We address in this paper the
design of a large-scale delivery system for time-shifted streaming. We
highlight the characteristics of time-shifted streaming that prevent known
video delivery systems to be used. Then, we present two proposals that meet the
demand for two radically different types of TV operator. First, the
Peer-Assisted Catch-Up Streaming system, namely PACUS, aims at reducing the
load on the server of a large TV broadcasters without losing the control of the
TV delivery. Second, the turntable structure, is an overlay of nodes that allow
an independent content delivery network or a small independent TV broadcaster
to ensure that all past TV programs are stored and as available as possible. We
show through extensive simulations that our objectives are reached, with a
reduction of up to three quarters of the traffic for PACUS and a 100\%
guaranteed availability for the turntable structure. We also compare our
proposals to the main previous works in the area
- …