Search CORE

5,387 research outputs found

Identifying the Major Sources of Variance in Transaction Latencies: Towards More Predictable Databases

Author: Huang Jiamin
Mozafari Barzan
Schoenebeck Grant
Wenisch Thomas
Publication venue
Publication date: 03/03/2016
Field of study

Decades of research have sought to improve transaction processing performance and scalability in database management systems (DBMSs). However, significantly less attention has been dedicated to the predictability of performance: how often individual transactions exhibit execution latency far from the mean? Performance predictability is vital when transaction processing lies on the critical path of a complex enterprise software or an interactive web service, as well as in emerging database-as-a-service markets where customers contract for guaranteed levels of performance. In this paper, we take several steps towards achieving more predictable database systems. First, we propose a profiling framework called VProfiler that, given the source code of a DBMS, is able to identify the dominant sources of variance in transaction latency. VProfiler automatically instruments the DBMS source code to deconstruct the overall variance of transaction latencies into variances and covariances of the execution time of individual functions, which in turn provide insight into the root causes of variance. Second, we use VProfiler to analyze MySQL and Postgres - two of the most popular and complex open-source database systems. Our case studies reveal that the primary causes of variance in MySQL and Postgres are lock scheduling and centralized logging, respectively. Finally, based on VProfiler's findings, we further focus on remedying the performance variance of MySQL by (1) proposing a new lock scheduling algorithm, called Variance-Aware Transaction Scheduling (VATS), (2) enhancing the buffer pool replacement policy, and (3) identifying tuning parameters that can reduce variance significantly. Our experimental results show that our schemes reduce overall transaction latency variance by 37% on average (and up to 64%) without compromising throughput or mean latency

arXiv.org e-Print Archive

PipeGen: Data Pipe Generator for Hybrid Analytics

Author: Balazinska Magdalena
Cheung Alvin
Haynes Brandon
Publication venue
Publication date: 15/05/2016
Field of study

We develop a tool called PipeGen for efficient data transfer between database management systems (DBMSs). PipeGen targets data analytics workloads on shared-nothing engines. It supports scenarios where users seek to perform different parts of an analysis in different DBMSs or want to combine and analyze data stored in different systems. The systems may be colocated in the same cluster or may be in different clusters. To achieve high performance, PipeGen leverages the ability of all DBMSs to export, possibly in parallel, data into a common data format, such as CSV or JSON. It automatically extends these import and export functions with efficient binary data transfer capabilities that avoid materializing the transmitted data on the file system. We implement a prototype of PipeGen and evaluate it by automatically generating data pipes between five different DBMSs. Our experiments show that PipeGen delivers speedups up to 3.8x compared with manually exporting and importing data across systems using CSV.Comment: 12 pages, 15 figure

arXiv.org e-Print Archive

MITHRIL: Mining Sporadic Associations for Cache Prefetching

Author: Karimi Reza
Sæmundsson Trausti
Vigfusson Ymir
Wildani Avani
Yang Juncheng
Publication venue
Publication date: 21/05/2017
Field of study

The growing pressure on cloud application scalability has accentuated storage performance as a critical bottle- neck. Although cache replacement algorithms have been extensively studied, cache prefetching - reducing latency by retrieving items before they are actually requested remains an underexplored area. Existing approaches to history-based prefetching, in particular, provide too few benefits for real systems for the resources they cost. We propose MITHRIL, a prefetching layer that efficiently exploits historical patterns in cache request associations. MITHRIL is inspired by sporadic association rule mining and only relies on the timestamps of requests. Through evaluation of 135 block-storage traces, we show that MITHRIL is effective, giving an average of a 55% hit ratio increase over LRU and PROBABILITY GRAPH, a 36% hit ratio gain over AMP at reasonable cost. We further show that MITHRIL can supplement any cache replacement algorithm and be readily integrated into existing systems. Furthermore, we demonstrate the improvement comes from MITHRIL being able to capture mid-frequency blocks

arXiv.org e-Print Archive

A Survey and Taxonomy of Urban Traffic Management: Towards Vehicular Networks

Author: Amoretti Michele
Kamal Hany
Picone Marco
Publication venue
Publication date: 15/09/2014
Field of study

Urban Traffic Management (UTM) topics have been tackled since long time, mainly by civil engineers and by city planners. The introduction of new communication technologies - such as cellular systems, satellite positioning systems and inter-vehicle communications - has significantly changed the way researchers deal with UTM issues. In this survey, we provide a review and a classification of how UTM has been addressed in the literature. We start from the recent achievements of "classical" approaches to urban traffic estimation and optimization, including methods based on the analysis of data collected by fixed sensors (e.g., cameras and radars), as well as methods based on information provided by mobile phones, such as Floating Car Data (FCD). Afterwards, we discuss urban traffic optimization, presenting the most recent works on traffic signal control and vehicle routing control. Then, after recalling the main concepts of Vehicular Ad-Hoc Networks (VANETs), we classify the different VANET-based approaches to UTM, according to three categories ("pure" VANETs, hybrid vehicular-sensor networks and hybrid vehicular-cellular networks), while illustrating the major research issues for each of them. The main objective of this survey is to provide a comprehensive view on UTM to researchers with focus on VANETs, in order to pave the way for the design and development of novel techniques for mitigating urban traffic problems, based on inter-vehicle communications

arXiv.org e-Print Archive

Recommended from our members

SeeDB: Efficient Data-Driven Visualization Recommendations to Support Visual Analytics.

Author: Madden Samuel
Parameswaran Aditya
Polyzotis Neoklis
Rahman Sajjadur
Vartak Manasi
Publication venue: eScholarship, University of California
Publication date: 01/09/2015
Field of study

Data analysts often build visualizations as the first step in their analytical workflow. However, when working with high-dimensional datasets, identifying visualizations that show relevant or desired trends in data can be laborious. We propose SeeDB, a visualization recommendation engine to facilitate fast visual analysis: given a subset of data to be studied, SeeDB intelligently explores the space of visualizations, evaluates promising visualizations for trends, and recommends those it deems most "useful" or "interesting". The two major obstacles in recommending interesting visualizations are (a) scale: evaluating a large number of candidate visualizations while responding within interactive time scales, and (b) utility: identifying an appropriate metric for assessing interestingness of visualizations. For the former, SeeDB introduces pruning optimizations to quickly identify high-utility visualizations and sharing optimizations to maximize sharing of computation across visualizations. For the latter, as a first step, we adopt a deviation-based metric for visualization utility, while indicating how we may be able to generalize it to other factors influencing utility. We implement SeeDB as a middleware layer that can run on top of any DBMS. Our experiments show that our framework can identify interesting visualizations with high accuracy. Our optimizations lead to multiple orders of magnitude speedup on relational row and column stores and provide recommendations at interactive time scales. Finally, we demonstrate via a user study the effectiveness of our deviation-based utility metric and the value of recommendations in supporting visual analytics

eScholarship - University of California

An Efficient Hybrid I/O Caching Architecture Using Heterogeneous SSDs

Author: Asadi Hossein
Hadizadeh Mostafa
Salkhordeh Reza
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 10/12/2018
Field of study

SSDs are emerging storage devices which unlike HDDs, do not have mechanical parts and therefore, have superior performance compared to HDDs. Due to the high cost of SSDs, entirely replacing HDDs with SSDs is not economically justified. Additionally, SSDs can endure a limited number of writes before failing. To mitigate the shortcomings of SSDs while taking advantage of their high performance, SSD caching is practiced in both academia and industry. Previously proposed caching architectures have only focused on either performance or endurance and neglected to address both parameters in suggested architectures. Moreover, the cost, reliability, and power consumption of such architectures is not evaluated. This paper proposes a hybrid I/O caching architecture that while offers higher performance than previous studies, it also improves power consumption with a similar budget. The proposed architecture uses DRAM, Read-Optimized SSD, and Write-Optimized SSD in a three-level cache hierarchy and tries to efficiently redirect read requests to either DRAM or RO-SSD while sending writes to WO-SSD. To provide high reliability, dirty pages are written to at least two devices which removes any single point of failure. The power consumption is also managed by reducing the number of accesses issued to SSDs. The proposed architecture reconfigures itself between performance- and endurance-optimized policies based on the workload characteristics to maintain an effective tradeoff between performance and endurance. We have implemented the proposed architecture on a server equipped with industrial SSDs and HDDs. The experimental results show that as compared to state-of-the-art studies, the proposed architecture improves performance and power consumption by an average of 8% and 28%, respectively, and reduces the cost by 5% while increasing the endurance cost by 4.7% and negligible reliability penalty

arXiv.org e-Print Archive

Technical Report: Accelerating Dynamic Graph Analytics on GPUs

Author: He Bingsheng
Li Yuchen
Sha Mo
Tan Kian-Lee
Publication venue
Publication date: 27/06/2018
Field of study

As graph analytics often involves compute-intensive operations, GPUs have been extensively used to accelerate the processing. However, in many applications such as social networks, cyber security, and fraud detection, their representative graphs evolve frequently and one has to perform a rebuild of the graph structure on GPUs to incorporate the updates. Hence, rebuilding the graphs becomes the bottleneck of processing high-speed graph streams. In this paper, we propose a GPU-based dynamic graph storage scheme to support existing graph algorithms easily. Furthermore, we propose parallel update algorithms to support efficient stream updates so that the maintained graph is immediately available for high-speed analytic processing on GPUs. Our extensive experiments with three streaming applications on large-scale real and synthetic datasets demonstrate the superior performance of our proposed approach.Comment: 34 pages, 18 figure

arXiv.org e-Print Archive

Performance Optimization of Network Coding Based Communication and Reliable Storage in Internet of Things

Author: Li Jian
Liu Yun
Ren Jian
Zhang Zhenjiang
Zhao Nan
Publication venue
Publication date: 09/03/2017
Field of study

Internet or things (IoT) is changing our daily life rapidly. Although new technologies are emerging everyday and expanding their influence in this rapidly growing area, many classic theories can still find their places. In this paper, we study the important applications of the classic network coding theory in two important components of Internet of things, including the IoT core network, where data is sensed and transmitted, and the distributed cloud storage, where the data generated by the IoT core network is stored. First we propose an adaptive network coding (ANC) scheme in the IoT core network to improve the transmission efficiency. We demonstrate the efficacy of the scheme and the performance advantage over existing schemes through simulations. %Next we study the application of network coding in the distributed cloud storage. Next we introduce the optimal storage allocation problem in the network coding based distributed cloud storage, which aims at searching for the most reliable allocation that distributes the

n

data components into

N

data centers, given the failure probability

p

of each data center. Then we propose a polynomial-time optimal storage allocation (OSA) scheme to solve the problem. Both the theoretical analysis and the simulation results show that the storage reliability could be greatly improved by the OSA scheme

arXiv.org e-Print Archive

Wireless Network Design for Control Systems: A Survey

Author: Ergen Sinem Coleri
Fischione Carlo
Johansson Karl Henrik
Lu Chenyang
Park Pangun
Publication venue
Publication date: 24/08/2017
Field of study

Wireless networked control systems (WNCS) are composed of spatially distributed sensors, actuators, and con- trollers communicating through wireless networks instead of conventional point-to-point wired connections. Due to their main benefits in the reduction of deployment and maintenance costs, large flexibility and possible enhancement of safety, WNCS are becoming a fundamental infrastructure technology for critical control systems in automotive electrical systems, avionics control systems, building management systems, and industrial automation systems. The main challenge in WNCS is to jointly design the communication and control systems considering their tight interaction to improve the control performance and the network lifetime. In this survey, we make an exhaustive review of the literature on wireless network design and optimization for WNCS. First, we discuss what we call the critical interactive variables including sampling period, message delay, message dropout, and network energy consumption. The mutual effects of these communication and control variables motivate their joint tuning. We discuss the effect of controllable wireless network parameters at all layers of the communication protocols on the probability distribution of these interactive variables. We also review the current wireless network standardization for WNCS and their corresponding methodology for adapting the network parameters. Moreover, we discuss the analysis and design of control systems taking into account the effect of the interactive variables on the control system performance. Finally, we present the state-of-the-art wireless network design and optimization for WNCS, while highlighting the tradeoff between the achievable performance and complexity of various approaches. We conclude the survey by highlighting major research issues and identifying future research directions.Comment: 37 pages, 17 figures, 4 table

arXiv.org e-Print Archive

Large-Scale Time-Shifted Streaming Delivery

Author: Liu Yaning
Simon Gwendal
Publication venue
Publication date: 09/11/2011
Field of study

An attractive new feature of connected TV systems consists in allowing users to access past portions of the TV channel. This feature, called time-shifted streaming, is now used by millions of TV viewers. We address in this paper the design of a large-scale delivery system for time-shifted streaming. We highlight the characteristics of time-shifted streaming that prevent known video delivery systems to be used. Then, we present two proposals that meet the demand for two radically different types of TV operator. First, the Peer-Assisted Catch-Up Streaming system, namely PACUS, aims at reducing the load on the server of a large TV broadcasters without losing the control of the TV delivery. Second, the turntable structure, is an overlay of nodes that allow an independent content delivery network or a small independent TV broadcaster to ensure that all past TV programs are stored and as available as possible. We show through extensive simulations that our objectives are reached, with a reduction of up to three quarters of the traffic for PACUS and a 100\% guaranteed availability for the turntable structure. We also compare our proposals to the main previous works in the area

arXiv.org e-Print Archive