5,387 research outputs found

    Identifying the Major Sources of Variance in Transaction Latencies: Towards More Predictable Databases

    Full text link
    Decades of research have sought to improve transaction processing performance and scalability in database management systems (DBMSs). However, significantly less attention has been dedicated to the predictability of performance: how often individual transactions exhibit execution latency far from the mean? Performance predictability is vital when transaction processing lies on the critical path of a complex enterprise software or an interactive web service, as well as in emerging database-as-a-service markets where customers contract for guaranteed levels of performance. In this paper, we take several steps towards achieving more predictable database systems. First, we propose a profiling framework called VProfiler that, given the source code of a DBMS, is able to identify the dominant sources of variance in transaction latency. VProfiler automatically instruments the DBMS source code to deconstruct the overall variance of transaction latencies into variances and covariances of the execution time of individual functions, which in turn provide insight into the root causes of variance. Second, we use VProfiler to analyze MySQL and Postgres - two of the most popular and complex open-source database systems. Our case studies reveal that the primary causes of variance in MySQL and Postgres are lock scheduling and centralized logging, respectively. Finally, based on VProfiler's findings, we further focus on remedying the performance variance of MySQL by (1) proposing a new lock scheduling algorithm, called Variance-Aware Transaction Scheduling (VATS), (2) enhancing the buffer pool replacement policy, and (3) identifying tuning parameters that can reduce variance significantly. Our experimental results show that our schemes reduce overall transaction latency variance by 37% on average (and up to 64%) without compromising throughput or mean latency

    PipeGen: Data Pipe Generator for Hybrid Analytics

    Full text link
    We develop a tool called PipeGen for efficient data transfer between database management systems (DBMSs). PipeGen targets data analytics workloads on shared-nothing engines. It supports scenarios where users seek to perform different parts of an analysis in different DBMSs or want to combine and analyze data stored in different systems. The systems may be colocated in the same cluster or may be in different clusters. To achieve high performance, PipeGen leverages the ability of all DBMSs to export, possibly in parallel, data into a common data format, such as CSV or JSON. It automatically extends these import and export functions with efficient binary data transfer capabilities that avoid materializing the transmitted data on the file system. We implement a prototype of PipeGen and evaluate it by automatically generating data pipes between five different DBMSs. Our experiments show that PipeGen delivers speedups up to 3.8x compared with manually exporting and importing data across systems using CSV.Comment: 12 pages, 15 figure

    MITHRIL: Mining Sporadic Associations for Cache Prefetching

    Full text link
    The growing pressure on cloud application scalability has accentuated storage performance as a critical bottle- neck. Although cache replacement algorithms have been extensively studied, cache prefetching - reducing latency by retrieving items before they are actually requested remains an underexplored area. Existing approaches to history-based prefetching, in particular, provide too few benefits for real systems for the resources they cost. We propose MITHRIL, a prefetching layer that efficiently exploits historical patterns in cache request associations. MITHRIL is inspired by sporadic association rule mining and only relies on the timestamps of requests. Through evaluation of 135 block-storage traces, we show that MITHRIL is effective, giving an average of a 55% hit ratio increase over LRU and PROBABILITY GRAPH, a 36% hit ratio gain over AMP at reasonable cost. We further show that MITHRIL can supplement any cache replacement algorithm and be readily integrated into existing systems. Furthermore, we demonstrate the improvement comes from MITHRIL being able to capture mid-frequency blocks

    A Survey and Taxonomy of Urban Traffic Management: Towards Vehicular Networks

    Full text link
    Urban Traffic Management (UTM) topics have been tackled since long time, mainly by civil engineers and by city planners. The introduction of new communication technologies - such as cellular systems, satellite positioning systems and inter-vehicle communications - has significantly changed the way researchers deal with UTM issues. In this survey, we provide a review and a classification of how UTM has been addressed in the literature. We start from the recent achievements of "classical" approaches to urban traffic estimation and optimization, including methods based on the analysis of data collected by fixed sensors (e.g., cameras and radars), as well as methods based on information provided by mobile phones, such as Floating Car Data (FCD). Afterwards, we discuss urban traffic optimization, presenting the most recent works on traffic signal control and vehicle routing control. Then, after recalling the main concepts of Vehicular Ad-Hoc Networks (VANETs), we classify the different VANET-based approaches to UTM, according to three categories ("pure" VANETs, hybrid vehicular-sensor networks and hybrid vehicular-cellular networks), while illustrating the major research issues for each of them. The main objective of this survey is to provide a comprehensive view on UTM to researchers with focus on VANETs, in order to pave the way for the design and development of novel techniques for mitigating urban traffic problems, based on inter-vehicle communications

    An Efficient Hybrid I/O Caching Architecture Using Heterogeneous SSDs

    Full text link
    SSDs are emerging storage devices which unlike HDDs, do not have mechanical parts and therefore, have superior performance compared to HDDs. Due to the high cost of SSDs, entirely replacing HDDs with SSDs is not economically justified. Additionally, SSDs can endure a limited number of writes before failing. To mitigate the shortcomings of SSDs while taking advantage of their high performance, SSD caching is practiced in both academia and industry. Previously proposed caching architectures have only focused on either performance or endurance and neglected to address both parameters in suggested architectures. Moreover, the cost, reliability, and power consumption of such architectures is not evaluated. This paper proposes a hybrid I/O caching architecture that while offers higher performance than previous studies, it also improves power consumption with a similar budget. The proposed architecture uses DRAM, Read-Optimized SSD, and Write-Optimized SSD in a three-level cache hierarchy and tries to efficiently redirect read requests to either DRAM or RO-SSD while sending writes to WO-SSD. To provide high reliability, dirty pages are written to at least two devices which removes any single point of failure. The power consumption is also managed by reducing the number of accesses issued to SSDs. The proposed architecture reconfigures itself between performance- and endurance-optimized policies based on the workload characteristics to maintain an effective tradeoff between performance and endurance. We have implemented the proposed architecture on a server equipped with industrial SSDs and HDDs. The experimental results show that as compared to state-of-the-art studies, the proposed architecture improves performance and power consumption by an average of 8% and 28%, respectively, and reduces the cost by 5% while increasing the endurance cost by 4.7% and negligible reliability penalty

    Technical Report: Accelerating Dynamic Graph Analytics on GPUs

    Full text link
    As graph analytics often involves compute-intensive operations, GPUs have been extensively used to accelerate the processing. However, in many applications such as social networks, cyber security, and fraud detection, their representative graphs evolve frequently and one has to perform a rebuild of the graph structure on GPUs to incorporate the updates. Hence, rebuilding the graphs becomes the bottleneck of processing high-speed graph streams. In this paper, we propose a GPU-based dynamic graph storage scheme to support existing graph algorithms easily. Furthermore, we propose parallel update algorithms to support efficient stream updates so that the maintained graph is immediately available for high-speed analytic processing on GPUs. Our extensive experiments with three streaming applications on large-scale real and synthetic datasets demonstrate the superior performance of our proposed approach.Comment: 34 pages, 18 figure

    Performance Optimization of Network Coding Based Communication and Reliable Storage in Internet of Things

    Full text link
    Internet or things (IoT) is changing our daily life rapidly. Although new technologies are emerging everyday and expanding their influence in this rapidly growing area, many classic theories can still find their places. In this paper, we study the important applications of the classic network coding theory in two important components of Internet of things, including the IoT core network, where data is sensed and transmitted, and the distributed cloud storage, where the data generated by the IoT core network is stored. First we propose an adaptive network coding (ANC) scheme in the IoT core network to improve the transmission efficiency. We demonstrate the efficacy of the scheme and the performance advantage over existing schemes through simulations. %Next we study the application of network coding in the distributed cloud storage. Next we introduce the optimal storage allocation problem in the network coding based distributed cloud storage, which aims at searching for the most reliable allocation that distributes the nn data components into NN data centers, given the failure probability pp of each data center. Then we propose a polynomial-time optimal storage allocation (OSA) scheme to solve the problem. Both the theoretical analysis and the simulation results show that the storage reliability could be greatly improved by the OSA scheme

    Wireless Network Design for Control Systems: A Survey

    Full text link
    Wireless networked control systems (WNCS) are composed of spatially distributed sensors, actuators, and con- trollers communicating through wireless networks instead of conventional point-to-point wired connections. Due to their main benefits in the reduction of deployment and maintenance costs, large flexibility and possible enhancement of safety, WNCS are becoming a fundamental infrastructure technology for critical control systems in automotive electrical systems, avionics control systems, building management systems, and industrial automation systems. The main challenge in WNCS is to jointly design the communication and control systems considering their tight interaction to improve the control performance and the network lifetime. In this survey, we make an exhaustive review of the literature on wireless network design and optimization for WNCS. First, we discuss what we call the critical interactive variables including sampling period, message delay, message dropout, and network energy consumption. The mutual effects of these communication and control variables motivate their joint tuning. We discuss the effect of controllable wireless network parameters at all layers of the communication protocols on the probability distribution of these interactive variables. We also review the current wireless network standardization for WNCS and their corresponding methodology for adapting the network parameters. Moreover, we discuss the analysis and design of control systems taking into account the effect of the interactive variables on the control system performance. Finally, we present the state-of-the-art wireless network design and optimization for WNCS, while highlighting the tradeoff between the achievable performance and complexity of various approaches. We conclude the survey by highlighting major research issues and identifying future research directions.Comment: 37 pages, 17 figures, 4 table

    Large-Scale Time-Shifted Streaming Delivery

    Full text link
    An attractive new feature of connected TV systems consists in allowing users to access past portions of the TV channel. This feature, called time-shifted streaming, is now used by millions of TV viewers. We address in this paper the design of a large-scale delivery system for time-shifted streaming. We highlight the characteristics of time-shifted streaming that prevent known video delivery systems to be used. Then, we present two proposals that meet the demand for two radically different types of TV operator. First, the Peer-Assisted Catch-Up Streaming system, namely PACUS, aims at reducing the load on the server of a large TV broadcasters without losing the control of the TV delivery. Second, the turntable structure, is an overlay of nodes that allow an independent content delivery network or a small independent TV broadcaster to ensure that all past TV programs are stored and as available as possible. We show through extensive simulations that our objectives are reached, with a reduction of up to three quarters of the traffic for PACUS and a 100\% guaranteed availability for the turntable structure. We also compare our proposals to the main previous works in the area
    corecore