2,047 research outputs found

    Magpie: Automatically Tuning Static Parameters for Distributed File Systems using Deep Reinforcement Learning

    Full text link
    Distributed file systems are widely used nowadays, yet using their default configurations is often not optimal. At the same time, tuning configuration parameters is typically challenging and time-consuming. It demands expertise and tuning operations can also be expensive. This is especially the case for static parameters, where changes take effect only after a restart of the system or workloads. We propose a novel approach, Magpie, which utilizes deep reinforcement learning to tune static parameters by strategically exploring and exploiting configuration parameter spaces. To boost the tuning of the static parameters, our method employs both server and client metrics of distributed file systems to understand the relationship between static parameters and performance. Our empirical evaluation results show that Magpie can noticeably improve the performance of the distributed file system Lustre, where our approach on average achieves 91.8% throughput gains against default configuration after tuning towards single performance indicator optimization, while it reaches 39.7% more throughput gains against the baseline.Comment: Accepted at The IEEE International Conference on Cloud Engineering (IC2E) conference 202

    Towards High-Performance Big Data Processing Systems

    Get PDF
    The amount of generated and stored data has been growing rapidly, It is estimated that 2.5 quintillion bytes of data are generated every day, and 90% of the data in the world today has been created in the last two years. How to solve these big data issues has become a hot topic in both industry and academia. Due to the complex of big data platform, we stratify it into four layers: storage layer, resource management layer, computing layer, and methodology layer. This dissertation proposes brand-new approaches to address the performance of big data platforms like Hadoop and Spark on these four layers. We first present an improved HDFS design called SMARTH, which optimizes the storage layer. It utilizes asynchronous multi-pipeline data transfers instead of a single pipeline stop-and-wait mechanism. SMARTH records the actual transfer speed of data blocks and sends this information to the namenode along with periodic heartbeat messages. The namenode sorts datanodes according to their past performance and tracks this information continuously. When a client initiates an upload request, the namenode will send it a list of \u27\u27high performance\u27\u27 datanodes that it thinks will yield the highest throughput for the client. By choosing higher performance datanodes relative to each client and by taking advantage of the multi-pipeline design, our experiments show that SMARTH significantly improves the performance of data write operations compared to HDFS. Specifically, SMARTH is able to improve the throughput of data transfer by 27-245% in a heterogeneous virtual cluster on Amazon EC2. Secondly, we propose an optimized Hadoop extension called MRapid, which significantly speeds up the execution of short jobs on the resource management layer. It is completely backward compatible to Hadoop, and imposes negligible overhead. Our experiments on Microsoft Azure public cloud show that MRapid can improve performance by up to 88% compared to the original Hadoop. Thirdly, we introduce an efficient 3-level sampling performance model, called Hedgehog, and focus on the relationship between resource and performance. This design is a brand new white-box model for Spark, which is more complex and challenging than Hadoop. In our tool, we employ a Java bytecode manipulation and analysis framework called ASM to reduce the profiling overhead dramatically. Fourthly, on the computing layer, we optimize the current implementation of SGD in Spark\u27s MLlib by reusing data partition for multiple times within a single iteration to find better candidate weights in a more efficient way. Whether using multiple local iterations within each partition is dynamically decided by the 68-95-99.7 rule. We also design a variant of momentum algorithm to optimize step size in every iteration. This method uses a new adaptive rule that decreases the step size whenever neighboring gradients show differing directions of significance. Experiments show that our adaptive algorithm is more efficient and can be 7 times faster compared to the original MLlib\u27s SGD. At last, on the application layer, we present a scalable and distributed geographic information system, called Dart, based on Hadoop and HBase. Dart provides a hybrid table schema to store spatial data in HBase so that the Reduce process can be omitted for operations like calculating the mean center and the median center. It employs reasonable pre-splitting and hash techniques to avoid data imbalance and hot region problems. It also supports massive spatial data analysis like K-Nearest Neighbors (KNN) and Geometric Median Distribution. In our experiments, we evaluate the performance of Dart by processing 160 GB Twitter data on an Amazon EC2 cluster. The experimental results show that Dart is very scalable and efficient

    CONFPROFITT: A CONFIGURATION-AWARE PERFORMANCE PROFILING, TESTING, AND TUNING FRAMEWORK

    Get PDF
    Modern computer software systems are complicated. Developers can change the behavior of the software system through software configurations. The large number of configuration option and their interactions make the task of software tuning, testing, and debugging very challenging. Performance is one of the key aspects of non-functional qualities, where performance bugs can cause significant performance degradation and lead to poor user experience. However, performance bugs are difficult to expose, primarily because detecting them requires specific inputs, as well as specific configurations. While researchers have developed techniques to analyze, quantify, detect, and fix performance bugs, many of these techniques are not effective in highly-configurable systems. To improve the non-functional qualities of configurable software systems, testing engineers need to be able to understand the performance influence of configuration options, adjust the performance of a system under different configurations, and detect configuration-related performance bugs. This research will provide an automated framework that allows engineers to effectively analyze performance-influence configuration options, detect performance bugs in highly-configurable software systems, and adjust configuration options to achieve higher long-term performance gains. To understand real-world performance bugs in highly-configurable software systems, we first perform a performance bug characteristics study from three large-scale opensource projects. Many researchers have studied the characteristics of performance bugs from the bug report but few have reported what the experience is when trying to replicate confirmed performance bugs from the perspective of non-domain experts such as researchers. This study is meant to report the challenges and potential workaround to replicate confirmed performance bugs. We also want to share a performance benchmark to provide real-world performance bugs to evaluate future performance testing techniques. Inspired by our performance bug study, we propose a performance profiling approach that can help developers to understand how configuration options and their interactions can influence the performance of a system. The approach uses a combination of dynamic analysis and machine learning techniques, together with configuration sampling techniques, to profile the program execution, analyze configuration options relevant to performance. Next, the framework leverages natural language processing and information retrieval techniques to automatically generate test inputs and configurations to expose performance bugs. Finally, the framework combines reinforcement learning and dynamic state reduction techniques to guide subject application towards achieving higher long-term performance gains

    A Monte Carlo tree search algorithm for optimization of load scalability in database systems

    Get PDF
    A thesis submitted in partial fulfilment of the requirements for the Degree of Doctor of Philosophy in Information Technology at Strathmore UniversityVariable environmental conditions and runtime phenomena require developers of complex business information systems to expose configuration parameters to system administrators. This allows system administrators to intervene by tuning the bottleneck configuration parameters in response to current changes or in anticipation of future changes in order to maintain the system’s performance at an optimum level. However, these manual performance tuning interventions are prone to error and lack of standards due to varying levels of expertise and over-reliance on inaccurate predictions of future states of a business information system. The purpose of this research was therefore to investigate on how to design an algorithm that proactively reconfigures bottleneck parameters without over-relying on an accurate model of a stochastic environment. This was done using a comparative experimental research design that involved quantitative data collection through simulations of different algorithm variants. The research built on the theoretical concepts of control theory and decision theory, coupled with the estimation of unknown quantities using principles of simulation-based inferential statistics. Subsequently, Monte Carlo Tree Search, with a variant of the selection stage, was used as the foundation of the designed algorithm. The selection stage was variated by applying a “lean Last Good Reply with Forgetting” (lean-LGRF) strategy and first tested in the context of a strategy board game, Reversi. The lean-LGRF selection strategy applied over 1,000 playouts against the baseline Upper Confidence Bound applied to Trees (UCT) selection strategy recorded the highest number of wins. On the other hand, the Progressive Bias selection strategy had a win-rate of 45.8% against the UCT selection strategy. Lastly, as expected, the UCT selection strategy had a win-rate of 49.7% (an almost 50-50 win-rate) against itself. The results were then subjected to a Chi-square (χ2) test which provided evidence that the variation technique applied in the selection stage of the algorithm had a significantly positive impact on its performance. The superior selection variant was then applied in the context of a distributed database system. This also provided compelling results that indicate that applying the algorithm in a distributed database system resulted in a response-time latency that was 27% lower than the average response-time latency and a transaction throughput that was 17% higher than the average transaction throughput

    Deep Learning Data and Indexes in a Database

    Get PDF
    A database is used to store and retrieve data, which is a critical component for any software application. Databases requires configuration for efficiency, however, there are tens of configuration parameters. It is a challenging task to manually configure a database. Furthermore, a database must be reconfigured on a regular basis to keep up with newer data and workload. The goal of this thesis is to use the query workload history to autonomously configure the database and improve its performance. We achieve proposed work in four stages: (i) we develop an index recommender using deep reinforcement learning for a standalone database. We evaluated the effectiveness of our algorithm by comparing with several state-of-the-art approaches, (ii) we build a real-time index recommender that can, in real-time, dynamically create and remove indexes for better performance in response to sudden changes in the query workload, (iii) we develop a database advisor. Our advisor framework will be able to learn latent patterns from a workload. It is able to enhance a query, recommend interesting queries, and summarize a workload, (iv) we developed LinkSocial, a fast, scalable, and accurate framework to gain deeper insights from heterogeneous data

    Prefetching techniques for client server object-oriented database systems

    Get PDF
    The performance of many object-oriented database applications suffers from the page fetch latency which is determined by the expense of disk access. In this work we suggest several prefetching techniques to avoid, or at least to reduce, page fetch latency. In practice no prediction technique is perfect and no prefetching technique can entirely eliminate delay due to page fetch latency. Therefore we are interested in the trade-off between the level of accuracy required for obtaining good results in terms of elapsed time reduction and the processing overhead needed to achieve this level of accuracy. If prefetching accuracy is high then the total elapsed time of an application can be reduced significantly otherwise if the prefetching accuracy is low, many incorrect pages are prefetched and the extra load on the client, network, server and disks decreases the whole system performance. Access pattern of object-oriented databases are often complex and usually hard to predict accurately. The ..

    Cognitive GPR for subsurface sensing based on edge computing and deep reinforcement learning

    Get PDF
    Ground penetrating radars (GPRs) have been extensively used in many industrial applications, such as coal mining, structural health monitoring, subsurface utilities detection and localization, and autonomous driving. Most of the existing GPR systems are human-operated due to the need for experience in operation configurations based on the interpretation of collected GPR data. To achieve the best subsurface sensing performance, it is desired to design an autonomous GPR system that can operate adaptively under varying sensing conditions. In this research, first, a generic architecture for cognitive GPRs based on edge computing is studied. The operation of cognitive GPRs under this architecture is formulated as a sequential decision process. Then a cognitive GPR based on 2D B-Scan image analysis and deep Q-learning network (DQN) is investigated. A novel entropy-based reward function is designed for the DQN model by using the results of subsurface object detection (via the region of interest identification) and recognition (via classification). Furthermore, to acquire a global view of subsurface objects with complex shape configurations, 2D B-Scan image analysis is extended to 3D GPR data analysis termed “Scan Cloud.” A scan cloud-enabled cognitive GPR is studied based on an advanced deep reinforcement learning method called deep deterministic policy gradient (DDPG) with a new reward function derived from 3D GPR data. The proposed methods are evaluated using GPR modeling and simulation software called GprMax. Simulation results show that our proposed cognitive GPRs outperform other GPR systems in terms of detection accuracy, operating time, and object reconstruction
    corecore