6,946 research outputs found

    A Survey on Array Storage, Query Languages, and Systems

    Full text link
    Since scientific investigation is one of the most important providers of massive amounts of ordered data, there is a renewed interest in array data processing in the context of Big Data. To the best of our knowledge, a unified resource that summarizes and analyzes array processing research over its long existence is currently missing. In this survey, we provide a guide for past, present, and future research in array processing. The survey is organized along three main topics. Array storage discusses all the aspects related to array partitioning into chunks. The identification of a reduced set of array operators to form the foundation for an array query language is analyzed across multiple such proposals. Lastly, we survey real systems for array processing. The result is a thorough survey on array data storage and processing that should be consulted by anyone interested in this research topic, independent of experience level. The survey is not complete though. We greatly appreciate pointers towards any work we might have forgotten to mention.Comment: 44 page

    Scaling In-Memory databases on multicores

    Get PDF
    Current computer systems have evolved from featuring only a single processing unit and limited RAM, in the order of kilobytes or few megabytes, to include several multicore processors, o↵ering in the order of several tens of concurrent execution contexts, and have main memory in the order of several tens to hundreds of gigabytes. This allows to keep all data of many applications in the main memory, leading to the development of inmemory databases. Compared to disk-backed databases, in-memory databases (IMDBs) are expected to provide better performance by incurring in less I/O overhead. In this dissertation, we present a scalability study of two general purpose IMDBs on multicore systems. The results show that current general purpose IMDBs do not scale on multicores, due to contention among threads running concurrent transactions. In this work, we explore di↵erent direction to overcome the scalability issues of IMDBs in multicores, while enforcing strong isolation semantics. First, we present a solution that requires no modification to either database systems or to the applications, called MacroDB. MacroDB replicates the database among several engines, using a master-slave replication scheme, where update transactions execute on the master, while read-only transactions execute on slaves. This reduces contention, allowing MacroDB to o↵er scalable performance under read-only workloads, while updateintensive workloads su↵er from performance loss, when compared to the standalone engine. Second, we delve into the database engine and identify the concurrency control mechanism used by the storage sub-component as a scalability bottleneck. We then propose a new locking scheme that allows the removal of such mechanisms from the storage sub-component. This modification o↵ers performance improvement under all workloads, when compared to the standalone engine, while scalability is limited to read-only workloads. Next we addressed the scalability limitations for update-intensive workloads, and propose the reduction of locking granularity from the table level to the attribute level. This further improved performance for intensive and moderate update workloads, at a slight cost for read-only workloads. Scalability is limited to intensive-read and read-only workloads. Finally, we investigate the impact applications have on the performance of database systems, by studying how operation order inside transactions influences the database performance. We then propose a Read before Write (RbW) interaction pattern, under which transaction perform all read operations before executing write operations. The RbW pattern allowed TPC-C to achieve scalable performance on our modified engine for all workloads. Additionally, the RbW pattern allowed our modified engine to achieve scalable performance on multicores, almost up to the total number of cores, while enforcing strong isolation

    Cross-Layer Cloud Performance Monitoring, Analysis and Recovery

    Get PDF
    The basic idea of Cloud computing is to offer software and hardware resources as services. These services are provided at different layers: Software (Software as a Service: SaaS), Platform (Platform as a Service: PaaS) and Infrastructure (Infrastructure as a Service: IaaS). In such a complex environment, performance issues are quite likely and rather the norm than the exception. Consequently, performance-related problems may frequently occur at all layers. Thus, it is necessary to monitor all Cloud layers and analyze their performance parameters to detect and rectify related problems. This thesis presents a novel cross-layer reactive performance monitoring approach for Cloud computing environments, based on the methodology of Complex Event Processing (CEP). The proposed approach is called CEP4Cloud. It analyzes monitored events to detect performance-related problems and performs actions to fix them. The proposal is based on the use of (1) a novel multi-layer monitoring approach, (2) a new cross-layer analysis approach and (3) a novel recovery approach. The proposed monitoring approach operates at all Cloud layers, while collecting related parameters. It makes use of existing monitoring tools and a new monitoring approach for Cloud services at the SaaS layer. The proposed SaaS monitoring approach is called AOP4CSM. It is based on aspect-oriented programming and monitors quality-of-service parameters of the SaaS layer in a non-invasive manner. AOP4CSM neither modifies the server implementation nor the client implementation. The defined cross-layer analysis approach is called D-CEP4CMA. It is based on the methodology of Complex Event Processing (CEP). Instead of having to manually specify continuous queries on monitored event streams, CEP queries are derived from analyzing the correlations between monitored metrics across multiple Cloud layers. The results of the correlation analysis allow us to reduce the number of monitored parameters and enable us to perform a root cause analysis to identify the causes of performance-related problems. The derived analysis rules are implemented as queries in a CEP engine. D-CEP4CMA is designed to dynamically switch between different centralized and distributed CEP architectures depending on the load/memory of the CEP machine and network traffic conditions in the observed Cloud environment. The proposed recovery approach is based on a novel action manager framework. It applies recovery actions at all Cloud layers. The novel action manager framework assigns a set of repair actions to each performance-related problem and checks the success of the applied action. The results of several experiments illustrate the merits of the reactive performance monitoring approach and its main components (i.e., monitoring, analysis and recovery). First, experimental results show the efficiency of AOP4CSM (very low overhead). Second, obtained results demonstrate the benefits of the analysis approach in terms of precision and recall compared to threshold-based methods. They also show the accuracy of the analysis approach in identifying the causes of performance-related problems. Furthermore, experiments illustrate the efficiency of D-CEP4CMA and its performance in terms of precision and recall compared to centralized and distributed CEP architectures. Moreover, experimental results indicate that the time needed to fix a performance-related problem is reasonably short. They also show that the CPU overhead of using CEP4Cloud is negligible. Finally, experimental results demonstrate the merits of CEP4Cloud in terms of speeding up the repair and reducing the number of triggered alarms compared to baseline methods

    Numerical aerodynamic simulation facility feasibility study, executive summary

    Get PDF
    There were three major issues examined in the feasibility study. First, the ability of the proposed system architecture to support the anticipated workload was evaluated. Second, the throughput of the computational engine (the flow model processor) was studied using real application programs. Third, the availability, reliability, and maintainability of the system were modeled. The evaluations were based on the baseline systems. The results show that the implementation of the Numerical Aerodynamic Simulation Facility, in the form considered, would indeed be a feasible project with an acceptable level of risk. The technology required (both hardware and software) either already exists or, in the case of a few parts, is expected to be announced this year
    • …
    corecore