7 research outputs found

    Memory aware query scheduling in a database cluster

    Get PDF
    Query throughput is one of the primary optimization goals in interactive web-based information systems in order to achieve the performance necessary to serve large user communities. Queries in this application domain differ significantly from those in traditional database applications: they are of lower complexity and almost exclusively read-only. The architecture we propose here is specifically tailored to take advantage of the query characteristics. It is based on a large parallel shared-nothing database cluster where each node runs a separate server with a fully replicated copy of the database. A query is assigned and entirely executed on one single node avoiding network contention or synchronization effects. However, the actual key to enhanced throughput is a resource efficient scheduling of the arriving queries. We develop a simple and robust scheduling scheme that takes the currently memory resident data at each server into account and trades off memory re-use and execution time, reordering queries as necessary. Our experimental evaluation demonstrates the effectiveness when scaling the system beyond hundreds of nodes showing super-linear speedup

    Optimisation of partitioned temporal joins

    Get PDF

    A shared-disk parallel cluster file system

    Get PDF
    Dissertação apresentada para obtenção do Grau de Doutor em Informática Pela Universidade Nova de Lisboa, Faculdade de Ciências e TecnologiaToday, clusters are the de facto cost effective platform both for high performance computing (HPC) as well as IT environments. HPC and IT are quite different environments and differences include, among others, their choices on file systems and storage: HPC favours parallel file systems geared towards maximum I/O bandwidth, but which are not fully POSIX-compliant and were devised to run on top of (fault prone) partitioned storage; conversely, IT data centres favour both external disk arrays (to provide highly available storage) and POSIX compliant file systems, (either general purpose or shared-disk cluster file systems, CFSs). These specialised file systems do perform very well in their target environments provided that applications do not require some lateral features, e.g., no file locking on parallel file systems, and no high performance writes over cluster-wide shared files on CFSs. In brief, we can say that none of the above approaches solves the problem of providing high levels of reliability and performance to both worlds. Our pCFS proposal makes a contribution to change this situation: the rationale is to take advantage on the best of both – the reliability of cluster file systems and the high performance of parallel file systems. We don’t claim to provide the absolute best of each, but we aim at full POSIX compliance, a rich feature set, and levels of reliability and performance good enough for broad usage – e.g., traditional as well as HPC applications, support of clustered DBMS engines that may run over regular files, and video streaming. pCFS’ main ideas include: · Cooperative caching, a technique that has been used in file systems for distributed disks but, as far as we know, was never used either in SAN based cluster file systems or in parallel file systems. As a result, pCFS may use all infrastructures (LAN and SAN) to move data. · Fine-grain locking, whereby processes running across distinct nodes may define nonoverlapping byte-range regions in a file (instead of the whole file) and access them in parallel, reading and writing over those regions at the infrastructure’s full speed (provided that no major metadata changes are required). A prototype was built on top of GFS (a Red Hat shared disk CFS): GFS’ kernel code was slightly modified, and two kernel modules and a user-level daemon were added. In the prototype, fine grain locking is fully implemented and a cluster-wide coherent cache is maintained through data (page fragments) movement over the LAN. Our benchmarks for non-overlapping writers over a single file shared among processes running on different nodes show that pCFS’ bandwidth is 2 times greater than NFS’ while being comparable to that of the Parallel Virtual File System (PVFS), both requiring about 10 times more CPU. And pCFS’ bandwidth also surpasses GFS’ (600 times for small record sizes, e.g., 4 KB, decreasing down to 2 times for large record sizes, e.g., 4 MB), at about the same CPU usage.Lusitania, Companhia de Seguros S.A, Programa IBM Shared University Research (SUR

    Much Ado About Shared-Nothing

    No full text
    In a `shared-nothing' parallel computer, each processor has its own memory and disks and processors communicate by passing messages through an interconnect. Many academic researchers, and some vendors, assert that shared-nothingness is the `consensus' architecture for parallel DBMSs. This alleged consensus is used as a justification for simulation models, algorithms, research prototypes and even marketing campaigns. We argue that shared-nothingness is no longer the consensus hardware architecture and that hardware resource sharing is a poor basis for categorising parallel DBMS software architectures if one wishes to compare the performance characteristics of parallel DBMS products. 1 Introduction A parallel computer has a `shared-nothing' architecture if each processor has its own memory and its own disk sub-system; pro- Tel. + 44 131 553 3313, Fax + 44 131 555 1178, Email: [email protected] y King's Buildings, JCMB, Edinburgh EH9 3JZ, Scotland, Tel. + 44 131 6505133, Fax ..

    Much ado about shared-nothing

    No full text
    corecore