46 research outputs found

    New approaches to data access in large-scale distributed system

    Get PDF
    Mención Internacional en el título de doctorA great number of scientific projects need supercomputing resources, such as, for example, those carried out in physics, astrophysics, chemistry, pharmacology, etc. Most of them generate, as well, a great amount of data; for example, a some minutes long experiment in a particle accelerator generates several terabytes of data. In the last years, high-performance computing environments have evolved towards large-scale distributed systems such as Grids, Clouds, and Volunteer Computing environments. Managing a great volume of data in these environments means an added huge problem since the data have to travel from one site to another through the internet. In this work a novel generic I/O architecture for large-scale distributed systems used for high-performance and high-throughput computing will be proposed. This solution is based on applying parallel I/O techniques to remote data access. Novel replication and data search schemes will also be proposed; schemes that, combined with the above techniques, will allow to improve the performance of those applications that execute in these environments. In addition, it will be proposed to develop simulation tools that allow to test these and other ideas without needing to use real platforms due to their technical and logistic limitations. An initial prototype of this solution has been evaluated and the results show a noteworthy improvement regarding to data access compared to existing solutions.Un gran número de proyectos científicos necesitan recursos de supercomputación como, por ejemplo, los llevados a cabo en física, astrofísica, química, farmacología, etc. Muchos de ellos generan, además, una gran cantidad de datos; por ejemplo, un experimento de unos minutos de duración en un acelerador de partículas genera varios terabytes de datos. Los entornos de computación de altas prestaciones han evolucionado en los últimos años hacia sistemas distribuidos a gran escala tales como Grids, Clouds y entornos de computación voluntaria. En estos entornos gestionar un gran volumen de datos supone un problema añadido de importantes dimensiones ya que los datos tienen que viajar de un sitio a otro a través de internet. En este trabajo se propondrá una nueva arquitectura de E/S genérica para sistemas distribuidos a gran escala usados para cómputo de altas prestaciones y de alta productividad. Esta solución se basa en la aplicación de técnicas de E/S paralela al acceso remoto a los datos. Así mismo, se estudiarán y propondrán nuevos esquemas de replicación y búsqueda de datos que, en combinación con las técnicas anteriores, permitan mejorar las prestaciones de aquellas aplicaciones que ejecuten en este tipo de entornos. También se propone desarrollar herramientas de simulación que permitan probar estas y otras ideas sin necesidad de recurrir a una plataforma real debido a las limitaciones técnicas y logísticas que ello supone. Se ha evaluado un prototipo inicial de esta solución y los resultados muestran una mejora significativa en el acceso a los datos sobre las soluciones existentes.Programa Oficial de Doctorado en Ciencia y Tecnología InformáticaPresidente: David Expósito Singh.- Secretario: María de los Santos Pérez Hernández.- Vocal: Juan Manuel Tirado Mart

    Coping at the User-Level with Resource Limitations in the Cray Message Passing Toolkit MPI at Scale: How Not to Spend Your Summer Vacation

    Get PDF
    ABSTRACT: As the number of processor cores available in Cray XT series computers has rapidly grown, users have increasingly encountered instances where an MPI code that has previously worked for years unexpectedly fails at high core counts ("at scale") due to resource limitations being exceeded within the MPI implementation. Here, we examine several examples drawn from user experiences and discuss strategies for working around these difficulties at the user level

    October 3, 2008, Ohio University Board of Trustees Meeting Minutes

    Get PDF
    Meeting minutes document the activities of Ohio University\u27s Board of Trustees

    A reliable and resource aware framework for data dissemination in wireless sensor networks

    Full text link
    Distinctive from traditional wireless ad hoc networks, wireless sensor networks (WSN) comprise a large number of low-cost miniaturized nodes each acting autonomously and equipped with short-range wireless communication mechanism, limited memory, processing power, and a physical sensing capability. Since sensor networks are resource constrained in terms of power, bandwidth and computational capability, an optimal system design radically changes the performance of the sensor network. Here, a comprehensive information dissemination scheme for wireless sensor networks is performed. Two main research issues are considered: (1) a collaborative flow of information packet/s from the source to sink and (2) energy efficiency of the sensor nodes and the entire system. For the first issue, we designed and evaluated a reactive and on-demand routing paradigm for distributed sensing applications. We name this scheme as IDLF-Information Dissemination via Label ForwarDing IDLF incorporates point to point data transmission where the source initiates the routing scheme and disseminates the information toward the sink (destination) node. Prior to transmission of actual data packet/s, a data tunnel is formed followed by the source node issuing small label information to its neighbors locally. These labels are in turn disseminated in the network. By using small size labels, IDLF avoids generation of unnecessary network traffic and transmission of duplicate packets to nodes. To study the impact of node failures and to improve the reliability of the network, we developed another scheme which is an extension to IDLF. This new scheme, RM-IDLF - Reliable Multipath Information dissemination by Label Forwarding, employ an alternate disjoint path. This alternate path scheme (RM-IDLF) may have a higher path cost in terms of energy consumption, but is more reliable in terms of data packet delivery to sink than the single path scheme (IDLF). In the latter scheme, the protocol establishes multiple (alternate) disjoint path/s from source to destination with negligible control overhead to balance load due to heavy data traffic among intermediate nodes from source to the destination. Another point of interest in this framework is the study of trade-offs between the achieved routing reliability using multiple disjoint path routing and extra energy consumption due to the use of additional path/s. Also, the effect of the failed nodes on the network performance is evaluated within the sensor system; Performance of the label dissemination scheme is evaluated and compared with the classic flooding and SPIN. (Abstract shortened by UMI.)

    Development of new data partitioning and allocation algorithms for query optimization of distributed data warehouse systems

    Get PDF
    Distributed databases and in particular distributed data warehousing are becoming an increasingly important technology for information integration and data analysis. Data Warehouse (DW) systems are used by decision makers for performance measurement and decision support. However, although data warehousing and on-line analytical processing (OLAP) are essential elements of decision support, the OLAP query response time is strongly affected by the volume of data need to be accessed from storage disks. Data partitioning is one of the physical design techniques that may be used to optimize query processing cost in DWs. It is a non redundant optimization technique because it does not replicate data, contrary to redundant techniques like materialized views and indexes. The warehouse partitioning problem is concerned with determining the set of dimension tables to be partitioned and using them to generate the fact table fragments. In this work an enhanced grouping algorithm that avoids the limitations of some existing vertical partitioning algorithms is proposed. Furthermore, a static partitioning algorithm that allows fragmentation at early stages of schema design is presented. The thesis also, investigates the performance of the data warehouse after implementing a combination of Genetic Algorithm (GA) and Simulated Annealing (SA) techniques to horizontally partition the data warehouse star schema. It, then presents the experimentation and implementation results of the proposed algorithm. This research presented different approaches to optimize data fragments allocation cost using a greedy mathematical model and a combination of simulated annealing and genetic algorithm to determine the site by site allocation leading to optimal solutions for fragments distribution. Throughout this thesis, the term fragmentation and partitioning will be used interchangeably

    Light-Weight Hierarchical Clustering Middleware for Public-Resource Computing

    Get PDF
    The goal of this work was to investigate ways to implement and improve a public-resource computing middleware. Specifically, to make hosting a public-resource computing project logistically simpler and to examine the affect of hierarchical clustering on bandwidth utilization at the central server. To this end, we present the architecture for our cross-platform, multithreaded public-resource computing middleware. Implementing and debugging the middleware proved far more challenging than initially anticipated. As hard as debugging multithreaded programs is, our experience has shown us that it can be leveraged to simplify system components. Our main contribution is the final system architecture.Computer Science Departmen

    An Enhanced Hardware Description Language Implementation for Improved Design-Space Exploration in High-Energy Physics Hardware Design

    Get PDF
    Detectors in High-Energy Physics (HEP) have increased tremendously in accuracy, speed and integration. Consequently HEP experiments are confronted with an immense amount of data to be read out, processed and stored. Originally low-level processing has been accomplished in hardware, while more elaborate algorithms have been executed on large computing farms. Field-Programmable Gate Arrays (FPGAs) meet HEP's need for ever higher real-time processing performance by providing programmable yet fast digital logic resources. With the fast move from HEP Digital Signal Processing (DSPing) applications into the domain of FPGAs, related design tools are crucial to realise the potential performance gains. This work reviews Hardware Description Languages (HDLs) in respect to the special needs present in the HEP digital hardware design process. It is especially concerned with the question, how features outside the scope of mainstream digital hardware design can be implemented efficiently into HDLs. It will argue that functional languages are especially suitable for implementation of domain-specific languages, including HDLs. Casestudies examining the implementation complexity of HEP-specific language extensions to the functional HDCaml HDL will prove the viability of the suggested approach
    corecore