Search CORE

46 research outputs found

New approaches to data access in large-scale distributed system

Author: Bergua Guerra Borja
Publication venue
Publication date: 01/01/2015
Field of study

Mención Internacional en el título de doctorA great number of scientific projects need supercomputing resources, such as, for example, those carried out in physics, astrophysics, chemistry, pharmacology, etc. Most of them generate, as well, a great amount of data; for example, a some minutes long experiment in a particle accelerator generates several terabytes of data. In the last years, high-performance computing environments have evolved towards large-scale distributed systems such as Grids, Clouds, and Volunteer Computing environments. Managing a great volume of data in these environments means an added huge problem since the data have to travel from one site to another through the internet. In this work a novel generic I/O architecture for large-scale distributed systems used for high-performance and high-throughput computing will be proposed. This solution is based on applying parallel I/O techniques to remote data access. Novel replication and data search schemes will also be proposed; schemes that, combined with the above techniques, will allow to improve the performance of those applications that execute in these environments. In addition, it will be proposed to develop simulation tools that allow to test these and other ideas without needing to use real platforms due to their technical and logistic limitations. An initial prototype of this solution has been evaluated and the results show a noteworthy improvement regarding to data access compared to existing solutions.Un gran número de proyectos científicos necesitan recursos de supercomputación como, por ejemplo, los llevados a cabo en física, astrofísica, química, farmacología, etc. Muchos de ellos generan, además, una gran cantidad de datos; por ejemplo, un experimento de unos minutos de duración en un acelerador de partículas genera varios terabytes de datos. Los entornos de computación de altas prestaciones han evolucionado en los últimos años hacia sistemas distribuidos a gran escala tales como Grids, Clouds y entornos de computación voluntaria. En estos entornos gestionar un gran volumen de datos supone un problema añadido de importantes dimensiones ya que los datos tienen que viajar de un sitio a otro a través de internet. En este trabajo se propondrá una nueva arquitectura de E/S genérica para sistemas distribuidos a gran escala usados para cómputo de altas prestaciones y de alta productividad. Esta solución se basa en la aplicación de técnicas de E/S paralela al acceso remoto a los datos. Así mismo, se estudiarán y propondrán nuevos esquemas de replicación y búsqueda de datos que, en combinación con las técnicas anteriores, permitan mejorar las prestaciones de aquellas aplicaciones que ejecuten en este tipo de entornos. También se propone desarrollar herramientas de simulación que permitan probar estas y otras ideas sin necesidad de recurrir a una plataforma real debido a las limitaciones técnicas y logísticas que ello supone. Se ha evaluado un prototipo inicial de esta solución y los resultados muestran una mejora significativa en el acceso a los datos sobre las soluciones existentes.Programa Oficial de Doctorado en Ciencia y Tecnología InformáticaPresidente: David Expósito Singh.- Secretario: María de los Santos Pérez Hernández.- Vocal: Juan Manuel Tirado Mart

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Universidad Carlos III de Madrid e-Archivo

Coping at the User-Level with Resource Limitations in the Cray Message Passing Toolkit MPI at Scale: How Not to Spend Your Summer Vacation

Author: Art Mirin
Barry F Smith
Forrest M Hoffman
Glenn E Hammond
Kalyan S Perumalla
Patrick H Worley
Richard T Mills
Publication venue
Publication date: 11/04/2020
Field of study

ABSTRACT: As the number of processor cores available in Cray XT series computers has rapidly grown, users have increasingly encountered instances where an MPI code that has previously worked for years unexpectedly fails at high core counts ("at scale") due to resource limitations being exceeded within the MPI implementation. Here, we examine several examples drawn from user experiences and discuss strategies for working around these difficulties at the user level

CiteSeerX

October 3, 2008, Ohio University Board of Trustees Meeting Minutes

Author: Ohio University Board of Trustees
Publication venue: OHIO Open Library
Publication date: 03/10/2008
Field of study

Meeting minutes document the activities of Ohio University\u27s Board of Trustees

OHIO Open Library (Ohio University)

A reliable and resource aware framework for data dissemination in wireless sensor networks

Author: Jolly Vasu
Publication venue: Digital Scholarship@UNLV
Publication date: 01/01/2005
Field of study

Distinctive from traditional wireless ad hoc networks, wireless sensor networks (WSN) comprise a large number of low-cost miniaturized nodes each acting autonomously and equipped with short-range wireless communication mechanism, limited memory, processing power, and a physical sensing capability. Since sensor networks are resource constrained in terms of power, bandwidth and computational capability, an optimal system design radically changes the performance of the sensor network. Here, a comprehensive information dissemination scheme for wireless sensor networks is performed. Two main research issues are considered: (1) a collaborative flow of information packet/s from the source to sink and (2) energy efficiency of the sensor nodes and the entire system. For the first issue, we designed and evaluated a reactive and on-demand routing paradigm for distributed sensing applications. We name this scheme as IDLF-Information Dissemination via Label ForwarDing IDLF incorporates point to point data transmission where the source initiates the routing scheme and disseminates the information toward the sink (destination) node. Prior to transmission of actual data packet/s, a data tunnel is formed followed by the source node issuing small label information to its neighbors locally. These labels are in turn disseminated in the network. By using small size labels, IDLF avoids generation of unnecessary network traffic and transmission of duplicate packets to nodes. To study the impact of node failures and to improve the reliability of the network, we developed another scheme which is an extension to IDLF. This new scheme, RM-IDLF - Reliable Multipath Information dissemination by Label Forwarding, employ an alternate disjoint path. This alternate path scheme (RM-IDLF) may have a higher path cost in terms of energy consumption, but is more reliable in terms of data packet delivery to sink than the single path scheme (IDLF). In the latter scheme, the protocol establishes multiple (alternate) disjoint path/s from source to destination with negligible control overhead to balance load due to heavy data traffic among intermediate nodes from source to the destination. Another point of interest in this framework is the study of trade-offs between the achieved routing reliability using multiple disjoint path routing and extra energy consumption due to the use of additional path/s. Also, the effect of the failed nodes on the network performance is evaluated within the sensor system; Performance of the label dissemination scheme is evaluated and compared with the classic flooding and SPIN. (Abstract shortened by UMI.)

University of Nevada, Las Vegas Repository

Development of new data partitioning and allocation algorithms for query optimization of distributed data warehouse systems

Author: Abdalla Hassan Ismail
Publication venue
Publication date: 01/06/2008
Field of study

Distributed databases and in particular distributed data warehousing are becoming an increasingly important technology for information integration and data analysis. Data Warehouse (DW) systems are used by decision makers for performance measurement and decision support. However, although data warehousing and on-line analytical processing (OLAP) are essential elements of decision support, the OLAP query response time is strongly affected by the volume of data need to be accessed from storage disks. Data partitioning is one of the physical design techniques that may be used to optimize query processing cost in DWs. It is a non redundant optimization technique because it does not replicate data, contrary to redundant techniques like materialized views and indexes. The warehouse partitioning problem is concerned with determining the set of dimension tables to be partitioned and using them to generate the fact table fragments. In this work an enhanced grouping algorithm that avoids the limitations of some existing vertical partitioning algorithms is proposed. Furthermore, a static partitioning algorithm that allows fragmentation at early stages of schema design is presented. The thesis also, investigates the performance of the data warehouse after implementing a combination of Genetic Algorithm (GA) and Simulated Annealing (SA) techniques to horizontally partition the data warehouse star schema. It, then presents the experimentation and implementation results of the proposed algorithm. This research presented different approaches to optimize data fragments allocation cost using a greedy mathematical model and a combination of simulated annealing and genetic algorithm to determine the site by site allocation leading to optimal solutions for fragments distribution. Throughout this thesis, the term fragmentation and partitioning will be used interchangeably

London Met Repository

Faculty Publications & Presentations, 2008-2009

Author: University of Arkansas Fayetteville
Publication venue: ScholarWorks@UARK
Publication date: 01/01/2004
Field of study

ScholarWorks@UARK

UARK (University of Arkansas )

Sandboxes for Grid Computing

Author: Andersen Rasmus
Publication venue
Publication date: 01/01/2009
Field of study

Copenhagen University Research Information System

Light-Weight Hierarchical Clustering Middleware for Public-Resource Computing

Author: Gilbert Austin
Publication venue: 'Oklahoma State University Library'
Publication date: 01/05/2007
Field of study

The goal of this work was to investigate ways to implement and improve a public-resource computing middleware. Specifically, to make hosting a public-resource computing project logistically simpler and to examine the affect of hierarchical clustering on bandwidth utilization at the central server. To this end, we present the architecture for our cross-platform, multithreaded public-resource computing middleware. Implementing and debugging the middleware proved far more challenging than initially anticipated. As hard as debugging multithreaded programs is, our experience has shown us that it can be leveraged to simplify system components. Our main contribution is the final system architecture.Computer Science Departmen

SHAREOK repository

An Enhanced Hardware Description Language Implementation for Improved Design-Space Exploration in High-Energy Physics Hardware Design

Author: Mücke M
Publication venue: Graz, Tech. U.
Publication date: 01/01/2007
Field of study

Detectors in High-Energy Physics (HEP) have increased tremendously in accuracy, speed and integration. Consequently HEP experiments are confronted with an immense amount of data to be read out, processed and stored. Originally low-level processing has been accomplished in hardware, while more elaborate algorithms have been executed on large computing farms. Field-Programmable Gate Arrays (FPGAs) meet HEP's need for ever higher real-time processing performance by providing programmable yet fast digital logic resources. With the fast move from HEP Digital Signal Processing (DSPing) applications into the domain of FPGAs, related design tools are crucial to realise the potential performance gains. This work reviews Hardware Description Languages (HDLs) in respect to the special needs present in the HEP digital hardware design process. It is especially concerned with the question, how features outside the scope of mainstream digital hardware design can be implemented efficiently into HDLs. It will argue that functional languages are especially suitable for implementation of domain-specific languages, including HDLs. Casestudies examining the implementation complexity of HEP-specific language extensions to the functional HDCaml HDL will prove the viability of the suggested approach

CERN Document Server

Distributed debugging for mobile networks

Author: Agha
Astudillo
Bainomugisha
Bates
Baude
Boix
Campbell
Cheung
Dao
Elshoff
Fonseca
Frumkin
Gait
Gottbrath
Hood
Kampenes
Khoo
Kiczales
Kola
Lamport
LeBlanc
Marceau
Matthijssen
Mcdowell
Meier
Miller
Miller
Mostinckx
Neophytou
Netzer
Olsson
Pancake
Pothier
Quigley
Ronsse
Sistare
Stanley
Thoai
Tribou
Van Cutsem
Wismüller
Wu
Zhang
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref