Search CORE

341 research outputs found

Reading list of selected PASM-related publications

Author: Siegel Howard Jay
Young Dalton
Publication venue: 'Springer Publishing Company'
Publication date: 01/01/2010
Field of study

Prepared for a chapter to be published in the forthcoming Encyclopedia of Parallel Computing by Springer Publishing Company. The Encyclopedia will contain a broad coverage of the field and will include entries on machine organization, programming, algorithms, and applications. The broad coverage, together with extensive pointers to the literature for in-depth study, is expected to make the Encyclopedia a useful reference tool in parallel computing

Mountain Scholar (Digital Collections of Colorado and Wyoming)

Topology Agnostic Methods for Routing, Reconfiguration and Virtualization of Interconnection Networks

Author: Solheim Åshild Grønstad
Publication venue
Publication date: 01/01/2012
Field of study

Modern computing systems, such as supercomputers, data centers and multicore chips, generally require efficient communication between their different system units; tolerance towards component faults; flexibility to expand or merge; and a high utilization of their resources. Interconnection networks are used in a variety of such computing systems in order to enable communication between their diverse system units. Investigation and proposal of new or improved solutions to topology agnostic routing and reconfiguration of interconnection networks are main objectives of this thesis. In addition, topology agnostic routing and reconfiguration algorithms are utilized in the development of new and flexible approaches to processor allocation. The thesis aims to present versatile solutions that can be used for the interconnection networks of a number of different computing systems. No particular routing algorithm was specified for an interconnection network technology which is now incorporated in Dolphin Express. The thesis states a set of criteria for a suitable routing algorithm, evaluates a number of existing routing algorithms, and recommend that one of the algorithms – which fulfils all of the criteria – is used. Further investigations demonstrate how this routing algorithm inherently supports fault-tolerance, and how it can be optimized for some network topologies. These considerations are also relevant for the InfiniBand interconnection network technology. Reconfiguration of interconnection networks (change of routing function) is a deadlock prone process. Some existing reconfiguration strategies include deadlock avoidance mechanisms that significantly reduce the network service offered to running applications. The thesis expands the area of application for one of the most versatile and efficient reconfiguration algorithms available in the literature, and proposes an optimization of this algorithm that improves the network service offered to running applications. Moreover, a new reconfiguration algorithm is presented that supports a replacement of the routing function without causing performance penalties. Processor allocation strategies that guarantee traffic-containment commonly pose strict requirements on the shape of partitions, and thus achieve only a limited utilization of a system’s computing resources. The thesis introduces two new approaches that are more flexible. Both approaches utilize the properties of a topology agnostic routing algorithm in order to enforce traffic-containment within arbitrarily shaped partitions. Consequently, a high resource utilization as well as isolation of traffic between different partitions is achieved

NORA - Norwegian Open Research Archives

High-speed, economical design implementation of transit network router

Author: Hara Kazuhiro
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/1995
Field of study

Thesis (M.S.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1995.Includes bibliographical references (p. 88-90).by Kazuhiro Hara.M.S

DSpace@MIT

Efficient mechanisms to provide fault tolerance in interconnection networks for pc clusters

Author: Montañana Aliaga José Miguel
Publication venue: 'Universitat Politecnica de Valencia'
Publication date: 21/07/2008
Field of study

Actualmente, los clusters de PC son un alternativa rentable a los computadores paralelos. En estos sistemas, miles de componentes (procesadores y/o discos duros) se conectan a través de redes de interconexión de altas prestaciones. Entre las tecnologías de red actualmente disponibles para construir clusters, InfiniBand (IBA) ha emergido como un nuevo estándar de interconexión para clusters. De hecho, ha sido adoptado por muchos de los sistemas más potentes construidos actualmente (lista top500). A medida que el número de nodos aumenta en estos sistemas, la red de interconexión también crece. Junto con el aumento del número de componentes la probabilidad de averías aumenta dramáticamente, y así, la tolerancia a fallos en el sistema en general, y de la red de interconexión en particular, se convierte en una necesidad. Desafortunadamente, la mayor parte de las estrategias de encaminamiento tolerantes a fallos propuestas para los computadores masivamente paralelos no pueden ser aplicadas porque el encaminamiento y las transiciones de canal virtual son deterministas en IBA, lo que impide que los paquetes eviten los fallos. Por lo tanto, son necesarias nuevas estrategias para tolerar fallos. Por ello, esta tesis se centra en proporcionar los niveles adecuados de tolerancia a fallos a los clusters de PC, y en particular a las redes IBA. En esta tesis proponemos y evaluamos varios mecanismos adecuados para las redes de interconexión para clusters. El primer mecanismo para proporcionar tolerancia a fallos en IBA (al que nos referimos como encaminamiento tolerante a fallos basado en transiciones; TFTR) consiste en usar varias rutas disjuntas entre cada par de nodos origen-destino y seleccionar la ruta apropiada en el nodo fuente usando el mecanismo APM proporcionado por IBA. Consiste en migrar las rutas afectadas por el fallo a las rutas alternativas sin fallos. Sin embargo, con este fin, es necesario un algoritmo eficiente de encaminamiento capaz de proporcionar suficientesMontañana Aliaga, JM. (2008). Efficient mechanisms to provide fault tolerance in interconnection networks for pc clusters [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/2603Palanci

Crossref

RiuNet

Resource-Aware Deployment, Configuration, and Adaptation for Fault-tolerant Distributed Real-time Embedded Systems

Author: Balasubramanian Jaiganesh
Publication venue: VANDERBILT
Publication date
Field of study

Multi-Modular Integral Pressurized Water Reactor Control and Operational Reconfiguration for a Flow Control Loop

Author: Perillo Sergio Ricardo Pereira
Publication venue: TRACE: Tennessee Research and Creative Exchange
Publication date: 01/12/2010
Field of study

This dissertation focused on the IRIS design since this will likely be one of the designs of choice for future deployment in the U.S and developing countries. With a net 335 MWe output IRIS novel design falls in the “medium” size category and it is a potential candidate for the so called modular reactors, which may be appropriate for base load electricity generation, especially in regions with smaller electricity grids, but especially well suited for more specialized non-electrical energy applications such as district heating and process steam for desalination. The first objective of this dissertation is to evaluate and quantify the performance of a Nuclear Power Plant (NPP) comprised of two IRIS reactor modules operating simultaneously with a common steam header, which in turn is connected to a single turbine, resulting in a steam-mixing control problem with respect to “load-following” scenarios, such as varying load during the day or reduced consumption during the weekend. To solve this problem a single-module IRIS SIMULINK model previously developed by another researcher is modified to include a second module and was used to quantify the responses from both modules. In order to develop research related to instrumentation and control, and equipment and sensor monitoring, the second objective is to build a two-tank multivariate loop in the Nuclear Engineering Department at the University of Tennessee. This loop provides the framework necessary to investigate and test control strategies and fault detection in sensors, equipment and actuators. The third objective is to experimentally develop and demonstrate a fault-tolerant control strategy using this loop. Using six correlated variables in a single-tank configuration, five inferential models and one Auto-Associative Kernel Regression (AAKR) model were developed to detect faults in process sensors. Once detected the faulty measurements were successfully substituted with prediction values, which would provide the necessary flexibility and time to find the source of discrepancy and resolve it, such as in an operating power plant. Finally, using the same empirical models, an actuator failure was simulated and once detected the control was automatically transferred and reconfigured from one tank to another, providing survivability to the system

University of Tennessee, Knoxville: Trace

Dependable Embedded Systems

Author: Dutt Nikil
Henkel Jörg
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

This Open Access book introduces readers to many new techniques for enhancing and optimizing reliability in embedded systems, which have emerged particularly within the last five years. This book introduces the most prominent reliability concerns from today’s points of view and roughly recapitulates the progress in the community so far. Unlike other books that focus on a single abstraction level such circuit level or system level alone, the focus of this book is to deal with the different reliability challenges across different levels starting from the physical level all the way to the system level (cross-layer approaches). The book aims at demonstrating how new hardware/software co-design solution can be proposed to ef-fectively mitigate reliability degradation such as transistor aging, processor variation, temperature effects, soft errors, etc. Provides readers with latest insights into novel, cross-layer methods and models with respect to dependability of embedded systems; Describes cross-layer approaches that can leverage reliability through techniques that are pro-actively designed with respect to techniques at other layers; Explains run-time adaptation and concepts/means of self-organization, in order to achieve error resiliency in complex, future many core systems

OAPEN Library

Cloud computing: survey on energy efficiency

Author: Anghel I.
Apparao P.
Ariel Oleksiak
Armand F.
Athanasios V. Vasilakos
Borovyi A.
Cavdar D.
de Langen P.
Fakhar Faiza
Feeney L. M.
Ge Chang
Gupta M.
Hankendi C.
Heller Brandon
Holger Claussen
Ivona Brandic
Jean-Marc Pierson
Jin Yichao
Le K.
Ma Kun
Mastelic Toni
Megalingam R. K.
Moisan Francois
Patterson Michael K.
Pianese F.
Raghavan Praveen
Snyder G. Jeffrey
Steinder M.
Stoess Jan
Sundararajan K. T.
Toni Mastelic
Zer Emre
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2015
Field of study

International audienceCloud computing is today’s most emphasized Information and Communications Technology (ICT) paradigm that is directly or indirectly used by almost every online user. However, such great significance comes with the support of a great infrastructure that includes large data centers comprising thousands of server units and other supporting equipment. Their share in power consumption generates between 1.1% and 1.5% of the total electricity use worldwide and is projected to rise even more. Such alarming numbers demand rethinking the energy efficiency of such infrastructures. However, before making any changes to infrastructure, an analysis of the current status is required. In this article, we perform a comprehensive analysis of an infrastructure supporting the cloud computing paradigm with regards to energy efficiency. First, we define a systematic approach for analyzing the energy efficiency of most important data center domains, including server and network equipment, as well as cloud management systems and appliances consisting of a software utilized by end users. Second, we utilize this approach for analyzing available scientific and industrial literature on state-of-the-art practices in data centers and their equipment. Finally, we extract existing challenges and highlight future research directions

Crossref

Scientific Publications of the University of Toulouse II Le Mirail

Open Archive Toulouse Archive Ouverte

Transistor-Level Defect-Tolerant Techniques for Reliable Design at the Nanoscale

Author: Khan Farhan
Publication venue
Publication date: 26/09/2009
Field of study

KFUPM ePrints