55 research outputs found

    A proactive fault tolerance framework for high performance computing (HPC) systems in the cloud

    Get PDF
    High Performance Computing (HPC) systems have been widely used by scientists and researchers in both industry and university laboratories to solve advanced computation problems. Most advanced computation problems are either data-intensive or computation-intensive. They may take hours, days or even weeks to complete execution. For example, some of the traditional HPC systems computations run on 100,000 processors for weeks. Consequently traditional HPC systems often require huge capital investments. As a result, scientists and researchers sometimes have to wait in long queues to access shared, expensive HPC systems. Cloud computing, on the other hand, offers new computing paradigms, capacity, and flexible solutions for both business and HPC applications. Some of the computation-intensive applications that are usually executed in traditional HPC systems can now be executed in the cloud. Cloud computing price model eliminates huge capital investments. However, even for cloud-based HPC systems, fault tolerance is still an issue of growing concern. The large number of virtual machines and electronic components, as well as software complexity and overall system reliability, availability and serviceability (RAS), are factors with which HPC systems in the cloud must contend. The reactive fault tolerance approach of checkpoint/restart, which is commonly used in HPC systems, does not scale well in the cloud due to resource sharing and distributed systems networks. Hence, the need for reliable fault tolerant HPC systems is even greater in a cloud environment. In this thesis we present a proactive fault tolerance approach to HPC systems in the cloud to reduce the wall-clock execution time, as well as dollar cost, in the presence of hardware failure. We have developed a generic fault tolerance algorithm for HPC systems in the cloud. We have further developed a cost model for executing computation-intensive applications on HPC systems in the cloud. Our experimental results obtained from a real cloud execution environment show that the wall-clock execution time and cost of running computation-intensive applications in the cloud can be considerably reduced compared to checkpoint and redundancy techniques used in traditional HPC systems

    Characterisation of single event effects and total ionising dose effects of an intel atom microprocessor

    Get PDF
    The rapid advancements of COTS microprocessors compared to radiation hardened microprocessors has attracted the interest of system designers within the aerospace sector. COTS microprocessors offer higher performance with lower energy requirements, both of which are desired characteristics for microprocessors used in spacecraft. COTS microprocessors, however, are much more susceptible to radiation damage therefore their SEE and TID responses needs to be evaluated before they can be incorporated into spacecraft. This thesis presents the process followed to evaluate said characteristics of a COTS Intel Atom E3815 microprocessor mounted on a DE3815TYBE single board PC. Evaluation of the SEE response was carried out at NRF iThemba Labs in Cape Town, South Africa where the device was irradiated by a proton beam at 55.58 MeV and with varying beam currents. The device showed a higher sensitivity to functional interrupts when running with the onboard cache on compared to when running with the cache off, as would be expected. The cross-sections, respectively, are: 4.5 10−10 2 and 2.8 10−10 2. TID testing on the other hand was carried out at the irradiation chamber of FruitFly Africa in Stellenbosch, South Africa. The test device was irradiated by gamma radiation from a Cobalt-60 source at a dose rate of 9.7kRad/h and to a total dose of 67.25kRad. Noticeable TID degradation, in the form of leakage currents, was observed once a total dose of about 20kRad was absorbed. The device then completely failed once a total dose of approximately 32kRad was absorbed. These results suggest that the E3815 microprocessor would not be suitable for long term missions that require higher TID survivability. The processor could however be considered for short term missions launched into polar or high incline orbits where the dose rate is relatively low, and the mission is capable of tolerating functional interrupts

    Multi-Tenant Cloud FPGA: A Survey on Security

    Full text link
    With the exponentially increasing demand for performance and scalability in cloud applications and systems, data center architectures evolved to integrate heterogeneous computing fabrics that leverage CPUs, GPUs, and FPGAs. FPGAs differ from traditional processing platforms such as CPUs and GPUs in that they are reconfigurable at run-time, providing increased and customized performance, flexibility, and acceleration. FPGAs can perform large-scale search optimization, acceleration, and signal processing tasks compared with power, latency, and processing speed. Many public cloud provider giants, including Amazon, Huawei, Microsoft, Alibaba, etc., have already started integrating FPGA-based cloud acceleration services. While FPGAs in cloud applications enable customized acceleration with low power consumption, it also incurs new security challenges that still need to be reviewed. Allowing cloud users to reconfigure the hardware design after deployment could open the backdoors for malicious attackers, potentially putting the cloud platform at risk. Considering security risks, public cloud providers still don't offer multi-tenant FPGA services. This paper analyzes the security concerns of multi-tenant cloud FPGAs, gives a thorough description of the security problems associated with them, and discusses upcoming future challenges in this field of study

    Resistive Switching in Silicon-rich Silicon Oxide

    Get PDF
    Over the recent decade, many different concepts of new emerging memories have been proposed. Examples of such include ferroelectric random access memories (FeRAMs), phase-change RAMs (PRAMs), resistive RAMs (RRAMs), magnetic RAMs (MRAMs), nano-crystal floating-gate flash memories, among others. The ultimate goal for any of these memories is to overcome the limitations of dynamic random access memories (DRAM) and flash memories. Non-volatile memories exploiting resistive switching – resistive RAM (RRAM) devices – offer the possibility of low programming energy per bit, rapid switching, and very high levels of integration – potentially in 3D. Resistive switching in a silicon-based material offers a compelling alternative to existing metal oxide-based devices, both in terms of ease of fabrication, but also in enhanced device performance. In this thesis I demonstrate a redox-based resistive switch exploiting the formation of conductive filaments in a bulk silicon-rich silicon oxide. My devices exhibit multi-level switching and analogue modulation of resistance as well as standard two-level switching. I demonstrate different operational modes (bipolar and unipolar switching modes) that make it possible to dynamically adjust device properties, in particular two highly desirable properties: non-linearity and self-rectification. Scanning tunnelling microscopy (STM), atomic force microscopy (AFM), and conductive atomic force microscopy (C-AFM) measurements provide a more detailed insight into both the location and the dimensions of the conductive filaments. I discuss aspects of conduction and switching mechanisms and we propose a physical model of resistive switching. I demonstrate room temperature quantisation of conductance in silicon oxide resistive switches, implying ballistic transport of electrons through a quantum constriction, associated with an individual silicon filament in the SiOx bulk. I develop a stochastic method to simulate microscopic formation and rupture of conductive filaments inside an oxide matrix. I use the model to discuss switching properties – endurance and switching uniformity

    Exploiting Hardware Abstraction for Parallel Programming Framework: Platform and Multitasking

    Get PDF
    With the help of the parallelism provided by the fine-grained architecture, hardware accelerators on Field Programmable Gate Arrays (FPGAs) can significantly improve the performance of many applications. However, designers are required to have excellent hardware programming skills and unique optimization techniques to explore the potential of FPGA resources fully. Intermediate frameworks above hardware circuits are proposed to improve either performance or productivity by leveraging parallel programming models beyond the multi-core era. In this work, we propose the PolyPC (Polymorphic Parallel Computing) framework, which targets enhancing productivity without losing performance. It helps designers develop parallelized applications and implement them on FPGAs. The PolyPC framework implements a custom hardware platform, on which programs written in an OpenCL-like programming model can launch. Additionally, the PolyPC framework extends vendor-provided tools to provide a complete development environment including intermediate software framework, and automatic system builders. Designers\u27 programs can be either synthesized as hardware processing elements (PEs) or compiled to executable files running on software PEs. Benefiting from nontrivial features of re-loadable PEs, and independent group-level schedulers, the multitasking is enabled for both software and hardware PEs to improve the efficiency of utilizing hardware resources. The PolyPC framework is evaluated regarding performance, area efficiency, and multitasking. The results show a maximum 66 times speedup over a dual-core ARM processor and 1043 times speedup over a high-performance MicroBlaze with 125 times of area efficiency. It delivers a significant improvement in response time to high-priority tasks with the priority-aware scheduling. Overheads of multitasking are evaluated to analyze trade-offs. With the help of the design flow, the OpenCL application programs are converted into executables through the front-end source-to-source transformation and back-end synthesis/compilation to run on PEs, and the framework is generated from users\u27 specifications

    FLEXIBLE LOW-COST HW/SW ARCHITECTURES FOR TEST, CALIBRATION AND CONDITIONING OF MEMS SENSOR SYSTEMS

    Get PDF
    During the last years smart sensors based on Micro-Electro-Mechanical systems (MEMS) are widely spreading over various fields as automotive, biomedical, optical and consumer, and nowadays they represent the outstanding state of the art. The reasons of their diffusion is related to the capability to measure physical and chemical information using miniaturized components. The developing of this kind of architectures, due to the heterogeneities of their components, requires a very complex design flow, due to the utilization of both mechanical parts typical of the MEMS sensor and electronic components for the interfacing and the conditioning. In these kind of systems testing activities gain a considerable importance, and they concern various phases of the life-cycle of a MEMS based system. Indeed, since the design phase of the sensor, the validation of the design by the extraction of characteristic parameters is important, because they are necessary to design the sensor interface circuit. Moreover, this kind of architecture requires techniques for the calibration and the evaluation of the whole system in addition to the traditional methods for the testing of the control circuitry. The first part of this research work addresses the testing optimization by the developing of different hardware/software architecture for the different testing stages of the developing flow of a MEMS based system. A flexible and low-cost platform for the characterization and the prototyping of MEMS sensors has been developed in order to provide an environment that allows also to support the design of the sensor interface. To reduce the reengineering time requested during the verification testing a universal client-server architecture has been designed to provide a unique framework to test different kind of devices, using different development environment and programming languages. Because the use of ATE during the engineering phase of the calibration algorithm is expensive in terms of ATE’s occupation time, since it requires the interruption of the production process, a flexible and easily adaptable low-cost hardware/software architecture for the calibration and the evaluation of the performance has been developed in order to allow the developing of the calibration algorithm in a user-friendly environment that permits also to realize a small and medium volume production. The second part of the research work deals with a topic that is becoming ever more important in the field of applications for MEMS sensors, and concerns the capability to combine information extracted from different typologies of sensors (typically accelerometers, gyroscopes and magnetometers) to obtain more complex information. In this context two different algorithm for the sensor fusion has been analyzed and developed: the first one is a fully software algorithm that has been used as a means to estimate how much the errors in MEMS sensor data affect the estimation of the parameter computed using a sensor fusion algorithm; the second one, instead, is a sensor fusion algorithm based on a simplified Kalman filter. Starting from this algorithm, a bit-true model in Mathworks Simulink(TM) has been created as a system study for the implementation of the algorithm on chip

    DEHYDROGENASES

    Get PDF

    Scalability of RAID systems

    Get PDF
    RAID systems (Redundant Arrays of Inexpensive Disks) have dominated backend storage systems for more than two decades and have grown continuously in size and complexity. Currently they face unprecedented challenges from data intensive applications such as image processing, transaction processing and data warehousing. As the size of RAID systems increases, designers are faced with both performance and reliability challenges. These challenges include limited back-end network bandwidth, physical interconnect failures, correlated disk failures and long disk reconstruction time. This thesis studies the scalability of RAID systems in terms of both performance and reliability through simulation, using a discrete event driven simulator for RAID systems (SIMRAID) developed as part of this project. SIMRAID incorporates two benchmark workload generators, based on the SPC-1 and Iometer benchmark specifications. Each component of SIMRAID is highly parameterised, enabling it to explore a large design space. To improve the simulation speed, SIMRAID develops a set of abstraction techniques to extract the behaviour of the interconnection protocol without losing accuracy. Finally, to meet the technology trend toward heterogeneous storage architectures, SIMRAID develops a framework that allows easy modelling of different types of device and interconnection technique. Simulation experiments were first carried out on performance aspects of scalability. They were designed to answer two questions: (1) given a number of disks, which factors affect back-end network bandwidth requirements; (2) given an interconnection network, how many disks can be connected to the system. The results show that the bandwidth requirement per disk is primarily determined by workload features and stripe unit size (a smaller stripe unit size has better scalability than a larger one), with cache size and RAID algorithm having very little effect on this value. The maximum number of disks is limited, as would be expected, by the back-end network bandwidth. Studies of reliability have led to three proposals to improve the reliability and scalability of RAID systems. Firstly, a novel data layout called PCDSDF is proposed. PCDSDF combines the advantages of orthogonal data layouts and parity declustering data layouts, so that it can not only survivemultiple disk failures caused by physical interconnect failures or correlated disk failures, but also has a good degraded and rebuild performance. The generating process of PCDSDF is deterministic and time-efficient. The number of stripes per rotation (namely the number of stripes to achieve rebuild workload balance) is small. Analysis shows that the PCDSDF data layout can significantly improve the system reliability. Simulations performed on SIMRAID confirm the good performance of PCDSDF, which is comparable to other parity declustering data layouts, such as RELPR. Secondly, a system architecture and rebuilding mechanism have been designed, aimed at fast disk reconstruction. This architecture is based on parity declustering data layouts and a disk-oriented reconstruction algorithm. It uses stripe groups instead of stripes as the basic distribution unit so that it can make use of the sequential nature of the rebuilding workload. The design space of system factors such as parity declustering ratio, chunk size, private buffer size of surviving disks and free buffer size are explored to provide guidelines for storage system design. Thirdly, an efficient distributed hot spare allocation and assignment algorithm for general parity declustering data layouts has been developed. This algorithm avoids conflict problems in the process of assigning distributed spare space for the units on the failed disk. Simulation results show that it effectively solves the write bottleneck problem and, at the same time, there is only a small increase in the average response time to user requests

    Caractérisation et conception d' architectures basées sur des mémoires à changement de phase

    Get PDF
    Semiconductor memory has always been an indispensable component of modern electronic systems. The increasing demand for highly scaled memory devices has led to the development of reliable non-volatile memories that are used in computing systems for permanent data storage and are capable of achieving high data rates, with the same or lower power dissipation levels as those of current advanced memory solutions.Among the emerging non-volatile memory technologies, Phase Change Memory (PCM) is the most promising candidate to replace conventional Flash memory technology. PCM offers a wide variety of features, such as fast read and write access, excellent scalability potential, baseline CMOS compatibility and exceptional high-temperature data retention and endurance performances, and can therefore pave the way for applications not only in memory devices, but also in energy demanding, high-performance computer systems. However, some reliability issues still need to be addressed in order for PCM to establish itself as a competitive Flash memory replacement.This work focuses on the study of embedded Phase Change Memory in order to optimize device performance and propose solutions to overcome the key bottlenecks of the technology, targeting high-temperature applications. In order to enhance the reliability of the technology, the stoichiometry of the phase change material was appropriately engineered and dopants were added, resulting in an optimized thermal stability of the device. A decrease in the programming speed of the memory technology was also reported, along with a residual resistivity drift of the low resistance state towards higher resistance values over time.A novel programming technique was introduced, thanks to which the programming speed of the devices was improved and, at the same time, the resistance drift phenomenon could be successfully addressed. Moreover, an algorithm for programming PCM devices to multiple bits per cell using a single-pulse procedure was also presented. A pulse generator dedicated to provide the desired voltage pulses at its output was designed and experimentally tested, fitting the programming demands of a wide variety of materials under study and enabling accurate programming targeting the performance optimization of the technology.Les mĂ©moires Ă  base de semi-conducteur sont indispensables pour les dispositifs Ă©lectroniques actuels. La demande croissante pour des dispositifs mĂ©moires fortement miniaturisĂ©es a entraĂźnĂ© le dĂ©veloppement de mĂ©moires non volatiles fiables qui sont utilisĂ©es dans des systĂšmes informatiques pour le stockage de donnĂ©es et qui sont capables d'atteindre des dĂ©bits de donnĂ©es Ă©levĂ©s, avec des niveaux de dissipation d'Ă©nergie Ă©quivalents voire moindres que ceux des technologies mĂ©moires actuelles.Parmi les technologies de mĂ©moires non-volatiles Ă©mergentes, les mĂ©moires Ă  changement de phase (PCM) sont le candidat le plus prometteur pour remplacer la technologie de mĂ©moire Flash conventionnelle. Les PCM offrent une grande variĂ©tĂ© de fonctions, comme une lecture et une Ă©criture rapide, un excellent potentiel de miniaturisation, une compatibilitĂ© CMOS et des performances Ă©levĂ©es de rĂ©tention de donnĂ©es Ă  haute tempĂ©rature et d'endurance, et peuvent donc ouvrir la voie Ă  des applications non seulement pour les dispositifs mĂ©moires, mais Ă©galement pour les systĂšmes informatiques Ă  hautes performances. Cependant, certains problĂšmes de fiabilitĂ© doivent encore ĂȘtre rĂ©solus pour que les PCM se positionnent comme un remplacement concurrentiel de la mĂ©moire Flash.Ce travail se concentre sur l'Ă©tude de mĂ©moires Ă  changement de phase intĂ©grĂ©es afin d'optimiser leurs performances et de proposer des solutions pour surmonter les principaux points critiques de la technologie, ciblant des applications Ă  hautes tempĂ©ratures. Afin d'amĂ©liorer la fiabilitĂ© de la technologie, la stƓchiomĂ©trie du matĂ©riau Ă  changement de phase a Ă©tĂ© conçue de façon appropriĂ©e et des dopants ont Ă©tĂ© ajoutĂ©s, optimisant ainsi la stabilitĂ© thermique. Une diminution de la vitesse de programmation est Ă©galement rapportĂ©e, ainsi qu'un drift rĂ©siduel de la rĂ©sistance de l'Ă©tat de faiblement rĂ©sistif vers des valeurs de rĂ©sistance plus Ă©levĂ©es au cours du temps.Une nouvelle technique de programmation est introduite, permettant d'amĂ©liorer la vitesse de programmation des dispositifs et, dans le mĂȘme temps, de rĂ©duire avec succĂšs le phĂ©nomĂšne de drift en rĂ©sistance. Par ailleurs, un algorithme de programmation des PCM multi-bits est prĂ©sentĂ©. Un gĂ©nĂ©rateur d'impulsions fournissant des impulsions avec la tension souhaitĂ©e en sortie a Ă©tĂ© conçu et testĂ© expĂ©rimentalement, rĂ©pondant aux demandes de programmation d'une grande variĂ©tĂ© de matĂ©riaux innovants et en permettant la programmation prĂ©cise et l’optimisation des performances des PCM
    • 

    corecore