Electronic computing systems have been evolving in the past decades with breath-taking speed and unrelenting acceleration. They have become omnipresent and irreplaceable in every aspect of human life including reliability-critical applications for healthcare and ambient assisted living, automotive and transportation, infrastructure, communication and security. Constant advances in manufacturing yield and field reliability have been important enabling factors for electronic devices pervading our lives. The exponential growth of functionality, performance and complexity was relying on the half-century lasting Moore's Law as the technology escalator for the industry. Both embedded and large-scale computing systems are now being combined into the Internet-of-Things and cyber-physical systems targeting at a new ultimate experience of the computing continuum that is already on the doorstep. However, today, this steady race is being disrupted and a new strategy is urgently needed to keep reliability along with continuous performance scaling in computing systems for life.
MARIA K. MICHAEL , SALVATORE PONTARELLI, AND OMER KHAN
Electronic computing systems have been evolving in the past decades with breath-taking speed and unrelenting acceleration. They have become omnipresent and irreplaceable in every aspect of human life including reliability-critical applications for healthcare and ambient assisted living, automotive and transportation, infrastructure, communication and security. Constant advances in manufacturing yield and field reliability have been important enabling factors for electronic devices pervading our lives. The exponential growth of functionality, performance and complexity was relying on the half-century lasting Moore's Law as the technology escalator for the industry. Both embedded and large-scale computing systems are now being combined into the Internet-of-Things and cyber-physical systems targeting at a new ultimate experience of the computing continuum that is already on the doorstep. However, today, this steady race is being disrupted and a new strategy is urgently needed to keep reliability along with continuous performance scaling in computing systems for life.
At the dawn of the information technology revolution, the reliability of vacuum tubes was very low, as was the probability that a system composed of thousands of tubes would work properly for extended periods of time. The introduction of the transistor, followed by the miniaturization and integration, led to the current amazing levels of functional density because manufacturing yield and reliability have kept the pace thus allowing today's pervasive and ubiquitous presence of electronic devices. While CMOS is expected to continue being the dominant semiconductor technology for the immediate future -due to reasons that are both technological and financial -alternative technological solutions to conventional CMOS-based devices are attracting significant attention from the researchers. To keep the current trend of density growth sustainable and to overcome cost and reliability limitations, a large variety of post-CMOS devices have been proposed, including carbon nanotubes (CNT), fieldeffect transistors (CNT-FETs), graphene field-effect transistors (GFETs), tunnel transistors, graphene nanoribbon tunnel field-effect transistors (GNR-TFET), quantum-dots, and single-electron devices (SET). Newer memory technologies such as resistive random-access memory (Re-RAM) have already become commercial, while memristors, Spin Transfer Torque Random Access Memory (STT-RAM) technologies are progressing at a rapid pace.
At the architectural level, a similar shift has already taken place. The shift from increasing core clock frequencies to exploiting parallelism and multicore chip architectures has been the main design driver across all application domains in the electronics and computing industry. The introduction of multicore chips has allowed the constant increase in delivered performance otherwise impossible to achieve. However, continuous performance scaling, which nowadays needs to be delivered at a given power envelope, is facing two major chip development challenges, namely, manufacturability and dependability. Manufacturability refers to issues arising during production, while dependability deals with the correctness of the operation in the field. Hence, besides new technological solutions to address the reliability of CMOSbased and alternative to CMOS devices at manufacturing time, there is an increasing need to consider the in-field dependability of current computing systems due to increasing transient, intermittent, and permanent faults caused by aging and wear-out of electronics. This is pointing towards dependability solutions at higher abstraction levels, such as system, micro-/architecture, as well as memory, communication infrastructure and software levels.
Such technological and architectural advances require manufacturability and dependability to keep the pace, to allow the adoption of the future systems, making these aspects key research issues for the new technologies and architectures, worldwide. This is the driving motivation of this special section of the IEEE Transactions on Emerging Topics in Computing, which after a highly competitive and large collective effort has resulted in the selection of 8 papers ranging in reliability and dependability topics in emerging technologies, new circuit-/logic-level solutions, multicore architectures and reconfigurable techniques, as well as approaches in the communication fabric.
The guest editors would like to thank the reviewers for the quality and timeliness of their reviews. The reviewers have helped to raise the quality of the final submissions of this issue through their quality feedback. We thank the authors for their patience, diligence and dedication at all stages of the review process. Finally, we are grateful to the Editor-in-Chief of the IEEE Transactions on Emerging Topics in Computing, Dr. Fabrizio Lombardi, for making this special section possible.
The article "A Parity-Preserving Reversible QCA Gate with Self-Checking Cascadable Resiliency" authored by Arman Roohi, Ramtin Zand, Shaahin Angizi, and Ronald F. Demara, proposes a novel gate for reversible computation using Quantum-dot Cellular Automata (QCA). The ParityPreserving property enables a wide set of fault tolerance features and allows designing extremely low energy computations. The paper shows both a rich set of simulation for standard combinatorial Boolean function and a formalized procedure to obtain fault detection and isolation properties. Testing Null Convention Logic (NCL) gates is the focus of the paper "Clock-Less DFT-Less Test Strategy for Null Convention Logic" by Nastaran Nemati, Paul Beckett, Mark C. Reed, and Karl M. Fant. They proposes a clock-less selftimed ATPG, that detects all of the faults on inputs of NCL gates and almost all the Gate Internal Feedback (GIF) of the NCL gates. Then, they study the effectiveness of IDDQ (quiescent current) test for detecting stuck-at faults on GIF of NCL gates. The proposed IDDQ test method combined with the self-timed ATPG has resulted in average 98 percent fault coverage for static and semi-static NCL circuits.
The paper "Generalized Numerical Entanglement For Reliable Linear, Sesquilinear and Bijective Operations On Integer Data Streams" by Mohammad Ashraful Anam, Ijeoma Jane Frances Anarado, and Yiannis Andreopoulos propose a new technique for the mitigation of fail-stop failures and/or silent data corruptions (SDCs) within linear, sesquilinear or bijective (LSB) operations on M integer data streams (M 3). The M input streams are linearly combined (entangled) and stored/transmitted in-place of the original streams. The output results can be extracted from a subset of M-K output streams by means of additions and arithmetic shifts, while K fail-stop failures can be corrected, or up to K SDCs can be detected. The proposed approach shows an overhead that is up to two orders of magnitude smaller than that of the equivalent checksum-based method. Martin Omana, Tusharasandeep Edara, and Cecilia Metra present a paper entitled "Low-Cost Strategy to Mitigate the Impact of Aging on Latches' Robustness", in which they face the problem of aging of robust latches. The paper proposes a strategy a strategy to reduce the impact of BTI on the SER of standard and low-cost robust latches. The proposed approach reduce by approximately the 50 percent the SER increase due to BTI during circuit lifetime with respect to original latches.
At the architectural level, the paper "Efficient Performance Evaluation of Multi-Core SIMT Processors with Hot Redundancy" authored by Seyyed Hasan Mazafari and Brett Meyer proposes estimation techniques applicable during the design space exploration phase to evaluate the performance of multicore processors utilizing hot spare components. The new estimation techniques offer significant reductions in simulation time while preserving high estimation accuracy, allowing designers to evaluate different redundancy levels for a system. The reliability of reconfigurable systems using SRAM-based FPGA technology deployed in harsh environments, such as space missions, is the topic of the paper "OLT(RE) 2 : An OnLine On-Demand Testing Approach for Permanent Radiation Effects in REconfigurable systems" by Luca Cassano, Dario Cozzi, Sebastian Korf, Jens Hagemeyer, Andrea Domenici, Cinzia Bernardeschi, Mario Porrmann, and Luca Sterpone. This solution proposes on-line detection of permanent faults in unprogrammed FPGA resources using a test circuit and adhoc designed place-and-route algorithms. Without interfering with the normal operation of the system, it is possible to identify faulty resources, which can consequently be excluded from consideration as possible functional units during system reconfiguration.
The last two papers in this special section examine fault tolerance in multi-core systems with Network-on-Chip (NoC) as the underlying communication infrastructure. Paper "A Hierarchical and Distributed Fault Tolerant Proposal for NoC-Based MPSoCs" authored by Fernando Moraes, Eduardo Wachter, Vinicius Fochi, Francisco Barreto, and Alexandre Amory proposes a system-level methodology to tolerate faults both in the communication fabric and in the processing elements of the NoC. The fault tolerant method is organized in a hierarchical and distributed manner, ensuring the correct execution of applications in the presence of multiple transient or permanent faults. The method is able to reestablish the communication between processing elements in less than 50 microseconds for a fault occurred in the network and in less than one millisecond for faults detected into the processing elements. Finally, the article "An Energy-Efficient NoC Router with Adaptive Fault-Tolerance Using Channel Slicing and On-Demand TMR" by Cheng Li, Mo Yang, and Paul Ampadu proposes an energy-efficient and fault-tolerant NoC router that leverage channel slicing. Channel slicing allows applying power gating to improve energy-efficiency and also enable the application of resource sharing to enhance fault-tolerance. Moreover, the use of on-demand TMR can further increase the fault-tolerance level. Experimental results reported in the paper show that the proposed router can tolerate 180 percent more gate faults than the state-of-the-art faulttolerant NoC router architecture with a 28-64 percent energy improvement, depending on the fault rate.
Sincerely 
