164 research outputs found

    Resource efficient redundancy using quorum-based cycle routing in optical networks

    Get PDF
    In this paper we propose a cycle redundancy technique that provides optical networks almost fault-tolerant point-to-point and multipoint-to-multipoint communications. The technique more importantly is shown to approximately halve the necessary light-trail resources in the network while maintaining the fault-tolerance and dependability expected from cycle-based routing. For efficiency and distributed control, it is common in distributed systems and algorithms to group nodes into intersecting sets referred to as quorum sets. Optimal communication quorum sets forming optical cycles based on light-trails have been shown to flexibly and efficiently route both point-to-point and multipoint-to-multipoint traffic requests. Commonly cycle routing techniques will use pairs of cycles to achieve both routing and fault-tolerance, which uses substantial resources and creates the potential for underutilization. Instead, we intentionally utilize redundancy within the quorum cycles for fault-tolerance such that almost every point-to-point communication occurs in more than one cycle. The result is a set of cycles with 96.60% - 99.37% fault coverage, while using 42.9% - 47.18% fewer resources.Comment: 17th International Conference on Transparent Optical Networks (ICTON), 5-9 July 2015. arXiv admin note: substantial text overlap with arXiv:1608.05172, arXiv:1608.0516

    Unidirectional Quorum-based Cycle Planning for Efficient Resource Utilization and Fault-Tolerance

    Full text link
    In this paper, we propose a greedy cycle direction heuristic to improve the generalized R\mathbf{R} redundancy quorum cycle technique. When applied using only single cycles rather than the standard paired cycles, the generalized R\mathbf{R} redundancy technique has been shown to almost halve the necessary light-trail resources in the network. Our greedy heuristic improves this cycle-based routing technique's fault-tolerance and dependability. For efficiency and distributed control, it is common in distributed systems and algorithms to group nodes into intersecting sets referred to as quorum sets. Optimal communication quorum sets forming optical cycles based on light-trails have been shown to flexibly and efficiently route both point-to-point and multipoint-to-multipoint traffic requests. Commonly cycle routing techniques will use pairs of cycles to achieve both routing and fault-tolerance, which uses substantial resources and creates the potential for underutilization. Instead, we use a single cycle and intentionally utilize R\mathbf{R} redundancy within the quorum cycles such that every point-to-point communication pairs occur in at least R\mathbf{R} cycles. Without the paired cycles the direction of the quorum cycles becomes critical to the fault tolerance performance. For this we developed a greedy cycle direction heuristic and our single fault network simulations show a reduction of missing pairs by greater than 30%, which translates to significant improvements in fault coverage.Comment: Computer Communication and Networks (ICCCN), 2016 25th International Conference on. arXiv admin note: substantial text overlap with arXiv:1608.05172, arXiv:1608.05168, arXiv:1608.0517

    Impact of Structural Faults on Neural Network Performance

    Get PDF
    Deep Learning (DL), a subset of Artificial Intelligence (AI), is growing rapidly with possible applications in different domains such as speech recognition, computer vision etc. Deep Neural Network (DNN), the backbone of DL algorithms is a directed graph containing multiple layers with different number of neurons residing in each layer. The use of these networks has been increased in the last few years due to availability of large data sets and huge computation power. As the size of DNN is growing over the years, researchers have developed specialized hardware accelerators to reduce the inference compute time. An example of such domain specific architecture designed for Neural Network acceleration is Tensor Processing Unit (TPU) which outperforms GPU in the inference stage of DNN execution. The heart of this inference engine is a Matrix Multiplication unit which is based on systolic array architecture. The TPU\u27s systolic array is a grid-like structure made of individual processing elements that can be extended along rows and columns. Due to external environmental factors or internal scaling of semiconductor, these systems are often prone to faults which leads to improper calculations and thereby resulting in inaccurate decisions by the DNN. Although a lot of work has been done in the past on the computing array implementation and it\u27s reliability concerns, their fault tolerance behavior for DNN application is not very well understood. It is not even clear what would be the impact of various different faults on the accuracy. We in this work, first study possible mapping strategies to implement a convolution and dense layer weights on TPU systolic array. Next we consider various faults scenarios that may occur in the array. We divide these fault scenarios into low, high row and column faults (Fig. 1(a) pictorially represents column faults) modes with respect to the multiplication unit. Next, we study the impact of these fault models on the overall accuracy of the DNN performance on a faculty TPU unit. The goal is to study the resiliency and overcome the limitations of earlier work. The previous work was very effective in masking the random faults which used pruning of weights (removing weights or connections in the DNN) plus retraining to mask the faults on the array. However, it failed in the case of column faults which is clearly shown in Fig. 1(b). We also propose techniques to mitigate or bypass the row and column faults. Our mapping strategy follows physical_x(i) = i%N and physical_y(j) = j%N where (i,j) represents the index of dense (FC) weight matrix and (physical x(i), physical y(j)) indicates the actual physical location on the array of size N. The convolution filters are linearized with respect to every channel so as to convert them into proper weight matrix and mapped according to the previous mentioned policy. It was shown that DNNs can up to certain faults in the array while retaining the original accuracy (low row faults). The accuracy of the network decreases even with one column faults if it (column) is in the use. As per the results, it is proved that for the same number of row and column faults, the latter has most impact on the network accuracy because pruning input neuron has very little effect than pruning an output neuron. We experimented with three different networks and found the influence of these different faults to be the same. These faults can be mitigated using techniques like Matrix Transpose and Array Reduction which does not require retraining of weights. For low row faults, the original mapping policy can be retained such that weights can be mapped at their exact locations which does not affect the accuracy. Low column faults can be converted into low row faults by transposing the matrix. In the case of high row (column) faults, the entire row (column) has to be avoided to completely bypass the faulty locations. Static mapping of weights along with retraining the network on the array can be effective in the case of random faults. Adapting to change in the case of structured faults can reduce the burden of retraining which happens outside the TPU
    corecore