300 research outputs found

    On-Disk Data Processing: Issues and Future Directions

    Get PDF
    In this paper, we present a survey of "on-disk" data processing (ODDP). ODDP, which is a form of near-data processing, refers to the computing arrangement where the secondary storage drives have the data processing capability. Proposed ODDP schemes vary widely in terms of the data processing capability, target applications, architecture and the kind of storage drive employed. Some ODDP schemes provide only a specific but heavily used operation like sort whereas some provide a full range of operations. Recently, with the advent of Solid State Drives, powerful and extensive ODDP solutions have been proposed. In this paper, we present a thorough review of architectures developed for different on-disk processing approaches along with current and future challenges and also identify the future directions which ODDP can take.Comment: 24 pages, 17 Figures, 3 Table

    Resource efficient redundancy using quorum-based cycle routing in optical networks

    Get PDF
    In this paper we propose a cycle redundancy technique that provides optical networks almost fault-tolerant point-to-point and multipoint-to-multipoint communications. The technique more importantly is shown to approximately halve the necessary light-trail resources in the network while maintaining the fault-tolerance and dependability expected from cycle-based routing. For efficiency and distributed control, it is common in distributed systems and algorithms to group nodes into intersecting sets referred to as quorum sets. Optimal communication quorum sets forming optical cycles based on light-trails have been shown to flexibly and efficiently route both point-to-point and multipoint-to-multipoint traffic requests. Commonly cycle routing techniques will use pairs of cycles to achieve both routing and fault-tolerance, which uses substantial resources and creates the potential for underutilization. Instead, we intentionally utilize redundancy within the quorum cycles for fault-tolerance such that almost every point-to-point communication occurs in more than one cycle. The result is a set of cycles with 96.60% - 99.37% fault coverage, while using 42.9% - 47.18% fewer resources.Comment: 17th International Conference on Transparent Optical Networks (ICTON), 5-9 July 2015. arXiv admin note: substantial text overlap with arXiv:1608.05172, arXiv:1608.0516

    Logic element architecture for generic logic chains in programmable devices

    Get PDF
    A reconfigurable device includes an arrangement of a plurality of cells and routing resources for transmitting signals between the cells. The plurality of cells comprises carry-select reuse cells, each of the carry-select reuse cells configured to provide for performing non-arithmetic operations using a reuse arithmetic carry chain interconnecting adjacent cells

    Survivability and Traffic Grooming in WDM Optical Networks

    Get PDF
    The advent of fiber optic transmission systems and wavelength division multiplexing (WDM) have led to a dramatic increase in the usable bandwidth of single fiber systems. This book provides detailed coverage of survivability (dealing with the risk of losing large volumes of traffic data due to a failure of a node or a single fiber span) and traffic grooming (managing the increased complexity of smaller user requests over high capacity data pipes), both of which are key issues in modern optical networks. A framework is developed to deal with these problems in wide-area networks, where the topology used to service various high-bandwidth (but still small in relation to the capacity of the fiber) systems evolves toward making use of a general mesh. Effective solutions, exploiting complex optimization techniques, and heuristic methods are presented to keep network problems tractable. Newer networking technologies and efficient design methodologies are also described.https://lib.dr.iastate.edu/ece_books/1004/thumbnail.jp

    Addressing multiple bit/symbol errors in DRAM subsystem

    Get PDF
    As DRAM technology continues to evolve towards smaller feature sizes and increased densities, faults in DRAM subsystem are becoming more severe. Current servers mostly use CHIPKILL based schemes to tolerate up-to one/two symbol errors per DRAM beat. Such schemes may not detect multiple symbol errors arising due to faults in multiple devices and/or data-bus, address bus. In this article, we introduce Single Symbol Correction Multiple Symbol Detection (SSCMSD)—a novel error handling scheme to correct single-symbol errors and detect multi-symbol errors. Our scheme makes use of a hash in combination with Error Correcting Code (ECC) to avoid silent data corruptions (SDCs). We develop a novel scheme that deploys 32-bit CRC along with Reed-Solomon code to implement SSCMSD for a ×4 based DDR4 system. Simulation based experiments show that our scheme effectively guards against device, data-bus and address-bus errors only limited by the aliasing probability of the hash. Our novel design enabled us to achieve this without introducing additional READ latency. We need 19 chips per rank, 76 data bus-lines and additional hash-logic at the memory controller

    Parallel Computing Solution for Capacity Expansion Network Flow Optimization Problems

    Get PDF
    In classical linear network flow (LNF) problems, a network consists of multiple source and sink nodes, where each node is a sink node or a source node, but not both. Usually, there is only one kind of commodity flow and the goal is to find flow schedules and routes such that all sink nodes’ flow demands are satisfied and the total flow transmission cost is minimized. We develop a capacity expansion multicommodity network flow (CEMNF) problem, in which the total commodity supply is less than the total commodity demand. There are more than one kind of commodities and each node is a commodity flow generator, as well as a consumer. It is allowed to do expansion for commodity flow generation capacities at each node and also to do expansion for commodity flow capacities of each arc so that more flow can be transmitted among nodes. Thus, CEMNF is not only a commodity flow routing problem, but also a commodity generation and flow planning problem, in which the increasing commodity demands need to be satisfied by generation/transmission capacity expansions. The goal of CEMNF problems is to find the flow routes and capacity expansion plans such that all flow demands are satisfied and the total cost of routing and planning is minimized. High-performance distributed computing algorithms have been designed to solve classical linear network flow (LNF) problems have been proposed. Solving the general CEMNF problems by high-performance distributed computing algorithms is an open research question. The LNF problems can be formulated as linear programming models and algorithms have been proposed to solve them efficiently on distributed computing platforms. But, the constraints of the CEMNF problems do not allow them to solve using the same methodology. In this paper, we also develop a transformation method to transform CEMNF problems into LNF problems in polynomial time and space complexity to solve them efficiently on distributed computing platforms. The results show that we can solve CEMNF problems with high performance

    Impact of Structural Faults on Neural Network Performance

    Get PDF
    Deep Learning (DL), a subset of Artificial Intelligence (AI), is growing rapidly with possible applications in different domains such as speech recognition, computer vision etc. Deep Neural Network (DNN), the backbone of DL algorithms is a directed graph containing multiple layers with different number of neurons residing in each layer. The use of these networks has been increased in the last few years due to availability of large data sets and huge computation power. As the size of DNN is growing over the years, researchers have developed specialized hardware accelerators to reduce the inference compute time. An example of such domain specific architecture designed for Neural Network acceleration is Tensor Processing Unit (TPU) which outperforms GPU in the inference stage of DNN execution. The heart of this inference engine is a Matrix Multiplication unit which is based on systolic array architecture. The TPU\u27s systolic array is a grid-like structure made of individual processing elements that can be extended along rows and columns. Due to external environmental factors or internal scaling of semiconductor, these systems are often prone to faults which leads to improper calculations and thereby resulting in inaccurate decisions by the DNN. Although a lot of work has been done in the past on the computing array implementation and it\u27s reliability concerns, their fault tolerance behavior for DNN application is not very well understood. It is not even clear what would be the impact of various different faults on the accuracy. We in this work, first study possible mapping strategies to implement a convolution and dense layer weights on TPU systolic array. Next we consider various faults scenarios that may occur in the array. We divide these fault scenarios into low, high row and column faults (Fig. 1(a) pictorially represents column faults) modes with respect to the multiplication unit. Next, we study the impact of these fault models on the overall accuracy of the DNN performance on a faculty TPU unit. The goal is to study the resiliency and overcome the limitations of earlier work. The previous work was very effective in masking the random faults which used pruning of weights (removing weights or connections in the DNN) plus retraining to mask the faults on the array. However, it failed in the case of column faults which is clearly shown in Fig. 1(b). We also propose techniques to mitigate or bypass the row and column faults. Our mapping strategy follows physical_x(i) = i%N and physical_y(j) = j%N where (i,j) represents the index of dense (FC) weight matrix and (physical x(i), physical y(j)) indicates the actual physical location on the array of size N. The convolution filters are linearized with respect to every channel so as to convert them into proper weight matrix and mapped according to the previous mentioned policy. It was shown that DNNs can up to certain faults in the array while retaining the original accuracy (low row faults). The accuracy of the network decreases even with one column faults if it (column) is in the use. As per the results, it is proved that for the same number of row and column faults, the latter has most impact on the network accuracy because pruning input neuron has very little effect than pruning an output neuron. We experimented with three different networks and found the influence of these different faults to be the same. These faults can be mitigated using techniques like Matrix Transpose and Array Reduction which does not require retraining of weights. For low row faults, the original mapping policy can be retained such that weights can be mapped at their exact locations which does not affect the accuracy. Low column faults can be converted into low row faults by transposing the matrix. In the case of high row (column) faults, the entire row (column) has to be avoided to completely bypass the faulty locations. Static mapping of weights along with retraining the network on the array can be effective in the case of random faults. Adapting to change in the case of structured faults can reduce the burden of retraining which happens outside the TPU

    Managing contamination delay to improve Timing Speculation architectures

    Get PDF
    Timing Speculation (TS) is a widely known method for realizing better-than-worst-case systems. Aggressive clocking, realizable by TS, enable systems to operate beyond specified safe frequency limits to effectively exploit the data dependent circuit delay. However, the range of aggressive clocking for performance enhancement under TS is restricted by short paths. In this paper, we show that increasing the lengths of short paths of the circuit increases the effectiveness of TS, leading to performance improvement. Also, we propose an algorithm to efficiently add delay buffers to selected short paths while keeping down the area penalty. We present our algorithm results for ISCAS-85 suite and show that it is possible to increase the circuit contamination delay by up to 30% without affecting the propagation delay. We also explore the possibility of increasing short path delays further by relaxing the constraint on propagation delay and analyze the performance impact

    Unidirectional Quorum-based Cycle Planning for Efficient Resource Utilization and Fault-Tolerance

    Full text link
    In this paper, we propose a greedy cycle direction heuristic to improve the generalized R\mathbf{R} redundancy quorum cycle technique. When applied using only single cycles rather than the standard paired cycles, the generalized R\mathbf{R} redundancy technique has been shown to almost halve the necessary light-trail resources in the network. Our greedy heuristic improves this cycle-based routing technique's fault-tolerance and dependability. For efficiency and distributed control, it is common in distributed systems and algorithms to group nodes into intersecting sets referred to as quorum sets. Optimal communication quorum sets forming optical cycles based on light-trails have been shown to flexibly and efficiently route both point-to-point and multipoint-to-multipoint traffic requests. Commonly cycle routing techniques will use pairs of cycles to achieve both routing and fault-tolerance, which uses substantial resources and creates the potential for underutilization. Instead, we use a single cycle and intentionally utilize R\mathbf{R} redundancy within the quorum cycles such that every point-to-point communication pairs occur in at least R\mathbf{R} cycles. Without the paired cycles the direction of the quorum cycles becomes critical to the fault tolerance performance. For this we developed a greedy cycle direction heuristic and our single fault network simulations show a reduction of missing pairs by greater than 30%, which translates to significant improvements in fault coverage.Comment: Computer Communication and Networks (ICCCN), 2016 25th International Conference on. arXiv admin note: substantial text overlap with arXiv:1608.05172, arXiv:1608.05168, arXiv:1608.0517

    Reduction of green house gas emission by clean power trading

    Get PDF
    It is well known that the CO 2 emitted by fossil energy is one of the major reasons that result in global warming. It is still an open question about how to reduce CO 2 emission by the implementation of an investment plan for clean power systems. In this paper, we propose a clean power trading method among neighboring regions such that we can reduce CO 2 emission in a large region and reduce the imbalance between the power demand and supply in a region caused by the fluctuation of clean energy. With the five states with rich wind energy in America as an example, we use the quantitative computation results of the five states (from a modeling framework designed by ourselves) to show that our proposed clean power trading method can help reduce CO 2 emission and realize balance
    corecore