108 research outputs found

    Mechanistic modeling of architectural vulnerability factor

    Get PDF
    Reliability to soft errors is a significant design challenge in modern microprocessors owing to an exponential increase in the number of transistors on chip and the reduction in operating voltages with each process generation. Architectural Vulnerability Factor (AVF) modeling using microarchitectural simulators enables architects to make informed performance, power, and reliability tradeoffs. However, such simulators are time-consuming and do not reveal the microarchitectural mechanisms that influence AVF. In this article, we present an accurate first-order mechanistic analytical model to compute AVF, developed using the first principles of an out-of-order superscalar execution. This model provides insight into the fundamental interactions between the workload and microarchitecture that together influence AVF. We use the model to perform design space exploration, parametric sweeps, and workload characterization for AVF

    The Design, modeling and simulation of switching fabrics: For an ATM network switch

    Get PDF
    The requirements of today\u27s telecommunication systems to support high bandwidth and added flexibility brought about the expansion of (Asynchronous Transfer Mode) ATM as a new method of high-speed data transmission. Various analytical and simulation methods may be used to estimate the performance of ATM switches. Analytical methods considerably limit the range of parameters to be evaluated due to extensive formulae used and time consuming iterations. They are not as effective for large networks because of excessive computations that do not scale linearly with network size. One the other hand, simulation-based methods allow determining a bigger range of performance parameters in a shorter amount of time even for large networks. A simulation model, however, is more elaborate in terms of implementation. Instead of using formulae to obtain results, it has to operate software or hardware modules requiring a certain amount of effort to create. In this work simulation is accomplished by utilizing the ATM library - an object oriented software tool, which uses software chips for building ATM switches. The distinguishing feature of this approach is cut-through routing realized on the bit level abstraction treating ATM protocol data units, called cells, as groups of 424 bits. The arrival events of cells to the system are not instantaneous contrary to commonly used methods of simulation that consider cells as instant messages. The simulation was run for basic multistage interconnection network types with varying source arrival rate and buffer sizes producing a set of graphs of cell delays, throughput, cell loss probability, and queue sizes. The techniques of rearranging and sorting were considered in the simulation. The results indicate that better performance is always achieved by bringing additional stages of elements to the switching system

    An Experiment in the complexity of load balancing algorithms

    Get PDF
    Not provided

    Improving GPU SIMD Control Flow Efficiency via Hybrid Warp Size Mechanism

    Get PDF
    High single instruction multiple data (SIMD) efficiency and low power consumption have made graphic processing units (GPUs) an ideal platform for many complex computational applications. Thousands of threads can be created by programmers and grouped into fixed-size SIMD batches, known as warps. High throughput is then achieved by concurrently executing such warps with minimal control overhead. However, if a branch instruction occurs, which assigns different paths to different threads, this warp will be broken into multiple warps that have to be executed serially, consequently reducing the efficiency advantage of SIMD. In this thesis, the contemporary fixed-size warp design is abandoned and a hybrid warp size (HWS) mechanism is proposed. Mixed-size warps are generated according to HWS and are scheduled and issued flexibly. Once a branch divergence occurs, split warps are squeezed according to the proposed algorithm, and warp sizes are downscaled wherever applicable. Based on updated warp sizes, warp schedulers calculate the number of cycles the current warp needs and issue the next warp accordingly. As a result, hybrid warps are pushed into pipelines as soon as possible and more pipeline stages are overlapped. The simulation results show that this mechanism yields an average speedup of 1.20 over the baseline architecture for a wide variety of general purpose GPU applications. This work also integrates HWS with dynamic warp formation (DWF), which is a well-known branch handling mechanism aimed at improving SIMD utilization by forming new warps out of split warps in real time. The warp forming policy is modified to better tolerate warp conflicts. Also, squeeze operations are added before a warp merges with other warps. The simulation shows that the combination of DWF and HWS generates an average speedup of 1.27 over the DWF-only platform for the same set of GPU benchmarks

    Liquid stream processing on the web: a JavaScript framework

    Get PDF
    The Web is rapidly becoming a mature platform to host distributed applications. Pervasive computing application running on the Web are now common in the era of the Web of Things, which has made it increasingly simple to integrate sensors and microcontrollers in our everyday life. Such devices are of great in- terest to Makers with basic Web development skills. With them, Makers are able to build small smart stream processing applications with sensors and actuators without spending a fortune and without knowing much about the technologies they use. Thanks to ongoing Web technology trends enabling real-time peer-to- peer communication between Web-enabled devices, Web browsers and server- side JavaScript runtimes, developers are able to implement pervasive Web ap- plications using a single programming language. These can take advantage of direct and continuous communication channels going beyond what was possible in the early stages of the Web to push data in real-time. Despite these recent advances, building stream processing applications on the Web of Things remains a challenging task. On the one hand, Web-enabled devices of different nature still have to communicate with different protocols. On the other hand, dealing with a dynamic, heterogeneous, and volatile environment like the Web requires developers to face issues like disconnections, unpredictable workload fluctuations, and device overload. To help developers deal with such issues, in this dissertation we present the Web Liquid Streams (WLS) framework, a novel streaming framework for JavaScript. Developers implement streaming operators written in JavaScript and may interactively and dynamically define a streaming topology. The framework takes care of deploying the user-defined operators on the available devices and connecting them using the appropriate data channel, removing the burden of dealing with different deployment environments from the developers. Changes in the semantic of the application and in its execution environment may be ap- plied at runtime without stopping the stream flow. Like a liquid adapts its shape to the one of its container, the Web Liquid Streams framework makes streaming topologies flow across multiple heterogeneous devices, enabling dynamic operator migration without disrupting the data flow. By constantly monitoring the execution of the topology with a hierarchical controller infrastructure, WLS takes care of parallelising the operator execution across multiple devices in case of bottlenecks and of recovering the execution of the streaming topology in case one or more devices disconnect, by restarting lost operators on other available devices

    Compiler-Directed Energy Savings in Superscalar Processors

    Get PDF
    Institute for Computing Systems ArchitectureSuperscalar processors contain large, complex structures to hold data and instructions as they wait to be executed. However, many of these structures consume large amounts of energy, making them hotspots requiring sophisticated cooling systems. With the trend towards larger, more complex processors, this will become more of a problem, having important implications for future technology. This thesis uses compiler-based optimisation schemes to target the issue queue and register file. These are two of the most energy consuming structures in the processor. The algorithms and hardware techniques developed in this work dynamically adapt the processor's resources to the changing program phases, turning off parts of each structure when they are unused to save dynamic and static energy. To optimise the issue queue, the compiler analysis tracks data dependences through each program procedure. It identifies the critical path through each program region and informs the hardware of the minimum number of queue entries required to prevent it slowing down. This reduces the occupancy of the queue and increases the opportunities to save energy. With just a 1.3% performance loss, 26% dynamic and 32% static energy savings are achieved. Registers can be idle for many cycles after they are last read, before they are released and put back on the free-list to be reused by another instruction. Alternatively, they can be turned off for energy savings. Early register releasing can be used to perform this operation sooner than usual, but hardware schemes must wait for the instruction redefining the relevant logical register to enter the pipeline. This thesis presents an exploration of compiler-directed early register releasing. The compiler can exactly identify the last use of each register and pass the information to the hardware, based on simple data-flow and liveness analysis. The best scheme achieves 15% dynamic and 19% static energy savings. Finally, the issue queue limiting and early register releasing schemes are combined for energy savings in both processor structures. Four different configurations are evaluated bringing 25% to 31% dynamic and 19% to 34% static issue queue energy savings and reductions of 18% to 25% dynamic and 20% to 21% static energy in the register file

    Optimal Decision Making for Capacitated Reverse Logistics Networks with Quality Variations

    Get PDF
    Increasing concerns about the environmental impact of production, product take-back laws and dwindling natural resources have heightened the need to address the impact of disposing end-of-life (EOL) products. To cope this challenge, manufacturers have integrated reverse logistics into their supply chain or chosen to outsource product recovery activities to third party firms. The uncertain quality of returns as well as uncertainty in return flow limit the effectiveness of planning, control and monitoring of reverse logistics networks. In addition, there are different recovery routes for each returned product such as reuse, repair, disassembling, remanufacturing and recycling. To determine the most profitable option for EOL product management, remanufacturers must consider the quality of returns and other limitations such as inventory size, demand and quantity of returns. The work in this dissertation addresses these pertinent aspects using two models that have been motivated by two remanufacturing facilities whereby there are uncertainties in the quality and quantity of return and capacitated inventories. In the first case, a disposition decision making model is developed for a remanufacturing process in which the inventory capacity of recoverable returns is limited and where there\u27s a constant demand to be met, for remanufactured products that meet a minimum quality threshold. It is assumed that the quality of returns is uncertain and remanufacturing cost is dependent on the quality grade. In this model, remanufacturing takes place when there is demand for remanufactured products. Accepted returns that meet the minimum quality threshold undergo the remanufacturing processes, and any unacceptable returns are salvaged. A continuous time Markov chain (CTMC) is presented as the modeling approach. The Matrix-Geometric solution methodology is applied to evaluate several key performance metrics for this system, to result in the optimal disposition policy. The numerical study shows an intricate trade-off between the acceptable quality threshold value and the recoverable product inventory capacity. Particularly, there are periodic system starvation whenever there is a mis-match between these two system metrics. In addition, the sensitivity analysis indicates that changes to the demand rate for remanufactured products necessitates the need to re-evaluate the existing system configuration. In the second case, a general framework is presented for a third party remanufacturer, where the remanufacturer has the alternative of salvaging EOL products and supplying parts to external suppliers, or remanufacture the disassembled parts to \u27as new\u27 conditions. The remanufacturing processes of reusable products and parts is studied in the context of other process variables such as the cost and demand of remanufactured products and parts. The goal of this model is to determine the return quality thresholds for a multi-product, multi-period remanufacturing setting. The problem is formulated as a mixed integer non-linear programming (MINLP) problem, which involves a discretization technique that turns the problem turns into a quadratic mixed integer programming (QMIP) problem. Finally, a numerical analysis using a personal computer (PC) remanufacturing facility data is used to test the extent to which the minimum acceptance quality threshold is dependent on the inventory level capacities of the EOL product management sites, varying operational costs and the upper bound of disposal rate
    • 

    corecore