Search CORE

108 research outputs found

Mechanistic modeling of architectural vulnerability factor

Author: Chen Jian
Eeckhout Lieven
Eyerman Stijn
John Lizy
Nair Arun
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2015
Field of study

Reliability to soft errors is a significant design challenge in modern microprocessors owing to an exponential increase in the number of transistors on chip and the reduction in operating voltages with each process generation. Architectural Vulnerability Factor (AVF) modeling using microarchitectural simulators enables architects to make informed performance, power, and reliability tradeoffs. However, such simulators are time-consuming and do not reveal the microarchitectural mechanisms that influence AVF. In this article, we present an accurate first-order mechanistic analytical model to compute AVF, developed using the first principles of an out-of-order superscalar execution. This model provides insight into the fundamental interactions between the workload and microarchitecture that together influence AVF. We use the model to perform design space exploration, parametric sweeps, and workload characterization for AVF

Ghent University Academic Bibliography

The Design, modeling and simulation of switching fabrics: For an ATM network switch

Author: Molokov Dmitriy
Publication venue: RIT Scholar Works
Publication date: 01/08/2000
Field of study

The requirements of today\u27s telecommunication systems to support high bandwidth and added flexibility brought about the expansion of (Asynchronous Transfer Mode) ATM as a new method of high-speed data transmission. Various analytical and simulation methods may be used to estimate the performance of ATM switches. Analytical methods considerably limit the range of parameters to be evaluated due to extensive formulae used and time consuming iterations. They are not as effective for large networks because of excessive computations that do not scale linearly with network size. One the other hand, simulation-based methods allow determining a bigger range of performance parameters in a shorter amount of time even for large networks. A simulation model, however, is more elaborate in terms of implementation. Instead of using formulae to obtain results, it has to operate software or hardware modules requiring a certain amount of effort to create. In this work simulation is accomplished by utilizing the ATM library - an object oriented software tool, which uses software chips for building ATM switches. The distinguishing feature of this approach is cut-through routing realized on the bit level abstraction treating ATM protocol data units, called cells, as groups of 424 bits. The arrival events of cells to the system are not instantaneous contrary to commonly used methods of simulation that consider cells as instant messages. The simulation was run for basic multistage interconnection network types with varying source arrival rate and buffer sizes producing a set of graphs of cell delays, throughput, cell loss probability, and queue sizes. The techniques of rearranging and sorting were considered in the simulation. The results indicate that better performance is always achieved by bringing additional stages of elements to the switching system

RIT Scholar Works

An Experiment in the complexity of load balancing algorithms

Author: Carlino Charles
Publication venue: RIT Scholar Works
Publication date: 01/01/1991
Field of study

Not provided

RIT Scholar Works

Improving GPU SIMD Control Flow Efficiency via Hybrid Warp Size Mechanism

Author: Jin Xingxing
Publication venue: 'University of Saskatchewan Library'
Publication date
Field of study

High single instruction multiple data (SIMD) efficiency and low power consumption have made graphic processing units (GPUs) an ideal platform for many complex computational applications. Thousands of threads can be created by programmers and grouped into fixed-size SIMD batches, known as warps. High throughput is then achieved by concurrently executing such warps with minimal control overhead. However, if a branch instruction occurs, which assigns different paths to different threads, this warp will be broken into multiple warps that have to be executed serially, consequently reducing the efficiency advantage of SIMD. In this thesis, the contemporary fixed-size warp design is abandoned and a hybrid warp size (HWS) mechanism is proposed. Mixed-size warps are generated according to HWS and are scheduled and issued flexibly. Once a branch divergence occurs, split warps are squeezed according to the proposed algorithm, and warp sizes are downscaled wherever applicable. Based on updated warp sizes, warp schedulers calculate the number of cycles the current warp needs and issue the next warp accordingly. As a result, hybrid warps are pushed into pipelines as soon as possible and more pipeline stages are overlapped. The simulation results show that this mechanism yields an average speedup of 1.20 over the baseline architecture for a wide variety of general purpose GPU applications. This work also integrates HWS with dynamic warp formation (DWF), which is a well-known branch handling mechanism aimed at improving SIMD utilization by forming new warps out of split warps in real time. The warp forming policy is modified to better tolerate warp conflicts. Also, squeeze operations are added before a warp merges with other warps. The simulation shows that the combination of DWF and HWS generates an average speedup of 1.27 over the DWF-only platform for the same set of GPU benchmarks

eCommons@USASK

University of Saskatchewan Research Archive

Liquid stream processing on the web: a JavaScript framework

Author: Babazadeh Masiar
Pautasso Cesare
Publication venue
Publication date: 16/02/2018
Field of study

The Web is rapidly becoming a mature platform to host distributed applications. Pervasive computing application running on the Web are now common in the era of the Web of Things, which has made it increasingly simple to integrate sensors and microcontrollers in our everyday life. Such devices are of great in- terest to Makers with basic Web development skills. With them, Makers are able to build small smart stream processing applications with sensors and actuators without spending a fortune and without knowing much about the technologies they use. Thanks to ongoing Web technology trends enabling real-time peer-to- peer communication between Web-enabled devices, Web browsers and server- side JavaScript runtimes, developers are able to implement pervasive Web ap- plications using a single programming language. These can take advantage of direct and continuous communication channels going beyond what was possible in the early stages of the Web to push data in real-time. Despite these recent advances, building stream processing applications on the Web of Things remains a challenging task. On the one hand, Web-enabled devices of different nature still have to communicate with different protocols. On the other hand, dealing with a dynamic, heterogeneous, and volatile environment like the Web requires developers to face issues like disconnections, unpredictable workload fluctuations, and device overload. To help developers deal with such issues, in this dissertation we present the Web Liquid Streams (WLS) framework, a novel streaming framework for JavaScript. Developers implement streaming operators written in JavaScript and may interactively and dynamically define a streaming topology. The framework takes care of deploying the user-defined operators on the available devices and connecting them using the appropriate data channel, removing the burden of dealing with different deployment environments from the developers. Changes in the semantic of the application and in its execution environment may be ap- plied at runtime without stopping the stream flow. Like a liquid adapts its shape to the one of its container, the Web Liquid Streams framework makes streaming topologies flow across multiple heterogeneous devices, enabling dynamic operator migration without disrupting the data flow. By constantly monitoring the execution of the topology with a hierarchical controller infrastructure, WLS takes care of parallelising the operator execution across multiple devices in case of bottlenecks and of recovering the execution of the streaming topology in case one or more devices disconnect, by restarting lost operators on other available devices

RERO DOC Digital Library

Recommended from our members

Improving the Performance of Wide Area Networks

Author: Holt Alan Gene
Publication venue
Publication date: 01/01/1999
Field of study

Research in to the performance of wide area data networks is described in this thesis. A model of wide area network packet delays is developed and used to direct the research in to methods of improving performance. Wide area networks are slow and expensive compared to the computer systems that rely on them for communication. Typically data networks are packet switched in order to make efficient use of resources. This can lead to contention, and the mechanisms for resolving contention can bring about further delays when demand for resources is high. In this thesis, network users are viewed as interacting decision makers with conflicting interests, and Game Theory is used to analyse the effects users have on each other’s performance. It is asserted in this thesis that wide area network performance is an ethical issue as well as a technical one. Compression is examined as a technique for reducing network traffic load. While load reductions can reduce the time packets spend waiting in buffer queues experimental results show the compression process itself can present a bottleneck if CPU resources are limited. The other inhibiting factor with regard to wide area network performance is the time it takes for a signal to propagate through a transmission medium. Propagation delays are bounded by the speed of light and becomes significant as the distance between computer systems increases. Mirrors and Caches are methods of bringing data closer to the user, thereby reducing propagation delays and capping traffic loads on long haul communication facilities. The performance benefits of replicating data within a wide area network environment are studied in this thesis

Open Research Online (The Open University)

OpenGrey Repository

Compiler-Directed Energy Savings in Superscalar Processors

Author: Jones Timothy M
Publication venue: University of Edinburgh. College of Science and Engineering. School of Informatics.
Publication date: 01/01/2006
Field of study

Institute for Computing Systems ArchitectureSuperscalar processors contain large, complex structures to hold data and instructions as they wait to be executed. However, many of these structures consume large amounts of energy, making them hotspots requiring sophisticated cooling systems. With the trend towards larger, more complex processors, this will become more of a problem, having important implications for future technology. This thesis uses compiler-based optimisation schemes to target the issue queue and register file. These are two of the most energy consuming structures in the processor. The algorithms and hardware techniques developed in this work dynamically adapt the processor's resources to the changing program phases, turning off parts of each structure when they are unused to save dynamic and static energy. To optimise the issue queue, the compiler analysis tracks data dependences through each program procedure. It identifies the critical path through each program region and informs the hardware of the minimum number of queue entries required to prevent it slowing down. This reduces the occupancy of the queue and increases the opportunities to save energy. With just a 1.3% performance loss, 26% dynamic and 32% static energy savings are achieved. Registers can be idle for many cycles after they are last read, before they are released and put back on the free-list to be reused by another instruction. Alternatively, they can be turned off for energy savings. Early register releasing can be used to perform this operation sooner than usual, but hardware schemes must wait for the instruction redefining the relevant logical register to enter the pipeline. This thesis presents an exploration of compiler-directed early register releasing. The compiler can exactly identify the last use of each register and pass the information to the hardware, based on simple data-flow and liveness analysis. The best scheme achieves 15% dynamic and 19% static energy savings. Finally, the issue queue limiting and early register releasing schemes are combined for energy savings in both processor structures. Four different configurations are evaluated bringing 25% to 31% dynamic and 19% to 34% static issue queue energy savings and reductions of 18% to 25% dynamic and 20% to 21% static energy in the register file

CiteSeerX

Edinburgh Research Archive

Recommended from our members

Optimising data centre operation by removing the transport bottleneck

Author: Moncaster Tobias
Publication venue: University of Cambridge
Publication date: 04/04/2018
Field of study

Data centres lie at the heart of almost every service on the Internet. Data centres are used to provide search results, to power social media, to store and index email, to host “cloud” applications, for online retail and to provide a myriad of other web services. Consequently the more efficient they can be made the better for all of us. The power of modern data centres is in combining commodity off-the-shelf server hardware and network equipment to provide what Google’s Barrosso and Ho ̈lzle describe as “warehouse scale” computers. Data centres rely on TCP, a transport protocol that was originally designed for use in the Internet. Like other such protocols, TCP has been optimised to maximise throughput, usually by filling up queues at the bottleneck. However, for most applications within a data centre network latency is more critical than throughput. Consequently the choice of transport protocol becomes a bottleneck for performance. My thesis is that the solution to this is to move away from the use of one-size-fits-all transport protocols towards ones that have been designed to reduce latency across the data centre and which can dynamically respond to the needs of the applications. This dissertation focuses on optimising the transport layer in data centre networks. In particular I address the question of whether any single transport mechanism can be flexible enough to cater to the needs of all data centre traffic. I show that one leading protocol (DCTCP) has been heavily optimised for certain network conditions. I then explore approaches that seek to minimise latency for applications that care about it while still allowing throughput-intensive applications to receive a good level of service. My key contributions to this are Silo and Trevi. Trevi is a novel transport system for storage traffic that utilises fountain coding to max- imise throughput and minimise latency while being agnostic to drop, thus allowing storage traffic to be pushed out of the way when latency sensitive traffic is present in the network. Silo is an admission control system that is designed to give tenants of a multi-tenant data centre guaranteed low latency network performance. Both of these were developed in collaboration with others

Apollo (Cambridge)

Optimal Decision Making for Capacitated Reverse Logistics Networks with Quality Variations

Author: Farahani Sajjad
Publication venue: UWM Digital Commons
Publication date: 01/05/2018
Field of study

Increasing concerns about the environmental impact of production, product take-back laws and dwindling natural resources have heightened the need to address the impact of disposing end-of-life (EOL) products. To cope this challenge, manufacturers have integrated reverse logistics into their supply chain or chosen to outsource product recovery activities to third party firms. The uncertain quality of returns as well as uncertainty in return flow limit the effectiveness of planning, control and monitoring of reverse logistics networks. In addition, there are different recovery routes for each returned product such as reuse, repair, disassembling, remanufacturing and recycling. To determine the most profitable option for EOL product management, remanufacturers must consider the quality of returns and other limitations such as inventory size, demand and quantity of returns. The work in this dissertation addresses these pertinent aspects using two models that have been motivated by two remanufacturing facilities whereby there are uncertainties in the quality and quantity of return and capacitated inventories. In the first case, a disposition decision making model is developed for a remanufacturing process in which the inventory capacity of recoverable returns is limited and where there\u27s a constant demand to be met, for remanufactured products that meet a minimum quality threshold. It is assumed that the quality of returns is uncertain and remanufacturing cost is dependent on the quality grade. In this model, remanufacturing takes place when there is demand for remanufactured products. Accepted returns that meet the minimum quality threshold undergo the remanufacturing processes, and any unacceptable returns are salvaged. A continuous time Markov chain (CTMC) is presented as the modeling approach. The Matrix-Geometric solution methodology is applied to evaluate several key performance metrics for this system, to result in the optimal disposition policy. The numerical study shows an intricate trade-off between the acceptable quality threshold value and the recoverable product inventory capacity. Particularly, there are periodic system starvation whenever there is a mis-match between these two system metrics. In addition, the sensitivity analysis indicates that changes to the demand rate for remanufactured products necessitates the need to re-evaluate the existing system configuration. In the second case, a general framework is presented for a third party remanufacturer, where the remanufacturer has the alternative of salvaging EOL products and supplying parts to external suppliers, or remanufacture the disassembled parts to \u27as new\u27 conditions. The remanufacturing processes of reusable products and parts is studied in the context of other process variables such as the cost and demand of remanufactured products and parts. The goal of this model is to determine the return quality thresholds for a multi-product, multi-period remanufacturing setting. The problem is formulated as a mixed integer non-linear programming (MINLP) problem, which involves a discretization technique that turns the problem turns into a quadratic mixed integer programming (QMIP) problem. Finally, a numerical analysis using a personal computer (PC) remanufacturing facility data is used to test the extent to which the minimum acceptance quality threshold is dependent on the inventory level capacities of the EOL product management sites, varying operational costs and the upper bound of disposal rate

University of Wisconsin-Milwaukee