68 research outputs found

    TechNews digests: Jan - Mar 2010

    Get PDF
    TechNews is a technology, news and analysis service aimed at anyone in the education sector keen to stay informed about technology developments, trends and issues. TechNews focuses on emerging technologies and other technology news. TechNews service : digests september 2004 till May 2010 Analysis pieces and News combined publish every 2 to 3 month

    3rd Many-core Applications Research Community (MARC) Symposium. (KIT Scientific Reports ; 7598)

    Get PDF
    This manuscript includes recent scientific work regarding the Intel Single Chip Cloud computer and describes approaches for novel approaches for programming and run-time organization

    Dynamic partitioned global address spaces for high-efficiency computing

    Get PDF
    The current trend of ever larger clusters and data centers has coincided with a dramatic increase in the cost and power of these installations. While many efficiency improvements have focused on processor power and cooling costs, reducing the cost and power consumption of high-performance memory has mostly been overlooked. This thesis proposes a new address translation model called Dynamic Partitioned Global Address Space (DPGAS) that extends the ideas of NUMA and software-based approaches to create a high-performance hardware model that can be used to reduce the overall cost and power of memory in larger server installations. A memory model and hardware implementation of DPGAS is developed, and simulations of memory-intensive workloads are used to show potential cost and power reductions when DPGAS is integrated into a server environment.M.S.Committee Chair: Yalamanchili, Sudhakar; Committee Member: Riley, George; Committee Member: Schimmel, Davi

    Energy-efficient electrical and silicon-photonic networks in many core systems

    Full text link
    Thesis (Ph.D.)--Boston UniversityDuring the past decade, the very large scale integration (VLSI) community has migrated towards incorporating multiple cores on a single chip to sustain the historic performance improvement in computing systems. As the core count continuously increases, the performance of network-on-chip (NoC), which is responsible for the communication between cores, caches and memory controllers, is increasingly becoming critical for sustaining the performance improvement. In this dissertation, we propose several methods to improve the energy efficiency of both electrical and silicon-photonic NoCs. Firstly, for electrical NoC, we propose a flow control technique, Express Virtual Channel with Taps (EVC-T), to transmit both broadcast and data packets efficiently in a mesh network. A low-latency notification tree network is included to maintain t he order of broadcast packets. The EVC-T technique improves the NoC latency by 24% and the system energy efficiency in terms of energy-delay product (EDP) by 13%. In the near future, the silicon-photonic links are projected to replace the electrical links for global on-chip communication due to their lower data-dependent power and higher bandwidth density, but the high laser power can more than offset these advantages. Therefore, we propose a silicon-photonic multi-bus NoC architecture and a methodology that can reduce the laser power by 49% on average through bandwidth reconfiguration at runtime based on the variations in bandwidth requirements of applications. We also propose a technique to reduce the laser power by dynamically activating/deactivating the 12 cache banks and switching ON/ OFF the corresponding silicon-photonic links in a crossbar NoC. This cache-reconfiguration based technique can save laser power by 23.8% and improves system EDP by 5.52% on average. In addition, we propose a methodology for placing and sharing on-chip laser sources by jointly considering the bandwidth requirements, thermal constraints and physical layout constraints. Our proposed methodology for placing and sharing of on-chip laser sources reduces laser power. In addition to reducing the laser power to improve the energy efficiency of silicon-photonic NoCs, we propose to leverage the large bandwidth provided by silicon-photonic NoC to share computing resources. The global sharing of floating-point units can save system area by 13.75% and system power by 10%

    Energy Awareness and Scheduling in Mobile Devices and High End Computing

    Get PDF
    In the context of the big picture as energy demands rise due to growing economies and growing populations, there will be greater emphasis on sustainable supply, conservation, and efficient usage of this vital resource. Even at a smaller level, the need for minimizing energy consumption continues to be compelling in embedded, mobile, and server systems such as handheld devices, robots, spaceships, laptops, cluster servers, sensors, etc. This is due to the direct impact of constrained energy sources such as battery size and weight, as well as cooling expenses in cluster-based systems to reduce heat dissipation. Energy management therefore plays a paramount role in not only hardware design but also in user-application, middleware and operating system design. At a higher level Datacenters are sprouting everywhere due to the exponential growth of Big Data in every aspect of human life, the buzz word these days is Cloud computing. This dissertation, focuses on techniques, specifically algorithmic ones to scale down energy needs whenever the system performance can be relaxed. We examine the significance and relevance of this research and develop a methodology to study this phenomenon. Specifically, the research will study energy-aware resource reservations algorithms to satisfy both performance needs and energy constraints. Many energy management schemes focus on a single resource that is dedicated to real-time or nonreal-time processing. Unfortunately, in many practical systems the combination of hard and soft real-time periodic tasks, a-periodic real-time tasks, interactive tasks and batch tasks must be supported. Each task may also require access to multiple resources. Therefore, this research will tackle the NP-hard problem of providing timely and simultaneous access to multiple resources by the use of practical abstractions and near optimal heuristics aided by cooperative scheduling. We provide an elegant EAS model which works across the spectrum which uses a run-profile based approach to scheduling. We apply this model to significant applications such as BLAT and Assembly of gene sequences in the Bioinformatics domain. We also provide a simulation for extending this model to cloud computing to answers “what if” scenario questions for consumers and operators of cloud resources to help answers questions of deadlines, single v/s distributed cluster use and impact analysis of energy-index and availability against revenue and ROI

    Distribution of Low Latency Machine Learning Algorithm

    Get PDF
    Mobile networks are evolving towards centralization and cloudification while bringing computing power to the edge, opening its scope to a new range of applications. Ultra-low latency is one of the requirements of such applications in the next generation of mobile networks (5G), where deep learning is expected to play a big role. Hence, to enable the usage of deep learning solutions on the edge cloud, ultra-low latency inference must be investigated. The study presented here relies on the usage of an in-house framework (CRUN) that enables the distribution of acceleration on data center environment. The objective of this thesis is to leverage the best solution for the inference of a machine learning algorithm for an anomaly detection application using neural networks in the edge cloud context. To evaluate the obtained results with CRUN a comparison work is also carried out. Five inference solutions were compared using CPU, GPU and FPGA. The results show a superior performance in terms of latency for all CRUN experiments, that basically comprehends three cases. The first one utilizing the RTL anomaly detection neural network as a baseline solution, the second using the same baseline code but unrolling the biggest layer for obtaining reduced latency and the third by distributing the neural network in two FPGAs. The requirements for this solution were to obtain latency between 20 μs to 40 μs for inference time and at least 20000 inferences per second. These goals were categorically fulfilled for all CRUN experiments, providing 30 μs latency in average, while the second best solution provided 272 μs

    Energy consumption in cloud computing environments

    Get PDF
    The conference aimed at supporting and stimulating active productive research set to strengthen the technical foundations of engineers and scientists in the continent, through developing strong technical foundations and skills, leading to new small to medium enterprises within the African sub-continent. It also seeked to encourage the emergence of functionally skilled technocrats within the continent.Datacentres are becoming indispensable infrastructure for supporting the services offered by cloud computing. Unfortunately, they consume a great deal of energy accounting for 3% of global electrical energy consumption. The effect of this is that, cloud providers experience high operating costs, which leading to increased Total Cost of Ownership (TCO) of datacentre infrastructure. Moreover, there is increased carbon dioxide emissions that affects the universe. This paper presents a survey on the various ways in which energy is consumed in datacentre infrastructure. The factors that influence energy consumption within a datacentre is presented as well.Strathmore University; Institute of Electrical and Electronics Engineers (IEEE

    Memory-Aware Scheduling for Fixed Priority Hard Real-Time Computing Systems

    Get PDF
    As a major component of a computing system, memory has been a key performance and power consumption bottleneck in computer system design. While processor speeds have been kept rising dramatically, the overall computing performance improvement of the entire system is limited by how fast the memory can feed instructions/data to processing units (i.e. so-called memory wall problem). The increasing transistor density and surging access demands from a rapidly growing number of processing cores also significantly elevated the power consumption of the memory system. In addition, the interference of memory access from different applications and processing cores significantly degrade the computation predictability, which is essential to ensure timing specifications in real-time system design. The recent IC technologies (such as 3D-IC technology) and emerging data-intensive real-time applications (such as Virtual Reality/Augmented Reality, Artificial Intelligence, Internet of Things) further amplify these challenges. We believe that it is not simply desirable but necessary to adopt a joint CPU/Memory resource management framework to deal with these grave challenges. In this dissertation, we focus on studying how to schedule fixed-priority hard real-time tasks with memory impacts taken into considerations. We target on the fixed-priority real-time scheduling scheme since this is one of the most commonly used strategies for practical real-time applications. Specifically, we first develop an approach that takes into consideration not only the execution time variations with cache allocations but also the task period relationship, showing a significant improvement in the feasibility of the system. We further study the problem of how to guarantee timing constraints for hard real-time systems under CPU and memory thermal constraints. We first study the problem under an architecture model with a single core and its main memory individually packaged. We develop a thermal model that can capture the thermal interaction between the processor and memory, and incorporate the periodic resource sever model into our scheduling framework to guarantee both the timing and thermal constraints. We further extend our research to the multi-core architectures with processing cores and memory devices integrated into a single 3D platform. To our best knowledge, this is the first research that can guarantee hard deadline constraints for real-time tasks under temperature constraints for both processing cores and memory devices. Extensive simulation results demonstrate that our proposed scheduling can improve significantly the feasibility of hard real-time systems under thermal constraints
    corecore