141,708 research outputs found
Improving scalability of task-based programs
In a multi-core era, parallel programming allows further performance improvements, but with an important programmability cost. We envision that the best approach to parallel programming that can exceed the programability, parallelism, power, memory and reliability walls in Computer Architecture is a run-time approach.
Many traditional computer architecture concepts can be revisited and applied at the runtime layer in a completely transparent way to the programmer. The goal of this work is taking the computer architecture value prediction and data-prefetching concepts inside a runtime environment like OmpSs
Staring into the abyss: An evaluation of concurrency control with one thousand cores
Computer architectures are moving towards an era dominated by many-core machines with dozens or even hundreds of cores on a single chip. This unprecedented level of on-chip parallelism introduces a new dimension to scalability that current database management systems (DBMSs) were not designed for. In particular, as the number of cores increases, the problem of concurrency control becomes extremely challenging. With hundreds of threads running in parallel, the complexity of coordinating competing accesses to data will likely diminish the gains from increased core counts.
To better understand just how unprepared current DBMSs are for future CPU architectures, we performed an evaluation of concurrency control for on-line transaction processing (OLTP) workloads on many-core chips. We implemented seven concurrency control algorithms on a main-memory DBMS and using computer simulations scaled our system to 1024 cores. Our analysis shows that all algorithms fail to scale to this magnitude but for different reasons. In each case, we identify fundamental bottlenecks that are independent of the particular database implementation and argue that even state-of-the-art DBMSs suffer from these limitations. We conclude that rather than pursuing incremental solutions, many-core chips may require a completely redesigned DBMS architecture that is built from ground up and is tightly coupled with the hardware.Intel Corporation (Science and Technology Center for Big Data
Energy consumption in networks on chip : efficiency and scaling
Computer architecture design is in a new era where performance is increased by replicating processing cores on a chip rather than making CPUs larger and faster. This design strategy is motivated by the superior energy efficiency of the multi-core architecture compared to the traditional monolithic CPU. If the trend continues as expected, the number of cores on a chip is predicted to grow exponentially over time as the density of transistors on a die increases. A major challenge to the efficiency of multi-core chips is the energy used for communication among cores over a Network on Chip (NoC). As the number of cores increases, this energy also increases, imposing serious constraints on design and performance of both applications and architectures. Therefore, understanding the impact of different design choices on NoC power and energy consumption is crucial to the success of the multi- and many-core designs. This dissertation proposes methods for modeling and optimizing energy consumption in multi- and many-core chips, with special focus on the energy used for communication on the NoC. We present a number of tools and models to optimize energy consumption and model its scaling behavior as the number of cores increases. We use synthetic traffic patterns and full system simulations to test and validate our methods. Finally, we take a step back and look at the evolution of computer hardware in the last 40 years and, using a scaling theory from biology, present a predictive theory for power-performance scaling in microprocessor systems
RETROSPECTIVE: Corona: System Implications of Emerging Nanophotonic Technology
The 2008 Corona effort was inspired by a pressing need for more of
everything, as demanded by the salient problems of the day. Dennard scaling was
no longer in effect. A lot of computer architecture research was in the
doldrums. Papers often showed incremental subsystem performance improvements,
but at incommensurate cost and complexity. The many-core era was moving
rapidly, and the approach with many simpler cores was at odds with the better
and more complex subsystem publications of the day. Core counts were doubling
every 18 months, while per-pin bandwidth was expected to double, at best, over
the next decade. Memory bandwidth and capacity had to increase to keep pace
with ever more powerful multi-core processors. With increasing core counts per
die, inter-core communication bandwidth and latency became more important. At
the same time, the area and power of electrical networks-on-chip were
increasingly problematic: To be reliably received, any signal that traverses a
wire spanning a full reticle-sized die would need significant equalization,
re-timing, and multiple clock cycles. This additional time, area, and power was
the crux of the concern, and things looked to get worse in the future.
Silicon nanophotonics was of particular interest and seemed to be improving
rapidly. This led us to consider taking advantage of 3D packaging, where one
die in the 3D stack would be a photonic network layer. Our focus was on a
system that could be built about a decade out. Thus, we tried to predict how
the technologies and the system performance requirements would converge in
about 2018. Corona was the result this exercise; now, 15 years later, it's
interesting to look back at the effort.Comment: 2 pages. Proceedings of ISCA-50: 50 years of the International
Symposia on Computer Architecture (selected papers) June 17-21 Orlando,
Florid
ERA: A Framework for Economic Resource Allocation for the Cloud
Cloud computing has reached significant maturity from a systems perspective,
but currently deployed solutions rely on rather basic economics mechanisms that
yield suboptimal allocation of the costly hardware resources. In this paper we
present Economic Resource Allocation (ERA), a complete framework for scheduling
and pricing cloud resources, aimed at increasing the efficiency of cloud
resources usage by allocating resources according to economic principles. The
ERA architecture carefully abstracts the underlying cloud infrastructure,
enabling the development of scheduling and pricing algorithms independently of
the concrete lower-level cloud infrastructure and independently of its
concerns. Specifically, ERA is designed as a flexible layer that can sit on top
of any cloud system and interfaces with both the cloud resource manager and
with the users who reserve resources to run their jobs. The jobs are scheduled
based on prices that are dynamically calculated according to the predicted
demand. Additionally, ERA provides a key internal API to pluggable algorithmic
modules that include scheduling, pricing and demand prediction. We provide a
proof-of-concept software and demonstrate the effectiveness of the architecture
by testing ERA over both public and private cloud systems -- Azure Batch of
Microsoft and Hadoop/YARN. A broader intent of our work is to foster
collaborations between economics and system communities. To that end, we have
developed a simulation platform via which economics and system experts can test
their algorithmic implementations
- …