6,923 research outputs found
Virtual Machine Support for Many-Core Architectures: Decoupling Abstract from Concrete Concurrency Models
The upcoming many-core architectures require software developers to exploit
concurrency to utilize available computational power. Today's high-level
language virtual machines (VMs), which are a cornerstone of software
development, do not provide sufficient abstraction for concurrency concepts. We
analyze concrete and abstract concurrency models and identify the challenges
they impose for VMs. To provide sufficient concurrency support in VMs, we
propose to integrate concurrency operations into VM instruction sets.
Since there will always be VMs optimized for special purposes, our goal is to
develop a methodology to design instruction sets with concurrency support.
Therefore, we also propose a list of trade-offs that have to be investigated to
advise the design of such instruction sets.
As a first experiment, we implemented one instruction set extension for
shared memory and one for non-shared memory concurrency. From our experimental
results, we derived a list of requirements for a full-grown experimental
environment for further research
Datacenter Traffic Control: Understanding Techniques and Trade-offs
Datacenters provide cost-effective and flexible access to scalable compute
and storage resources necessary for today's cloud computing needs. A typical
datacenter is made up of thousands of servers connected with a large network
and usually managed by one operator. To provide quality access to the variety
of applications and services hosted on datacenters and maximize performance, it
deems necessary to use datacenter networks effectively and efficiently.
Datacenter traffic is often a mix of several classes with different priorities
and requirements. This includes user-generated interactive traffic, traffic
with deadlines, and long-running traffic. To this end, custom transport
protocols and traffic management techniques have been developed to improve
datacenter network performance.
In this tutorial paper, we review the general architecture of datacenter
networks, various topologies proposed for them, their traffic properties,
general traffic control challenges in datacenters and general traffic control
objectives. The purpose of this paper is to bring out the important
characteristics of traffic control in datacenters and not to survey all
existing solutions (as it is virtually impossible due to massive body of
existing research). We hope to provide readers with a wide range of options and
factors while considering a variety of traffic control mechanisms. We discuss
various characteristics of datacenter traffic control including management
schemes, transmission control, traffic shaping, prioritization, load balancing,
multipathing, and traffic scheduling. Next, we point to several open challenges
as well as new and interesting networking paradigms. At the end of this paper,
we briefly review inter-datacenter networks that connect geographically
dispersed datacenters which have been receiving increasing attention recently
and pose interesting and novel research problems.Comment: Accepted for Publication in IEEE Communications Surveys and Tutorial
Thread-Modular Static Analysis for Relaxed Memory Models
We propose a memory-model-aware static program analysis method for accurately
analyzing the behavior of concurrent software running on processors with weak
consistency models such as x86-TSO, SPARC-PSO, and SPARC-RMO. At the center of
our method is a unified framework for deciding the feasibility of inter-thread
interferences to avoid propagating spurious data flows during static analysis
and thus boost the performance of the static analyzer. We formulate the
checking of interference feasibility as a set of Datalog rules which are both
efficiently solvable and general enough to capture a range of hardware-level
memory models. Compared to existing techniques, our method can significantly
reduce the number of bogus alarms as well as unsound proofs. We implemented the
method and evaluated it on a large set of multithreaded C programs. Our
experiments showthe method significantly outperforms state-of-the-art
techniques in terms of accuracy with only moderate run-time overhead.Comment: revised version of the ESEC/FSE 2017 pape
Program Transformations for Asynchronous and Batched Query Submission
The performance of database/Web-service backed applications can be
significantly improved by asynchronous submission of queries/requests well
ahead of the point where the results are needed, so that results are likely to
have been fetched already when they are actually needed. However, manually
writing applications to exploit asynchronous query submission is tedious and
error-prone. In this paper we address the issue of automatically transforming a
program written assuming synchronous query submission, to one that exploits
asynchronous query submission. Our program transformation method is based on
data flow analysis and is framed as a set of transformation rules. Our rules
can handle query executions within loops, unlike some of the earlier work in
this area. We also present a novel approach that, at runtime, can combine
multiple asynchronous requests into batches, thereby achieving the benefits of
batching in addition to that of asynchronous submission. We have built a tool
that implements our transformation techniques on Java programs that use JDBC
calls; our tool can be extended to handle Web service calls. We have carried
out a detailed experimental study on several real-life applications, which
shows the effectiveness of the proposed rewrite techniques, both in terms of
their applicability and the performance gains achieved.Comment: 14 page
Domain-specific Architectures for Data-intensive Applications
Graphs' versatile ability to represent diverse relationships, make them effective for a wide range of applications. For instance, search engines use graph-based applications to provide high-quality search results. Medical centers use them to aid in patient diagnosis. Most recently, graphs are also being employed to support the management of viral pandemics. Looking forward, they are showing promise of being critical in unlocking several other opportunities, including combating the spread of fake content in social networks, detecting and preventing fraudulent online transactions in a timely fashion, and in ensuring collision avoidance in autonomous vehicle navigation, to name a few. Unfortunately, all these applications require more computational power than what can be provided by conventional computing systems. The key reason is that graph applications present large working sets that fail to fit in the small on-chip storage of existing computing systems, while at the same time they access data in seemingly unpredictable patterns, thus cannot draw benefit from traditional on-chip storage.
In this dissertation, we set out to address the performance limitations of existing computing systems so to enable emerging graph applications like those described above. To achieve this, we identified three key strategies: 1) specializing memory architecture, 2) processing data near its storage, and 3) message coalescing in the network. Based on these strategies, this dissertation develops several solutions: OMEGA, which employs specialized on-chip storage units, with co-located specialized compute engines to accelerate the computation; MessageFusion, which coalesces messages in the interconnect; and Centaur, providing an architecture that optimizes the processing of infrequently-accessed data. Overall, these solutions provide 2x in performance improvements, with negligible hardware overheads, across a wide range of applications.
Finally, we demonstrate the applicability of our strategies to other data-intensive domains, by exploring an acceleration solution for MapReduce applications, which achieves a 4x performance speedup, also with negligible area and power overheads.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/163186/1/abrahad_1.pd
Low Power system Design techniques for mobile computers
Portable products are being used increasingly. Because these systems are battery powered, reducing power consumption is vital. In this report we give the properties of low power design and techniques to exploit them on the architecture of the system. We focus on: min imizing capacitance, avoiding unnecessary and wasteful activity, and reducing voltage and frequency. We review energy reduction techniques in the architecture and design of a hand-held computer and the wireless communication system, including error control, sys tem decomposition, communication and MAC protocols, and low power short range net works
- …