72,565 research outputs found
Architecture-Aware Configuration and Scheduling of Matrix Multiplication on Asymmetric Multicore Processors
Asymmetric multicore processors (AMPs) have recently emerged as an appealing
technology for severely energy-constrained environments, especially in mobile
appliances where heterogeneity in applications is mainstream. In addition,
given the growing interest for low-power high performance computing, this type
of architectures is also being investigated as a means to improve the
throughput-per-Watt of complex scientific applications.
In this paper, we design and embed several architecture-aware optimizations
into a multi-threaded general matrix multiplication (gemm), a key operation of
the BLAS, in order to obtain a high performance implementation for ARM
big.LITTLE AMPs. Our solution is based on the reference implementation of gemm
in the BLIS library, and integrates a cache-aware configuration as well as
asymmetric--static and dynamic scheduling strategies that carefully tune and
distribute the operation's micro-kernels among the big and LITTLE cores of the
target processor. The experimental results on a Samsung Exynos 5422, a
system-on-chip with ARM Cortex-A15 and Cortex-A7 clusters that implements the
big.LITTLE model, expose that our cache-aware versions of gemm with asymmetric
scheduling attain important gains in performance with respect to its
architecture-oblivious counterparts while exploiting all the resources of the
AMP to deliver considerable energy efficiency
Coverage Protocols for Wireless Sensor Networks: Review and Future Directions
The coverage problem in wireless sensor networks (WSNs) can be generally
defined as a measure of how effectively a network field is monitored by its
sensor nodes. This problem has attracted a lot of interest over the years and
as a result, many coverage protocols were proposed. In this survey, we first
propose a taxonomy for classifying coverage protocols in WSNs. Then, we
classify the coverage protocols into three categories (i.e. coverage aware
deployment protocols, sleep scheduling protocols for flat networks, and
cluster-based sleep scheduling protocols) based on the network stage where the
coverage is optimized. For each category, relevant protocols are thoroughly
reviewed and classified based on the adopted coverage techniques. Finally, we
discuss open issues (and recommend future directions to resolve them)
associated with the design of realistic coverage protocols. Issues such as
realistic sensing models, realistic energy consumption models, realistic
connectivity models and sensor localization are covered
Improving DRAM Performance by Parallelizing Refreshes with Accesses
Modern DRAM cells are periodically refreshed to prevent data loss due to
leakage. Commodity DDR DRAM refreshes cells at the rank level. This degrades
performance significantly because it prevents an entire rank from serving
memory requests while being refreshed. DRAM designed for mobile platforms,
LPDDR DRAM, supports an enhanced mode, called per-bank refresh, that refreshes
cells at the bank level. This enables a bank to be accessed while another in
the same rank is being refreshed, alleviating part of the negative performance
impact of refreshes. However, there are two shortcomings of per-bank refresh.
First, the per-bank refresh scheduling scheme does not exploit the full
potential of overlapping refreshes with accesses across banks because it
restricts the banks to be refreshed in a sequential round-robin order. Second,
accesses to a bank that is being refreshed have to wait.
To mitigate the negative performance impact of DRAM refresh, we propose two
complementary mechanisms, DARP (Dynamic Access Refresh Parallelization) and
SARP (Subarray Access Refresh Parallelization). The goal is to address the
drawbacks of per-bank refresh by building more efficient techniques to
parallelize refreshes and accesses within DRAM. First, instead of issuing
per-bank refreshes in a round-robin order, DARP issues per-bank refreshes to
idle banks in an out-of-order manner. Furthermore, DARP schedules refreshes
during intervals when a batch of writes are draining to DRAM. Second, SARP
exploits the existence of mostly-independent subarrays within a bank. With
minor modifications to DRAM organization, it allows a bank to serve memory
accesses to an idle subarray while another subarray is being refreshed.
Extensive evaluations show that our mechanisms improve system performance and
energy efficiency compared to state-of-the-art refresh policies and the benefit
increases as DRAM density increases.Comment: The original paper published in the International Symposium on
High-Performance Computer Architecture (HPCA) contains an error. The arxiv
version has an erratum that describes the error and the fix for i
Unified clustering and communication protocol for wireless sensor networks
In this paper we present an energy-efficient cross layer protocol for providing application specific reservations in wireless senor networks called the “Unified Clustering and Communication Protocol ” (UCCP). Our modular cross layered framework satisfies three wireless sensor network requirements, namely, the QoS requirement of heterogeneous applications, energy aware clustering and data forwarding by relay sensor nodes. Our unified design approach is motivated by providing an integrated and viable solution for self organization and end-to-end communication is wireless sensor networks. Dynamic QoS based reservation guarantees are provided using a reservation-based TDMA approach. Our novel energy-efficient clustering approach employs a multi-objective optimization technique based on OR (operations research) practices. We adopt a simple hierarchy in which relay nodes forward data messages from cluster head to the sink, thus eliminating the overheads needed to maintain a routing protocol. Simulation results demonstrate that UCCP provides an energy-efficient and scalable solution to meet the application specific QoS demands in resource constrained sensor nodes. Index Terms — wireless sensor networks, unified communication, optimization, clustering and quality of service
- …