1,872 research outputs found
AirSync: Enabling Distributed Multiuser MIMO with Full Spatial Multiplexing
The enormous success of advanced wireless devices is pushing the demand for
higher wireless data rates. Denser spectrum reuse through the deployment of
more access points per square mile has the potential to successfully meet the
increasing demand for more bandwidth. In theory, the best approach to density
increase is via distributed multiuser MIMO, where several access points are
connected to a central server and operate as a large distributed multi-antenna
access point, ensuring that all transmitted signal power serves the purpose of
data transmission, rather than creating "interference." In practice, while
enterprise networks offer a natural setup in which distributed MIMO might be
possible, there are serious implementation difficulties, the primary one being
the need to eliminate phase and timing offsets between the jointly coordinated
access points.
In this paper we propose AirSync, a novel scheme which provides not only time
but also phase synchronization, thus enabling distributed MIMO with full
spatial multiplexing gains. AirSync locks the phase of all access points using
a common reference broadcasted over the air in conjunction with a Kalman filter
which closely tracks the phase drift. We have implemented AirSync as a digital
circuit in the FPGA of the WARP radio platform. Our experimental testbed,
comprised of two access points and two clients, shows that AirSync is able to
achieve phase synchronization within a few degrees, and allows the system to
nearly achieve the theoretical optimal multiplexing gain. We also discuss MAC
and higher layer aspects of a practical deployment. To the best of our
knowledge, AirSync offers the first ever realization of the full multiuser MIMO
gain, namely the ability to increase the number of wireless clients linearly
with the number of jointly coordinated access points, without reducing the per
client rate.Comment: Submitted to Transactions on Networkin
Code transformations based on speculative SDC scheduling
Code motion and speculations are usually exploited in the High Level Synthesis of control dominated applications to improve the performances of the synthesized designs. Selecting the transformations to be applied is not a trivial task: their effects can indeed indirectly spread across the whole design, potentially worsening the quality of the results.
In this paper we propose a code transformation flow, based on a new extension of the System of Difference Constraints (SDC) scheduling algorithm, which introduces a large number of transformations, whose profitability is guaranteed by SDC formulation. Experimental results show that the proposed technique in average reduces the execution time of control dominated applications by 37% with respect to a commercial tool without increasing the area usage
Advances in ILP-based Modulo Scheduling for High-Level Synthesis
In today's heterogenous computing world, field-programmable gate arrays (FPGA) represent the energy-efficient alternative to generic processor cores and graphics accelerators. However, due to their radically different computing model, automatic design methods, such as high-level synthesis (HLS), are needed to harness their full power. HLS raises the abstraction level to behavioural descriptions of algorithms, thus freeing designers from dealing with tedious low-level concerns, and enabling a rapid exploration of different microarchitectures for the same input specification. In an HLS tool, scheduling is the most influential step for the performance of the generated accelerator. Specifically, modulo schedulers enable a pipelined execution, which is a key technique to speed up the computation by extracting more parallelism from the input description.
In this thesis, we make a case for the use of integer linear programming (ILP) as a framework for modulo scheduling approaches.
First, we argue that ILP-based modulo schedulers are practically usable in the HLS context. Secondly, we show that the ILP framework enables a novel approach for the automatic design of FPGA accelerators.
We substantiate the first claim by proposing a new, flexible ILP formulation for the modulo scheduling problem, and evaluate it experimentally with a diverse set of realistic test instances. While solving an ILP may incur an exponential runtime in the worst case, we observe that simple countermeasures, such as setting a time limit, help to contain the practical impact of outlier instances. Furthermore, we present an algorithm to compress problems before the actual scheduling.
An HLS-generated microarchitecture is comprised of operators, i.e. single-purpose functional units such as a floating-point multiplier. Usually, the allocation of operators is determined before scheduling, even though both problems are interdependent. To that end, we investigate an extension of the modulo scheduling problem that combines both concerns in a single model. Based on the extension, we present a novel multi-loop scheduling approach capable of finding the fastest microarchitecture that still fits on a given FPGA device - an optimisation problem that current commercial HLS tools cannot solve. This proves our second claim
Optimality of Treating Interference as Noise: A Combinatorial Perspective
For single-antenna Gaussian interference channels, we re-formulate the
problem of determining the Generalized Degrees of Freedom (GDoF) region
achievable by treating interference as Gaussian noise (TIN) derived in [3] from
a combinatorial perspective. We show that the TIN power control problem can be
cast into an assignment problem, such that the globally optimal power
allocation variables can be obtained by well-known polynomial time algorithms.
Furthermore, the expression of the TIN-Achievable GDoF region (TINA region) can
be substantially simplified with the aid of maximum weighted matchings. We also
provide conditions under which the TINA region is a convex polytope that relax
those in [3]. For these new conditions, together with a channel connectivity
(i.e., interference topology) condition, we show TIN optimality for a new class
of interference networks that is not included, nor includes, the class found in
[3].
Building on the above insights, we consider the problem of joint link
scheduling and power control in wireless networks, which has been widely
studied as a basic physical layer mechanism for device-to-device (D2D)
communications. Inspired by the relaxed TIN channel strength condition as well
as the assignment-based power allocation, we propose a low-complexity
GDoF-based distributed link scheduling and power control mechanism (ITLinQ+)
that improves upon the ITLinQ scheme proposed in [4] and further improves over
the heuristic approach known as FlashLinQ. It is demonstrated by simulation
that ITLinQ+ provides significant average network throughput gains over both
ITLinQ and FlashLinQ, and yet still maintains the same level of implementation
complexity. More notably, the energy efficiency of the newly proposed ITLinQ+
is substantially larger than that of ITLinQ and FlashLinQ, which is desirable
for D2D networks formed by battery-powered devices.Comment: A short version has been presented at IEEE International Symposium on
Information Theory (ISIT 2015), Hong Kon
High-level synthesis of fine-grained weakly consistent C concurrency
High-level synthesis (HLS) is the process of automatically compiling high-level programs into a netlist (collection of gates). Given an input program, HLS tools exploit its inherent parallelism and pipelining opportunities to generate efficient customised hardware. C-based programs are the most popular input for HLS tools, but these tools historically only synthesise sequential C programs. As the appeal for software concurrency rises, HLS tools are beginning to synthesise concurrent C programs, such as C/C++ pthreads and OpenCL. Although supporting software concurrency leads to better hardware parallelism, shared memory synchronisation is typically serialised to ensure correct memory behaviour, via locks. Locks are safety resources that ensure exclusive access of shared memory, eliminating data races and providing synchronisation guarantees for programmers.Ā
As an alternative to lock-based synchronisation, the C memory model also defines the possibility of lock-free synchronisation via fine-grained atomic operations (`atomics'). However, most HLS tools either do not support atomics at all or implement atomics using locks. Instead, we treat the synthesis of atomics as a scheduling problem. We show that we can augment the intra-thread memory constraints during memory scheduling of concurrent programs to support atomics. On average, hardware generated by our method is 7.5x faster than the state-of-the-art, for our set of experiments.
Our method of synthesising atomics enables several unique possibilities. Chiefly, we are capable of supporting weakly consistent (`weak') atomics, which necessitate fewer ordering constraints compared to sequentially consistent (SC) atomics. However, implementing weak atomics is complex and error-prone and hence we formally verify our methods via automated model checking to ensure our generated hardware is correct. Furthermore, since the C memory model defines memory behaviour globally, we can globally analyse the entire program to generate its memory constraints. Additionally, we can also support loop pipelining by extending our methods to generate inter-iteration memory constraints. On average, weak atomics, global analysis and loop pipelining improve performance by 1.6x, 3.4x and 1.4x respectively, for our set of experiments. Finally, we present a case study of a real-world example via an HLS-based Google PageRank algorithm, whose performance improves by 4.4x via lock-free streaming and work-stealing.Open Acces
Autonomous Vehicle Coordination with Wireless Sensor and Actuator Networks
A coordinated team of mobile wireless sensor and actuator nodes can bring numerous benefits for various applications in the field of cooperative surveillance, mapping unknown areas, disaster management, automated highway and space exploration. This article explores the idea of mobile nodes using vehicles on wheels, augmented with wireless, sensing, and control capabilities. One of the vehicles acts as a leader, being remotely driven by the user, the others represent the followers. Each vehicle has a low-power wireless sensor node attached, featuring a 3D accelerometer and a magnetic compass. Speed and orientation are computed in real time using inertial navigation techniques. The leader periodically transmits these measures to the followers, which implement a lightweight fuzzy logic controller for imitating the leader's movement pattern. We report in detail on all development phases, covering design, simulation, controller tuning, inertial sensor evaluation, calibration, scheduling, fixed-point computation, debugging, benchmarking, field experiments, and lessons learned
- ā¦