The importance of parallel programming is increasing year after year since the power wall popularized
multi-core processors, and with them, shared memory parallel programming models. In
particular, task-based programming models, like the standard OpenMP 4.0, have become more
and more important. They allow describing a set of data dependences per task that the runtime
uses to order the execution of tasks. This order is calculated using shared graphs, which are updated
by all threads but in exclusive access using synchronization mechanisms (locks) to ensure
the dependences correctness. Although exclusive accesses are necessary to avoid data race conditions,
those may imply contention that limits the application parallelism. This becomes critical
in many-core systems because several threads may be wasting computation resources waiting to
access the runtime structures.
This master thesis introduces the concept of an asynchronous runtime management suitable
for task-based programming model runtimes. The runtime proposal is based on the asynchronous
management of the runtime structures like task dependence graphs. Therefore, the application
threads request actions to the runtime instead of directly executing the needed modifications.
The requests are then handled by a runtime manager which can be implemented in different ways.
This master thesis presents an extension to a previously implemented centralized runtime
manager and presents a novel implementation of a distributed runtime manager. On one hand,
the runtime design based on a centralized manager [1] is extended to dynamically adapt the
runtime behavior according to the manager load with the objective of being as fast as possible.
On the other hand, a novel runtime design based on a distributed manager implementation is
proposed to overcome the limitations observed in the centralized design. The distributed runtime
implementation allows any thread to become a runtime manager thread if it helps to exploit the
application parallelism. That is achieved using a new runtime feature, also implemented in this
master thesis, for runtime functionality dispatching through a callback system.
The proposals are evaluated in different many-core architectures and their performance is
compared against the baseline runtimes used to implement the asynchronous versions. Results
show that the centralized manager extension can overcome the hard limitations of the initial
basic implementation, that the distributed manager fixes the observed problems in previous implementation,
and the proposed asynchronous organization significantly outperforms the speedup
obtained by the original runtime for real benchmarks