Task-based programming models have gained a lot of attention for being able to explore high parallelism over multicore and manycore, while hiding the difficulties of parallel programming. For applications with moderate size tasks, performance gains are assured by using these programming models. While for more parallelism by using smaller and more tasks, the performance degrades as a result of runtime overheads. To speed up the runtime, we present a hardware accelerator, Picos Hardware to accelerate task dependence management and scheduling. In this work, we show the performance of the first Picos Hardware prototype realized in a Zynq 7000 All-Programmable SoC by using real benchmarks. Results show that our hardware support greatly outperforms the software-only implementation currentlyavailable in the runtime system for fine-grained tasks