Nowadays embedded systems are increasingly used in the world of distributed computing to provide more
computational power without having to change the whole system and the programming model. We
propose a DataFlow Execution Engine (DEE) to spawn asynchronous, data-driven threads, among
embedded cores to achieve a seamless distribution of threads without the need of using a distributed
programming model. Our idea relies on the creation of a hardware scheduler that can handle creation,
thread-dependency, and locality of many fine-grained tasks. We present an initial evaluation of our DEE
that is suited for FPGA implementation. Our initial results show the importance of a hardware based
support for such thread execution model