Power-efficient architectures have become the most important feature required for future embedded systems. Modern
designs, like those released on mobile devices, reveal that clusterization is the way to improve energy efficiency. However, such
architectures are still limited by the memory subsystem (i.e., memory latency problems). This work investigates an alternative
approach that exploits on-chip data locality to a large extent, through distributed shared memory systems that permit efficient
reuse of on-chip mapped data in clusterized many-core architectures. First, this work reviews the current literature on memory
allocations and explore the limitations of cluster-based many-core architectures. Then, several memory allocations are introduced
and benchmarked scalability, performance and energy-wise, compared to the conventional centralized shared memory solution to
reveal which memory allocation is the most appropriate for future mobile architectures. Our results show that distributed shared
memory allocations bring performance gains and opportunities to reduce energy consumption