Distributed applications tend to have a complex design due to issues such as concurrency, synchronization and communication. Researchers in the past have proposed simpler abstractions to hide these complexities. However, many of the proposed techniques use messaging protocols which incur high overhead and are not very scalable. To address these limitations, in our previous work , we proposed an efficient Distributed Data Sharing Substrate (DDSS) using the features of high-speed networks. In this paper, we propose several design optimizations for DDSS in multi-core systems such as the combination of shared memory and message queues for inter-process communication, dedicated thread for communication progress and for onloading DDSS operations such as get and put. Our micro-benchmark results not only show a very low latency in DDSS operations but also demonstrate the scalability of DDSS with increasing number of processes. Application evaluations with R-Tree and B-Tree query processing and distributed STORM shows an improvement of up to 56%, 45 % and 44%, respectively, as compared to the traditional implementations,whileevaluationswithapplicationcheckpointing using DDSS demonstrate the scalability with increasing number of checkpointing applications. Further, in our evaluations, we demonstrate the portability of DDSS across multiple modern interconnects such as InfiniBand and iWARP-capable 10-Gigabit Ethernet networks (applicable for both LAN/WAN environments).