202 research outputs found

    Simulation models of shared-memory multiprocessor systems

    Get PDF

    Reduction of false sharing by using process affinity in page-based distributed shared memory mutiprocessor systems

    Get PDF
    In page-based distributed shared memory systems, a large page size makes efficient use of interconnection network, but increases the chance of false sharing, while a small page size reduces the level of false sharing but results in an inefficient use of the network. This paper proposes a technique that uses process affinity to achieve data pages clustering so as to optimize the temporal data locality on DSM systems, and therefore reduces the chance of false sharing and improves the data locality. To quantify the degree of process affinity for a piece of data, a measure called process affinity index is used that indicates the closeness between this piece of data and the process. Simulation results show that process affinity technique improves the execution performance as page size increases due to the effective reduction of fair sharing. In the best case an order of magnitude performance improvement is achieved.published_or_final_versio

    Simulation Of Multi-core Systems And Interconnections And Evaluation Of Fat-Mesh Networks

    Get PDF
    Simulators are very important in computer architecture research as they enable the exploration of new architectures to obtain detailed performance evaluation without building costly physical hardware. Simulation is even more critical to study future many-core architectures as it provides the opportunity to assess currently non-existing computer systems. In this thesis, a multiprocessor simulator is presented based on a cycle accurate architecture simulator called SESC. The shared L2 cache system is extended into a distributed shared cache (DSC) with a directory-based cache coherency protocol. A mesh network module is extended and integrated into SESC to replace the bus for scalable inter-processor communication. While these efforts complete an extended multiprocessor simulation infrastructure, two interconnection enhancements are proposed and evaluated. A novel non-uniform fat-mesh network structure similar to the idea of fat-tree is proposed. This non-uniform mesh network takes advantage of the average traffic pattern, typically all-to-all in DSC, to dedicate additional links for connections with heavy traffic (e.g., near the center) and fewer links for lighter traffic (e.g., near the periphery). Two fat-mesh schemes are implemented based on different routing algorithms. Analytical fat-mesh models are constructed by presenting the expressions for the traffic requirements of personalized all-to-all traffic. Performance improvements over the uniform mesh are demonstrated in the results from the simulator. A hybrid network consisting of one packet switching plane and multiple circuit switching planes is constructed as the second enhancement. The circuit switching planes provide fast paths between neighbors with heavy communication traffic. A compiler technique that abstracts the symbolic expressions of benchmarks' communication patterns can be used to help facilitate the circuit establishment

    Multigrain shared memory

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1998.Includes bibliographical references (p. 197-203).by Donald Yeung.Ph.D

    Exploring the value of supporting multiple DSM protocols in Hardware DSM Controllers

    Get PDF
    Journal ArticleThe performance of a hardware distributed shared memory (DSM) system is largely dependent on its architect's ability to reduce the number of remote memory misses that occur. Previous attempts to solve this problem have included measures such as supporting both the CC-NUMA and S-COMA architectures is the same machine and providing a programmable DSM controller that can emulate any DSM mechanism. In this paper we first present the design of a DSM controller that supports multiple DSM protocols in custom hardware, and allows the programmer or compiler to specify on a per-variable basis what protocol to use to keep that variable coherent. This simulated performance of this DSM controller compares favorably with that of conventional single-protocol custom hardware designs, often outperforming the conventional systems by a factor of two. To achieve these promising results, that multi-protocol DSM controller needed to support only two DSM architectures (CC-NUMA and S-COMA) and three coherency protocols (both release and sequentially consistent write invalidate and release consistent write update). This work demonstrates the value of supporting a degree of flexibility in one's DSM controller design and suggests what operations such a flexible DSM controller should support

    Khazana An infrastructure for building distributed services

    Get PDF
    technical reportEssentially all distributed systems?? applications?? and services at some level boil down to the problem of man aging distributed shared state Unfortunately?? while the problem of managing distributed shared state is shared by many applications?? there is no common means of managing the data every application devises its own solution We have developed Khazana?? a distributed service exporting the abstraction of a distributed per sistent globally shared store that applications can use to store their shared state Khazana is responsible for performing many of the common operations needed by distributed applications?? including replication?? consis tency management?? fault recovery?? access control?? and location management Using Khazana as a form of middleware?? distributed applications can be quickly de veloped from corresponding uniprocessor applications through the insertion of Khazana data access and syn chronization operation

    Distributed data structure for factored operating systems

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2012.Cataloged from PDF version of thesis.Includes bibliographical references (p. 151-158).Future computer architectures will likely exhibit increased parallelism through the addition of more processor cores. Architectural trends such as exponentially increasing parallelism and the possible lack of scalable shared memory motivate the reevaluation of operating system design. This thesis work takes place in the context of Factored Operating Systems which leverage distributed system ideas to increase the scalability of multicore processor operating systems. fos, a Factored Operating System, explores a new design point for operating systems where traditional low-level operating system services are fine-grain parallelized while internally only using explicit message passing for communication. fos factors an operating system first by system service and then further parallelizes inside of the system service by splitting the service into a fleet of server processes which communicate via messaging. Constructing parallel low-level operating system services which only internally use messaging is challenging because shared resources must be partitioned across servers and the services must provide scalable performance when met with uneven demand. To ease the construction of parallel fos system services, this thesis develops the dPool distributed data structure. The dPool data structure provides concurrent access to an unordered collection of elements by server processes within a fos fleet. Internal to a single dPool instance, all communication between different portions of a dPool is done via messaging. This thesis uses the dPool data structure within the parallel fos Physical Memory Allocation fleet and demonstrates that it is possible to use a dPool to manage shared state in a factored operating system's physical page allocator. This thesis begins by presenting the design of the prototype fos operating system. In the context of fos system service fleets, this thesis describes the dPool data structure, its design, different implementations, and interfaces. The dPool data structure is shown to achieve scalability across even and uneven micro-benchmark workloads. This thesis shows that common parallel and distributed programming techniques apply to the creation of dPool and that background threads within a dPool can increase performance. Finally, this thesis evaluates different dPool implementations and demonstrates that intelligently pushing elements between dPool parts can increase scalability.by David Wentzlaff.Ph.D

    Khazana an infrastructure for building distributed services

    Get PDF
    technical reportEssentially all distributed systems, applications and service at some level boil down to the problem of managing distributed shared state. Unfortunately, while the problem of managing distributed shared state is shared by man applications, there is no common means of managing the data - every application devises its own solution. We have developed Khazana, a distributed service exporting the abstraction of a distributed persistent globally hared store that applications can use to store their shared state. Khazana is responsible for performing many of the common operations needed by distributed applications, including replication, consistency management, fault recovery, access control, and location management. Using Khazana as a form of middleware, distributed applications can be quickly developed from corresponding uniprocessor applications through the insertion of Khazana data access and synchronization operations
    corecore