1,961 research outputs found

    Interval simulation: raising the level of abstraction in architectural simulation

    Get PDF
    Detailed architectural simulators suffer from a long development cycle and extremely long evaluation times. This longstanding problem is further exacerbated in the multi-core processor era. Existing solutions address the simulation problem by either sampling the simulated instruction stream or by mapping the simulation models on FPGAs; these approaches achieve substantial simulation speedups while simulating performance in a cycle-accurate manner This paper proposes interval simulation which rakes a completely different approach: interval simulation raises the level of abstraction and replaces the core-level cycle-accurate simulation model by a mechanistic analytical model. The analytical model estimates core-level performance by analyzing intervals, or the timing between two miss events (branch mispredictions and TLB/cache misses); the miss events are determined through simulation of the memory hierarchy, cache coherence protocol, interconnection network and branch predictor By raising the level of abstraction, interval simulation reduces both development time and evaluation time. Our experimental results using the SPEC CPU2000 and PARSEC benchmark suites and the MS multi-core simulator show good accuracy up to eight cores (average error of 4.6% and max error of 11% for the multi-threaded full-system workloads), while achieving a one order of magnitude simulation speedup compared to cycle-accurate simulation. Moreover interval simulation is easy to implement: our implementation of the mechanistic analytical model incurs only one thousand lines of code. Its high accuracy, fast simulation speed and ease-of-use make interval simulation a useful complement to the architect's toolbox for exploring system-level and high-level micro-architecture trade-offs

    Exploring Task Mappings on Heterogeneous MPSoCs using a Bias-Elitist Genetic Algorithm

    Get PDF
    Exploration of task mappings plays a crucial role in achieving high performance in heterogeneous multi-processor system-on-chip (MPSoC) platforms. The problem of optimally mapping a set of tasks onto a set of given heterogeneous processors for maximal throughput has been known, in general, to be NP-complete. The problem is further exacerbated when multiple applications (i.e., bigger task sets) and the communication between tasks are also considered. Previous research has shown that Genetic Algorithms (GA) typically are a good choice to solve this problem when the solution space is relatively small. However, when the size of the problem space increases, classic genetic algorithms still suffer from the problem of long evolution times. To address this problem, this paper proposes a novel bias-elitist genetic algorithm that is guided by domain-specific heuristics to speed up the evolution process. Experimental results reveal that our proposed algorithm is able to handle large scale task mapping problems and produces high-quality mapping solutions in only a short time period.Comment: 9 pages, 11 figures, uses algorithm2e.st

    Simulation Of Multi-core Systems And Interconnections And Evaluation Of Fat-Mesh Networks

    Get PDF
    Simulators are very important in computer architecture research as they enable the exploration of new architectures to obtain detailed performance evaluation without building costly physical hardware. Simulation is even more critical to study future many-core architectures as it provides the opportunity to assess currently non-existing computer systems. In this thesis, a multiprocessor simulator is presented based on a cycle accurate architecture simulator called SESC. The shared L2 cache system is extended into a distributed shared cache (DSC) with a directory-based cache coherency protocol. A mesh network module is extended and integrated into SESC to replace the bus for scalable inter-processor communication. While these efforts complete an extended multiprocessor simulation infrastructure, two interconnection enhancements are proposed and evaluated. A novel non-uniform fat-mesh network structure similar to the idea of fat-tree is proposed. This non-uniform mesh network takes advantage of the average traffic pattern, typically all-to-all in DSC, to dedicate additional links for connections with heavy traffic (e.g., near the center) and fewer links for lighter traffic (e.g., near the periphery). Two fat-mesh schemes are implemented based on different routing algorithms. Analytical fat-mesh models are constructed by presenting the expressions for the traffic requirements of personalized all-to-all traffic. Performance improvements over the uniform mesh are demonstrated in the results from the simulator. A hybrid network consisting of one packet switching plane and multiple circuit switching planes is constructed as the second enhancement. The circuit switching planes provide fast paths between neighbors with heavy communication traffic. A compiler technique that abstracts the symbolic expressions of benchmarks' communication patterns can be used to help facilitate the circuit establishment

    QUERIES SERVICE TIME RESEARCH AND ESTIMATION DURING INFORMATION EXCHANGE IN MULTIPROCESSOR SYSTEMS WITH “UNI BUS” INTERFACE AND SHARED MEMORY

    Get PDF
    Abstract. The issues connected with estimating service time of queries (transactions) during the information exchange in multiprocessor systems with a unibus interface and shared memory are analyzed and studied in the article. The article aims at developing and making research of models based on systems and queueing networks, the "processor-memory" subsystem, as well as estimating the queries service time during the information exchange in multiprocessor systems with shared memory. The subject matter of the study is the analysis of time delays associated with conflict situations occured during the realization of interprocessor exchange when many processors turn to the exchange unibusand memory. The object of the article research is the "processor-memory" subsystem of existing multiprocessor systems and well-known versions of the architecture of this subsystem . The main task defined by the authors of the scientific article is to develop and make research of mathematical models of the "processor-memory" subsystem of the mentioned systems and to estimate the processing time of inputting queries during the information exchange in systems with shared memory. Mathematical models for carrying out queries service time research have been proposed. Equations fordetermining the main probabilistic-temporal characteristics of the "processor-memory" subsystem have been presented. The mentioned probabilistic-temporal models have been developed using the theory of queueing networks and probability theory. In conclusion the authors make the main judgements about the work done. The mathematical models studied in the article make it possible to estimate the main probabilistic-temporal characteristics of multiprocessor systems without developing real models or prototypes. As a result some effect is achieved, because it is possible to estimate thecharacteristics of new multiprocessor computer systems and choose the most optimal ones without creating a real expensive systemKeywords: simulation, analytical model, imitation model, queueing network system, transaction, read-operation, record-operation, multiprocessor system, “processor-memory” subsystem, memory architecture, memory bandwidth, memory controller, memory latency, buffer element
    corecore