188 research outputs found

    Mascot: Microarchitecture Synthesis of Control Paths

    Get PDF
    This paper presents MASCOT (MicroArchitecture Synthesis of ConTrol paths). This synthesis system constructs the optimal microarchitecture for a control path of an instruction set processor. Input to the system is the behavioural specification of a control path. This specification is in finite state machine form which is mapped initially onto a single programmed logic array (PLA) microarchitecture. The synthesis strategy then applies a sequence of decompositions on this initial microarchitecture. This strategy follows a decision scheme until all design objectives are met. It transforms the initial microarchitecture into a complex microarchitecture of several PLAs and ROMs. Where it is impossible to meet the design objectives, the system constructs a microarchitecture which comes as close as possible to given design objectives. Design objectives are allowed on floorplan dimensions and delay. Our strategy integrates a number of known optimization methods for specific microarchitectures. Therefore this synthesis method explores a larger part of the design space than do other control path synthesis methods. Other methods are mostly bound to one microarchitecture which they optimize. Our system is not only very flexible in microarchitecture construction but also open for extension by other optimizations

    Stepwise decomposition in controlpath synthesis

    Get PDF
    A method is presented for the synthesis of the microarchitecture of controlpaths. This method is called stepwise decomposition. It focuses primarily on controlpaths of instruction set processors, however it is also applicable for more general Finite State Machine synthesis. Many of the current controlpath synthesis algorithms are based on a fixed microarchitecture, and an optimization of that microarchitecture. This stepwise decomposition method is able to synthesize microarchitectures in a range from a single PLA to multiple PLA/ROM configurations and optionally further down to hardwired, which makes it more flexible and better suited to a wider range of controlpaths than current synthesis methods. A sequence of decomposition steps, from coarse to detailed, is performed on the design to move it to the area of the design space where all constraints on space, floorplan and delay are satisfied. The method is currently implemented in APL

    Instruction replication for clustered microarchitectures

    Get PDF
    This work presents a new compilation technique that uses instruction replication in order to reduce the number of communications executed on a clustered microarchitecture. For such architectures, the need to communicate values between clusters can result in a significant performance loss. Inter-cluster communications can be reduced by selectively replicating an appropriate set of instructions. However, instruction replication must be done carefully since it may also degrade performance due to the increased contention it can place on processor resources. The proposed scheme is built on top of a previously proposed state-of-the-art modulo scheduling algorithm that effectively reduces communications. Results show that the number of communications can decrease using replication, which results in significant speed-ups. IPC is increased by 25% on average for a 4-cluster microarchitecture and by as mush as 70% for selected programs.Peer ReviewedPostprint (published version

    Modulo scheduling for a fully-distributed clustered VLIW architecture

    Get PDF
    Clustering is an approach that many microprocessors are adopting in recent times in order to mitigate the increasing penalties of wire delays. We propose a novel clustered VLIW architecture which has all its resources partitioned among clusters, including the cache memory. A modulo scheduling scheme for this architecture is also proposed. This algorithm takes into account both register and memory inter-cluster communications so that the final schedule results in a cluster assignment that favors cluster locality in cache references and register accesses. It has been evaluated for both 2- and 4-cluster configurations and for differing numbers and latencies of inter-cluster buses. The proposed algorithm produces schedules with very low communication requirements and outperforms previous cluster-oriented schedulers.Peer ReviewedPostprint (published version

    An automatic microprogramming system.

    Get PDF
    by Wu Kam-wah.Bibliography: leaves [129]-[130]Thesis (M.Ph.)--Chinese University of Hong Kong, 198

    Memory Utilization for a Dynamically Microprogrammed Computer

    Get PDF
    A Particular, Dynamically Microprogrammed Computer (Proposed by Tucker and Flynn in Commun. of ACM, April 1971) is Considered with Respect to Main Memory and Micro-Memory Utilization. a Dependency is Shown between Memory Utilization and Utilization of the Arithmetic and Logic Unit

    The effectiveness of loop unrolling for modulo scheduling in clustered VLIW architectures

    Get PDF
    Clustered organizations are becoming a common trend in the design of VLIW architectures. In this work we propose a novel modulo scheduling approach for such architectures. The proposed technique performs the cluster assignment and the instruction scheduling in a single pass, which is shown to be more effective than doing first the assignment and later the scheduling. We also show that loop unrolling significantly enhances the performance of the proposed scheduler especially when the communication channel among clusters is the main performance bottleneck. By selectively unrolling some loops, we can obtain the best performance with the minimum increase in code size. Performance evaluation for the SPECfp95 shows that the clustered architecture achieves about the same IPC (Instructions Per Cycle) as a unified architecture with the same resources. Moreover when the cycle time is taken into account, a 4-cluster configurations is 3.6 times faster than the unified architecture.Peer ReviewedPostprint (published version

    MIDL: a microinstruction description language : (preprint)

    Get PDF
    • …
    corecore