2,627 research outputs found

    Quarc: an architecture for efficient on-chip communication

    Get PDF
    The exponential downscaling of the feature size has enforced a paradigm shift from computation-based design to communication-based design in system on chip development. Buses, the traditional communication architecture in systems on chip, are incapable of addressing the increasing bandwidth requirements of future large systems. Networks on chip have emerged as an interconnection architecture offering unique solutions to the technological and design issues related to communication in future systems on chip. The transition from buses as a shared medium to networks on chip as a segmented medium has given rise to new challenges in system on chip realm. By leveraging the shared nature of the communication medium, buses have been highly efficient in delivering multicast communication. The segmented nature of networks, however, inhibits the multicast messages to be delivered as efficiently by networks on chip. Relying on extensive research on multicast communication in parallel computers, several network on chip architectures have offered mechanisms to perform the operation, while conforming to resource constraints of the network on chip paradigm. Multicast communication in majority of these networks on chip is implemented by establishing a connection between source and all multicast destinations before the message transmission commences. Establishing the connections incurs an overhead and, therefore, is not desirable; in particular in latency sensitive services such as cache coherence. To address high performance multicast communication, this research presents Quarc, a novel network on chip architecture. The Quarc architecture targets an area-efficient, low power, high performance implementation. The thesis covers a detailed representation of the building blocks of the architecture, including topology, router and network interface. The cost and performance comparison of the Quarc architecture against other network on chip architectures reveals that the Quarc architecture is a highly efficient architecture. Moreover, the thesis introduces novel performance models of complex traffic patterns, including multicast and quality of service-aware communication

    Easy Cases of Deadlock Detection in Train Scheduling

    Get PDF
    A deadlock occurs when two or more trains are preventing each other from moving forward by occupying the required tracks. Deadlocks are rare but pernicious events in railroad operations and, in most cases, are caused by human errors. Recovering is a time-consuming and costly operation, producing large delays and often requiring crew rescheduling and complex switching moves. In practice, most deadlocks involve only two long trains missing their last potential meet location. In this paper, we prove that, for any network configuration, the identification of two-train deadlocks can be performed in polynomial time. This is the first exact polynomial algorithm for such a practically relevant combinatorial problem. We also develop a pseudo-polynomial but efficient oracle that allows real-time early detection and prevention of any (potential) two-train deadlock in the Union Pacific (a U.S. class 1 rail company) railroad network. A deadlock prevention module based on the work in this paper will be put in place at Union Pacific to prevent all deadlocks of this kind.acceptedVersio

    Design tradeoffs for simplicity and efficient verification in the Execution Migration Machine

    Get PDF
    As transistor technology continues to scale, the architecture community has experienced exponential growth in design complexity and significantly increasing implementation and verification costs. Moreover, Moore's law has led to a ubiquitous trend of an increasing number of cores on a single chip. Often, these large-core-count chips provide a shared memory abstraction via directories and coherence protocols, which have become notoriously error-prone and difficult to verify because of subtle data races and state space explosion. Although a very simple hardware shared memory implementation can be achieved by simply not allowing ad-hoc data replication and relying on remote accesses for remotely cached data (i.e., requiring no directories or coherence protocols), such remote-access-based directoryless architectures cannot take advantage of any data locality, and therefore suffer in both performance and energy. Our recently taped-out 110-core shared-memory processor, the Execution Migration Machine (EM[superscript 2]), establishes a new design point. On the one hand, EM[superscript 2] supports shared memory but does not automatically replicate data, and thus preserves the simplicity of directoryless architectures. On the other hand, it significantly improves performance and energy over remote-access-only designs by exploiting data locality at remote cores via fast hardware-level thread migration. In this paper, we describe the design choices made in the EM[superscript 2] chip as well as our choice of design methodology, and discuss how they combine to achieve design simplicity and verification efficiency. Even though EM[superscript 2] is a fairly large design-110 cores using a total of 357 million transistors-the entire chip design and implementation process (RTL, verification, physical design, tapeout) took only 18 man-months

    The Tick Formulation for deadlock detection and avoidance in railways traffic control

    Get PDF
    Wrong dispatching decisions may lead to deadlocks, where trains reciprocally block resources necessary to reach their destinations. It is crucial to develop tools to detect such potential deadlocks on time, in order to reverse the decisions previously taken by dispatchers or to take recovery actions. In this paper we present a new 0,1 linear formulation for detecting deadlocks and optimally park the involved trains to reduce congestion around the affected area. We discuss computational results on some realistic randomly generated instances to show the validity of the approach, as well as its limits.acceptedVersio

    Accommodating Transient Connectivity in Ad Hoc and Mobile Settings

    Get PDF
    Much of the work on networking and communications is based on thepremise that components interact in one of two ways: either they are connected viaa stable wired or wireless network, or they make use of persistent storage repositoriesaccessible to the communicating parties. A new generation of networks raises seri-ous questions about the validity of these fundamental assumptions. In mobile ad hocwireless networks connections are transient and availability of persistent storage is rare.This paper is concerned with achieving communication among mobile devices that maynever find themselves in direct or indirect contact with each other at any point in time.A unique feature of our contribution is the idea of exploiting information associatedwith the motion and availability profiles of the devices making up the ad hoc network.This is the starting point for an investigation into a range of possible solutions whoseessential features are controlled by the manner in which motion profiles are acquiredand the extent to which such knowledge is available across an ad hoc networ

    Solving the single-track train scheduling problem via Deep Reinforcement Learning

    Get PDF
    Every day, railways experience small inconveniences, both on the network and the fleet side, affecting the stability of rail traffic. When a disruption occurs, delays propagate through the network, resulting in demand mismatching and, in the long run, demand loss. When a critical situation arises, human dispatchers distributed over the line have the duty to do their best to minimize the impact of the disruptions. Unfortunately, human operators have a limited depth of perception of how what happens in distant areas of the network may affect their control zone. In recent years, decision science has focused on developing methods to solve the problem automatically, to improve the capabilities of human operators. In this paper, machine learning-based methods are investigated when dealing with the train dispatching problem. In particular, two different Deep Q-Learning methods are proposed. Numerical results show the superiority of these techniques respect to the classical linear Q-Learning based on matrices.Comment: 12 pages, 4 figures (2 b&w

    Scalable directoryless shared memory coherence using execution migration

    Get PDF
    We introduce the concept of deadlock-free migration-based coherent shared memory to the NUCA family of architectures. Migration-based architectures move threads among cores to guarantee sequential semantics in large multicores. Using a execution migration (EM) architecture, we achieve performance comparable to directory-based architectures without using directories: avoiding automatic data replication significantly reduces cache miss rates, while a fast network-level thread migration scheme takes advantage of shared data locality to reduce remote cache accesses that limit traditional NUCA performance. EM area and energy consumption are very competitive, and, on the average, it outperforms a directory-based MOESI baseline by 6.8% and a traditional S-NUCA design by 9.2%. We argue that with EM scaling performance has much lower cost and design complexity than in directory-based coherence and traditional NUCA architectures: by merely scaling network bandwidth from 128 to 256 (512) bit flits, the performance of our architecture improves by an additional 8% (12%), while the baselines show negligible improvement
    corecore