117 research outputs found

    NOC-Out: Microarchitecting a Scale-Out Processor

    Get PDF
    Scale-out server workloads benefit from many-core processor organizations that enable high throughput thanks to abundant request-level parallelism. A key characteristic of these workloads is the large instruction footprint that exceeds the capacity of private caches. While a shared last-level cache (LLC) can capture the instruction working set, it necessitates a low-latency interconnect fabric to minimize the core stall time on instruction fetches serviced by the LLC. Many-core processors with a mesh interconnect sacrifice performance on scale-out workloads due to NOC-induced delays. Low diameter topologies can overcome the performance limitations of meshes through rich inter-node connectivity, but at a high area expense. To address the drawbacks of existing designs, this work introduces NOC-Out – a many-core processor organization that affords low LLC access delays at a small area cost. NOC-Out is tuned to accommodate the bilateral core-to-cache access pattern, characterized by minimal coherence activity and lack of inter-core communication, that is dominant in scale-out workloads. Optimizing for the bilateral access pattern, NOC-Out segregates cores and LLC banks into distinct network regions and reduces costly network connectivity by eliminating the majority of inter-core links. NOC-Out further simplifies the interconnect through the use of low-complexity tree based topologies. A detailed evaluation targeting a 64-core CMP and a set of scale-out workloads reveals that NOC-Out improves system performance by 17% and reduces network area by 28% over a tiled mesh-based design. Compared to a design with a richly-connected flattened butterfly topology, NOC-Out reduces network area by 9x while matching the performance

    Time-of-arrival formalism for the relativistic particle

    Get PDF
    A suitable operator for the time-of-arrival at a detector is defined for the free relativistic particle in 3+1 dimensions. For each detector position, there exists a subspace of detected states in the Hilbert space of solutions to the Klein Gordon equation. Orthogonality and completeness of the eigenfunctions of the time-of-arrival operator apply inside this subspace, opening up a standard probabilistic interpretation.Comment: 16 pages, no figures, uses LaTeX. The section "Interpretation" has been completely rewritten and some errors correcte

    Measurement of Time-of-Arrival in Quantum Mechanics

    Get PDF
    It is argued that the time-of-arrival cannot be precisely defined and measured in quantum mechanics. By constructing explicit toy models of a measurement, we show that for a free particle it cannot be measured more accurately then ΔtA∌1/Ek\Delta t_A \sim 1/E_k, where EkE_k is the initial kinetic energy of the particle. With a better accuracy, particles reflect off the measuring device, and the resulting probability distribution becomes distorted. It is shown that a time-of-arrival operator cannot exist, and that approximate time-of-arrival operators do not correspond to the measurements considered here.Comment: References added. To appear in Phys. Rev.

    FADE: A programmable filtering accelerator for instruction-grain monitoring

    Get PDF
    Instruction-grain monitoring is a powerful approach that enables a wide spectrum of bug-finding tools. As existing software approaches incur prohibitive runtime overhead, researchers have focused on hardware support for instruction-grain monitoring. A recurring theme in recent work is the use of hardware-assisted filtering so as to elide costly software analysis. This work generalizes and extends prior point solutions into a programmable filtering accelerator affording vast flexibility and at-speed event filtering. The pipelined microarchitecture of the accelerator affords a peak filtering rate of one application event per cycle, which suffices to keep up with an aggressive OoO core running the monitored application. A unique feature of the proposed design is the ability to dynamically resolve dependencies between unfilterable events and subsequent events, eliminating data-dependent stalls and maximizing accelerator’s performance. Our evaluation results show a monitoring slowdown of just 1.2-1.8x across a diverse set of monitoring tools

    Fat Caches for Scale-Out Servers

    Get PDF
    The authors propose a high-capacity cache architecture that leverages emerging high-bandwidth memory modules. High-capacity caches capture the secondary data working sets of scale-out workloads while uncovering significant spatiotemporal locality across data objects. Unlike state-of-theart dram caches employing in-memory block-level metadata, the proposed cache is organized in pages, enabling a practical tag array, which can be implemented in the logic die of the high-bandwidth memory modules

    Time-of-Arrival States

    Get PDF
    Although one can show formally that a time-of-arrival operator cannot exist, one can modify the low momentum behaviour of the operator slightly so that it is self-adjoint. We show that such a modification results in the difficulty that the eigenstates are drastically altered. In an eigenstate of the modified time-of-arrival operator, the particle, at the predicted time-of-arrival, is found far away from the point of arrival with probability 1/2.Comment: 15 pages, 2 figure

    Relational evolution of the degrees of freedom of generally covariant quantum theories

    Get PDF
    We study the classical and quantum dynamics of generally covariant theories with vanishing a Hamiltonian and with a finite number of degrees of freedom. In particular, the geometric meaning of the full solution of the relational evolution of the degrees of freedom is displayed, which means the determination of the total number of evolving constants of motion required. Also a method to find evolving constants is proposed. The generalized Heinsenberg picture needs M time variables, as opposed to the Heisenberg picture of standard quantum mechanics where one time variable t is enough. As an application, we study the parameterized harmonic oscillator and the SL(2,R) model with one physical degree of freedom that mimics the constraint structure of general relativity where a Schrodinger equation emerges in its quantum dynamics.Comment: 25 pages, no figures, Latex file. Revised versio

    Bohmian arrival time without trajectories

    Full text link
    The computation of detection probabilities and arrival time distributions within Bohmian mechanics in general needs the explicit knowledge of a relevant sample of trajectories. Here it is shown how for one-dimensional systems and rigid inertial detectors these quantities can be computed without calculating any trajectories. An expression in terms of the wave function and its spatial derivative, both restricted to the boundary of the detector's spacetime volume, is derived for the general case, where the probability current at the detector's boundary may vary its sign.Comment: 20 pages, 12 figures; v2: reference added, extended introduction, published versio

    CCNoC: Specializing On-Chip Interconnects for Energy Efficiency in Cache-Coherent Servers

    Get PDF
    Abstract—Manycore chips are emerging as the architecture of choice to provide power efficiency and improve performance, while riding Moore’s Law. In these architectures, on-chip interconnects play a pivotal role in ensuring power and performance scalability. As supply voltages begin to level off in future technologies, chip designs in general and interconnects in particular will require specialization to meet power and performance objectives. In this work, we make the observation that cache-coherent manycore server chips exhibit a duality in on-chip network traffic. Request traffic largely consists of simple control messages, while response traffic often carries cache-block-sized payloads. We present Cache-Coherence Network-on-Chip (CCNoC), a design that specializes the NoC to fit the demands of server workloads via a pair of asymmetric networks tuned to the type of traffic traversing them. The networks differ in their datapath width, router microarchitecture, flow control strategy, and delay. The resulting heterogeneous CCNoC architecture enables significant gains in power efficiency over conventional NoC designs at similar performance levels. Our evaluation reveals that a 4x4 mesh-based chip multiprocessor with the proposed CCNoC organization running commercial server workloads is 15-28 % more energy efficient than various state-of-the-art singleand dual-network organizations. I

    Relational time in generally covariant quantum systems: four models

    Get PDF
    We analize the relational quantum evolution of generally covariant systems in terms of Rovelli's evolving constants of motion and the generalized Heisenberg picture. In order to have a well defined evolution, and a consistent quantum theory, evolving constants must be self-adjoint operators. We show that this condition imposes strong restrictions to the choices of the clock variables. We analize four cases. The first one is non- relativistic quantum mechanics in parametrized form. We show that, for the free particle case, the standard choice of time is the only one leading to self-adjoint evolving constants. Secondly, we study the relativistic case. We show that the resulting quantum theory is the free particle representation of the Klein Gordon equation in which the position is a perfectly well defined quantum observable. The admissible choices of clock variables are the ones leading to space-like simultaneity surfaces. In order to mimic the structure of General Relativity we study the SL(2R) model with two Hamiltonian constraints. The evolving constants depend in this case on three independent variables. We show that it is possible to find clock variables and inner products leading to a consistent quantum theory. Finally, we discuss the quantization of a constrained model having a compact constraint surface. All the models considered may be consistently quantized, although some of them do not admit any time choice such that the equal time surfaces are transversal to the orbits.Comment: 18 pages, revtex fil
    • 

    corecore