117 research outputs found
NOC-Out: Microarchitecting a Scale-Out Processor
Scale-out server workloads benefit from many-core processor organizations that enable high throughput thanks to abundant request-level parallelism. A key characteristic of these workloads is the large instruction footprint that exceeds the capacity of private caches. While a shared last-level cache (LLC) can capture the instruction working set, it necessitates a low-latency interconnect fabric to minimize the core stall time on instruction fetches serviced by the LLC. Many-core processors with a mesh interconnect sacrifice performance on scale-out workloads due to NOC-induced delays. Low diameter topologies can overcome the performance limitations of meshes through rich inter-node connectivity, but at a high area expense. To address the drawbacks of existing designs, this work introduces NOC-Out â a many-core processor organization that affords low LLC access delays at a small area cost. NOC-Out is tuned to accommodate the bilateral core-to-cache access pattern, characterized by minimal coherence activity and lack of inter-core communication, that is dominant in scale-out workloads. Optimizing for the bilateral access pattern, NOC-Out segregates cores and LLC banks into distinct network regions and reduces costly network connectivity by eliminating the majority of inter-core links. NOC-Out further simplifies the interconnect through the use of low-complexity tree based topologies. A detailed evaluation targeting a 64-core CMP and a set of scale-out workloads reveals that NOC-Out improves system performance by 17% and reduces network area by 28% over a tiled mesh-based design. Compared to a design with a richly-connected flattened butterfly topology, NOC-Out reduces network area by 9x while matching the performance
Time-of-arrival formalism for the relativistic particle
A suitable operator for the time-of-arrival at a detector is defined for the
free relativistic particle in 3+1 dimensions. For each detector position, there
exists a subspace of detected states in the Hilbert space of solutions to the
Klein Gordon equation. Orthogonality and completeness of the eigenfunctions of
the time-of-arrival operator apply inside this subspace, opening up a standard
probabilistic interpretation.Comment: 16 pages, no figures, uses LaTeX. The section "Interpretation" has
been completely rewritten and some errors correcte
Measurement of Time-of-Arrival in Quantum Mechanics
It is argued that the time-of-arrival cannot be precisely defined and
measured in quantum mechanics. By constructing explicit toy models of a
measurement, we show that for a free particle it cannot be measured more
accurately then , where is the initial kinetic
energy of the particle. With a better accuracy, particles reflect off the
measuring device, and the resulting probability distribution becomes distorted.
It is shown that a time-of-arrival operator cannot exist, and that approximate
time-of-arrival operators do not correspond to the measurements considered
here.Comment: References added. To appear in Phys. Rev.
FADE: A programmable filtering accelerator for instruction-grain monitoring
Instruction-grain monitoring is a powerful approach that enables a wide spectrum of bug-finding tools. As existing software approaches incur prohibitive runtime overhead, researchers have focused on hardware support for instruction-grain monitoring. A recurring theme in recent work is the use of hardware-assisted filtering so as to elide costly software analysis. This work generalizes and extends prior point solutions into a programmable filtering accelerator affording vast flexibility and at-speed event filtering. The pipelined microarchitecture of the accelerator affords a peak filtering rate of one application event per cycle, which suffices to keep up with an aggressive OoO core running the monitored application. A unique feature of the proposed design is the ability to dynamically resolve dependencies between unfilterable events and subsequent events, eliminating data-dependent stalls and maximizing acceleratorâs performance. Our evaluation results show a monitoring slowdown of just 1.2-1.8x across a diverse set of monitoring tools
Fat Caches for Scale-Out Servers
The authors propose a high-capacity cache architecture that leverages emerging high-bandwidth memory modules. High-capacity caches capture the secondary data working sets of scale-out workloads while uncovering significant spatiotemporal locality across data objects. Unlike state-of-theart dram caches employing in-memory block-level metadata, the proposed cache is organized in pages, enabling a practical tag array, which can be implemented in the logic die of the high-bandwidth memory modules
Time-of-Arrival States
Although one can show formally that a time-of-arrival operator cannot exist,
one can modify the low momentum behaviour of the operator slightly so that it
is self-adjoint. We show that such a modification results in the difficulty
that the eigenstates are drastically altered. In an eigenstate of the modified
time-of-arrival operator, the particle, at the predicted time-of-arrival, is
found far away from the point of arrival with probability 1/2.Comment: 15 pages, 2 figure
Relational evolution of the degrees of freedom of generally covariant quantum theories
We study the classical and quantum dynamics of generally covariant theories
with vanishing a Hamiltonian and with a finite number of degrees of freedom. In
particular, the geometric meaning of the full solution of the relational
evolution of the degrees of freedom is displayed, which means the determination
of the total number of evolving constants of motion required. Also a method to
find evolving constants is proposed. The generalized Heinsenberg picture needs
M time variables, as opposed to the Heisenberg picture of standard quantum
mechanics where one time variable t is enough. As an application, we study the
parameterized harmonic oscillator and the SL(2,R) model with one physical
degree of freedom that mimics the constraint structure of general relativity
where a Schrodinger equation emerges in its quantum dynamics.Comment: 25 pages, no figures, Latex file. Revised versio
Bohmian arrival time without trajectories
The computation of detection probabilities and arrival time distributions
within Bohmian mechanics in general needs the explicit knowledge of a relevant
sample of trajectories. Here it is shown how for one-dimensional systems and
rigid inertial detectors these quantities can be computed without calculating
any trajectories. An expression in terms of the wave function and its spatial
derivative, both restricted to the boundary of the detector's spacetime volume,
is derived for the general case, where the probability current at the
detector's boundary may vary its sign.Comment: 20 pages, 12 figures; v2: reference added, extended introduction,
published versio
CCNoC: Specializing On-Chip Interconnects for Energy Efficiency in Cache-Coherent Servers
AbstractâManycore chips are emerging as the architecture of choice to provide power efficiency and improve performance, while riding Mooreâs Law. In these architectures, on-chip interconnects play a pivotal role in ensuring power and performance scalability. As supply voltages begin to level off in future technologies, chip designs in general and interconnects in particular will require specialization to meet power and performance objectives. In this work, we make the observation that cache-coherent manycore server chips exhibit a duality in on-chip network traffic. Request traffic largely consists of simple control messages, while response traffic often carries cache-block-sized payloads. We present Cache-Coherence Network-on-Chip (CCNoC), a design that specializes the NoC to fit the demands of server workloads via a pair of asymmetric networks tuned to the type of traffic traversing them. The networks differ in their datapath width, router microarchitecture, flow control strategy, and delay. The resulting heterogeneous CCNoC architecture enables significant gains in power efficiency over conventional NoC designs at similar performance levels. Our evaluation reveals that a 4x4 mesh-based chip multiprocessor with the proposed CCNoC organization running commercial server workloads is 15-28 % more energy efficient than various state-of-the-art singleand dual-network organizations. I
Relational time in generally covariant quantum systems: four models
We analize the relational quantum evolution of generally covariant systems in
terms of Rovelli's evolving constants of motion and the generalized Heisenberg
picture. In order to have a well defined evolution, and a consistent quantum
theory, evolving constants must be self-adjoint operators. We show that this
condition imposes strong restrictions to the choices of the clock variables. We
analize four cases. The first one is non- relativistic quantum mechanics in
parametrized form. We show that, for the free particle case, the standard
choice of time is the only one leading to self-adjoint evolving constants.
Secondly, we study the relativistic case. We show that the resulting quantum
theory is the free particle representation of the Klein Gordon equation in
which the position is a perfectly well defined quantum observable. The
admissible choices of clock variables are the ones leading to space-like
simultaneity surfaces. In order to mimic the structure of General Relativity we
study the SL(2R) model with two Hamiltonian constraints. The evolving constants
depend in this case on three independent variables. We show that it is possible
to find clock variables and inner products leading to a consistent quantum
theory. Finally, we discuss the quantization of a constrained model having a
compact constraint surface. All the models considered may be consistently
quantized, although some of them do not admit any time choice such that the
equal time surfaces are transversal to the orbits.Comment: 18 pages, revtex fil
- âŠ