10 research outputs found
Memory consistency directed cache coherence protocols for scalable multiprocessors
The memory consistency model, which formally specifies the behavior of the
memory system, is used by programmers to reason about parallel programs. From a
hardware design perspective, weaker consistency models permit various optimizations
in a multiprocessor system: this thesis focuses on designing and optimizing the cache
coherence protocol for a given target memory consistency model.
Traditional directory coherence protocols are designed to be compatible with the
strictest memory consistency model, sequential consistency (SC). When they are used
for chip multiprocessors (CMPs) that provide more relaxed memory consistency models,
such protocols turn out to be unnecessarily strict. Usually, this comes at the cost of
scalability, in terms of per-core storage due to sharer tracking, which poses a problem
with increasing number of cores in today’s CMPs, most of which no longer are sequentially
consistent. The recent convergence towards programming language based relaxed
memory consistency models has sparked renewed interest in lazy cache coherence
protocols. These protocols exploit synchronization information by enforcing coherence
only at synchronization boundaries via self-invalidation. As a result, such protocols do
not require sharer tracking which benefits scalability. On the downside, such protocols
are only readily applicable to a restricted set of consistency models, such as Release
Consistency (RC), which expose synchronization information explicitly. In particular,
existing architectures with stricter consistency models (such as x86) cannot readily
make use of lazy coherence protocols without either: adapting the protocol to satisfy
the stricter consistency model; or changing the architecture’s consistency model to (a
variant of) RC, typically at the expense of backward compatibility. The first part of
this thesis explores both these options, with a focus on a practical approach satisfying
backward compatibility.
Because of the wide adoption of Total Store Order (TSO) and its variants in x86 and
SPARC processors, and existing parallel programs written for these architectures, we
first propose TSO-CC, a lazy cache coherence protocol for the TSO memory consistency
model. TSO-CC does not track sharers and instead relies on self-invalidation and
detection of potential acquires (in the absence of explicit synchronization) using per
cache line timestamps to efficiently and lazily satisfy the TSO memory consistency
model. Our results show that TSO-CC achieves, on average, performance comparable
to a MESI directory protocol, while TSO-CC’s storage overhead per cache line scales
logarithmically with increasing core count.
Next, we propose an approach for the x86-64 architecture, which is a compromise
between retaining the original consistency model and using a more storage efficient
lazy coherence protocol. First, we propose a mechanism to convey synchronization
information via a simple ISA extension, while retaining backward compatibility with
legacy codes and older microarchitectures. Second, we propose RC3 (based on TSOCC),
a scalable cache coherence protocol for RCtso, the resulting memory consistency
model. RC3 does not track sharers and relies on self-invalidation on acquires. To
satisfy RCtso efficiently, the protocol reduces self-invalidations transitively using per-L1
timestamps only. RC3 outperforms a conventional lazy RC protocol by 12%, achieving
performance comparable to a MESI directory protocol for RC optimized programs.
RC3’s storage overhead per cache line scales logarithmically with increasing core count
and reduces on-chip coherence storage overheads by 45% compared to TSO-CC.
Finally, it is imperative that hardware adheres to the promised memory consistency
model. Indeed, consistency directed coherence protocols cannot use conventional coherence
definitions (e.g. SWMR) to be verified against, and few existing verification
methodologies apply. Furthermore, as the full consistency model is used as a specification,
their interaction with other components (e.g. pipeline) of a system must not be
neglected in the verification process. Therefore, verifying a system with such protocols
in the context of interacting components is even more important than before. One
common way to do this is via executing tests, where specific threads of instruction
sequences are generated and their executions are checked for adherence to the consistency
model. It would be extremely beneficial to execute such tests under simulation,
i.e. when the functional design implementation of the hardware is being prototyped.
Most prior verification methodologies, however, target post-silicon environments, which
when used for simulation-based memory consistency verification would be too slow.
We propose McVerSi, a test generation framework for fast memory consistency
verification of a full-system design implementation under simulation. Our primary
contribution is a Genetic Programming (GP) based approach to memory consistency test
generation, which relies on a novel crossover function that prioritizes memory operations
contributing to non-determinism, thereby increasing the probability of uncovering
memory consistency bugs. To guide tests towards exercising as much logic as possible,
the simulator’s reported coverage is used as the fitness function. Furthermore, we
increase test throughput by making the test workload simulation-aware. We evaluate
our proposed framework using the Gem5 cycle accurate simulator in full-system mode
with Ruby (with configurations that use Gem5’s MESI protocol, and our proposed
TSO-CC together with an out-of-order pipeline). We discover 2 new bugs in the MESI
protocol due to the faulty interaction of the pipeline and the cache coherence protocol,
highlighting that even conventional protocols should be verified rigorously in the
context of a full-system. Crucially, these bugs would not have been discovered through
individual verification of the pipeline or the coherence protocol. We study 11 bugs
in total. Our GP-based test generation approach finds all bugs consistently, therefore
providing much higher guarantees compared to alternative approaches (pseudo-random
test generation and litmus tests)
Verification of a lazy cache coherence protocol against a weak memory model
In this paper we verify a modern lazy cache coherence protocol, TSO-CC,
against the memory consistency model it was designed for, TSO. We achieve this
by first showing a weak simulation relation between TSO-CC (with a fixed number
of processors) and a novel finite-state operational model which exhibits the
laziness of TSO-CC and satisfies TSO. We then extend this by an existing
parameterisation technique, allowing verification for an unlimited number of
processors. The approach is executed entirely within a model checker, no
external tool is required and very little in-depth knowledge of formal
verification methods is required of the verifier.Comment: 10 page
Business consulting – grupo Flesan del Perú
En el presente trabajo se realiza una consultorÃa de negocio para el grupo Flesan del
Perú, empresa que brinda servicios relacionados con la construcción de proyectos, cuyo
objetivo es consolidar sus nuevas unidades de negocio y ampliar su cartera de clientes. En la
consultorÃa se evaluó, su modelo de negocio, su misión y visión actual, asà como el contexto
tanto interno como externo de la organización. El objetivo de la consultorÃa fue identificar el
problema principal de la empresa, partiendo de un diagnóstico de las diferentes áreas de la
organización, a fin de encontrar brechas que no le permiten o dificultan alcanzar sus
objetivos. A partir de este diagnóstico, se identificaron las causas del problema principal,
proponiéndose planes de acción que les permita superarlos. De las evaluaciones que se
realizaron, se encontró que el problema principal está relacionado con la debilidad de sus
procesos, especÃficamente en el área de abastecimiento, cuya mejora también fue de interés
para la organización, debido a que es un área que brinda soporte a todas sus unidades de
negocio, por lo que su fortalecimiento impactarÃa en toda la organización. Después de varias
reuniones con el equipo de la empresa, y de las evaluaciones de cada una de las alternativas
de solución propuestas; se encontró que la alternativa más viable para solucionar el problema
principal es rediseñar el proceso de abastecimiento e implementar la metodologÃa de
inventario administrado por los proveedores o VMI (Vendor Managed Inventory), que
involucra la participación colaborativa en beneficio de los miembros de la cadena de
abastecimiento.Flesan del Perú group is a company that provides services related to the construction
of projects, whose objective is to consolidate its new business units and expand its client
portfolio. In the consultancy work carried out, its business model, its current mission and
vision were evaluated; as well as the internal and external context of the organization. The
objective of the consultancy was to identify the main problem of the company, based on a
diagnosis of the different areas of the organization, in order to find gaps that do not allow or
hinder the achievement of objectives of Flesan del Perú group. From this diagnosis, the
causes of the main problem were identified, proposing plans that allows them to overcome
them. From the evaluations that were carried out, it was found that the main problem is
related to the weakness of its processes, specifically in the supply area, whose improvement
was also of greatest interest by the organization, because it is an area that supports all its
business units, so its strengthening would impact the entire organization. After several
meetings with the company team, and evaluations of each of the proposed solution
alternatives, it was found that the most viable alternative to solve the main problem is to
redesign the supply process and implement the methodology Vendor Managed Inventory or
VMI, which involves collaborative engagement for the benefit of members of the supply
chain.Tesi