196 research outputs found
Processing techniques development, volume 3. Part 2: Data preprocessing and information extraction techniques
There are no author-identified significant results in this report
Memory consistency directed cache coherence protocols for scalable multiprocessors
The memory consistency model, which formally specifies the behavior of the
memory system, is used by programmers to reason about parallel programs. From a
hardware design perspective, weaker consistency models permit various optimizations
in a multiprocessor system: this thesis focuses on designing and optimizing the cache
coherence protocol for a given target memory consistency model.
Traditional directory coherence protocols are designed to be compatible with the
strictest memory consistency model, sequential consistency (SC). When they are used
for chip multiprocessors (CMPs) that provide more relaxed memory consistency models,
such protocols turn out to be unnecessarily strict. Usually, this comes at the cost of
scalability, in terms of per-core storage due to sharer tracking, which poses a problem
with increasing number of cores in todayâs CMPs, most of which no longer are sequentially
consistent. The recent convergence towards programming language based relaxed
memory consistency models has sparked renewed interest in lazy cache coherence
protocols. These protocols exploit synchronization information by enforcing coherence
only at synchronization boundaries via self-invalidation. As a result, such protocols do
not require sharer tracking which benefits scalability. On the downside, such protocols
are only readily applicable to a restricted set of consistency models, such as Release
Consistency (RC), which expose synchronization information explicitly. In particular,
existing architectures with stricter consistency models (such as x86) cannot readily
make use of lazy coherence protocols without either: adapting the protocol to satisfy
the stricter consistency model; or changing the architectureâs consistency model to (a
variant of) RC, typically at the expense of backward compatibility. The first part of
this thesis explores both these options, with a focus on a practical approach satisfying
backward compatibility.
Because of the wide adoption of Total Store Order (TSO) and its variants in x86 and
SPARC processors, and existing parallel programs written for these architectures, we
first propose TSO-CC, a lazy cache coherence protocol for the TSO memory consistency
model. TSO-CC does not track sharers and instead relies on self-invalidation and
detection of potential acquires (in the absence of explicit synchronization) using per
cache line timestamps to efficiently and lazily satisfy the TSO memory consistency
model. Our results show that TSO-CC achieves, on average, performance comparable
to a MESI directory protocol, while TSO-CCâs storage overhead per cache line scales
logarithmically with increasing core count.
Next, we propose an approach for the x86-64 architecture, which is a compromise
between retaining the original consistency model and using a more storage efficient
lazy coherence protocol. First, we propose a mechanism to convey synchronization
information via a simple ISA extension, while retaining backward compatibility with
legacy codes and older microarchitectures. Second, we propose RC3 (based on TSOCC),
a scalable cache coherence protocol for RCtso, the resulting memory consistency
model. RC3 does not track sharers and relies on self-invalidation on acquires. To
satisfy RCtso efficiently, the protocol reduces self-invalidations transitively using per-L1
timestamps only. RC3 outperforms a conventional lazy RC protocol by 12%, achieving
performance comparable to a MESI directory protocol for RC optimized programs.
RC3âs storage overhead per cache line scales logarithmically with increasing core count
and reduces on-chip coherence storage overheads by 45% compared to TSO-CC.
Finally, it is imperative that hardware adheres to the promised memory consistency
model. Indeed, consistency directed coherence protocols cannot use conventional coherence
definitions (e.g. SWMR) to be verified against, and few existing verification
methodologies apply. Furthermore, as the full consistency model is used as a specification,
their interaction with other components (e.g. pipeline) of a system must not be
neglected in the verification process. Therefore, verifying a system with such protocols
in the context of interacting components is even more important than before. One
common way to do this is via executing tests, where specific threads of instruction
sequences are generated and their executions are checked for adherence to the consistency
model. It would be extremely beneficial to execute such tests under simulation,
i.e. when the functional design implementation of the hardware is being prototyped.
Most prior verification methodologies, however, target post-silicon environments, which
when used for simulation-based memory consistency verification would be too slow.
We propose McVerSi, a test generation framework for fast memory consistency
verification of a full-system design implementation under simulation. Our primary
contribution is a Genetic Programming (GP) based approach to memory consistency test
generation, which relies on a novel crossover function that prioritizes memory operations
contributing to non-determinism, thereby increasing the probability of uncovering
memory consistency bugs. To guide tests towards exercising as much logic as possible,
the simulatorâs reported coverage is used as the fitness function. Furthermore, we
increase test throughput by making the test workload simulation-aware. We evaluate
our proposed framework using the Gem5 cycle accurate simulator in full-system mode
with Ruby (with configurations that use Gem5âs MESI protocol, and our proposed
TSO-CC together with an out-of-order pipeline). We discover 2 new bugs in the MESI
protocol due to the faulty interaction of the pipeline and the cache coherence protocol,
highlighting that even conventional protocols should be verified rigorously in the
context of a full-system. Crucially, these bugs would not have been discovered through
individual verification of the pipeline or the coherence protocol. We study 11 bugs
in total. Our GP-based test generation approach finds all bugs consistently, therefore
providing much higher guarantees compared to alternative approaches (pseudo-random
test generation and litmus tests)
A contribution to characterizing and calibrating the pointing control system of the SOFIA telescope
SOFIA, the Stratospheric Observatory for Infrared Astronomy, is an airborne observatory that will study the universe in the infrared spectrum. A Boeing 747-SP aircraft will carry a 2.5 m telescope designed to make sensitive infrared measurements of a wide range of astronomical objects. It will fly at and above 12 km, where the telescope collects radiation in the waveÆ length range from 0.3 micrometers to 1.6 millimeters of the electromagnetic spectrum. During flight, a door will be opened to allow clear optical observations from the cavity environment where the telescope is mounted. The telescope pointing control is achieved during science observations by an array of sensors including three imagers, gyroscopes and accelerometers. In addition, throughout alignment and calibration of the telescope assembly, the High-speed Imaging Photometer for Occultation (HIPO) is used as a reference instrument. A theoretical concept has been developed to compensate the perturbations in the airborne environment and to correct them within the attitude control loop. A set of Cartesian reference frames is established to describe and manipulate the orientations of the various subsystems, sensor and pointing orientations.thesi
Low-Latency Sliding Window Algorithms for Formal Languages
Low-latency sliding window algorithms for regular and context-free languages are studied, where latency refers to the worst-case time spent for a single window update or query. For every regular language L it is shown that there exists a constant-latency solution that supports adding and removing symbols independently on both ends of the window (the so-called two-way variable-size model). We prove that this result extends to all visibly pushdown languages. For deterministic 1-counter languages we present a ?(log n) latency sliding window algorithm for the two-way variable-size model where n refers to the window size. We complement these results with a conditional lower bound: there exists a fixed real-time deterministic context-free language L such that, assuming the OMV (online matrix vector multiplication) conjecture, there is no sliding window algorithm for L with latency n^(1/2-?) for any ? > 0, even in the most restricted sliding window model (one-way fixed-size model). The above mentioned results all refer to the unit-cost RAM model with logarithmic word size. For regular languages we also present a refined picture using word sizes ?(1), ?(log log n), and ?(log n)
A Model-Based Development and Verification Framework for Distributed System-on-Chip Architecture
The capabilities and thus, design complexity of VLSI-based embedded systems have increased tremendously in recent years, riding the wave of Mooreâs law. The time-to-market requirements are also shrinking, imposing challenges to the designers, which in turn, seek to adopt new design methods to increase their productivity. As an answer to these new pressures, modern day systems have moved towards on-chip multiprocessing technologies. New architectures have emerged in on-chip multiprocessing in order to utilize the tremendous advances of fabrication technology.
Platform-based design is a possible solution in addressing these challenges. The principle behind the approach is to separate the functionality of an application from the organization and communication architecture of hardware platform at several levels of abstraction. The existing design methodologies pertaining to platform-based design approach donât provide full automation at every level of the design processes, and sometimes, the co-design of platform-based systems lead to sub-optimal systems. In addition, the design productivity gap in multiprocessor systems remain a key challenge due to existing design methodologies.
This thesis addresses the aforementioned challenges and discusses the creation of a development framework for a platform-based system design, in the context of the SegBus platform - a distributed communication architecture. This research aims to provide automated procedures for platform design and application mapping. Structural verification support is also featured thus ensuring correct-by-design platforms. The solution is based on a model-based process. Both the platform and the application are modeled using the Unified Modeling Language. This thesis develops a Domain Specific Language to support platform modeling based on a corresponding UML profile. Object Constraint Language constraints are used to support structurally correct platform construction. An emulator is thus introduced to allow as much as possible accurate performance estimation of the solution, at high abstraction levels. VHDL code is automatically generated, in the form of âsnippetsâ to be employed in the arbiter modules of the platform, as required by the application. The resulting framework is applied in building an actual design solution for an MP3 stereo audio decoder application.Siirretty Doriast
The Application of Hybridized Genetic Algorithms to the Protein Folding Problem
The protein folding problem consists of attempting to determine the native conformation of a protein given its primary structure. This study examines various methods of hybridizing a genetic algorithm implementation in order to minimize an energy function and predict the conformation (structure) of Met-enkephalin. Genetic Algorithms are semi-optimal algorithms designed to explore and exploit a search space. The genetic algorithm uses selection, recombination, and mutation operators on populations of strings which represent possible solutions to the given problem. One step in solving the protein folding problem is the design of efficient energy minimization techniques. A conjugate gradient minimization technique is described and tested with different replacement frequencies. Baidwinian, Lamarckian, and probabilistic Lamarckian evolution are all tested. Another extension of simple genetic algorithms can be accomplished with niching. Niching works by de-emphasizing solutions based on their proximity to other solutions in the space. Several variations of niching are tested. Experiments are conducted to determine the benefits of each hybridization technique versus each other and versus the genetic algorithm by itself. The experiments are geared toward trying to find the lowest possible energy and hence the minimum conformation of Met-enkephalin. In the experiments, probabilistic Lamarckian strategies were successful in achieving energies below that of the published minimum in QUANTA
- âŠ