2,474 research outputs found
CBR and MBR techniques: review for an application in the emergencies domain
The purpose of this document is to provide an in-depth analysis of current reasoning engine practice and the integration strategies of Case Based Reasoning and Model Based Reasoning that will be used in the design and development of the RIMSAT system.
RIMSAT (Remote Intelligent Management Support and Training) is a European Commission funded project designed to:
a.. Provide an innovative, 'intelligent', knowledge based solution aimed at improving the quality of critical decisions
b.. Enhance the competencies and responsiveness of individuals and organisations involved in highly complex, safety critical incidents - irrespective of their location.
In other words, RIMSAT aims to design and implement a decision support system that using Case Base Reasoning as well as Model Base Reasoning technology is applied in the management of emergency situations.
This document is part of a deliverable for RIMSAT project, and although it has been done in close contact with the requirements of the project, it provides an overview wide enough for providing a state of the art in integration strategies between CBR and MBR technologies.Postprint (published version
Recommended from our members
Scalable hardware memory disambiguation
This dissertation deals with one of the long-standing problems in Computer Architecture
â the problem of memory disambiguation. Microprocessors typically reorder
memory instructions during execution to improve concurrency. Such microprocessors
use hardware memory structures for memory disambiguation, known as LoadStore
Queues (LSQs), to ensure that memory instruction dependences are satisfied
even when the memory instructions execute out-of-order. A typical LSQ implementation
(circa 2006) holds all in-flight memory instructions in a physically centralized
LSQ and performs a fully associative search on all buffered instructions to ensure
that memory dependences are satisfied. These LSQ implementations do not scale
because they use large, fully associative structures, which are known to be slow and
power hungry. The increasing trend towards distributed microarchitectures further
exacerbates these problems. As on-chip wire delays increase and high-performance
processors become necessarily distributed, centralized structures such as the LSQ
can limit scalability.
This dissertation describes techniques to create scalable LSQs in both centralized
and distributed microarchitectures. The problems and solutions described
in this thesis are motivated and validated by real system designs. The dissertation
starts with a description of the partitioned primary memory system of the TRIPS
processor, of which the LSQ is an important component, and then through a series
of optimizations describes how the power, area, and centralization problems
of the LSQ can be solved with minor performance losses (if at all) even for large
number of in flight memory instructions. The four solutions described in this dissertation
â partitioning, filtering, late binding and efficient overflow management â
enable power-, area-efficient, distributed and scalable LSQs, which in turn enable
aggressive large-window processors capable of simultaneously executing thousands
of instructions.
To mitigate the power problem, we replaced the power-hungry, fully associative
search with a power-efficient hash table lookup using a simple address-based
Bloom filter. Bloom filters are probabilistic data structures used for testing set
membership and can be used to quickly check if an instruction with the same data
address is likely to be found in the LSQ without performing the associative search.
Bloom filters typically eliminate more than 80% of the associative searches and they
are highly effective because in most programs, it is uncommon for loads and stores
to have the same data address and be in execution simultaneously.
To rectify the area problem, we observe the fact that only a small fraction
of all memory instructions are dependent, that only such dependent instructions
need to be buffered in the LSQ, and that these instructions need to be in the LSQ
only for certain parts of the pipelined execution. We propose two mechanisms to
exploit these observations. The first mechanism, area filtering, is a hardware mechanism
that couples Bloom filters and dependence predictors to dynamically identify
and buffer only those instructions which are likely to be dependent. The second
mechanism, late binding, reduces the occupancy and hence size of the LSQ. Both of
these optimizations allows the number of LSQ slots to be reduced by up to one-half
compared to a traditional organization without any performance degradation.
Finally, we describe a new decentralized LSQ design for handling LSQ structural
hazards in distributed microarchitectures. Decentralization of LSQs, and to
a large extent distributed microarchitectures with memory speculation, has proved
to be impractical because of the high performance penalties associated with the
mechanisms for dealing with hazards. To solve this problem, we applied classic
flow-control techniques from interconnection networks for handling resource con-
flicts. The first method, memory-side buffering, buffers the overflowing instructions
in a separate buffer near the LSQs. The second scheme, execution-side NACKing,
sends the overflowing instruction back to the issue window from which it is later
re-issued. The third scheme, network buffering, uses the buffers in the interconnection
network between the execution units and memory to hold instructions when the
LSQ is full, and uses virtual channel flow control to avoid deadlocks. The network
buffering scheme is the most robust of all the overflow schemes and shows less than
1% performance degradation due to overflows for a subset of SPEC CPU 2000 and
EEMBC benchmarks on a cycle-accurate simulator that closely models the TRIPS
processor.
The techniques proposed in this dissertation are independent, architectureneutral
and their cumulative benefits result in LSQs that can be partitioned at a
fine granularity and have low design complexity. Each of these partitions selectively
buffers only memory instructions with true dependences and can be closely coupled
with the execution units thus minimizing power, area, and latency. Such LSQ
designs with near-ideal characteristics are well suited for microarchitectures with
thousands of instructions in-flight and may enable even more aggressive microarchitectures
in the future.Computer Science
Efficient Execution of Sequential Instructions Streams by Physical Machines
Any computational model which relies on a physical system is likely to be subject to the fact that information density and speed have intrinsic, ultimate limits. The RAM model, and in particular the underlying assumption that memory accesses can be carried out in time independent from memory size itself, is not physically implementable.
This work has developed in the field of limiting technology machines, in which it is somewhat provocatively assumed that technology has achieved the physical limits. The ultimate goal for this is to tackle the problem of the intrinsic latencies of physical systems by encouraging scalable organizations for processors and memories.
An algorithmic study is presented, which depicts the implementation of high concurrency programs for SP and SPE, sequential machine models able to compute direct-flow programs in optimal time.
Then, a novel pieplined, hierarchical memory organization is presented, with optimal latency and bandwidth for a physical system.
In order to both take full advantage of the memory capabilities and exploit the available instruction level parallelism of the code to be executed, a novel processor model is developed. Particular care is put in devising an efficient information flow within the processor itself.
Both designs are extremely scalable, as they are based on fixed capacity and fixed size nodes, which are connected as a multidimensional array.
Performance analysis on the resulting machine design has led to the discovery that latencies internal to the processor can be the dominating source of complexity in instruction flow execution, which adds to the effects of processor-memory interaction. A characterization of instruction flows is then developed, which is based on the topology induced by instruction dependences
Automatic Performance Optimization of Stencil Codes
A widely used class of codes are stencil codes. Their general structure is very simple: data points in a large grid are repeatedly recomputed from neighboring values. This predefined neighborhood is the so-called stencil. Despite their very simple structure, stencil codes are hard to optimize since only few computations are performed while a comparatively large number of values have to be accessed, i.e., stencil codes usually have a very low computational intensity. Moreover, the set of optimizations and their parameters also depend on the hardware on which the code is executed.
To cut a long story short, current production compilers are not able to fully optimize this class of codes and optimizing each application by hand is not practical. As a remedy, we propose a set of optimizations and describe how they can be applied automatically by a code generator for the domain of stencil codes. A combination of a space and time tiling is able to increase the data locality, which significantly reduces the memory-bandwidth requirements: a standard three-dimensional 7-point Jacobi stencil can be accelerated by a factor of 3. This optimization can target basically any stencil code, while others are more specialized. E.g., support for arbitrary linear data layout transformations is especially beneficial for colored kernels, such as a Red-Black Gauss-Seidel smoother. On the one hand, an optimized data layout for such kernels reduces the bandwidth requirements while, on the other hand, it simplifies an explicit vectorization.
Other noticeable optimizations described in detail are redundancy elimination techniques to eliminate common subexpressions both in a sequence of statements and across loop boundaries, arithmetic simplifications and normalizations, and the vectorization mentioned previously. In combination, these optimizations are able to increase the performance not only of the model problem given by Poissonâs equation, but also of real-world applications: an optical flow simulation and the simulation of a non-isothermal and non-Newtonian fluid flow
Radio emission from cosmic ray air showers: Monte Carlo simulations
We present time-domain Monte Carlo simulations of radio emission from cosmic
ray air showers in the scheme of coherent geosynchrotron radiation. Our model
takes into account the important air shower characteristics such as the lateral
and longitudinal particle distributions, the particle track length and energy
distributions, a realistic magnetic field geometry and the shower evolution as
a whole. The Monte Carlo approach allows us to retain the full polarisation
information and to carry out the calculations without the need for any
far-field approximations. We demonstrate the strategies developed to tackle the
computational effort associated with the simulation of a huge number of
particles for a great number of observer bins and illustrate the robustness and
accuracy of these techniques. We predict the emission pattern, the radial and
the spectral dependence of the radiation from a prototypical 10^17 eV vertical
air shower and find good agreement with our analytical results (Huege & Falcke
2003) and the available historical data. Track-length effects in combination
with magnetic field effects surprisingly wash out any significant asymmetry in
the total field strength emission pattern in spite of the magnetic field
geometry. While statistics of total field strengths alone can therefore not
prove the geomagnetic origin, the predicted high degree of polarisation in the
direction perpendicular to the shower and magnetic field axes allows a direct
test of the geomagnetic emission mechanism with polarisation-sensitive
experiments such as LOPES. Our code provides a robust, yet flexible basis for
detailed studies of the dependence of the radio emission on specific shower
parameters and for the inclusion of additional radiation mechanism in the
future.Comment: 21 pages, version accepted for publication by Astronomy &
Astrophysic
The Sunyaev Zel'dovich effect: simulation and observation
The Sunyaev Zel'dovich effect (SZ effect) is a complete probe of ionized
baryons, the majority of which are likely hiding in the intergalactic medium.
We ran a CDM simulation using a moving mesh hydro code to
compute the statistics of the thermal and kinetic SZ effect such as the power
spectra and measures of non-Gaussianity. The thermal SZ power spectrum has a
very broad peak at multipole with temperature fluctuations
K. The power spectrum is consistent with available
observations and suggests a high and a possible role of
non-gravitational heating. The non-Gaussianity is significant and increases the
cosmic variance of the power spectrum by a factor of for .
We explore optimal driftscan survey strategies for the AMIBA CMB
interferometer and their dependence on cosmology. For SZ power spectrum
estimation, we find that the optimal sky coverage for a 1000 hours of
integration time is several hundred square degrees. One achieves an accuracy
better than 40% in the SZ measurement of power spectrum and an accuracy better
than 20% in the cross correlation with Sloan galaxies for . For
cluster searches, the optimal scan rate is around 280 hours per square degree
with a cluster detection rate 1 every 7 hours, allowing for a false positive
rate of 20% and better than 30% accuracy in the cluster SZ distribution
function measurement.Comment: 34 pages, 20 figures. Submitted to ApJ. Simulation maps have been
replaced by high resolution images. For higher resolution color images,
please download from http://www.cita.utoronto.ca/~zhangpj/research/SZ/ We
corrected a bug in our analysis. the SZ power spectrum decreases 50% and y
parameter decrease 25
Diffusive hidden Markov model characterization of DNA looping dynamics in tethered particle experiments
In many biochemical processes, proteins bound to DNA at distant sites are
brought into close proximity by loops in the underlying DNA. For example, the
function of some gene-regulatory proteins depends on such DNA looping
interactions. We present a new technique for characterizing the kinetics of
loop formation in vitro, as observed using the tethered particle method, and
apply it to experimental data on looping induced by lambda repressor. Our
method uses a modified (diffusive) hidden Markov analysis that directly
incorporates the Brownian motion of the observed tethered bead. We compare
looping lifetimes found with our method (which we find are consistent over a
range of sampling frequencies) to those obtained via the traditional
threshold-crossing analysis (which can vary depending on how the raw data are
filtered in the time domain). Our method does not involve any time filtering
and can detect sudden changes in looping behavior. For example, we show how our
method can identify transitions between long-lived, kinetically distinct states
that would otherwise be difficult to discern
Unified Polyhedral Modeling of Temporal and Spatial Locality
Despite decades of work in this area, the construction of effective loop nest optimizers and parallelizers continues to be challenging due to the increasing diversity of both loop-intensive application workloads and complex memory/computation hierarchies in modern processors. The lack of a systematic approach to optimizing locality and parallelism, with a well-founded data locality model, is a major obstacle to the design of optimizing compilers coping with the variety of software and hardware. Acknowledging the conflicting demands on loop nest optimization, we propose a new unified algorithm for optimizing parallelism and locality in loop nests, that is capable of modeling temporal and spatial effects of multiprocessors and accelerators with deep memory hierarchies and multiple levels of parallelism. It orchestrates a collection of parameterizable optimization problems for locality and parallelism objectives over a polyhedral space of semantics-preserving transformations. The overall problem is not convex and is only constrained by semantics preservation. We discuss the rationale for this unified algorithm, and validate it on a collection of representative computational kernels/benchmarks.MalgrĂ© les dĂ©cennies de travail dans ce domaine, la construction de compilateurs capables de paralĂ©liser et optimiser les nids de boucle reste un problĂšme difficile, dans le contexte dâune augmentation de la diversitĂ© des applications calculatoires et de la complexitĂ© de la hiĂ©rarchie de calcul et de stockage des processeurs modernes. Lâabsence dâune mĂ©thode systĂ©matique pour optimiser la localitĂ© et le parallĂ©lisme, fondĂ©e sur un modĂšle de localitĂ© des donnĂ©es pertinent, constitue un obstacle majeur pour prendre en charge la variĂ©tĂ© des besoins en optimisation de boucles issus du logiciel et du matĂ©riel. Dans ce contexte, nous proposons un nouvel algorithme unifiĂ© pour lâoptimisation du parallĂ©lisme et de la localitĂ© dans les nids de boucles, capable de modĂ©liser les effets temporels et spatiaux des multiprocesseurs et accĂ©lĂ©rateurs comportant des hiĂ©rarchies profondes de parallĂ©lisme et de mĂ©moire. Cet algorithme coordonne la rĂ©solution dâune collection de problĂšmes dâoptimisation paramĂštrĂ©s, portant sur des objectifs de localitĂ© ou et de parallĂ©lisme, dans un espace polyĂ©drique de transformations prĂ©servant la sĂ©mantique du programme. La conception de cet algorithme fait lâobjet dâune discussion systĂ©matique, ainsi que dâune validation expĂ©rimentale sur des noyaux calculatoires et benchmarks reprĂ©sentatifs
- âŠ