Search CORE

2,474 research outputs found

CBR and MBR techniques: review for an application in the emergencies domain

Author: Merida-Campos Carlos
Rollón Rico Emma
Publication venue
Publication date: 01/01/2003
Field of study

The purpose of this document is to provide an in-depth analysis of current reasoning engine practice and the integration strategies of Case Based Reasoning and Model Based Reasoning that will be used in the design and development of the RIMSAT system. RIMSAT (Remote Intelligent Management Support and Training) is a European Commission funded project designed to: a.. Provide an innovative, 'intelligent', knowledge based solution aimed at improving the quality of critical decisions b.. Enhance the competencies and responsiveness of individuals and organisations involved in highly complex, safety critical incidents - irrespective of their location. In other words, RIMSAT aims to design and implement a decision support system that using Case Base Reasoning as well as Model Base Reasoning technology is applied in the management of emergency situations. This document is part of a deliverable for RIMSAT project, and although it has been done in close contact with the requirements of the project, it provides an overview wide enough for providing a state of the art in integration strategies between CBR and MBR technologies.Postprint (published version

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Recommended from our members

Scalable hardware memory disambiguation

Author: Sethumadhavan Lakshminarasimhan, 1978-
Publication venue
Publication date: 01/12/2007
Field of study

This dissertation deals with one of the long-standing problems in Computer Architecture – the problem of memory disambiguation. Microprocessors typically reorder memory instructions during execution to improve concurrency. Such microprocessors use hardware memory structures for memory disambiguation, known as LoadStore Queues (LSQs), to ensure that memory instruction dependences are satisfied even when the memory instructions execute out-of-order. A typical LSQ implementation (circa 2006) holds all in-flight memory instructions in a physically centralized LSQ and performs a fully associative search on all buffered instructions to ensure that memory dependences are satisfied. These LSQ implementations do not scale because they use large, fully associative structures, which are known to be slow and power hungry. The increasing trend towards distributed microarchitectures further exacerbates these problems. As on-chip wire delays increase and high-performance processors become necessarily distributed, centralized structures such as the LSQ can limit scalability. This dissertation describes techniques to create scalable LSQs in both centralized and distributed microarchitectures. The problems and solutions described in this thesis are motivated and validated by real system designs. The dissertation starts with a description of the partitioned primary memory system of the TRIPS processor, of which the LSQ is an important component, and then through a series of optimizations describes how the power, area, and centralization problems of the LSQ can be solved with minor performance losses (if at all) even for large number of in flight memory instructions. The four solutions described in this dissertation — partitioning, filtering, late binding and efficient overflow management — enable power-, area-efficient, distributed and scalable LSQs, which in turn enable aggressive large-window processors capable of simultaneously executing thousands of instructions. To mitigate the power problem, we replaced the power-hungry, fully associative search with a power-efficient hash table lookup using a simple address-based Bloom filter. Bloom filters are probabilistic data structures used for testing set membership and can be used to quickly check if an instruction with the same data address is likely to be found in the LSQ without performing the associative search. Bloom filters typically eliminate more than 80% of the associative searches and they are highly effective because in most programs, it is uncommon for loads and stores to have the same data address and be in execution simultaneously. To rectify the area problem, we observe the fact that only a small fraction of all memory instructions are dependent, that only such dependent instructions need to be buffered in the LSQ, and that these instructions need to be in the LSQ only for certain parts of the pipelined execution. We propose two mechanisms to exploit these observations. The first mechanism, area filtering, is a hardware mechanism that couples Bloom filters and dependence predictors to dynamically identify and buffer only those instructions which are likely to be dependent. The second mechanism, late binding, reduces the occupancy and hence size of the LSQ. Both of these optimizations allows the number of LSQ slots to be reduced by up to one-half compared to a traditional organization without any performance degradation. Finally, we describe a new decentralized LSQ design for handling LSQ structural hazards in distributed microarchitectures. Decentralization of LSQs, and to a large extent distributed microarchitectures with memory speculation, has proved to be impractical because of the high performance penalties associated with the mechanisms for dealing with hazards. To solve this problem, we applied classic flow-control techniques from interconnection networks for handling resource con- flicts. The first method, memory-side buffering, buffers the overflowing instructions in a separate buffer near the LSQs. The second scheme, execution-side NACKing, sends the overflowing instruction back to the issue window from which it is later re-issued. The third scheme, network buffering, uses the buffers in the interconnection network between the execution units and memory to hold instructions when the LSQ is full, and uses virtual channel flow control to avoid deadlocks. The network buffering scheme is the most robust of all the overflow schemes and shows less than 1% performance degradation due to overflows for a subset of SPEC CPU 2000 and EEMBC benchmarks on a cycle-accurate simulator that closely models the TRIPS processor. The techniques proposed in this dissertation are independent, architectureneutral and their cumulative benefits result in LSQs that can be partitioned at a fine granularity and have low design complexity. Each of these partitions selectively buffers only memory instructions with true dependences and can be closely coupled with the execution units thus minimizing power, area, and latency. Such LSQ designs with near-ideal characteristics are well suited for microarchitectures with thousands of instructions in-flight and may enable even more aggressive microarchitectures in the future.Computer Science

Texas ScholarWorks

Efficient Execution of Sequential Instructions Streams by Physical Machines

Author: Milani Emanuele
Publication venue
Publication date: 28/01/2014
Field of study

Any computational model which relies on a physical system is likely to be subject to the fact that information density and speed have intrinsic, ultimate limits. The RAM model, and in particular the underlying assumption that memory accesses can be carried out in time independent from memory size itself, is not physically implementable. This work has developed in the field of limiting technology machines, in which it is somewhat provocatively assumed that technology has achieved the physical limits. The ultimate goal for this is to tackle the problem of the intrinsic latencies of physical systems by encouraging scalable organizations for processors and memories. An algorithmic study is presented, which depicts the implementation of high concurrency programs for SP and SPE, sequential machine models able to compute direct-flow programs in optimal time. Then, a novel pieplined, hierarchical memory organization is presented, with optimal latency and bandwidth for a physical system. In order to both take full advantage of the memory capabilities and exploit the available instruction level parallelism of the code to be executed, a novel processor model is developed. Particular care is put in devising an efficient information flow within the processor itself. Both designs are extremely scalable, as they are based on fixed capacity and fixed size nodes, which are connected as a multidimensional array. Performance analysis on the resulting machine design has led to the discovery that latencies internal to the processor can be the dominating source of complexity in instruction flow execution, which adds to the effects of processor-memory interaction. A characterization of instruction flows is then developed, which is based on the topology induced by instruction dependences

Archivio istituzionale della ricerca - Università di Padova

Efficient design space exploration of embedded microprocessors

Author: Breughe Maximilien
Publication venue: Ghent University. Faculty of Engineering and Architecture
Publication date: 01/01/2014
Field of study

Ghent University Academic Bibliography

Automatic Performance Optimization of Stencil Codes

Author: Kronawitter Stefan
Publication venue
Publication date: 09/01/2020
Field of study

A widely used class of codes are stencil codes. Their general structure is very simple: data points in a large grid are repeatedly recomputed from neighboring values. This predefined neighborhood is the so-called stencil. Despite their very simple structure, stencil codes are hard to optimize since only few computations are performed while a comparatively large number of values have to be accessed, i.e., stencil codes usually have a very low computational intensity. Moreover, the set of optimizations and their parameters also depend on the hardware on which the code is executed. To cut a long story short, current production compilers are not able to fully optimize this class of codes and optimizing each application by hand is not practical. As a remedy, we propose a set of optimizations and describe how they can be applied automatically by a code generator for the domain of stencil codes. A combination of a space and time tiling is able to increase the data locality, which significantly reduces the memory-bandwidth requirements: a standard three-dimensional 7-point Jacobi stencil can be accelerated by a factor of 3. This optimization can target basically any stencil code, while others are more specialized. E.g., support for arbitrary linear data layout transformations is especially beneficial for colored kernels, such as a Red-Black Gauss-Seidel smoother. On the one hand, an optimized data layout for such kernels reduces the bandwidth requirements while, on the other hand, it simplifies an explicit vectorization. Other noticeable optimizations described in detail are redundancy elimination techniques to eliminate common subexpressions both in a sequence of statements and across loop boundaries, arithmetic simplifications and normalizations, and the vectorization mentioned previously. In combination, these optimizations are able to increase the performance not only of the model problem given by Poisson’s equation, but also of real-world applications: an optical flow simulation and the simulation of a non-isothermal and non-Newtonian fluid flow

Radio emission from cosmic ray air showers: Monte Carlo simulations

Author: Abu-Zayyad
Agnetta
Allan
Antoni
Askaryan
Askaryan
Dova
Falcke
H. Falcke
Huege
Jelley
Kamata
Matsumoto
Razzaque
Razzaque
Spencer
Suprun
T. Huege
Publication venue: 'EDP Sciences'
Publication date: 09/09/2004
Field of study

We present time-domain Monte Carlo simulations of radio emission from cosmic ray air showers in the scheme of coherent geosynchrotron radiation. Our model takes into account the important air shower characteristics such as the lateral and longitudinal particle distributions, the particle track length and energy distributions, a realistic magnetic field geometry and the shower evolution as a whole. The Monte Carlo approach allows us to retain the full polarisation information and to carry out the calculations without the need for any far-field approximations. We demonstrate the strategies developed to tackle the computational effort associated with the simulation of a huge number of particles for a great number of observer bins and illustrate the robustness and accuracy of these techniques. We predict the emission pattern, the radial and the spectral dependence of the radiation from a prototypical 10^17 eV vertical air shower and find good agreement with our analytical results (Huege & Falcke 2003) and the available historical data. Track-length effects in combination with magnetic field effects surprisingly wash out any significant asymmetry in the total field strength emission pattern in spite of the magnetic field geometry. While statistics of total field strengths alone can therefore not prove the geomagnetic origin, the predicted high degree of polarisation in the direction perpendicular to the shower and magnetic field axes allows a direct test of the geomagnetic emission mechanism with polarisation-sensitive experiments such as LOPES. Our code provides a robust, yet flexible basis for detailed studies of the dependence of the radio emission on specific shower parameters and for the inclusion of additional radiation mechanism in the future.Comment: 21 pages, version accepted for publication by Astronomy & Astrophysic

arXiv.org e-Print Archive

Crossref

EDP Sciences OAI-PMH repository (1.2.0)

Radboud Repository

CERN Document Server

The Sunyaev Zel'dovich effect: simulation and observation

Author: Barbosa D.
Baugh C. M.
Benjamin Wang
Cole S.
Mason B.
Pengjie Zhang
Persic M.
Sievers J.
Ue‐Li Pen
Publication venue: 'University of Chicago Press'
Publication date: 01/01/2002
Field of study

The Sunyaev Zel'dovich effect (SZ effect) is a complete probe of ionized baryons, the majority of which are likely hiding in the intergalactic medium. We ran a

512^3

\Lambda

CDM simulation using a moving mesh hydro code to compute the statistics of the thermal and kinetic SZ effect such as the power spectra and measures of non-Gaussianity. The thermal SZ power spectrum has a very broad peak at multipole

l\sim 2000-10^4

with temperature fluctuations

\Delta T \sim 15\mu

K. The power spectrum is consistent with available observations and suggests a high

\sigma_8\simeq 1.0

and a possible role of non-gravitational heating. The non-Gaussianity is significant and increases the cosmic variance of the power spectrum by a factor of

\sim 5

for

l<6000

. We explore optimal driftscan survey strategies for the AMIBA CMB interferometer and their dependence on cosmology. For SZ power spectrum estimation, we find that the optimal sky coverage for a 1000 hours of integration time is several hundred square degrees. One achieves an accuracy better than 40% in the SZ measurement of power spectrum and an accuracy better than 20% in the cross correlation with Sloan galaxies for

2000<l<5000

. For cluster searches, the optimal scan rate is around 280 hours per square degree with a cluster detection rate 1 every 7 hours, allowing for a false positive rate of 20% and better than 30% accuracy in the cluster SZ distribution function measurement.Comment: 34 pages, 20 figures. Submitted to ApJ. Simulation maps have been replaced by high resolution images. For higher resolution color images, please download from http://www.cita.utoronto.ca/~zhangpj/research/SZ/ We corrected a bug in our analysis. the SZ power spectrum decreases 50% and y parameter decrease 25

arXiv.org e-Print Archive

CiteSeerX

Crossref

CERN Document Server

Diffusive hidden Markov model characterization of DNA looping dynamics in tethered particle experiments

Author: Andrec M
Bevington P R
Colquhoun D
Cox D R
Gelles J
John F Beausang
Philip C Nelson
Press W H
Qian H
Qin F
Qin F
Smith D A
Wong O K Guthold M Erie D A Gelles J
Yin H
Zocchi G
Zurla C
Publication venue: 'IOP Publishing'
Publication date: 01/09/2007
Field of study

In many biochemical processes, proteins bound to DNA at distant sites are brought into close proximity by loops in the underlying DNA. For example, the function of some gene-regulatory proteins depends on such DNA looping interactions. We present a new technique for characterizing the kinetics of loop formation in vitro, as observed using the tethered particle method, and apply it to experimental data on looping induced by lambda repressor. Our method uses a modified (diffusive) hidden Markov analysis that directly incorporates the Brownian motion of the observed tethered bead. We compare looping lifetimes found with our method (which we find are consistent over a range of sampling frequencies) to those obtained via the traditional threshold-crossing analysis (which can vary depending on how the raw data are filtered in the time domain). Our method does not involve any time filtering and can detect sudden changes in looping behavior. For example, we show how our method can identify transitions between long-lived, kinetically distinct states that would otherwise be difficult to discern

arXiv.org e-Print Archive

Crossref

ScholarlyCommons@Penn

Unified Polyhedral Modeling of Temporal and Spatial Locality

Author: Cohen Albert
Grosser Tobias
Reddy Chandan
Sarkar Vivek
Shirako Jun
Verdoolaege Sven
Zinenko Oleksandr
Publication venue: HAL CCSD
Publication date: 04/11/2017
Field of study

Despite decades of work in this area, the construction of effective loop nest optimizers and parallelizers continues to be challenging due to the increasing diversity of both loop-intensive application workloads and complex memory/computation hierarchies in modern processors. The lack of a systematic approach to optimizing locality and parallelism, with a well-founded data locality model, is a major obstacle to the design of optimizing compilers coping with the variety of software and hardware. Acknowledging the conflicting demands on loop nest optimization, we propose a new unified algorithm for optimizing parallelism and locality in loop nests, that is capable of modeling temporal and spatial effects of multiprocessors and accelerators with deep memory hierarchies and multiple levels of parallelism. It orchestrates a collection of parameterizable optimization problems for locality and parallelism objectives over a polyhedral space of semantics-preserving transformations. The overall problem is not convex and is only constrained by semantics preservation. We discuss the rationale for this unified algorithm, and validate it on a collection of representative computational kernels/benchmarks.Malgré les décennies de travail dans ce domaine, la construction de compilateurs capables de paraléliser et optimiser les nids de boucle reste un problème difficile, dans le contexte d’une augmentation de la diversité des applications calculatoires et de la complexité de la hiérarchie de calcul et de stockage des processeurs modernes. L’absence d’une méthode systématique pour optimiser la localité et le parallélisme, fondée sur un modèle de localité des données pertinent, constitue un obstacle majeur pour prendre en charge la variété des besoins en optimisation de boucles issus du logiciel et du matériel. Dans ce contexte, nous proposons un nouvel algorithme unifié pour l’optimisation du parallélisme et de la localité dans les nids de boucles, capable de modéliser les effets temporels et spatiaux des multiprocesseurs et accélérateurs comportant des hiérarchies profondes de parallélisme et de mémoire. Cet algorithme coordonne la résolution d’une collection de problèmes d’optimisation paramètrés, portant sur des objectifs de localité ou et de parallélisme, dans un espace polyédrique de transformations préservant la sémantique du programme. La conception de cet algorithme fait l’objet d’une discussion systématique, ainsi que d’une validation expérimentale sur des noyaux calculatoires et benchmarks représentatifs

INRIA a CCSD electronic archive server

Autotuning for Automatic Parallelization on Heterogeneous Systems

Author: Pfaffe Philip
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 01/01/2020
Field of study

KITopen