5 research outputs found

    Parallel processing and expert systems

    Get PDF
    Whether it be monitoring the thermal subsystem of Space Station Freedom, or controlling the navigation of the autonomous rover on Mars, NASA missions in the 1990s cannot enjoy an increased level of autonomy without the efficient implementation of expert systems. Merely increasing the computational speed of uniprocessors may not be able to guarantee that real-time demands are met for larger systems. Speedup via parallel processing must be pursued alongside the optimization of sequential implementations. Prototypes of parallel expert systems have been built at universities and industrial laboratories in the U.S. and Japan. The state-of-the-art research in progress related to parallel execution of expert systems is surveyed. The survey discusses multiprocessors for expert systems, parallel languages for symbolic computations, and mapping expert systems to multiprocessors. Results to date indicate that the parallelism achieved for these systems is small. The main reasons are (1) the body of knowledge applicable in any given situation and the amount of computation executed by each rule firing are small, (2) dividing the problem solving process into relatively independent partitions is difficult, and (3) implementation decisions that enable expert systems to be incrementally refined hamper compile-time optimization. In order to obtain greater speedups, data parallelism and application parallelism must be exploited

    Performance visualization for parallel programs: task-based, object-oriented approach

    Get PDF
    Developing and analyzing the performance of concurrent programs on distributed memory concurrent systems is normally a challenging task. Recently, performance visualization gains its importance as a critical tool for programmers. Programmers can have an insight into the development of parallel programs through a performance visualization. Most of the visualization tool to date have been developed for ad-hoc environments in hardware and software, and therefore its lifetime is limited. Since, however, new architectures keep emerging and application domains for distributed memory concurrent computer systems keep growing, the visualization tool should be flexible enough to accommodate unknown future demands of users (eg. new performance perspectives, application-specific views and disparate trace record formats);The Concurrent Object-Oriented ParaGraph (COOPG) is a prototype, general-purpose performance visualization package developed using an object-oriented approach. An object-oriented approach, both in design and implementation, provides a mechanism to build a simple, flexible, effective, and extensible performance visualization tool. The salient features of the COOPG include its flexible adaptability to disparate trace record formats and the incremental extensibility for incorporating user\u27s special-purpose views

    Model-Based Robot Control and Multiprocessor Implementation

    Get PDF
    Model-based control of robot manipulators has been gaining momentum in recent years. Unfortunately there are very few experimental validations to accompany simulation results and as such majority of conclusions drawn lack the credibility associated with the real control implementation

    Approche efficace pour la conception des architectures multiprocesseurs sur puce électronique

    Full text link
    Les systèmes multiprocesseurs sur puce électronique (On-Chip Multiprocessor [OCM]) sont considérés comme les meilleures structures pour occuper l'espace disponible sur les circuits intégrés actuels. Dans nos travaux, nous nous intéressons à un modèle architectural, appelé architecture isométrique de systèmes multiprocesseurs sur puce, qui permet d'évaluer, de prédire et d'optimiser les systèmes OCM en misant sur une organisation efficace des nœuds (processeurs et mémoires), et à des méthodologies qui permettent d'utiliser efficacement ces architectures. Dans la première partie de la thèse, nous nous intéressons à la topologie du modèle et nous proposons une architecture qui permet d'utiliser efficacement et massivement les mémoires sur la puce. Les processeurs et les mémoires sont organisés selon une approche isométrique qui consiste à rapprocher les données des processus plutôt que d'optimiser les transferts entre les processeurs et les mémoires disposés de manière conventionnelle. L'architecture est un modèle maillé en trois dimensions. La disposition des unités sur ce modèle est inspirée de la structure cristalline du chlorure de sodium (NaCl), où chaque processeur peut accéder à six mémoires à la fois et où chaque mémoire peut communiquer avec autant de processeurs à la fois. Dans la deuxième partie de notre travail, nous nous intéressons à une méthodologie de décomposition où le nombre de nœuds du modèle est idéal et peut être déterminé à partir d'une spécification matricielle de l'application qui est traitée par le modèle proposé. Sachant que la performance d'un modèle dépend de la quantité de flot de données échangées entre ses unités, en l'occurrence leur nombre, et notre but étant de garantir une bonne performance de calcul en fonction de l'application traitée, nous proposons de trouver le nombre idéal de processeurs et de mémoires du système à construire. Aussi, considérons-nous la décomposition de la spécification du modèle à construire ou de l'application à traiter en fonction de l'équilibre de charge des unités. Nous proposons ainsi une approche de décomposition sur trois points : la transformation de la spécification ou de l'application en une matrice d'incidence dont les éléments sont les flots de données entre les processus et les données, une nouvelle méthodologie basée sur le problème de la formation des cellules (Cell Formation Problem [CFP]), et un équilibre de charge de processus dans les processeurs et de données dans les mémoires. Dans la troisième partie, toujours dans le souci de concevoir un système efficace et performant, nous nous intéressons à l'affectation des processeurs et des mémoires par une méthodologie en deux étapes. Dans un premier temps, nous affectons des unités aux nœuds du système, considéré ici comme un graphe non orienté, et dans un deuxième temps, nous affectons des valeurs aux arcs de ce graphe. Pour l'affectation, nous proposons une modélisation des applications décomposées en utilisant une approche matricielle et l'utilisation du problème d'affectation quadratique (Quadratic Assignment Problem [QAP]). Pour l'affectation de valeurs aux arcs, nous proposons une approche de perturbation graduelle, afin de chercher la meilleure combinaison du coût de l'affectation, ceci en respectant certains paramètres comme la température, la dissipation de chaleur, la consommation d'énergie et la surface occupée par la puce. Le but ultime de ce travail est de proposer aux architectes de systèmes multiprocesseurs sur puce une méthodologie non traditionnelle et un outil systématique et efficace d'aide à la conception dès la phase de la spécification fonctionnelle du système.On-Chip Multiprocessor (OCM) systems are considered to be the best structures to occupy the abundant space available on today integrated circuits (IC). In our thesis, we are interested on an architectural model, called Isometric on-Chip Multiprocessor Architecture (ICMA), that optimizes the OCM systems by focusing on an effective organization of cores (processors and memories) and on methodologies that optimize the use of these architectures. In the first part of this work, we study the topology of ICMA and propose an architecture that enables efficient and massive use of on-chip memories. ICMA organizes processors and memories in an isometric structure with the objective to get processed data close to the processors that use them rather than to optimize transfers between processors and memories, arranged in a conventional manner. ICMA is a mesh model in three dimensions. The organization of our architecture is inspired by the crystal structure of sodium chloride (NaCl), where each processor can access six different memories and where each memory can communicate with six processors at once. In the second part of our work, we focus on a methodology of decomposition. This methodology is used to find the optimal number of nodes for a given application or specification. The approach we use is to transform an application or a specification into an incidence matrix, where the entries of this matrix are the interactions between processors and memories as entries. In other words, knowing that the performance of a model depends on the intensity of the data flow exchanged between its units, namely their number, we aim to guarantee a good computing performance by finding the optimal number of processors and memories that are suitable for the application computation. We also consider the load balancing of the units of ICMA during the specification phase of the design. Our proposed decomposition is on three points: the transformation of the specification or application into an incidence matrix, a new methodology based on the Cell Formation Problem (CFP), and load balancing processes in the processors and data in memories. In the third part, we focus on the allocation of processor and memory by a two-step methodology. Initially, we allocate units to the nodes of the system structure, considered here as an undirected graph, and subsequently we assign values to the arcs of this graph. For the assignment, we propose modeling of the decomposed application using a matrix approach and the Quadratic Assignment Problem (QAP). For the assignment of the values to the arcs, we propose an approach of gradual changes of these values in order to seek the best combination of cost allocation, this under certain metric constraints such as temperature, heat dissipation, power consumption and surface occupied by the chip. The ultimate goal of this work is to propose a methodology for non-traditional, systematic and effective decision support design tools for multiprocessor system architects, from the phase of functional specification

    SCHEDULING REAL-TIME GRAPH-BASED WORKLOADS

    Get PDF
    Developments in the semiconductor industry in the previous decades have made possible computing platforms with very large computing capacities that, in turn, have stimulated the rapid progress of computationally intensive computer vision (CV) algorithms with highly parallelizable structure (often represented as graphs). Applications using such algorithms are the foundation for the transformation of semi-autonomous systems (e.g., advanced driver-assist systems) to future fully-autonomous systems (e.g., self-driving cars). Enabling mass-produced safety-critical systems with full autonomy requires real-time execution guarantees as a part of system certification.Since multiple CV applications may need to share the same hardware platform due to size, weight, power, and cost constraints, system component isolation is necessary to avoid explosive interference growth that breaks all execution guarantees. Existing software certification processes achieve component isolation through time partitioning, which can be broken by accelerator usage, which is essential for high-efficacy CV algorithms.The goal of this dissertation is to make a first step towards providing real-time guarantees for safety-critical systems by analyzing the scheduling of highly parallel accelerator-using workloads isolated in system components. The specific contributions are threefold.First, a general method for graph-based workloads’ response-time-bound reduction through graph structure modifications is introduced, leading to significant response-time-bound reductions. Second, a generalized real-time task model is introduced that enables real-time response-time bounds for a wider range of graph-based workloads. A proposed response-time analysis for the introduced model accounts for potential accelerator usage within tasks. Third, a scheduling approach for graph-based workloads in a single system component is proposed that ensures the temporal isolation of system components. A response-time analysis for workloads with accelerator usage is presented alongside a non-mandatory schedulability-improvement step. This approach can help to enable component-wise certification in the considered systems.Doctor of Philosoph
    corecore