Search CORE

3,117 research outputs found

Evaluation of the parallel computational capabilities of embedded platforms for critical systems

Author: Jover-Alvarez Alvaro
Publication venue: Universitat Politècnica de Catalunya
Publication date: 28/10/2021
Field of study

Modern critical systems need higher performance which cannot be delivered by the simple architectures used so far. Latest embedded architectures feature multi-cores and GPUs, which can be used to satisfy this need. In this thesis we parallelise relevant applications from multiple critical domains represented in the GPU4S benchmark suite, and perform a comparison of the parallel capabilities of candidate platforms for use in critical systems. In particular, we port the open source GPU4S Bench benchmarking suite in the OpenMP programming model, and we benchmark the candidate embedded heterogeneous multi-core platforms of the H2020 UP2DATE project, NVIDIA TX2, NVIDIA Xavier and Xilinx Zynq Ultrascale+, in order to drive the selection of the research platform which will be used in the next phases of the project. Our result indicate that in terms of CPU and GPU performance, the NVIDIA Xavier is the highest performing platform

UPCommons. Portal del coneixement obert de la UPC

Developing and applying heterogeneous phylogenetic models with XRate

Author: A Heger
A Siepel
A Varadarajan
AJ Drummond
B Knudsen
B Knudsen
Christos A. Ouzounis
D Ayres
DB Searls
E Birney
G Lunter
GSC Slater
Ian Holmes
IM Meyer
J Felsenstein
J Goecks
J Watts
JS Pedersen
L Stein
M Garber
M Hasegawa
M Kimura
M Zuker
ME Skinner
N Saitou
O Penn
Oscar Westesson
PS Klosterman
RK Bradley
SR Eddy
TH Jukes
WJ Kent
Z Yang
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 16/02/2012
Field of study

Modeling sequence evolution on phylogenetic trees is a useful technique in computational biology. Especially powerful are models which take account of the heterogeneous nature of sequence evolution according to the "grammar" of the encoded gene features. However, beyond a modest level of model complexity, manual coding of models becomes prohibitively labor-intensive. We demonstrate, via a set of case studies, the new built-in model-prototyping capabilities of XRate (macros and Scheme extensions). These features allow rapid implementation of phylogenetic models which would have previously been far more labor-intensive. XRate's new capabilities for lineage-specific models, ancestral sequence reconstruction, and improved annotation output are also discussed. XRate's flexible model-specification capabilities and computational efficiency make it well-suited to developing and prototyping phylogenetic grammar models. XRate is available as part of the DART software package: http://biowiki.org/DART .Comment: 34 pages, 3 figures, glossary of XRate model terminolog

arXiv.org e-Print Archive

Crossref

PubMed Central

FigShare

Adaptive optimization for OpenCL programs on embedded heterogeneous systems

Author: Cho Y.
Garzón E.
Grewe D.
Imes C.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 21/06/2017
Field of study

Heterogeneous multi-core architectures consisting of CPUs and GPUs are commonplace in today’s embedded systems. These architectures offer potential for energy efficient computing if the application task is mapped to the right core. Realizing such potential is challenging due to the complex and evolving nature of hardware and applications. This paper presents an automatic approach to map OpenCL kernels onto heterogeneous multi-cores for a given optimization criterion – whether it is faster runtime, lower energy consumption or a trade-off between them. This is achieved by developing a machine learning based approach to predict which processor to use to run the OpenCL kernel and the host program, and at what frequency the processor should operate. Instead of hand-tuning a model for each optimization metric, we use machine learning to develop a unified framework that first automatically learns the optimization heuristic for each metric off-line, then uses the learned knowledge to schedule OpenCL kernels at runtime based on code and runtime information of the program. We apply our approach to a set of representative OpenCL benchmarks and evaluate it on an ARM big.LITTLE mobile platform. Our approach achieves over 93% of the performance delivered by a perfect predictor.We obtain, on average, 1.2x, 1.6x, and 1.8x improvement respectively for runtime, energy consumption and the energy delay product when compared to a comparative heterogeneous-aware OpenCL task mapping scheme

Crossref

Lancaster E-Prints

Information-Theoretic Control of Multiple Sensor Platforms

Author: Grocholsky Ben
Publication venue: Faculty of Engineering and Information Technologies, School of Aerospace, Mechanical and Mechatronic Engineering
Publication date: 01/01/2002
Field of study

This thesis is concerned with the development of a consistent, information-theoretic basis for understanding of coordination and cooperation decentralised multi-sensor multi-platform systems. Autonomous systems composed of multiple sensors and multiple platforms potentially have significant importance in applications such as defence, search and rescue mining or intelligent manufacturing. However, the effective use of multiple autonomous systems requires that an understanding be developed of the mechanisms of coordination and cooperation between component systems in pursuit of a common goal. A fundamental, quantitative, understanding of coordination and cooperation between decentralised autonomous systems is the main goal of this thesis. This thesis focuses on the problem of coordination and cooperation for teams of autonomous systems engaged in information gathering and data fusion tasks. While this is a subset of the general cooperative autonomous systems problem, it still encompasses a range of possible applications in picture compilation, navigation, searching and map building problems. The great advantage of restricting the domain of interest in this way is that an underlying mathematical model for coordination and cooperation can be based on the use of information-theoretic models of platform and sensor abilities. The information theoretic approach builds on the established principles and architecture previously developed for decentralised data fusion systems. In the decentralised control problem addressed in this thesis, each platform and sensor system is considered to be a distinct decision maker with an individual information-theoretic utility measure capturing both local objectives and the inter-dependencies among the decisions made by other members of the team. Together these information-theoretic utilities constitute the team objective. The key contributions of this thesis lie in the quantification and study of cooperative control between sensors and platforms using information as a common utility measure. In particular, * The problem of information gathering is formulated as an optimal control problem by identifying formal measures of information with utility or pay-off. * An information-theoretic utility model of coupling and coordination between decentralised decision makers is elucidated. This is used to describe how the information gathering strategies of a team of autonomous systems are coupled. * Static and dynamic information structures for team members are defined. It is shown that the use of static information structures can lead to efficient, although sub-optimal, decentralised control strategies for the team. * Significant examples in decentralised control of a team of sensors are developed. These include the multi-vehicle multi-target bearings-only tracking problem, and the area coverage or exploration problem for multiple vehicles. These examples demonstrate the range of non-trivial problems to which the theory in this thesis can be employed

Estudo Geral

Sydney eScholarship

Information-Theoretic Control of Multiple Sensor Platforms

Author: Grocholsky Ben
Publication venue: Faculty of Engineering and Information Technologies, School of Aerospace, Mechanical and Mechatronic Engineering
Publication date: 01/01/2002
Field of study

Sydney eScholarship

An analysis of key generation efficiency of RSA cryptosystem in distributed environments

Author: Çağrıcı Gökhan
Publication venue: Izmir Institute of Technology
Publication date: 01/01/2005
Field of study

Thesis (Master)--Izmir Institute of Technology, Computer Engineering, Izmir, 2005Includes bibliographical references (leaves: 68)Text in English Abstract: Turkish and Englishix, 74 leavesAs the size of the communication through networks and especially through Internet grew, there became a huge need for securing these connections. The symmetric and asymmetric cryptosystems formed a good complementary approach for providing this security. While the asymmetric cryptosystems were a perfect solution for the distribution of the keys used by the communicating parties, they were very slow for the actual encryption and decryption of the data flowing between them. Therefore, the symmetric cryptosystems perfectly filled this space and were used for the encryption and decryption process once the session keys had been exchanged securely. Parallelism is a hot research topic area in many different fields and being used to deal with problems whose solutions take a considerable amount of time. Cryptography is no exception and, computer scientists have discovered that parallelism could certainly be used for making the algorithms for asymmetric cryptosystems go faster and the experimental results have shown a good promise so far. This thesis is based on the parallelization of a famous public-key algorithm, namely RSA

Next-generation lane centering assist system : design and implementation of a lane centering assist system, using NXP-Bluebox

Author: Ismail R.
Publication venue: Technische Universiteit Eindhoven
Publication date: 31/10/2017
Field of study

Pure OAI Repository

Parallel source code transformation techniques using design patterns

Author: Río Astorga David del
Publication venue
Publication date: 01/01/2018
Field of study

Mención Internacional en el título de doctorIn recent years, the traditional approaches for improving performance, such as increasing the clock frequency, has come to a dead-end. To tackle this issue, parallel architectures, such as multi-/many-core processors, have been envisioned to increase the performance by providing greater processing capabilities. However, programming efficiently for this architectures demands big efforts in order to transform sequential applications into parallel and to optimize such applications. Compared to sequential programming, designing and implementing parallel applications for operating on modern hardware poses a number of new challenges to developers such as data races, deadlocks, load imbalance, etc. To pave the way, parallel design patterns provide a way to encapsulate algorithmic aspects, allowing users to implement robust, readable and portable solutions with such high-level abstractions. Basically, these patterns instantiate parallelism while hiding away the complexity of concurrency mechanisms, such as thread management, synchronizations or data sharing. Nonetheless, frameworks following this philosophy does not share the same interface and users require understanding different libraries, and their capabilities, not only to decide which fits best for their purposes but also to properly leverage them. Furthermore, in order to parallelize these applications, it is necessary to analyze the sequential code in order to detect the regions of code that can be parallelized that is a time consuming and complex task. Additionally, different libraries targeted to specific devices provide some algorithms implementations that are already parallel and highly-tuned. In these situations, it is also necessary to analyze and determine which routine implementation is the most suitable for a given problem. To tackle these issues, this thesis aims at simplifying and minimizing the necessary efforts to transform sequential applications into parallel. This way, resulting codes will improve their performance by fully exploiting the available resources while the development efforts will be considerably reduced. Basically, in this thesis, we contribute with the following. First, we propose a technique to detect potential parallel patterns in sequential code. Second, we provide a novel generic C++ interface for parallel patterns which acts as a switch among existing frameworks. Third, we implement a framework that is able to transform sequential code into parallel using the proposed pattern discovery technique and pattern interface. Finally, we propose mechanisms that are able to select the most suitable device and routine implementation to solve a given problem based on previous performance information. The evaluation demonstrates that using the proposed techniques can minimize the refactoring and optimization time while improving the performance of the resulting applications with respect to the original code.En los últimos años, las técnicas tradicionales para mejorar el rendimiento, como es el caso del incremento de la frecuencia de reloj, han llegado a sus límites. Con el fin de seguir mejorando el rendimiento, se han desarrollado las arquitecturas paralelas, las cuales proporcionan un incremento del rendimiento al estar provistas de mayores capacidades de procesamiento. Sin embargo, programar de forma eficiente para estas arquitecturas requieren de grandes esfuerzos por parte de los desarrolladores. Comparado con la programación secuencial, diseñar e implementar aplicaciones paralelas enfocadas a trabajar en estas arquitecturas presentan una gran cantidad de dificultades como son las condiciones de carrera, los deadlocks o el incorrecto balanceo de la carga. En este sentido, los patrones paralelos son una forma de encapsular aspectos algorítmicos de las aplicaciones permitiendo el desarrollo de soluciones robustas, portables y legibles gracias a las abstracciones de alto nivel. En general, estos patrones son capaces de proporcionar el paralelismo a la vez que ocultan las complejidades derivadas de los mecanismos de control de concurrencia necesarios como el manejo de los hilos, las sincronizaciones o la compartición de datos. No obstante, los diferentes frameworks que siguen esta filosofía no comparten una única interfaz lo que conlleva que los usuarios deban conocer múltiples bibliotecas y sus capacidades, con el fin de decidir cuál de ellos es mejor para una situación concreta y como usarlos de forma eficiente. Además, con el fin de paralelizar aplicaciones existentes, es necesario analizar e identificar las regiones del código que pueden ser paralelizadas, lo cual es una tarea ardua y compleja. Además, algunos algoritmos ya se encuentran implementados en paralelo y optimizados para arquitecturas concretas en diversas bibliotecas. Esto da lugar a que sea necesario analizar y determinar que implementación concreta es la más adecuada para solucionar un problema dado. Para paliar estas situaciones, está tesis busca simplificar y minimizar el esfuerzo necesario para transformar aplicaciones secuenciales en paralelas. De esta forma, los códigos resultantes serán capaces de explotar los recursos disponibles a la vez que se reduce considerablemente el esfuerzo de desarrollo necesario. En general, esta tesis contribuye con lo siguiente. En primer lugar, se propone una técnica de detección de patrones paralelos en códigos secuenciales. En segundo lugar, se presenta una interfaz genérica de patrones paralelos para C++ que permite seleccionar la implementación de dichos patrones proporcionada por frameworks ya existentes. En tercer lugar, se introduce un framework de transformación de código secuencial a paralelo que hace uso de las técnicas de detección de patrones y la interfaz presentadas. Finalmente, se proponen mecanismos capaces de seleccionar la implementación más adecuada para solucionar un problema concreto basándose en el rendimiento obtenido en ejecuciones previas. Gracias a la evaluación realizada se ha podido demostrar que uso de las técnicas presentadas pueden minimizar el tiempo necesario para transformar y optimizar el código a la vez que mejora el rendimiento de las aplicaciones transformadas.Programa Oficial de Doctorado en Ciencia y Tecnología InformáticaPresidente: David Expósito Singh.- Secretario: Rafael Asenjo Plaza.- Vocal: Marco Aldinucc

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Universidad Carlos III de Madrid e-Archivo