5 research outputs found

    Розробка інструментальних засобів MATLAB/Simulink для проектування керуючих пристроїв систем управління на базі ПЛІС

    Get PDF
    У статті розглядається проблема створення інструментальних засобів для використання методології швидкого прототипування та сумісного моделювання при розробці керуючих пристроїв систем управління. Детально розглянуто мету, засоби і переваги підходу швидкого прототипування. Також розглянуто зв'язок швидкого прототипування та сумісного моделювання, переваги їх використання у комплексі. Як рішення запропоновано універсальний спосіб зв’язку між середовищем моделювання та апаратним прототипом. Результатом роботи є розроблена програма, що дозволяє використовувати апаратні прототипи керуючих пристроїв, створені на ПЛІС, для сумісного моделювання з моделлю системи створеної у програмному пакеті Matlab.This article addresses the problem of creating a design environment for rapid prototyping and co-simulation to be used in field of control systems. Purposes, techniques and advantages of rapid prototyping approach are covered in details. As well a co-simulation approach is covered in conjunction with rapid prototyping; the advantages of using them in combination are presented. A universal way of communication between a simulation environment and a prototype is proposed as a part of the solution. The study resulted in a development of a program that allows hardware prototypes implemented in FPGA to be used for co-simulation with a model of a system created in MATLAB/Simulink environment

    Методы преобразования параллелизма в процессе высокоуровневого синтеза СБИС

    Get PDF
    In this paper methods for increasing the efficiency of VLSI development based on the method of architecture-independent design are proposed. The route of high-level VLSI synthesis is considered. The principle of constructing a VLSI hardware model based on the functional-flow programming paradigm is stated.The results of the development of methods and algorithms for transformation functional-parallel programs into programs in HDL languages that support the design process of digital chips are presented. The principles of assessment are considered and the classes of resources required for the analysis of design solutions are identified. Reduction coefficients and methods of their calculation for each resource class have been introduced. An algorithm for calculating the reduction coefficients and estimating the required resources is proposed. An algorithm for converting parallelism is proposed, taking into account the specified constraints of the target platform. A mechanism for the exchange of metrics with an architecture-dependent level has been developed. Examples of parallelism reduction for the FPGA platform and practical implementation of FFT algorithms in the Virtex® UltraScale FPGA basis are given. The developed methods and algorithms make it possible to use the method of architecture-independent synthesis for transferring VLSI projects to various architectures by changing the parallelism of the circuit and equivalent transformations of parallel programs. The proposed approach provides many options for hardware solutions for implementation on various target platforms.Предложены методы повышения эффективности разработки СБИС на основе метода архитектурно-независимого проектирования. Рассмотрен маршрут высокоуровневого синтеза СБИС. Изложен принцип построения аппаратной модели СБИС на основе функционально-потоковой парадигмы программирования.Представлены результаты разработки методов и алгоритмов трансформации, функционально-потоковых параллельных программ в программы на языках описания аппаратуры, обеспечивающих поддержку процесса проектирования цифровых однокристальных систем. Рассмотрены принципы оценки и выделены классы ресурсов, требуемых для анализа проектных решений. Введены коэффициенты редукции и методики их расчета по каждому классу ресурсов. Предложен алгоритм расчета коэффициентов редукции и оценки требуемых ресурсов. Предложен алгоритм преобразования параллелизма с учетом заданных ограничений целевой платформы. Разработан механизм обмена метриками с архитектурно-зависимым уровнем. Приведены примеры редукции параллелизма для платформы ПЛИС и практическая реализация тестовых алгоритмов БПФ в базисе ПЛИС Virtex® UltraScale. Разработанные методы и алгоритмы позволяют использовать метод архитектурно-независимого синтеза для переноса проектов СБИС на различные архитектуры с помощью изменения параллелизма схемы и эквивалентных преобразований параллельных программ. Предложенный подход обеспечивает множество вариантов аппаратных решений для реализации на различных целевых платформах

    Architectural explorations for streaming accelerators with customized memory layouts

    Get PDF
    El concepto básico de la arquitectura mono-nucleo en los procesadores de propósito general se ajusta bien a un modelo de programación secuencial. La integración de multiples núcleos en un solo chip ha permitido a los procesadores correr partes del programa en paralelo. Sin embargo, la explotación del enorme paralelismo disponible en muchas aplicaciones de alto rendimiento y de los datos correspondientes es difícil de conseguir usando unicamente multicores de propósito general. La aparición de aceleradores tipo streaming y de los correspondientes modelos de programación han mejorado esta situación proporcionando arquitecturas orientadas al proceso de flujos de datos. La idea básica detrás del diseño de estas arquitecturas responde a la necesidad de procesar conjuntos enormes de datos. Estos dispositivos de alto rendimiento orientados a flujos permiten el procesamiento rapido de datos mediante el uso eficiente de computación paralela y comunicación entre procesos. Los aceleradores streaming orientados a flujos, igual que en otros procesadores, consisten en diversos componentes micro-arquitectonicos como por ejemplo las estructuras de memoria, las unidades de computo, las unidades de control, los canales de Entrada/Salida y controles de Entrada/Salida, etc. Sin embargo, los requisitos del flujo de datos agregan algunas características especiales e imponen otras restricciones que afectan al rendimiento. Estos dispositivos, por lo general, ofrecen un gran número de recursos computacionales, pero obligan a reorganizar los conjuntos de datos en paralelo, maximizando la independiencia para alimentar los recursos de computación en forma de flujos. La disposición de datos en conjuntos independientes de flujos paralelos no es una tarea sencilla. Es posible que se tenga que cambiar la estructura de un algoritmo en su conjunto o, incluso, puede requerir la reescritura del algoritmo desde cero. Sin embargo, todos estos esfuerzos para la reordenación de los patrones de las aplicaciones de acceso a datos puede que no sean muy útiles para lograr un rendimiento óptimo. Esto es debido a las posibles limitaciones microarquitectonicas de la plataforma de destino para los mecanismos hardware de prefetch, el tamaño y la granularidad del almacenamiento local, y la flexibilidad para disponer de forma serial los datos en el interior del almacenamiento local. Las limitaciones de una plataforma de streaming de proposito general para el prefetching de datos, almacenamiento y demas procedimientos para organizar y mantener los datos en forma de flujos paralelos e independientes podría ser eliminado empleando técnicas a nivel micro-arquitectonico. Esto incluye el uso de memorias personalizadas especificamente para las aplicaciones en el front-end de una arquitectura streaming. El objetivo de esta tesis es presentar exploraciones arquitectónicas de los aceleradores streaming con diseños de memoria personalizados. En general, la tesis cubre tres aspectos principales de tales aceleradores. Estos aspectos se pueden clasificar como: i) Diseño de aceleradores de aplicaciones específicas con diseños de memoria personalizados, ii) diseño de aceleradores con memorias personalizadas basados en plantillas, y iii) exploraciones del espacio de diseño para dispositivos orientados a flujos con las memorias estándar y personalizadas. Esta tesis concluye con la propuesta conceptual de una Blacksmith Streaming Architecture (BSArc). El modelo de computación Blacksmith permite la adopción a nivel de hardware de un front-end de aplicación específico utilizando una GPU como back-end. Esto permite maximizar la explotación de la localidad de datos y el paralelismo a nivel de datos de una aplicación mientras que proporciona un flujo mayor de datos al back-end. Consideramos que el diseño de estos procesadores con memorias especializadas debe ser proporcionado por expertos del dominio de aplicación en la forma de plantillas.The basic concept behind the architecture of a general purpose CPU core conforms well to a serial programming model. The integration of more cores on a single chip helped CPUs in running parts of a program in parallel. However, the utilization of huge parallelism available from many high performance applications and the corresponding data is hard to achieve from these general purpose multi-cores. Streaming accelerators and the corresponding programing models improve upon this situation by providing throughput oriented architectures. The basic idea behind the design of these architectures matches the everyday increasing requirements of processing huge data sets. These high-performance throughput oriented devices help in high performance processing of data by using efficient parallel computations and streaming based communications. The throughput oriented streaming accelerators ¿ similar to the other processors ¿ consist of numerous types of micro-architectural components including the memory structures, compute units, control units, I/O channels and I/O controls etc. However, the throughput requirements add some special features and impose other restrictions for the performance purposes. These devices, normally, offer a large number of compute resources but restrict the applications to arrange parallel and maximally independent data sets to feed the compute resources in the form of streams. The arrangement of data into independent sets of parallel streams is not an easy and simple task. It may need to change the structure of an algorithm as a whole or even it can require to write a new algorithm from scratch for the target application. However, all these efforts for the re-arrangement of application data access patterns may still not be very helpful to achieve the optimal performance. This is because of the possible micro-architectural constraints of the target platform for the hardware pre-fetching mechanisms, the size and the granularity of the local storage and the flexibility in data marshaling inside the local storage. The constraints of a general purpose streaming platform on the data pre-fetching, storing and maneuvering to arrange and maintain it in the form of parallel and independent streams could be removed by employing micro-architectural level design approaches. This includes the usage of application specific customized memories in the front-end of a streaming architecture. The focus of this thesis is to present architectural explorations for the streaming accelerators using customized memory layouts. In general the thesis covers three main aspects of such streaming accelerators in this research. These aspects can be categorized as : i) Design of Application Specific Accelerators with Customized Memory Layout ii) Template Based Design Support for Customized Memory Accelerators and iii) Design Space Explorations for Throughput Oriented Devices with Standard and Customized Memories. This thesis concludes with a conceptual proposal on a Blacksmith Streaming Architecture (BSArc). The Blacksmith Computing allow the hardware-level adoption of an application specific front-end with a GPU like streaming back-end. This gives an opportunity to exploit maximum possible data locality and the data level parallelism from an application while providing a throughput natured powerful back-end. We consider that the design of these specialized memory layouts for the front-end of the device are provided by the application domain experts in the form of templates. These templates are adjustable according to a device and the problem size at the device's configuration time. The physical availability of such an architecture may still take time. However, simulation framework helps in architectural explorations to give insight into the proposal and predicts potential performance benefits for such an architecture.Postprint (published version

    Architectural explorations for streaming accelerators with customized memory layouts

    Get PDF
    El concepto básico de la arquitectura mono-nucleo en los procesadores de propósito general se ajusta bien a un modelo de programación secuencial. La integración de multiples núcleos en un solo chip ha permitido a los procesadores correr partes del programa en paralelo. Sin embargo, la explotación del enorme paralelismo disponible en muchas aplicaciones de alto rendimiento y de los datos correspondientes es difícil de conseguir usando unicamente multicores de propósito general. La aparición de aceleradores tipo streaming y de los correspondientes modelos de programación han mejorado esta situación proporcionando arquitecturas orientadas al proceso de flujos de datos. La idea básica detrás del diseño de estas arquitecturas responde a la necesidad de procesar conjuntos enormes de datos. Estos dispositivos de alto rendimiento orientados a flujos permiten el procesamiento rapido de datos mediante el uso eficiente de computación paralela y comunicación entre procesos. Los aceleradores streaming orientados a flujos, igual que en otros procesadores, consisten en diversos componentes micro-arquitectonicos como por ejemplo las estructuras de memoria, las unidades de computo, las unidades de control, los canales de Entrada/Salida y controles de Entrada/Salida, etc. Sin embargo, los requisitos del flujo de datos agregan algunas características especiales e imponen otras restricciones que afectan al rendimiento. Estos dispositivos, por lo general, ofrecen un gran número de recursos computacionales, pero obligan a reorganizar los conjuntos de datos en paralelo, maximizando la independiencia para alimentar los recursos de computación en forma de flujos. La disposición de datos en conjuntos independientes de flujos paralelos no es una tarea sencilla. Es posible que se tenga que cambiar la estructura de un algoritmo en su conjunto o, incluso, puede requerir la reescritura del algoritmo desde cero. Sin embargo, todos estos esfuerzos para la reordenación de los patrones de las aplicaciones de acceso a datos puede que no sean muy útiles para lograr un rendimiento óptimo. Esto es debido a las posibles limitaciones microarquitectonicas de la plataforma de destino para los mecanismos hardware de prefetch, el tamaño y la granularidad del almacenamiento local, y la flexibilidad para disponer de forma serial los datos en el interior del almacenamiento local. Las limitaciones de una plataforma de streaming de proposito general para el prefetching de datos, almacenamiento y demas procedimientos para organizar y mantener los datos en forma de flujos paralelos e independientes podría ser eliminado empleando técnicas a nivel micro-arquitectonico. Esto incluye el uso de memorias personalizadas especificamente para las aplicaciones en el front-end de una arquitectura streaming. El objetivo de esta tesis es presentar exploraciones arquitectónicas de los aceleradores streaming con diseños de memoria personalizados. En general, la tesis cubre tres aspectos principales de tales aceleradores. Estos aspectos se pueden clasificar como: i) Diseño de aceleradores de aplicaciones específicas con diseños de memoria personalizados, ii) diseño de aceleradores con memorias personalizadas basados en plantillas, y iii) exploraciones del espacio de diseño para dispositivos orientados a flujos con las memorias estándar y personalizadas. Esta tesis concluye con la propuesta conceptual de una Blacksmith Streaming Architecture (BSArc). El modelo de computación Blacksmith permite la adopción a nivel de hardware de un front-end de aplicación específico utilizando una GPU como back-end. Esto permite maximizar la explotación de la localidad de datos y el paralelismo a nivel de datos de una aplicación mientras que proporciona un flujo mayor de datos al back-end. Consideramos que el diseño de estos procesadores con memorias especializadas debe ser proporcionado por expertos del dominio de aplicación en la forma de plantillas.The basic concept behind the architecture of a general purpose CPU core conforms well to a serial programming model. The integration of more cores on a single chip helped CPUs in running parts of a program in parallel. However, the utilization of huge parallelism available from many high performance applications and the corresponding data is hard to achieve from these general purpose multi-cores. Streaming accelerators and the corresponding programing models improve upon this situation by providing throughput oriented architectures. The basic idea behind the design of these architectures matches the everyday increasing requirements of processing huge data sets. These high-performance throughput oriented devices help in high performance processing of data by using efficient parallel computations and streaming based communications. The throughput oriented streaming accelerators ¿ similar to the other processors ¿ consist of numerous types of micro-architectural components including the memory structures, compute units, control units, I/O channels and I/O controls etc. However, the throughput requirements add some special features and impose other restrictions for the performance purposes. These devices, normally, offer a large number of compute resources but restrict the applications to arrange parallel and maximally independent data sets to feed the compute resources in the form of streams. The arrangement of data into independent sets of parallel streams is not an easy and simple task. It may need to change the structure of an algorithm as a whole or even it can require to write a new algorithm from scratch for the target application. However, all these efforts for the re-arrangement of application data access patterns may still not be very helpful to achieve the optimal performance. This is because of the possible micro-architectural constraints of the target platform for the hardware pre-fetching mechanisms, the size and the granularity of the local storage and the flexibility in data marshaling inside the local storage. The constraints of a general purpose streaming platform on the data pre-fetching, storing and maneuvering to arrange and maintain it in the form of parallel and independent streams could be removed by employing micro-architectural level design approaches. This includes the usage of application specific customized memories in the front-end of a streaming architecture. The focus of this thesis is to present architectural explorations for the streaming accelerators using customized memory layouts. In general the thesis covers three main aspects of such streaming accelerators in this research. These aspects can be categorized as : i) Design of Application Specific Accelerators with Customized Memory Layout ii) Template Based Design Support for Customized Memory Accelerators and iii) Design Space Explorations for Throughput Oriented Devices with Standard and Customized Memories. This thesis concludes with a conceptual proposal on a Blacksmith Streaming Architecture (BSArc). The Blacksmith Computing allow the hardware-level adoption of an application specific front-end with a GPU like streaming back-end. This gives an opportunity to exploit maximum possible data locality and the data level parallelism from an application while providing a throughput natured powerful back-end. We consider that the design of these specialized memory layouts for the front-end of the device are provided by the application domain experts in the form of templates. These templates are adjustable according to a device and the problem size at the device's configuration time. The physical availability of such an architecture may still take time. However, simulation framework helps in architectural explorations to give insight into the proposal and predicts potential performance benefits for such an architecture

    Generalised correlation higher order neural networks, neural network operation and Levenberg-Marquardt training on field programmable gate arrays

    Get PDF
    Higher Order Neural Networks (HONNs) were introduced in the late 80's as a solution to the increasing complexity within Neural Networks (NNs). Similar to NNs HONNs excel at performing pattern recognition, classification, optimisation particularly for non-linear systems in varied applications such as communication channel equalisation, real time intelligent control, and intrusion detection. This research introduced new HONNs called the Generalised Correlation Higher Order Neural Networks which as an extension to the ordinary first order NNs and HONNs, based on interlinked arrays of correlators with known relationships, they provide the NN with a more extensive view by introducing interactions between the data as an input to the NN model. All studies included two data sets to generalise the applicability of the findings. The research investigated the performance of HONNs in the estimation of short term returns of two financial data sets, the FTSE 100 and NASDAQ. The new models were compared against several financial models and ordinary NNs. Two new HONNs, the Correlation HONN (C-HONN) and the Horizontal HONN (Horiz-HONN) outperformed all other models tested in terms of the Akaike Information Criterion (AIC). The new work also investigated HONNs for camera calibration and image mapping. HONNs were compared against NNs and standard analytical methods in terms of mapping performance for three cases; 3D-to-2D mapping, a hybrid model combining HONNs with an analytical model, and 2D-to-3D inverse mapping. This study considered 2 types of data, planar data and co-planar (cube) data. To our knowledge this is the first study comparing HONNs against NNs and analytical models for camera calibration. HONNs were able to transform the reference grid onto the correct camera coordinate and vice versa, an aspect that the standard analytical model fails to perform with the type of data used. HONN 3D-to-2D mapping had calibration error lower than the parametric model by up to 24% for plane data and 43% for cube data. The hybrid model also had lower calibration error than the parametric model by 12% for plane data and 34% for cube data. However, the hybrid model did not outperform the fully non-parametric models. Using HONNs for inverse mapping from 2D-to-3D outperformed NNs by up to 47% in the case of cube data mapping. This thesis is also concerned with the operation and training of NNs in limited precision specifically on Field Programmable Gate Arrays (FPGAs). Our findings demonstrate the feasibility of on-line, real-time, low-latency training on limited precision electronic hardware such as Digital Signal Processors (DSPs) and FPGAs. This thesis also investigated the e�ffects of limited precision on the Back Propagation (BP) and Levenberg-Marquardt (LM) optimisation algorithms. Two new HONNs are compared against NNs for estimating the discrete XOR function and an optical waveguide sidewall roughness dataset in order to find the Minimum Precision for Lowest Error (MPLE) at which the training and operation are still possible. The new findings show that compared to NNs, HONNs require more precision to reach a similar performance level, and that the 2nd order LM algorithm requires at least 24 bits of precision. The final investigation implemented and demonstrated the LM algorithm on Field Programmable Gate Arrays (FPGAs) for the first time in our knowledge. It was used to train a Neural Network, and the estimation of camera calibration parameters. The LM algorithm approximated NN to model the XOR function in only 13 iterations from zero initial conditions with a speed-up in excess of 3 x 10^6 compared to an implementation in software. Camera calibration was also demonstrated on FPGAs; compared to the software implementation, the FPGA implementation led to an increase in the mean squared error and standard deviation of only 17.94% and 8.04% respectively, but the FPGA increased the calibration speed by a factor of 1:41 x 106
    corecore