218 research outputs found

    Approaching Retargetable Static, Dynamic, and Hybrid Executable-Code Analysis

    Get PDF
    Program comprehension and reverse engineering are two large domains of computer science that have one common goal – analysis of existing programs and understanding their behaviour. In present, methods of source code analysis are well established and used in practice by software engineers. On the other hand, analysis of executable code is a more challenging task that is not fully covered by existing tools. Furthermore, methods of retargetable executable code analysis are rare because of their complexity. In this paper, we present a complex platform independent toolchain for executable-code analysis that supports both static and dynamic analysis. This toolchain, developed within the Lissom project, exploits several previously designed methods and it can be used for debugging user’s applications as well as malware analysis, etc. The main contribution of this paper is to interconnect the existing methods and illustrate their usage on the real world scenarios. Furthermore, we introduce a concept of a new retargetable method – the hybrid analysis. It can eliminate the shortcomings of the static and dynamic analysis in future

    Architectural explorations for streaming accelerators with customized memory layouts

    Get PDF
    El concepto básico de la arquitectura mono-nucleo en los procesadores de propósito general se ajusta bien a un modelo de programación secuencial. La integración de multiples núcleos en un solo chip ha permitido a los procesadores correr partes del programa en paralelo. Sin embargo, la explotación del enorme paralelismo disponible en muchas aplicaciones de alto rendimiento y de los datos correspondientes es difícil de conseguir usando unicamente multicores de propósito general. La aparición de aceleradores tipo streaming y de los correspondientes modelos de programación han mejorado esta situación proporcionando arquitecturas orientadas al proceso de flujos de datos. La idea básica detrás del diseño de estas arquitecturas responde a la necesidad de procesar conjuntos enormes de datos. Estos dispositivos de alto rendimiento orientados a flujos permiten el procesamiento rapido de datos mediante el uso eficiente de computación paralela y comunicación entre procesos. Los aceleradores streaming orientados a flujos, igual que en otros procesadores, consisten en diversos componentes micro-arquitectonicos como por ejemplo las estructuras de memoria, las unidades de computo, las unidades de control, los canales de Entrada/Salida y controles de Entrada/Salida, etc. Sin embargo, los requisitos del flujo de datos agregan algunas características especiales e imponen otras restricciones que afectan al rendimiento. Estos dispositivos, por lo general, ofrecen un gran número de recursos computacionales, pero obligan a reorganizar los conjuntos de datos en paralelo, maximizando la independiencia para alimentar los recursos de computación en forma de flujos. La disposición de datos en conjuntos independientes de flujos paralelos no es una tarea sencilla. Es posible que se tenga que cambiar la estructura de un algoritmo en su conjunto o, incluso, puede requerir la reescritura del algoritmo desde cero. Sin embargo, todos estos esfuerzos para la reordenación de los patrones de las aplicaciones de acceso a datos puede que no sean muy útiles para lograr un rendimiento óptimo. Esto es debido a las posibles limitaciones microarquitectonicas de la plataforma de destino para los mecanismos hardware de prefetch, el tamaño y la granularidad del almacenamiento local, y la flexibilidad para disponer de forma serial los datos en el interior del almacenamiento local. Las limitaciones de una plataforma de streaming de proposito general para el prefetching de datos, almacenamiento y demas procedimientos para organizar y mantener los datos en forma de flujos paralelos e independientes podría ser eliminado empleando técnicas a nivel micro-arquitectonico. Esto incluye el uso de memorias personalizadas especificamente para las aplicaciones en el front-end de una arquitectura streaming. El objetivo de esta tesis es presentar exploraciones arquitectónicas de los aceleradores streaming con diseños de memoria personalizados. En general, la tesis cubre tres aspectos principales de tales aceleradores. Estos aspectos se pueden clasificar como: i) Diseño de aceleradores de aplicaciones específicas con diseños de memoria personalizados, ii) diseño de aceleradores con memorias personalizadas basados en plantillas, y iii) exploraciones del espacio de diseño para dispositivos orientados a flujos con las memorias estándar y personalizadas. Esta tesis concluye con la propuesta conceptual de una Blacksmith Streaming Architecture (BSArc). El modelo de computación Blacksmith permite la adopción a nivel de hardware de un front-end de aplicación específico utilizando una GPU como back-end. Esto permite maximizar la explotación de la localidad de datos y el paralelismo a nivel de datos de una aplicación mientras que proporciona un flujo mayor de datos al back-end. Consideramos que el diseño de estos procesadores con memorias especializadas debe ser proporcionado por expertos del dominio de aplicación en la forma de plantillas.The basic concept behind the architecture of a general purpose CPU core conforms well to a serial programming model. The integration of more cores on a single chip helped CPUs in running parts of a program in parallel. However, the utilization of huge parallelism available from many high performance applications and the corresponding data is hard to achieve from these general purpose multi-cores. Streaming accelerators and the corresponding programing models improve upon this situation by providing throughput oriented architectures. The basic idea behind the design of these architectures matches the everyday increasing requirements of processing huge data sets. These high-performance throughput oriented devices help in high performance processing of data by using efficient parallel computations and streaming based communications. The throughput oriented streaming accelerators ¿ similar to the other processors ¿ consist of numerous types of micro-architectural components including the memory structures, compute units, control units, I/O channels and I/O controls etc. However, the throughput requirements add some special features and impose other restrictions for the performance purposes. These devices, normally, offer a large number of compute resources but restrict the applications to arrange parallel and maximally independent data sets to feed the compute resources in the form of streams. The arrangement of data into independent sets of parallel streams is not an easy and simple task. It may need to change the structure of an algorithm as a whole or even it can require to write a new algorithm from scratch for the target application. However, all these efforts for the re-arrangement of application data access patterns may still not be very helpful to achieve the optimal performance. This is because of the possible micro-architectural constraints of the target platform for the hardware pre-fetching mechanisms, the size and the granularity of the local storage and the flexibility in data marshaling inside the local storage. The constraints of a general purpose streaming platform on the data pre-fetching, storing and maneuvering to arrange and maintain it in the form of parallel and independent streams could be removed by employing micro-architectural level design approaches. This includes the usage of application specific customized memories in the front-end of a streaming architecture. The focus of this thesis is to present architectural explorations for the streaming accelerators using customized memory layouts. In general the thesis covers three main aspects of such streaming accelerators in this research. These aspects can be categorized as : i) Design of Application Specific Accelerators with Customized Memory Layout ii) Template Based Design Support for Customized Memory Accelerators and iii) Design Space Explorations for Throughput Oriented Devices with Standard and Customized Memories. This thesis concludes with a conceptual proposal on a Blacksmith Streaming Architecture (BSArc). The Blacksmith Computing allow the hardware-level adoption of an application specific front-end with a GPU like streaming back-end. This gives an opportunity to exploit maximum possible data locality and the data level parallelism from an application while providing a throughput natured powerful back-end. We consider that the design of these specialized memory layouts for the front-end of the device are provided by the application domain experts in the form of templates. These templates are adjustable according to a device and the problem size at the device's configuration time. The physical availability of such an architecture may still take time. However, simulation framework helps in architectural explorations to give insight into the proposal and predicts potential performance benefits for such an architecture.Postprint (published version

    Architectural explorations for streaming accelerators with customized memory layouts

    Get PDF
    El concepto básico de la arquitectura mono-nucleo en los procesadores de propósito general se ajusta bien a un modelo de programación secuencial. La integración de multiples núcleos en un solo chip ha permitido a los procesadores correr partes del programa en paralelo. Sin embargo, la explotación del enorme paralelismo disponible en muchas aplicaciones de alto rendimiento y de los datos correspondientes es difícil de conseguir usando unicamente multicores de propósito general. La aparición de aceleradores tipo streaming y de los correspondientes modelos de programación han mejorado esta situación proporcionando arquitecturas orientadas al proceso de flujos de datos. La idea básica detrás del diseño de estas arquitecturas responde a la necesidad de procesar conjuntos enormes de datos. Estos dispositivos de alto rendimiento orientados a flujos permiten el procesamiento rapido de datos mediante el uso eficiente de computación paralela y comunicación entre procesos. Los aceleradores streaming orientados a flujos, igual que en otros procesadores, consisten en diversos componentes micro-arquitectonicos como por ejemplo las estructuras de memoria, las unidades de computo, las unidades de control, los canales de Entrada/Salida y controles de Entrada/Salida, etc. Sin embargo, los requisitos del flujo de datos agregan algunas características especiales e imponen otras restricciones que afectan al rendimiento. Estos dispositivos, por lo general, ofrecen un gran número de recursos computacionales, pero obligan a reorganizar los conjuntos de datos en paralelo, maximizando la independiencia para alimentar los recursos de computación en forma de flujos. La disposición de datos en conjuntos independientes de flujos paralelos no es una tarea sencilla. Es posible que se tenga que cambiar la estructura de un algoritmo en su conjunto o, incluso, puede requerir la reescritura del algoritmo desde cero. Sin embargo, todos estos esfuerzos para la reordenación de los patrones de las aplicaciones de acceso a datos puede que no sean muy útiles para lograr un rendimiento óptimo. Esto es debido a las posibles limitaciones microarquitectonicas de la plataforma de destino para los mecanismos hardware de prefetch, el tamaño y la granularidad del almacenamiento local, y la flexibilidad para disponer de forma serial los datos en el interior del almacenamiento local. Las limitaciones de una plataforma de streaming de proposito general para el prefetching de datos, almacenamiento y demas procedimientos para organizar y mantener los datos en forma de flujos paralelos e independientes podría ser eliminado empleando técnicas a nivel micro-arquitectonico. Esto incluye el uso de memorias personalizadas especificamente para las aplicaciones en el front-end de una arquitectura streaming. El objetivo de esta tesis es presentar exploraciones arquitectónicas de los aceleradores streaming con diseños de memoria personalizados. En general, la tesis cubre tres aspectos principales de tales aceleradores. Estos aspectos se pueden clasificar como: i) Diseño de aceleradores de aplicaciones específicas con diseños de memoria personalizados, ii) diseño de aceleradores con memorias personalizadas basados en plantillas, y iii) exploraciones del espacio de diseño para dispositivos orientados a flujos con las memorias estándar y personalizadas. Esta tesis concluye con la propuesta conceptual de una Blacksmith Streaming Architecture (BSArc). El modelo de computación Blacksmith permite la adopción a nivel de hardware de un front-end de aplicación específico utilizando una GPU como back-end. Esto permite maximizar la explotación de la localidad de datos y el paralelismo a nivel de datos de una aplicación mientras que proporciona un flujo mayor de datos al back-end. Consideramos que el diseño de estos procesadores con memorias especializadas debe ser proporcionado por expertos del dominio de aplicación en la forma de plantillas.The basic concept behind the architecture of a general purpose CPU core conforms well to a serial programming model. The integration of more cores on a single chip helped CPUs in running parts of a program in parallel. However, the utilization of huge parallelism available from many high performance applications and the corresponding data is hard to achieve from these general purpose multi-cores. Streaming accelerators and the corresponding programing models improve upon this situation by providing throughput oriented architectures. The basic idea behind the design of these architectures matches the everyday increasing requirements of processing huge data sets. These high-performance throughput oriented devices help in high performance processing of data by using efficient parallel computations and streaming based communications. The throughput oriented streaming accelerators ¿ similar to the other processors ¿ consist of numerous types of micro-architectural components including the memory structures, compute units, control units, I/O channels and I/O controls etc. However, the throughput requirements add some special features and impose other restrictions for the performance purposes. These devices, normally, offer a large number of compute resources but restrict the applications to arrange parallel and maximally independent data sets to feed the compute resources in the form of streams. The arrangement of data into independent sets of parallel streams is not an easy and simple task. It may need to change the structure of an algorithm as a whole or even it can require to write a new algorithm from scratch for the target application. However, all these efforts for the re-arrangement of application data access patterns may still not be very helpful to achieve the optimal performance. This is because of the possible micro-architectural constraints of the target platform for the hardware pre-fetching mechanisms, the size and the granularity of the local storage and the flexibility in data marshaling inside the local storage. The constraints of a general purpose streaming platform on the data pre-fetching, storing and maneuvering to arrange and maintain it in the form of parallel and independent streams could be removed by employing micro-architectural level design approaches. This includes the usage of application specific customized memories in the front-end of a streaming architecture. The focus of this thesis is to present architectural explorations for the streaming accelerators using customized memory layouts. In general the thesis covers three main aspects of such streaming accelerators in this research. These aspects can be categorized as : i) Design of Application Specific Accelerators with Customized Memory Layout ii) Template Based Design Support for Customized Memory Accelerators and iii) Design Space Explorations for Throughput Oriented Devices with Standard and Customized Memories. This thesis concludes with a conceptual proposal on a Blacksmith Streaming Architecture (BSArc). The Blacksmith Computing allow the hardware-level adoption of an application specific front-end with a GPU like streaming back-end. This gives an opportunity to exploit maximum possible data locality and the data level parallelism from an application while providing a throughput natured powerful back-end. We consider that the design of these specialized memory layouts for the front-end of the device are provided by the application domain experts in the form of templates. These templates are adjustable according to a device and the problem size at the device's configuration time. The physical availability of such an architecture may still take time. However, simulation framework helps in architectural explorations to give insight into the proposal and predicts potential performance benefits for such an architecture

    Project CrayOn: Back to the future for a more General-Purpose GPU

    Get PDF
    General purpose of use graphics processing units (GPGPU) recapitulates many of the lessons of the early generations of supercomputers. To what extent have we learnt those lessons, rather than repeating the mistakes? To answer that question, I review why the Cray-1 design ushered in a succession of successful supercomputer designs, while more exotic modes of parallelism failed to gain traction. In the light of this review, I propose that new packaging breakthroughs create an opening for revisiting our approach to GPUs and hence GPGPU. If we do not do so now, the GPU endpoint–when no GPU enhancement will be perceptible to human senses–will in any case remove the economic incentive to build ever-faster GPUs and hence the economy of scale incentive for GPGPU. Anticipating this change now makes sense when new 3D packaging options are opening up; I propose one such option inspired by packaging of the Cray-1

    Systolic Array Implementations With Reduced Compute Time.

    Get PDF
    The goal of the research is the establishment of a formal methodology to develop computational structures more suitable for the changing nature of real-time signal processing and control applications. A major effort is devoted to the following question: Given a systolic array designed to execute a particular algorithm, what other algorithms can be executed on the same array? One approach for answering this question is based on a general model of array operations using graph-theoretic techniques. As a result, a systematic procedure is introduced that models array operations as a function of the compute cycle. As a consequence of the analysis, the dissertation develops the concept of fast algorithm realizations. This concept characterizes specific realizations that can be evaluated in a reduced number of cycles. It restricts the operations to remain in the same class but with reduced execution time. The concept takes advantage of the data dependencies of the algorithm at hand. This feature allows the modification of existing structures by reordering the input data. Applications of the principle allows optimum time band and triangular matrix product on arrays designed for dense matrices. A second approach for analyzing the families of algorithms implementable in an array, is based on the concept of array time constrained operation. The principle uses the number of compute cycle as an additional degree of freedom to expand the class of transformations generated by a single array. A mathematical approach, based on concepts from multilinear algebra, is introduced to model the recursive transformations implemented in linear arrays at each compute cycle. The proposed representation is general enough to encompass a large class of signal processing and control applications. A complete analytical model of the linear maps implementable by the array at each compute cycle is developed. The proposed methodology results in arrays that are more adaptable to the changing nature of operations. Lessons learned from analyzing existing arrays are used to design smart arrays for special algorithm realizations. Applications of the methodology include the design of flexible time structures and the ability to decompose a full size array into subarrays implementing smaller size problems

    Generic Reverse Compilation to Recognize Specific Behavior

    Get PDF
    Práce je zaměřena na rozpoznávání specifického chování pomocí generického zpětného překladu. Generický zpětný překlad je proces, který transformuje spustitelné soubory z různých architektur a formátů objektových souborů na stejný jazyk na vysoké úrovni. Tento proces se vztahuje k nástroji Lissom Decompiler. Pro účely rozpoznání chování práce zavádí Language for Decompilation -- LfD. LfD představuje jednoduchý imperativní jazyk, který je vhodný pro srovnávaní. Konkrétní chování je dáno známým spustitelným souborem (např. malware) a rozpoznání se provádí jako najítí poměru podobnosti s jiným neznámým spustitelným souborem. Tento poměr podobnosti je vypočítán nástrojem LfDComparator, který zpracovává dva vstupy v LfD a rozhoduje o jejich podobnosti.Thesis is aimed on recognition of specific behavior by generic reverse compilation. The generic reverse compilation is a process that transforms executables from different architectures and object file formats to same high level language. This process is covered by a tool Lissom Decompiler. For purpose of behavior recognition the thesis introduces Language for Decompilation -- LfD. LfD represents a simple imperative language, which is suitable for a comparison. The specific behavior is given by the known executable (e.g. malware) and the recognition is performed as finding the ratio of similarity with other unknown executable. This ratio of similarity is calculated by a tool LfDComparator, which processes two sources in LfD to decide their similarity.

    Heterogeneous Acceleration for 5G New Radio Channel Modelling Using FPGAs and GPUs

    Get PDF
    L'abstract è presente nell'allegato / the abstract is in the attachmen

    Application-specific instruction set processor for speech recognition.

    Get PDF
    Cheung Man Ting.Thesis (M.Phil.)--Chinese University of Hong Kong, 2005.Includes bibliographical references (leaves 69-71).Abstracts in English and Chinese.Chapter 1 --- Introduction --- p.1Chapter 1.1 --- The Emergence of ASIP --- p.1Chapter 1.1.1 --- Related Work --- p.3Chapter 1.2 --- Motivation --- p.6Chapter 1.3 --- ASIP Design Methodologies --- p.7Chapter 1.4 --- Fundamentals of Speech Recognition --- p.8Chapter 1.5 --- Thesis outline --- p.10Chapter 2 --- Automatic Speech Recognition --- p.11Chapter 2.1 --- Overview of ASR system --- p.11Chapter 2.2 --- Theory of Front-end Feature Extraction --- p.12Chapter 2.3 --- Theory of HMM-based Speech Recognition --- p.14Chapter 2.3.1 --- Hidden Markov Model (HMM) --- p.14Chapter 2.3.2 --- The Typical Structure of the HMM --- p.14Chapter 2.3.3 --- Discrete HMMs and Continuous HMMs --- p.15Chapter 2.3.4 --- The Three Basic Problems for HMMs --- p.17Chapter 2.3.5 --- Probability Evaluation --- p.18Chapter 2.4 --- The Viterbi Search Engine --- p.19Chapter 2.5 --- Isolated Word Recognition (IWR) --- p.22Chapter 3 --- Design of ASIP Platform --- p.24Chapter 3.1 --- Instruction Fetch --- p.25Chapter 3.2 --- Instruction Decode --- p.26Chapter 3.3 --- Datapath --- p.29Chapter 3.4 --- Register File Systems --- p.30Chapter 3.4.1 --- Memory Hierarchy --- p.30Chapter 3.4.2 --- Register File Organization --- p.31Chapter 3.4.3 --- Special Registers --- p.34Chapter 3.4.4 --- Address Generation --- p.34Chapter 3.4.5 --- Load and Store --- p.36Chapter 4 --- Implementation of Speech Recognition on ASIP --- p.37Chapter 4.1 --- Hardware Architecture Exploration --- p.37Chapter 4.1.1 --- Floating Point and Fixed Point --- p.37Chapter 4.1.2 --- Multiplication and Accumulation --- p.38Chapter 4.1.3 --- Pipelining --- p.41Chapter 4.1.4 --- Memory Architecture --- p.43Chapter 4.1.5 --- Saturation Logic --- p.44Chapter 4.1.6 --- Specialized Addressing Modes --- p.44Chapter 4.1.7 --- Repetitive Operation --- p.47Chapter 4.2 --- Software Algorithm Implementation --- p.49Chapter 4.2.1 --- Implementation Using Base Instruction Set --- p.49Chapter 4.2.2 --- Implementation Using Refined Instruction Set --- p.54Chapter 5 --- Simulation Results --- p.56Chapter 6 --- Conclusions and Future Work --- p.60Appendices --- p.62Chapter A --- Base Instruction Set --- p.62Chapter B --- Special Registers --- p.65Chapter C --- Chip Microphotograph of ASIP --- p.67Chapter D --- The Testing Board of ASIP --- p.68Bibliography --- p.6

    Digital System Design - Use of Microcontroller

    Get PDF
    Embedded systems are today, widely deployed in just about every piece of machinery from toasters to spacecraft. Embedded system designers face many challenges. They are asked to produce increasingly complex systems using the latest technologies, but these technologies are changing faster than ever. They are asked to produce better quality designs with a shorter time-to-market. They are asked to implement increasingly complex functionality but more importantly to satisfy numerous other constraints. To achieve the current goals of design, the designer must be aware with such design constraints and more importantly, the factors that have a direct effect on them.One of the challenges facing embedded system designers is the selection of the optimum processor for the application in hand; single-purpose, general-purpose or application specific. Microcontrollers are one member of the family of the application specific processors.The book concentrates on the use of microcontroller as the embedded system?s processor, and how to use it in many embedded system applications. The book covers both the hardware and software aspects needed to design using microcontroller.The book is ideal for undergraduate students and also the engineers that are working in the field of digital system design.Contents• Preface;• Process design metrics;• A systems approach to digital system design;• Introduction to microcontrollers and microprocessors;• Instructions and Instruction sets;• Machine language and assembly language;• System memory; Timers, counters and watchdog timer;• Interfacing to local devices / peripherals;• Analogue data and the analogue I/O subsystem;• Multiprocessor communications;• Serial Communications and Network-based interfaces
    corecore