362 research outputs found

    Design, Implementation and Evaluation of Hardware Vision Systems Dedicated to Real-Time Face Recognition

    Get PDF
    Human face recognition is an active area of research spanning several disciplines such as image processing, pattern recognition, and computer vision. Most researches have concentrated on the algorithms of segmentation, feature extraction, and recognition of human faces, which are generally realized by software implementation on standard computers. However, many applications of human face recognition such as human-computer interfaces, model-based video coding, and security control (Kobayashi, 2001, Yeh & Lee, 1999) need to be high-speed and real-time, for example, passing through customs quickly while ensuring security. For the last years, our laboratory has focused on face processing and obtained interesting results concerning face tracking and recognition by implementing original dedicated hardware systems. Our aim is to implement on embedded systems efficient models of unconstrained face tracking and identity verification in arbitrary scenes. The main goal of these various systems is to provide efficient robustness algorithms that only require moderated computation in order 1) to obtain high success rates of face tracking and identity verification and 2) to cope with the drastic real-time constraints. The goal of this chapter is to describe three different hardware platforms dedicated to face recognition. Each of them has been designed, implemented and evaluated in our laboratory

    An Exploration of the Feasibility of FPGA Implementation of Face Recognition Using Eigenfaces

    Get PDF
    Biometric identification has been a major force since 1990\u27s. There are different types of approaches for it; one of the most significant approaches is face recognition. Over the past two decades, face recognition techniques have improved significantly, the main focus being the development of efficient algorithm. The state of art algorithms with good recognition rate are implemented using programming languages such as C++, JAVA and MATLAB, these requires a fast and computationally efficient hardware such as workstations. If the face recognition algorithms could be written in a Hardware Description Language, they could be implemented in an FPGA. In this thesis we have choose the eigenfaces algorithm, since it is simple and very efficient, this algorithm is first solved analytically, and then the architecture is designed for FPGA implementation. We then develop the Verilog module for each of these modules and test their functionality using a Verilog Simulator and finally we discuss the feasibility of FPGA implementation. Implementing the face recognition technology in an FPGA would mean that they would require relatively low power and the size is drastically reduced when compared to the workstations. They would also be much faster and efficient, since they are specifically designed for face recognition

    Profile Guided Dataflow Transformation for FPGAs and CPUs

    Get PDF
    This paper proposes a new high-level approach for optimising field programmable gate array (FPGA) designs. FPGA designs are commonly implemented in low-level hardware description languages (HDLs), which lack the abstractions necessary for identifying opportunities for significant performance improvements. Using a computer vision case study, we show that modelling computation with dataflow abstractions enables substantial restructuring of FPGA designs before lowering to the HDL level, and also improve CPU performance. Using the CPU transformations, runtime is reduced by 43 %. Using the FPGA transformations, clock frequency is increased from 67MHz to 110MHz. Our results outperform commercial low-level HDL optimisations, showcasing dataflow program abstraction as an amenable computation model for highly effective FPGA optimisation

    Parameterized Implementation of K-means Clustering on Reconfigurable Systems

    Get PDF
    Processing power of pattern classification algorithms on conventional platforms has not been able to keep up with exponentially growing datasets. However, algorithms such as k-means clustering include significant potential parallelism that could be exploited to enhance processing speed on conventional platforms. A better and effective solution to speed-up the algorithm performance is the use of a hardware assist since parallel kernels can be partitioned and concurrently run on hardware as opposed to the sequential software flow. A parameterized hardware implementation of k-means clustering is presented as a proof of concept on the Pilchard Reconfigurable computing system. The hardware implementation is shown to have speedups of about 500 over conventional implementations on a general-purpose processor. A scalability analysis is done to provide a future direction to take the current implementation of 3 classes and scale it to over N classes

    Advanced analytics through FPGA based query processing and deep reinforcement learning

    Get PDF
    Today, vast streams of structured and unstructured data have been incorporated in databases, and analytical processes are applied to discover patterns, correlations, trends and other useful relationships that help to take part in a broad range of decision-making processes. The amount of generated data has grown very large over the years, and conventional database processing methods from previous generations have not been sufficient to provide satisfactory results regarding analytics performance and prediction accuracy metrics. Thus, new methods are needed in a wide array of fields from computer architectures, storage systems, network design to statistics and physics. This thesis proposes two methods to address the current challenges and meet the future demands of advanced analytics. First, we present AxleDB, a Field Programmable Gate Array based query processing system which constitutes the frontend of an advanced analytics system. AxleDB melds highly-efficient accelerators with memory, storage and provides a unified programmable environment. AxleDB is capable of offloading complex Structured Query Language queries from host CPU. The experiments have shown that running a set of TPC-H queries, AxleDB can perform full queries between 1.8x and 34.2x faster and 2.8x to 62.1x more energy efficient compared to MonetDB, and PostgreSQL on a single workstation node. Second, we introduce TauRieL, a novel deep reinforcement learning (DRL) based method for combinatorial problems. The design idea behind combining DRL and combinatorial problems is to apply the prediction capabilities of deep reinforcement learning and to use the universality of combinatorial optimization problems to explore general purpose predictive methods. TauRieL utilizes an actor-critic inspired DRL architecture that adopts ordinary feedforward nets. Furthermore, TauRieL performs online training which unifies training and state space exploration. The experiments show that TauRieL can generate solutions two orders of magnitude faster and performs within 3% of accuracy compared to the state-of-the-art DRL on the Traveling Salesman Problem while searching for the shortest tour. Also, we present that TauRieL can be adapted to the Knapsack combinatorial problem. With a very minimal problem specific modification, TauRieL can outperform a Knapsack specific greedy heuristics.Hoy en día, se han incorporado grandes cantidades de datos estructurados y no estructurados en las bases de datos, y se les aplican procesos analíticos para descubrir patrones, correlaciones, tendencias y otras relaciones útiles que se utilizan mayormente para la toma de decisiones. La cantidad de datos generados ha crecido enormemente a lo largo de los años, y los métodos de procesamiento de bases de datos convencionales utilizados en las generaciones anteriores no son suficientes para proporcionar resultados satisfactorios respecto al rendimiento del análisis y respecto de la precisión de las predicciones. Por lo tanto, se necesitan nuevos métodos en una amplia gama de campos, desde arquitecturas de computadoras, sistemas de almacenamiento, diseño de redes hasta estadísticas y física. Esta tesis propone dos métodos para abordar los desafíos actuales y satisfacer las demandas futuras de análisis avanzado. Primero, presentamos AxleDB, un sistema de procesamiento de consultas basado en FPGAs (Field Programmable Gate Array) que constituye la interfaz de un sistema de análisis avanzado. AxleDB combina aceleradores altamente eficientes con memoria, almacenamiento y proporciona un entorno programable unificado. AxleDB es capaz de descargar consultas complejas de lenguaje de consulta estructurado desde la CPU del host. Los experimentos han demostrado que al ejecutar un conjunto de consultas TPC-H, AxleDB puede realizar consultas completas entre 1.8x y 34.2x más rápido y 2.8x a 62.1x más eficiente energéticamente que MonetDB, y PostgreSQL en un solo nodo de una estación de trabajo. En segundo lugar, presentamos TauRieL, un nuevo método basado en Deep Reinforcement Learning (DRL) para problemas combinatorios. La idea central que está detrás de la combinación de DRL y problemas combinatorios, es aplicar las capacidades de predicción del aprendizaje de refuerzo profundo y el uso de la universalidad de los problemas de optimización combinatoria para explorar métodos predictivos de propósito general. TauRieL utiliza una arquitectura DRL inspirada en el actor-crítico que se adapta a redes feedforward. Además, TauRieL realiza el entrenamieton en línea que unifica el entrenamiento y la exploración espacial de los estados. Los experimentos muestran que TauRieL puede generar soluciones dos órdenes de magnitud más rápido y funciona con un 3% de precisión en comparación con el estado del arte en DRL aplicado al problema del viajante mientras busca el recorrido más corto. Además, presentamos que TauRieL puede adaptarse al problema de la Mochila. Con una modificación específica muy mínima del problema, TauRieL puede superar a una heurística codiciosa de Knapsack Problem.Postprint (published version

    Efficient Decoder for Optical Transport Networks Achieving Near Capacity Performance

    Get PDF
    Today’s optical transport networks (OTNs) support a plethora of services such as video streaming, cloud computing, social networking and many more. To make such a wide assortment of services possible, a tremendous amount of data needs to be carried over the internet backbone supported by these optical transport networks. In order to cope with this increase in traffic, data rate on OTNs has increased significantly. Product codes (PC) are a class of codes that provide good coding gain at reasonable decoding complexity and, hence, have been a popular choice for OTNs in recent times. The key goal of this thesis is to implement a decoder for a Product Code (PC) on a Virtex7 Field Programmable Gate Array(FPGA). The product code of choice for this project is based on a (1023,993) BCH code as a component code. The conventional decoder for BCH codes has a computationally expensive step for finding the roots of error locator polynomial. The BCH decoder implemented as a part of this project is optimized to speed up the decoding process while at the same time also simplifying the hardware complexity of the design. The implementation is parallelized and pipelined to achieve high throughputs. This provides a hardware platform to evaluate the performance of product codes at low bit error rates that is infeasible using software simulations
    • …
    corecore