398 research outputs found

    A low-power hardware accelerator for ORB feature extraction in self-driving cars

    Get PDF
    Simultaneous Localization And Mapping (SLAM) is a key component for autonomous navigation. SLAM consists of building and creating a map of an unknown environment while keeping track of the exploring agent's location within it. An effective implementation of SLAM presents important challenges due to real-time inherent constraints and energy consumption. ORB-SLAM is a state-of-the-art Visual SLAM system based on cameras that can be used for self-driving cars. In this paper, we propose a high-performance, energy-efficient and functionally accurate hardware accelerator for ORB-SLAM, focusing on its most time-consuming stage: Oriented FAST and Rotated BRIEF (ORB) feature extraction. We identify the BRIEF descriptor generation as the main bottleneck, as it exhibits highly irregular access patterns to local on-chip memories, causing a high performance penalty due to bank conflicts. We propose a genetic algorithm to generate an optimal memory access pattern offline, which greatly simplifies the hardware while minimizing bank conflicts in the computation of the BRIEF descriptor. Compared with a CPU system, the accelerator achieves 8x speedup and 1957x reduction in power dissipation.This work has been supported by the CoCoUnit ERC Advanced Grant of the EU’s Horizon 2020 program (grant No 833057), the Spanish State Research Agency (MCIN/AEI) under grant PID2020-113172RB-I00, the ICREA Academia program and the FPU grant FPU18/04413.Peer ReviewedPostprint (author's final draft

    Big data analytics:Computational intelligence techniques and application areas

    Get PDF
    Big Data has significant impact in developing functional smart cities and supporting modern societies. In this paper, we investigate the importance of Big Data in modern life and economy, and discuss challenges arising from Big Data utilization. Different computational intelligence techniques have been considered as tools for Big Data analytics. We also explore the powerful combination of Big Data and Computational Intelligence (CI) and identify a number of areas, where novel applications in real world smart city problems can be developed by utilizing these powerful tools and techniques. We present a case study for intelligent transportation in the context of a smart city, and a novel data modelling methodology based on a biologically inspired universal generative modelling approach called Hierarchical Spatial-Temporal State Machine (HSTSM). We further discuss various implications of policy, protection, valuation and commercialization related to Big Data, its applications and deployment

    Parallel and Distributed Computing

    Get PDF
    The 14 chapters presented in this book cover a wide variety of representative works ranging from hardware design to application development. Particularly, the topics that are addressed are programmable and reconfigurable devices and systems, dependability of GPUs (General Purpose Units), network topologies, cache coherence protocols, resource allocation, scheduling algorithms, peertopeer networks, largescale network simulation, and parallel routines and algorithms. In this way, the articles included in this book constitute an excellent reference for engineers and researchers who have particular interests in each of these topics in parallel and distributed computing

    SYSTEM-ON-A-CHIP (SOC)-BASED HARDWARE ACCELERATION FOR HUMAN ACTION RECOGNITION WITH CORE COMPONENTS

    Get PDF
    Today, the implementation of machine vision algorithms on embedded platforms or in portable systems is growing rapidly due to the demand for machine vision in daily human life. Among the applications of machine vision, human action and activity recognition has become an active research area, and market demand for providing integrated smart security systems is growing rapidly. Among the available approaches, embedded vision is in the top tier; however, current embedded platforms may not be able to fully exploit the potential performance of machine vision algorithms, especially in terms of low power consumption. Complex algorithms can impose immense computation and communication demands, especially action recognition algorithms, which require various stages of preprocessing, processing and machine learning blocks that need to operate concurrently. The market demands embedded platforms that operate with a power consumption of only a few watts. Attempts have been mad to improve the performance of traditional embedded approaches by adding more powerful processors; this solution may solve the computation problem but increases the power consumption. System-on-a-chip eld-programmable gate arrays (SoC-FPGAs) have emerged as a major architecture approach for improving power eciency while increasing computational performance. In a SoC-FPGA, an embedded processor and an FPGA serving as an accelerator are fabricated in the same die to simultaneously improve power consumption and performance. Still, current SoC-FPGA-based vision implementations either shy away from supporting complex and adaptive vision algorithms or operate at very limited resolutions due to the immense communication and computation demands. The aim of this research is to develop a SoC-based hardware acceleration workflow for the realization of advanced vision algorithms. Hardware acceleration can improve performance for highly complex mathematical calculations or repeated functions. The performance of a SoC system can thus be improved by using hardware acceleration method to accelerate the element that incurs the highest performance overhead. The outcome of this research could be used for the implementation of various vision algorithms, such as face recognition, object detection or object tracking, on embedded platforms. The contributions of SoC-based hardware acceleration for hardware-software codesign platforms include the following: (1) development of frameworks for complex human action recognition in both 2D and 3D; (2) realization of a framework with four main implemented IPs, namely, foreground and background subtraction (foreground probability), human detection, 2D/3D point-of-interest detection and feature extraction, and OS-ELM as a machine learning algorithm for action identication; (3) use of an FPGA-based hardware acceleration method to resolve system bottlenecks and improve system performance; and (4) measurement and analysis of system specications, such as the acceleration factor, power consumption, and resource utilization. Experimental results show that the proposed SoC-based hardware acceleration approach provides better performance in terms of the acceleration factor, resource utilization and power consumption among all recent works. In addition, a comparison of the accuracy of the framework that runs on the proposed embedded platform (SoCFPGA) with the accuracy of other PC-based frameworks shows that the proposed approach outperforms most other approaches

    Tecnologias de streaming em contextos de aprendizagem

    Get PDF
    Mestrado Sistemas de InformaçãoAs instituições de ensino, nomeadamente no ensino superior, atravessam uma fase de adaptação à cultura tecnológica dos alunos, que, nos últimos anos, tem vindo a sofrer alterações num ritmo elevado, introduzindo novas necessidades, o que justifica uma observação atenta e pode sugerir alterações às metodologias no sentido da sua melhor adequação aos novos paradigmas de aprendizagem. A crescente utilização de tecnologias de streaming e o potencial impacto que parecem introduzir na flexibilidade do processo de aprendizagem, pode contribuir para a inovação nos conteúdos e, onsequentemente, nas actividades propostas, cuja necessidade de mudança deriva dos novos contextos tecnológicos em que os alunos estão inseridos. Este trabalho propõe a utilização e validação de elementos multimédia sincronizados, distribuídos em streaming, como suporte a actividades em novos contextos de aprendizagem, quer para apoio a momentos de aprendizagem não presencial, quer em actividades presenciais. A metodologia proposta tem como objectivo a criação e distribuição de conteúdos eficazes, de forma segura, num processo de aprendizagem distribuído e com públicoalvo heterogéneo, adequados às novas necessidades dos alunos e sem motivar choques culturais. A aplicação desta metodologia nos casos realizados, permitiu registar um enorme entusiasmo dos alunos e a manifestação do seu interesse em intensificar a utilização do formato proposto, alargando-o a outras disciplinas.The education institutions, particularly in higher education, are in an adaptation phase to the technological culture of the students, which in the last years has rapidly changed, introducing new requirements. This cultural change recommends a close observation of this phenomenon and can suggest some methodology adaptations to best fit the new learning paradigms. The increasing use of streaming technologies and the potential impact they seem to have in the learning process’s flexibility, can contribute to the innovation of pedagogical activities and learning contents, whose need for change is demanded by the student’s new technological context. This work propose the use and validation of synchronized media elements, delivered using streaming technologies, as a support to classroom and elearning activities in the new learning contexts. The methodology proposed within this project intends to guide the development and safe delivery of effective learning contents, in a distributed learning environment, to heterogeneous classes with new requirements, avoiding cultural conflicts. The use of this methodology in the accomplished experiences, showed a huge enthusiasm and adhesion of the students, which are looking forward for the methodology to be widen to other courses

    A Flexible and Scalable Architecture for Real-Time ANT+ Sensor Data Acquisition and NoSQL Storage

    Get PDF
    Wireless Personal or Body Area Networks (WPANs or WBANs) are the main mechanisms to develop healthcare systems for an ageing society. Such systems offer monitoring, security, and caring services by measuring physiological body parameters using wearable devices. Wireless sensor networks allow inexpensive, continuous, and real-time updates of the sensor data, to the data repositories via an Internet. A great deal of research is going on with a focus on technical, managerial, economic, and social health issues. The technical obstacles, which we encounter, in general, are better methodologies, architectures, and context data storage. Sensor communication, data processing and interpretation, data interchange format, data transferal, and context data storage are sensitive phases during the whole process of body parameter acquisition until the storage. ANT+ is a proprietary (but open access) low energy protocol, which supports device interoperability by mutually agreeing upon device profile standards. We have implemented a prototype, based upon ANT+ enabled sensors for a real-time scenario. This paper presents a system architecture, with its software organization, for real-time message interpretation, event-driven based real-time bidirectional communication, and schema flexible storage. A computer user uses it to acquire and to transmit the data using a Windows service to the context server

    Design of software radio

    Get PDF
    Software Define Radio (SDR) has become a prevalent technology in wireless systems. In SDR some or all of the signal specific handling is implemented in software functions, while other functions like decimation, interpolation, digital up-conversion and digital down conversion are done on reprogrammable Digital Signal Processor or Field Programmable Gate Arrays.Twelve laboratory exercises have been designed to lead the student through the process of using the Universal Software Radio peripheral (USRP) hardware and GNU Radio open source software

    Design of software radio

    Get PDF
    Software Define Radio (SDR) has become a prevalent technology in wireless systems. In SDR some or all of the signal specific handling is implemented in software functions, while other functions like decimation, interpolation, digital up-conversion and digital down conversion are done on reprogrammable Digital Signal Processor or Field Programmable Gate Arrays.Twelve laboratory exercises have been designed to lead the student through the process of using the Universal Software Radio peripheral (USRP) hardware and GNU Radio open source software

    Design of a High-Speed Architecture for Stabilization of Video Captured Under Non-Uniform Lighting Conditions

    Get PDF
    Video captured in shaky conditions may lead to vibrations. A robust algorithm to immobilize the video by compensating for the vibrations from physical settings of the camera is presented in this dissertation. A very high performance hardware architecture on Field Programmable Gate Array (FPGA) technology is also developed for the implementation of the stabilization system. Stabilization of video sequences captured under non-uniform lighting conditions begins with a nonlinear enhancement process. This improves the visibility of the scene captured from physical sensing devices which have limited dynamic range. This physical limitation causes the saturated region of the image to shadow out the rest of the scene. It is therefore desirable to bring back a more uniform scene which eliminates the shadows to a certain extent. Stabilization of video requires the estimation of global motion parameters. By obtaining reliable background motion, the video can be spatially transformed to the reference sequence thereby eliminating the unintended motion of the camera. A reflectance-illuminance model for video enhancement is used in this research work to improve the visibility and quality of the scene. With fast color space conversion, the computational complexity is reduced to a minimum. The basic video stabilization model is formulated and configured for hardware implementation. Such a model involves evaluation of reliable features for tracking, motion estimation, and affine transformation to map the display coordinates of a stabilized sequence. The multiplications, divisions and exponentiations are replaced by simple arithmetic and logic operations using improved log-domain computations in the hardware modules. On Xilinx\u27s Virtex II 2V8000-5 FPGA platform, the prototype system consumes 59% logic slices, 30% flip-flops, 34% lookup tables, 35% embedded RAMs and two ZBT frame buffers. The system is capable of rendering 180.9 million pixels per second (mpps) and consumes approximately 30.6 watts of power at 1.5 volts. With a 1024×1024 frame, the throughput is equivalent to 172 frames per second (fps). Future work will optimize the performance-resource trade-off to meet the specific needs of the applications. It further extends the model for extraction and tracking of moving objects as our model inherently encapsulates the attributes of spatial distortion and motion prediction to reduce complexity. With these parameters to narrow down the processing range, it is possible to achieve a minimum of 20 fps on desktop computers with Intel Core 2 Duo or Quad Core CPUs and 2GB DDR2 memory without a dedicated hardware
    corecore