Search CORE

9,988 research outputs found

PI-BA Bundle Adjustment Acceleration on Embedded FPGAs with Co-observation Optimization

Author: Liu Qiang
Liu Shaoshan
Qin Shuzhen
Yu Bo
Publication venue
Publication date: 07/05/2019
Field of study

Bundle adjustment (BA) is a fundamental optimization technique used in many crucial applications, including 3D scene reconstruction, robotic localization, camera calibration, autonomous driving, space exploration, street view map generation etc. Essentially, BA is a joint non-linear optimization problem, and one which can consume a significant amount of time and power, especially for large optimization problems. Previous approaches of optimizing BA performance heavily rely on parallel processing or distributed computing, which trade higher power consumption for higher performance. In this paper we propose {\pi}-BA, the first hardware-software co-designed BA engine on an embedded FPGA-SoC that exploits custom hardware for higher performance and power efficiency. Specifically, based on our key observation that not all points appear on all images in a BA problem, we designed and implemented a Co-Observation Optimization technique to accelerate BA operations with optimized usage of memory and computation resources. Experimental results confirm that {\pi}-BA outperforms the existing software implementations in terms of performance and power consumption.Comment: in Proceedings of IEEE FCCM 201

arXiv.org e-Print Archive

Crossref

Omphale: Streamlining the Communication for Jobs in a Multi Processor System on Chip

Author: Bekooij M.J.G.
Bijlsma T.
Jansen P.G.
Smit G.J.M.
Publication venue: Centre for Telematics and Information Technology, University of Twente
Publication date: 01/01/2007
Field of study

Our Multi Processor System on Chip (MPSoC) template provides processing tiles that are connected via a network on chip. A processing tile contains a processing unit and a Scratch Pad Memory (SPM). This paper presents the Omphale tool that performs the first step in mapping a job, represented by a task graph, to such an MPSoC, given the SPM sizes as constraints. Furthermore a memory tile is introduced. The result of Omphale is a Cyclo Static DataFlow (CSDF) model and a task graph where tasks communicate via sliding windows that are located in circular buffers. The CSDF model is used to determine the size of the buffers and the communication pattern of the data. A buffer must fit in the SPM of the processing unit that is reading from it, such that low latency access is realized with a minimized number of stall cycles. If a task and its buffer exceed the size of the SPM, the task is examined for additional parallelism or the circular buffer is partly located in a memory tile. This results in an extended task graph that satisfies the SPM size constraints

CiteSeerX

University of Twente Research Information

Limits on Fundamental Limits to Computation

Author: Markov Igor L.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

An indispensable part of our lives, computing has also become essential to industries and governments. Steady improvements in computer hardware have been supported by periodic doubling of transistor densities in integrated circuits over the last fifty years. Such Moore scaling now requires increasingly heroic efforts, stimulating research in alternative hardware and stirring controversy. To help evaluate emerging technologies and enrich our understanding of integrated-circuit scaling, we review fundamental limits to computation: in manufacturing, energy, physical space, design and verification effort, and algorithms. To outline what is achievable in principle and in practice, we recall how some limits were circumvented, compare loose and tight limits. We also point out that engineering difficulties encountered by emerging technologies may indicate yet-unknown limits.Comment: 15 pages, 4 figures, 1 tabl

arXiv.org e-Print Archive

CiteSeerX

Compilation and Scheduling Techniques for Embedded Systems

Author: Salamy Hassan
Publication venue: LSU Digital Commons
Publication date: 01/01/2009
Field of study

Embedded applications are constantly increasing in size, which has resulted in increasing demand on designers of digital signal processors (DSPs) to meet the tight memory, size and cost constraints. With this trend, memory requirement reduction through code compaction and variable coalescing techniques are gaining more ground. Also, as the current trend in complex embedded systems of using multiprocessor system-on-chip (MPSoC) grows, problems like mapping, memory management and scheduling are gaining more attention. The first part of the dissertation deals with problems related to digital signal processors. Most modern DSPs provide multiple address registers and a dedicated address generation unit (AGU) which performs address generation in parallel to instruction execution. A careful placement of variables in memory is important in decreasing the number of address arithmetic instructions leading to compact and efficient code. Chapters 2 and 3 present effective heuristics for the simple and the general offset assignment problems with variable coalescing. A solution based on simulated annealing is also presented. Chapter 4 presents an optimal integer linear programming (ILP) solution to the offset assignment problem with variable coalescing and operand permutation. A new approach to the general offset assignment problem is introduced. Chapter 5 presents an optimal ILP formulation and a genetic algorithm solution to the address register allocation problem (ARA) with code transformation techniques. The ARA problem is used to generate compact codes for array-intensive embedded applications. In the second part of the dissertation, we study problems related to MPSoCs. MPSoCs provide the flexibility to meet the performance requirements of multimedia applications while respecting the tight embedded system constraints. MPSoC-based embedded systems often employ software-managed memories called scratch-pad memories (SPM). Scheduling the tasks of an application on the processors and partitioning the available SPM budget among those processors are two critical issues in reducing the overall computation time. Traditionally, the step of task scheduling is applied separately from the memory partitioning step. Such a decoupled approach may miss better quality schedules. Chapters 6 and 7 present effective heuristics that integrate task allocation and SPM partitioning to further reduce the execution time of embedded applications for single and multi-application scenarios

Louisiana State University

A knowledge-based approach to VLSI-design in an open CAD-environment

Author: Alberts L.K.
Huijs C.
Mars N.J.I.
Spaanenburg L.
Publication venue: North-Holland
Publication date: 01/01/1989
Field of study

A knowledge-based approach is suggested to assist a designer in the increasingly complex task of generating VLSI-chips from abstract, high-level specifications of the system. The complexity of designing VLSI-circuits has reached a level where computer-based assistance has become indispensable. Not all of the design tasks allow for algorithmic solutions. AI technique can be used, in order to support the designer with computer-aided tools for tasks not suited for algorithmic approaches. The approach described in this paper is based upon the underlying characteristics of VLSI design processes in general, comprising all stages of the design. A universal model is presented, accompanied with a recording method for the acquisition of design knowledge - strategic and task-specific - in terms of the design actions involved and their effects on the design itself. This method is illustrated by a simple design example: the implementation of the logical EXOR-component. Finally suggestions are made for obtaining a universally usable architecture of a knowledge-based system for VLSI-design

University of Twente Research Information