302 research outputs found

    Enhancing Performance of Computer Vision Applications on Low-Power Embedded Systems Through Heterogeneous Parallel Programming

    Get PDF
    Enabling computer vision applications on low-power embedded systems gives rise to new challenges for embedded SW developers. Such applications implement different functionalities, like image recognition based on deep learning, simultaneous localization and mapping tasks. They are characterized by stringent performance constraints to guarantee real-time behaviors and, at the same time, energy constraints to save battery on the mobile platform. Even though heterogeneous embedded boards are getting pervasive for their high computational power at low power costs, they need a time consuming customization of the whole application (i.e., mapping of application blocks to CPUGPU processing elements and their synchronization) to efficiently exploit their potentiality. Different languages and environments have been proposed for such an embedded SW customization. Nevertheless, they often find limitations on complex real cases, as their application is mutual exclusive. This paper presents a comprehensive framework that relies on a heterogeneous parallel programming model, which combines OpenMP, PThreads, OpenVX, OpenCV, and CUDA to best exploit different levels of parallelism while guaranteeing a semi-automatic customization. The paper shows how such languages and API platforms have been interfaced, synchronized, and applied to customize an ORBSLAM application for an NVIDIA Jetson TX2 board

    Exploiting Adaptive Techniques to Improve Processor Energy Efficiency

    Get PDF
    Rapid device-miniaturization keeps on inducing challenges in building energy efficient microprocessors. As the size of the transistors continuously decreasing, more uncertainties emerge in their operations. On the other hand, integrating more and more transistors on a single chip accentuates the need to lower its supply-voltage. This dissertation investigates one of the primary device uncertainties - timing error, in microprocessor performance bottleneck in NTC era. Then it proposes various innovative techniques to exploit these opportunities to maintain processor energy efficiency, in the context of emerging challenges. Evaluated with the cross-layer methodology, the proposed approaches achieve substantial improvements in processor energy efficiency, compared to other start-of-art techniques

    Soft Sensor-based Servo Press Monitoring

    Get PDF
    The force that a servo press exerts forming a workpiece is one the most important magnitudes in any metal forming operation. The process force, along with the characteristics of the die, is what shapes the workpiece. When the process force is greater than the maximum force for which the servo press was designed, the servo press integrity can be damaged. Therefore, the knowledge of the process force is of great interest for both, press manufacturers and users. As such, the metal forming sector is seeking systems that can monitor the process force and the operation of the servo press to analyse process’s performance and predict future deviations in the forming operation. Servo press users want to guarantee the quality of the formed parts and reduce facility downtimes due to malfunctions of the press. This dissertation addressed the monitoring of the process force and the dynamic performance of a servo press based on a model based statistical signal processing algorithm known as the dual particle filter (dPF). Initially both, the developed model of a servo press and the proposed dPF, have been experimentally evaluated and validated in a reduced scale test bench. The test bench has been designed and manufactured based on a design methodology that allows to replicate the kinematic and dynamic behaviour of different servo press facilities in the same test bench. The experimental validation has been also carried out in an industrial servo press under three different metal forming processes. The estimation results have proved the ability of the dPF to track the process force throughout the evaluated processes, obtaining a deviation lower than 5% with respect to the measured force signals at the maximum force position. The dPF algorithm has been accelerated by means of a field programmable gate array (FPGA) to achieve a real time estimation.Serbo prentsa batek pieza gordin bat eraldatzeko egindako prozesuko indarra edozein konformatu eragiketako magnitude garrantzitsuenetarikoa da. Prozesuko indarra da, trokelaren ezaugarriekin batera, pieza gordina eraldatzen duena. Prozesuko indarra prentsak diseinuaren arabera jasan dezakeena baino handiagoa bada, prentsak kalteak izan ditzake bere osotasunean. Beraz, prozesuko indarraren ezagutza interes handikoa da, prentsa egileentzat zein erabiltzaileentzat. Hori dela eta, metal eraldatzearen sektoreak prozesuko indarra eta prentsa beraren funtzionamendua monitoriza ditzaketen sistemen bila diardute, prentsaren jarduera aztertu eta eraldatzeko operazioetan etorkizunean izan daitezkeen desbideraketak aurreikusteko. Prentsa erabiltzaileek fabrikatutako piezen kalitatea bermatzea eta funtzionamendu akatsengatiko prentsaren geldialdiak murriztea bilatzen dute. Tesi honek servo prentsa baten prozesuko indarra eta portarea dinamikoaren monitorizazioa jorratzen ditu, dual particle filter (dPF) izeneko modeloetan oinarritutako seinalaren prozesamendu estadistikoko algoritmo baten bitartez. Lehenik eta behin, garatutako servo prentsaren modeloa eta proposatutako dPFa eskalatutako entsegutarako banku batean ebaluatu eta balioztatu dira. Eskalatutako entsegutarako bankua serbo prentsa desberdinen portaera zinematiko eta dinamikoa erreplikatzea ahalbidetzen duen metodologia baten bitartez diseinatu eta gauzatu da. Esperimentu bidezko balioztatzea serbo prentsa industrial batean ere gauzatu da hiru konformatuko prozesu desberdinetan. Estimazio emaitzek dPFak prozesuko indarrari jarraitzeko duen ahalmena forgatu dute, neurtutako indarrarekiko %5ekoa baino txikiagoko desbideraketa lortuz indar maximoa egiten den puntuan. dPF algoritmoa field programmable gate array (FPGA) baten bitartez azeleratu da, denbora errealeko estimazioa lortzeko.La fuerza que una servo prensa ejerce conformando una pieza es la magnitud más importante en cualquier operación de conformado. La fuerza aplicada, junto a las características del troquel, es la magnitud que da forma a la pieza. Cuando la fuerza de proceso es más grande que la fuerza máxima para la que fue diseñada la servo prensa, la integridad de ésta puede verse afectada. Por lo tanto, el conocimiento de la fuerza de proceso es de gr´an interés tanto para los fabricantes de prensas como para los usuarios de las mismas. Así pues, el sector del conformado está buscando sistemas capaces de monitorizar la fuerza de proceso y el funcionamiento de la servo prensa para analizar el proceso y predecir futuras desviaciones de las operaciones de conformado. Los usuarios de las servo prensas quieren garantizar la calidad de las piezas fabricadas y reducir las paradas de las servo prensas debidas al mal funcionamiento de las mismas. Esta tesis aborda la monitorización de la fuerza de proceso y el comportamiento dinámico de una servo prensa mediante un algoritmo de tratamiento estadístico de la señal conocido como el dual Particle Filter (dPF). Inicialmente, tanto el modelo desarrollado como el dPF propuesto han sido evaluados y validados experimentalmente en un banco de ensayos de escala reducida. El banco de ensayos ha sido diseñado y fabricado mediante una metodología de diseño que permite replicar el comportamiento cinem´atico y din´amico de distintas servo prensas en el mismo banco. La validación experimental también se ha llevado a cabo en una servo prensa industrial mediante tres procesos de conformado distintos. Los resultados de estimación han provado la habilidad del dPF para seguir la fuerza de proceso en los procesos evaluados, obteniendo una desviación menor que un 5% con respecto a las señales medidas en el punto donde se da la fuerza máxima. El algoritmo dPF ha sido acelerado mediante un filed programmable gate array (FPGA) para lograr estimaciones en tiempo real

    Histogram of oriented gradients front end processing: an FPGA based processor approach

    Get PDF
    The Field Programmable Gate Array (FPGA) implementation of the commonly used Histogram of Oriented Gradients (HOG) algorithm is explored. The HOG algorithm is employed to extract features for object detection. A key focus has been to explore the use of a new FPGA-based processor which has been targeted at image processing. The paper gives details of the mapping and scheduling factors that influence the performance and the stages that were undertaken to allow the algorithm to be deployed on FPGA hardware, whilst taking into account the specific IPPro architecture features. We show that multi-core IPPro performance can exceed that of against state-of-the-art FPGA designs by up to 3.2 times with reduced design and implementation effort and increased flexibility all on a low cost, Zynq programmable system

    Vector processor virtualization: distributed memory hierarchy and simultaneous multithreading

    Get PDF
    Taking advantage of DLP (Data-Level Parallelism) is indispensable in most data streaming and multimedia applications. Several architectures have been proposed to improve both the performance and energy consumption for such applications. Superscalar and VLIW (Very Long Instruction Word) processors, along with SIMD (Single-Instruction Multiple-Data) and vector processor (VP) accelerators, are among the available options for designers to accomplish their desired requirements. On the other hand, these choices turn out to be large resource and energy consumers, while also not being always used efficiently due to data dependencies among instructions and limited portion of vectorizable code in single applications that deploy them. This dissertation proposes an innovative architecture for a multithreaded VP which separates the path for performing data shuffle and memory-indexed accesses from the data path for executing other vector instructions that access the memory. This separation speeds up the most common memory access operations by avoiding extra delays and unnecessary stalls. In this multilane-based VP design, each vector lane uses its own private memory to avoid any stalls during memory access instructions. More importantly, the proposed VP has an innovative multithreaded architecture which makes it highly suitable for concurrent sharing in multicore environments. To this end, the VP which is developed in VHDL and prototyped on an FPGA (Field-Programmable Gate Array), serves as a coprocessor for one or more scalar cores in various system architectures presented in the dissertation. In the first system architecture, the VP is allocated exclusively to a single scalar core. Benchmarking shows that the VP can achieve very high performance. The inclusion of distributed data shuffle engines across vector lanes has a spectacular impact on the execution time, primarily for applications like FFT (Fast-Fourier Transform) that require large amounts of data shuffling. In the second system architecture, a VP virtualization technique is presented which, when applied, enables the multithreaded VP to simultaneously execute many threads of various vector lengths. The threads compete simultaneously for the VP resources having as a goal an improved aggregate VP utilization. This approach yields high VP utilization even under low utilization for the individual threads. A vector register file (VRF) virtualization technique dynamically allocates physical vector registers to running threads. The technique is implemented for a multi-core processor embedded in an FPGA. Under the dynamic creation of threads, benchmarking demonstrates large VP speedups and drastic energy savings when compared to the first system architecture. In the last system architecture, further improvements focus on VP virtualization relying exclusively on hardware. Moreover, a pipelined data shuffle network replaces the non-pipelined shuffle engines. The VP can then take advantage of identical instruction flows that may be present in different vector applications by running in a fused instruction mode that increases its utilization. A power dissipation model is introduced as well as two optimization policies towards minimizing the consumed energy, or the product of the energy and runtime for a given application. Benchmarking shows the positive impact of these optimizations

    An Overview of SBIR Phase 2 Communications Technology and Development

    Get PDF
    Technological innovation is the overall focus of NASA's Small Business Innovation Research (SBIR) program. The program invests in the development of innovative concepts and technologies to help NASA's mission directorates address critical research and development needs for agency projects. This report highlights innovative SBIR Phase II projects from 2007-2012 specifically addressing areas in Communications Technology and Development which is one of six core competencies at NASA Glenn Research Center. There are eighteen technologies featured with emphasis on a wide spectrum of applications such as with a security-enhanced autonomous network management, secure communications using on-demand single photons, cognitive software-defined radio, spacesuit audio systems, multiband photonic phased-array antenna, and much more. Each article in this booklet describes an innovation, technical objective, and highlights NASA commercial and industrial applications. This report serves as an opportunity for NASA personnel including engineers, researchers, and program managers to learn of NASA SBIR's capabilities that might be crosscutting into this technology area. As the result, it would cause collaborations and partnerships between the small companies and NASA Programs and Projects resulting in benefit to both SBIR companies and NASA
    • …
    corecore