Search CORE

220 research outputs found

Ada (trademark) projects at NASA. Runtime environment issues and recommendations

Author: Roy Daniel M.
Wilke Randall W.
Publication venue
Publication date
Field of study

Ada practitioners should use this document to discuss and establish common short term requirements for Ada runtime environments. The major current Ada runtime environment issues are identified through the analysis of some of the Ada efforts at NASA and other research centers. The runtime environment characteristics of major compilers are compared while alternate runtime implementations are reviewed. Modifications and extensions to the Ada Language Reference Manual to address some of these runtime issues are proposed. Three classes of projects focusing on the most critical runtime features of Ada are recommended, including a range of immediately feasible full scale Ada development projects. Also, a list of runtime features and procurement issues is proposed for consideration by the vendors, contractors and the government

NASA Technical Reports Server

MEANTIME: Achieving Both Minimal Energy and Timeliness with Approximate Computing

Author: Anne Farrell
Henry Hoffmann
Publication venue
Publication date: 01/01/2016
Field of study

Energy efficiency and timeliness (i.e., predictable job latency) are two essential -yet opposing -concerns for embedded systems. Hard timing guarantees require conservative resource allocation while energy minimization requires aggressively releasing resources and occasionally violating timing constraints. Recent work on approximate computing, however, opens up a new dimension of optimization: application accuracy. In this paper, we use approximate computing to achieve both hard timing guarantees and energy efficiency. Specifically, we propose MEANTIME: a runtime system that delivers hard latency guarantees and energy-minimal resource usage through small accuracy reductions. We test MEAN-TIME on a real Linux/ARM system with six applications. Overall, we find that MEANTIME never violates real-time deadlines and sacrifices a small amount (typically less than 2%) of accuracy while reducing energy to 54% of a conservative, full accuracy approach

CiteSeerX

Federated Sensor Network architectural design for the Internet of Things (IoT)

Author: Ran Xu (316855)
Publication venue
Publication date: 01/01/2013
Field of study

An information technology that can combine the physical world and virtual world is desired. The Internet of Things (IoT) is a concept system that uses Radio Frequency Identification (RFID), WSN and barcode scanners to sense and to detect physical objects and events. This information is shared with people on the Internet. With the announcement of the Smarter Planet concept by IBM, the problem of how to share this data was raised. However, the original design of WSN aims to provide environment monitoring and control within a small scale local network. It cannot meet the demands of the IoT because there is a lack of multi-connection functionality with other WSNs and upper level applications. As various standards of WSNs provide information for different purposes, a hybrid system that gives a complete answer by combining all of them could be promising for future IoT applications. This thesis is on the subject of `Federated Sensor Network' design and architectural development for the Internet of Things. A Federated Sensor Network (FSN) is a system that integrates WSNs and the Internet. Currently, methods of integrating WSNs and the Internet can follow one of three main directions: a Front-End Proxy solution, a Gateway solution or a TCP/IP Overlay solution. Architectures based on the ideas from all three directions are presented in this thesis; this forms a comprehensive body of research on possible Federated Sensor Network architecture designs. In addition, a fully compatible technology for the sensor network application, namely the Sensor Model Language (SensorML), has been reviewed and embedded into our FSN systems. The IoT as a new concept is also comprehensively described and the major technical issues discussed. Finally, a case study of the IoT in logistic management for emergency response is given. Proposed FSN architectures based on the Gateway solution are demonstrated through hardware implementation and lab tests. A demonstration of the 6LoWPAN enabled federated sensor network based on the TCP/IP Overlay solution presents a good result for the iNET localization and tracking project. All the tests of the designs have verified feasibility and achieve the target of the IoT concept

Loughborough University Institutional Repository

Real-Time Scheduling for GPUs with Applications in Advanced Automotive Systems

Author: Elliott Glenn
Publication venue: University of North Carolina at Chapel Hill Graduate School
Publication date: 01/01/2015
Field of study

Self-driving cars, once constrained to closed test tracks, are beginning to drive alongside human drivers on public roads. Loss of life or property may result if the computing systems of automated vehicles fail to respond to events at the right moment. We call such systems that must satisfy precise timing constraints “real-time systems.” Since the 1960s, researchers have developed algorithms and analytical techniques used in the development of real-time systems; however, this body of knowledge primarily applies to traditional CPU-based platforms. Unfortunately, traditional platforms cannot meet the computational requirements of self-driving cars without exceeding the power and cost constraints of commercially viable vehicles. We argue that modern graphics processing units, or GPUs, represent a feasible alternative, but new algorithms and analytical techniques must be developed in order to integrate these uniquely constrained processors into a real-time system. The goal of the research presented in this dissertation is to discover and remedy the issues that prevent the use of GPUs in real-time systems. To overcome these issues, we design and implement a real-time multi-GPU scheduler, called GPUSync. GPUSync tightly controls access to a GPU’s computational and DMA processors, enabling simultaneous use despite potential limitations in GPU hardware. GPUSync enables tasks to migrate among GPUs, allowing new classes of real-time multi-GPU computing platforms. GPUSync employs heuristics to guide scheduling decisions to improve system efficiency without risking violations in real-time constraints. GPUSync may be paired with a wide variety of common real-time CPU schedulers. GPUSync supports closed-source GPU runtimes and drivers without loss in functionality. We evaluate GPUSync with both analytical and runtime experiments. In our analytical experiments, we model and evaluate over fifty configurations of GPUSync. We determine which configurations support the greatest computational capacity while maintaining real-time constraints. In our runtime experiments, we execute computer vision programs similar to those found in automated vehicles, with and without GPUSync. Our results demonstrate that GPUSync greatly reduces jitter in video processing. Research into real-time systems with GPUs is a new area of study. Although there is prior work on such systems, no other GPU scheduling framework is as comprehensive and flexible as GPUSync.Doctor of Philosoph

CiteSeerX

Carolina Digital Repository

Datasiirron optimointi FPGA mikropiiriin pohjautuvassa sulautetussa Linux-järjestelmässä

Author: Lipponen Jan
Publication venue
Publication date: 06/06/2018
Field of study

The main goal of this thesis was to optimize the efficiency of data transfer in an FPGA based embedded Linux system. The target system is a part of a radio transceiver application receiving high data rates to an FPGA chip, from where the data is made accessible to a user program using a DMA operation utilizing Linux kernel module. The initial solution, however, used excessive amounts of CPU time to make the kernel module buffered data accessible by the user program. Further optimization of the data transfer was required by upcoming phases of the project. Two data transfer optimization methods were considered. The first solution would use an architecture enabling the FPGA originating data to be accessed directly from the user program via a data buffer shared with the kernel. The second solution utilized a DMAC (DMA controller) hardware component capable of moving the data from the kernel buffer to the user program. The second choice was later rejected due to high platform dependency on such an implementation. A working solution, for the shared buffer optimization method, was found by going through Linux memory management related literature. The implemented solution uses the mmap system call function to remap a kernel module allocated data buffer for user program access. To compare the performance of the implemented solution to the initial one, a data transfer test system was implemented. This system enables pre-defined data to be generated in the FPGA with varying data rates. It was shown in the performed tests that the maximum throughput was increased by ~25% (from ~100 MB/s to ~125 MB/s) using the optimized solution. No exact maximum data rates were discovered because of a test data generation related constraint. The increase in throughput is considered as a significant result for the radio transceiver application. The implemented optimization solution is also expected to be easily portable to any Linux system

Trepo - Institutional Repository of Tampere University

Video Sensor Architecture for Surveillance Applications

Author: Benet Gilabert Ginés
Simó Ten José Enrique
Sánchez Peñarroja Jordi
Publication venue: 'MDPI AG'
Publication date: 01/01/2012
Field of study

This paper introduces a flexible hardware and software architecture for a smart video sensor. This sensor has been applied in a video surveillance application where some of these video sensors are deployed, constituting the sensory nodes of a distributed surveillance system. In this system, a video sensor node processes images locally in order to extract objects of interest, and classify them. The sensor node reports the processing results to other nodes in the cloud (a user or higher level software) in the form of an XML description. The hardware architecture of each sensor node has been developed using two DSP processors and an FPGA that controls, in a flexible way, the interconnection among processors and the image data flow. The developed node software is based on pluggable components and runs on a provided execution run-time. Some basic and application-specific software components have been developed, in particular: acquisition, segmentation, labeling, tracking, classification and feature extraction. Preliminary results demonstrate that the system can achieve up to 7.5 frames per second in the worst case, and the true positive rates in the classification of objects are better than 80%. © 2012 by the authors; licensee MDPI, Basel, Switzerland.This work has been partially supported by SENSE project (Specific Targeted Research Project within the thematic priority IST 2.5.3 of the 6th Framework Program of the European Commission: IST Project 033279), and has been also co-funded by the Spanish research projects SIDIRELI: DPI2008-06737-C02-01/02 and COBAMI: DPI2011-28507-C02-02, both partially supported with European FEDER funds.Sánchez Peñarroja, J.; Benet Gilabert, G.; Simó Ten, JE. (2012). Video Sensor Architecture for Surveillance Applications. Sensors. 12(2):1509-1528. https://doi.org/10.3390/s120201509S15091528122Batlle, J. (2002). A New FPGA/DSP-Based Parallel Architecture for Real-Time Image Processing. Real-Time Imaging, 8(5), 345-356. doi:10.1006/rtim.2001.0273Foresti, G. L., Micheloni, C., Piciarelli, C., & Snidaro, L. (2009). Visual Sensor Technology for Advanced Surveillance Systems: Historical View, Technological Aspects and Research Activities in Italy. Sensors, 9(4), 2252-2270. doi:10.3390/s90402252Bramberger, M., Doblander, A., Maier, A., Rinner, B., & Schwabach, H. (2006). Distributed Embedded Smart Cameras for Surveillance Applications. Computer, 39(2), 68-75. doi:10.1109/mc.2006.55Foresti, G. L., Micheloni, C., Snidaro, L., Remagnino, P., & Ellis, T. (2005). Active video-based surveillance system: the low-level image and video processing techniques needed for implementation. IEEE Signal Processing Magazine, 22(2), 25-37. doi:10.1109/msp.2005.1406473Fuentes, L. M., & Velastin, S. A. (2003). Tracking People for Automatic Surveillance Applications. Lecture Notes in Computer Science, 238-245. doi:10.1007/978-3-540-44871-6_28García, J., Pérez, O., Berlanga, A., & Molina, J. M. (2007). Video tracking system optimization using evolution strategies. International Journal of Imaging Systems and Technology, 17(2), 75-90. doi:10.1002/ima.20100Xu, H., Lv, J., Chen, X., Gong, X., & Yang, C. (2007). Design of video processing and testing system based on DSP and FPGA. 3rd International Symposium on Advanced Optical Manufacturing and Testing Technologies: Optical Test and Measurement Technology and Equipment. doi:10.1117/12.783790Sanfeliu, A., Andrade-Cetto, J., Barbosa, M., Bowden, R., Capitán, J., Corominas, A., … Spaan, M. T. J. (2010). Decentralized Sensor Fusion for Ubiquitous Networking Robotics in Urban Areas. Sensors, 10(3), 2274-2314. doi:10.3390/s100302274http://www.sense-ist.orgXu, H., Lv, J., Chen, X., Gong, X., & Yang, C. (2007). Design of video processing and testing system based on DSP and FPGA. 3rd International Symposium on Advanced Optical Manufacturing and Testing Technologies: Optical Test and Measurement Technology and Equipment. doi:10.1117/12.78379

Multidisciplinary Digital Publishing Institute

Directory of Open Access Journals

PubMed Central

RiuNet

Towards the development of flexible, reliable, reconfigurable, and high-performance imaging systems

Author: Khalifat Jalal Mohamed
Publication venue: The University of Edinburgh
Publication date: 29/11/2016
Field of study

Current FPGAs can implement large systems because of the high density of reconfigurable logic resources in a single chip. FPGAs are comprehensive devices that combine flexibility and high performance in the same platform compared to other platform such as General-Purpose Processors (GPPs) and Application Specific Integrated Circuits (ASICs). The flexibility of modern FPGAs is further enhanced by introducing Dynamic Partial Reconfiguration (DPR) feature, which allows for changing the functionality of part of the system while other parts are functioning. FPGAs became an important platform for digital image processing applications because of the aforementioned features. They can fulfil the need of efficient and flexible platforms that execute imaging tasks efficiently as well as the reliably with low power, high performance and high flexibility. The use of FPGAs as accelerators for image processing outperforms most of the current solutions. Current FPGA solutions can to load part of the imaging application that needs high computational power on dedicated reconfigurable hardware accelerators while other parts are working on the traditional solution to increase the system performance. Moreover, the use of the DPR feature enhances the flexibility of image processing further by swapping accelerators in and out at run-time. The use of fault mitigation techniques in FPGAs enables imaging applications to operate in harsh environments following the fact that FPGAs are sensitive to radiation and extreme conditions. The aim of this thesis is to present a platform for efficient implementations of imaging tasks. The research uses FPGAs as the key component of this platform and uses the concept of DPR to increase the performance, flexibility, to reduce the power dissipation and to expand the cycle of possible imaging applications. In this context, it proposes the use of FPGAs to accelerate the Image Processing Pipeline (IPP) stages, the core part of most imaging devices. The thesis has a number of novel concepts. The first novel concept is the use of FPGA hardware environment and DPR feature to increase the parallelism and achieve high flexibility. The concept also increases the performance and reduces the power consumption and area utilisation. Based on this concept, the following implementations are presented in this thesis: An implementation of Adams Hamilton Demosaicing algorithm for camera colour interpolation, which exploits the FPGA parallelism to outperform other equivalents. In addition, an implementation of Automatic White Balance (AWB), another IPP stage that employs DPR feature to prove the mentioned novelty aspects. Another novel concept in this thesis is presented in chapter 6, which uses DPR feature to develop a novel flexible imaging system that requires less logic and can be implemented in small FPGAs. The system can be employed as a template for any imaging application with no limitation. Moreover, discussed in this thesis is a novel reliable version of the imaging system that adopts novel techniques including scrubbing, Built-In Self Test (BIST), and Triple Modular Redundancy (TMR) to detect and correct errors using the Internal Configuration Access Port (ICAP) primitive. These techniques exploit the datapath-based nature of the implemented imaging system to improve the system's overall reliability. The thesis presents a proposal for integrating the imaging system with the Robust Reliable Reconfigurable Real-Time Heterogeneous Operating System (R4THOS) to get the best out of the system. The proposal shows the suitability of the proposed DPR imaging system to be used as part of the core system of autonomous cars because of its unbounded flexibility. These novel works are presented in a number of publications as shown in section 1.3 later in this thesis

Edinburgh Research Archive

Temporal analysis and scheduling of hard real-time radios running on a multi-processor

Author: Pires dos reis Moreira O.
Publication venue: Technische Universiteit Eindhoven
Publication date: 01/01/2012
Field of study

On a multi-radio baseband system, multiple independent transceivers must share the resources of a multi-processor, while meeting each its own hard real-time requirements. Not all possible combinations of transceivers are known at compile time, so a solution must be found that either allows for independent timing analysis or relies on runtime timing analysis. This thesis proposes a design flow and software architecture that meets these challenges, while enabling features such as independent transceiver compilation and dynamic loading, and taking into account other challenges such as ease of programming, efficiency, and ease of validation. We take data flow as the basic model of computation, as it fits the application domain, and several static variants (such as Single-Rate, Multi-Rate and Cyclo-Static) have been shown to possess strong analytical properties. Traditional temporal analysis of data flow can provide minimum throughput guarantees for a self-timed implementation of data flow. Since transceivers may need to guarantee strictly periodic execution and meet latency requirements, we extend the analysis techniques to show that we can enforce strict periodicity for an actor in the graph; we also provide maximum latency analysis techniques for periodic, sporadic and bursty sources. We propose a scheduling strategy and an automatic scheduling flow that enable the simultaneous execution of multiple transceivers with hard-realtime requirements, described as Single-Rate Data Flow (SRDF) graphs. Each transceiver has its own execution rate and starts and stops independently from other transceivers, at times unknown at compile time, on a multiprocessor. We show how to combine scheduling and mapping decisions with the input application data flow graph to generate a worst-case temporal analysis graph. We propose algorithms to find a mapping per transceiver in the form of clusters of statically-ordered actors, and a budget for either a Time Division Multiplex (TDM) or Non-Preemptive Non-Blocking Round Robin (NPNBRR) scheduler per cluster per transceiver. The budget is computed such that if the platform can provide it, then the desired minimum throughput and maximum latency of the transceiver are guaranteed, while minimizing the required processing resources. We illustrate the use of these techniques to map a combination of WLAN and TDS-CDMA receivers onto a prototype Software-Defined Radio platform. The functionality of transceivers for standards with very dynamic behavior – such as WLAN – cannot be conveniently modeled as an SRDF graph, since SRDF is not capable of expressing variations of actor firing rules depending on the values of input data. Because of this, we propose a restricted, customized data flow model of computation, Mode-Controlled Data Flow (MCDF), that can capture the data-value dependent behavior of a transceiver, while allowing rigorous temporal analysis, and tight resource budgeting. We develop a number of analysis techniques to characterize the temporal behavior of MCDF graphs, in terms of maximum latencies and throughput. We also provide an extension to MCDF of our scheduling strategy for SRDF. The capabilities of MCDF are then illustrated with a WLAN 802.11a receiver model. Having computed budgets for each transceiver, we propose a way to use these budgets for run-time resource mapping and admissibility analysis. During run-time, at transceiver start time, the budget for each cluster of statically-ordered actors is allocated by a resource manager to platform resources. The resource manager enforces strict admission control, to restrict transceivers from interfering with each other’s worst-case temporal behaviors. We propose algorithms adapted from Vector Bin-Packing to enable the mapping at start time of transceivers to the multi-processor architecture, considering also the case where the processors are connected by a network on chip with resource reservation guarantees, in which case we also find routing and resource allocation on the network-on-chip. In our experiments, our resource allocation algorithms can keep 95% of the system resources occupied, while suffering from an allocation failure rate of less than 5%. An implementation of the framework was carried out on a prototype board. We present performance and memory utilization figures for this implementation, as they provide insights into the costs of adopting our approach. It turns out that the scheduling and synchronization overhead for an unoptimized implementation with no hardware support for synchronization of the framework is 16.3% of the cycle budget for a WLAN receiver on an EVP processor at 320 MHz. However, this overhead is less than 1% for mobile standards such as TDS-CDMA or LTE, which have lower rates, and thus larger cycle budgets. Considering that clock speeds will increase and that the synchronization primitives can be optimized to exploit the addressing modes available in the EVP, these results are very promising

Repository TU/e

Pure OAI Repository

Efficient Algorithms for Large-Scale Image Analysis

Author: Wassenberg Jan
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 01/01/2011
Field of study

This work develops highly efficient algorithms for analyzing large images. Applications include object-based change detection and screening. The algorithms are 10-100 times as fast as existing software, sometimes even outperforming FGPA/GPU hardware, because they are designed to suit the computer architecture. This thesis describes the implementation details and the underlying algorithm engineering methodology, so that both may also be applied to other applications

KITopen

A system for the simulation of hardware to software allocation and performance evaluation

Author: Wielgosz John
Wielgosz John
Publication venue: Department of Computer Science, Imperial College London
Publication date: 01/01/1974
Field of study

Imperial Users onl

Spiral - Imperial College Digital Repository