879 research outputs found

    A Dynamically Reconfigurable Parallel Processing Framework with Application to High-Performance Video Processing

    Get PDF
    Digital video processing demands have and will continue to grow at unprecedented rates. Growth comes from ever increasing volume of data, demand for higher resolution, higher frame rates, and the need for high capacity communications. Moreover, economic realities force continued reductions in size, weight and power requirements. The ever-changing needs and complexities associated with effective video processing systems leads to the consideration of dynamically reconfigurable systems. The goal of this dissertation research was to develop and demonstrate the viability of integrated parallel processing system that effectively and efficiently apply pre-optimized hardware cores for processing video streamed data. Digital video is decomposed into packets which are then distributed over a group of parallel video processing cores. Real time processing requires an effective task scheduler that distributes video packets efficiently to any of the reconfigurable distributed processing nodes across the framework, with the nodes running on FPGA reconfigurable logic in an inherently Virtual\u27 mode. The developed framework, coupled with the use of hardware techniques for dynamic processing optimization achieves an optimal cost/power/performance realization for video processing applications. The system is evaluated by testing processor utilization relative to I/O bandwidth and algorithm latency using a separable 2-D FIR filtering system, and a dynamic pixel processor. For these applications, the system can achieve performance of hundreds of 640x480 video frames per second across an eight lane Gen I PCIe bus. Overall, optimal performance is achieved in the sense that video data is processed at the maximum possible rate that can be streamed through the processing cores. This performance, coupled with inherent ability to dynamically add new algorithms to the described dynamically reconfigurable distributed processing framework, creates new opportunities for realizable and economic hardware virtualization.\u2

    Smartphone based applications for Road Traffic Telematics

    Get PDF
    L'abstract è presente nell'allegato / the abstract is in the attachmen

    Precipitation products from the hydrology SAF

    Get PDF
    Abstract. The EUMETSAT Satellite Application Facility on Support to Operational Hydrology and Water Management (H-SAF) was established by the EUMETSAT Council on 3 July 2005, starting activity on 1 September 2005. The Italian Meteorological Service serves as Leading Entity on behalf of twelve European member countries. H-SAF products include precipitation, soil moisture and snow parameters. Some products are based only on satellite observations, while other products are based on the assimilation of satellite measurements/products into numerical models. In addition to product development and generation, H-SAF includes a product validation program and a hydrological validation program that are coordinated, respectively, by the Italian Department of Civil Protection and by the Polish Institute of Meteorology and Water Management. The National Center of Aeronautical Meteorology and Climatology (CNMCA) of the Italian Air Force is responsible for operational product generation and dissemination. In this paper we describe the H-SAF precipitation algorithms and products, which have been developed by the Italian Institute of Atmospheric Sciences and Climate (in collaboration with the international community) and by CNMCA during the Development Phase (DP, 2005–2010) and the first Continuous Development and Operations Phase (CDOP-1, 2010–2012). The precipitation products are based on passive microwave measurements obtained from radiometers onboard different sun-synchronous low-Earth-orbiting satellites (especially, the SSM/I and SSMIS radiometers onboard DMSP satellites and the AMSU-A + AMSU-B/MHS radiometer suites onboard EPS-MetOp and NOAA-POES satellites), as well as on combined infrared/passive microwave measurements in which the passive microwave precipitation estimates are used in conjunction with SEVIRI images from the geostationary MSG satellite. Moreover, the H-SAF product generation and dissemination chain and independent product validation activities are described. Also, the H-SAF program and its associated activities that currently are being carried out or are planned to be performed within the second CDOP phase (CDOP-2, 2012–2017) are presented in some detail. Insofar as CDOP-2 is concerned, it is emphasized that all algorithms and processing schemes will be improved and enhanced so as to extend them to satellites that will be operational within this decade – particularly the geostationary Meteosat Third Generation satellites and the low-Earth-orbiting Core Observatory of the international Global Precipitation Measurement mission. Finally, the role of H-SAF within the international science and operations community is explained.</p

    Dynamically reconfigurable management of energy, performance, and accuracy applied to digital signal, image, and video Processing Applications

    Get PDF
    There is strong interest in the development of dynamically reconfigurable systems that can meet real-time constraints in energy/power-performance-accuracy (EPA/PPA). In this dissertation, I introduce a framework for implementing dynamically reconfigurable digital signal, image, and video processing systems. The basic idea is to first generate a collection of Pareto-optimal realizations in the EPA/PPA space. Dynamic EPA/PPA management is then achieved by selecting the Pareto-optimal implementations that can meet the real-time constraints. The systems are then demonstrated using Dynamic Partial Reconfiguration (DPR) and dynamic frequency control on FPGAs. The framework is demonstrated on: i) a dynamic pixel processor, ii) a dynamically reconfigurable 1-D digital filtering architecture, and iii) a dynamically reconfigurable 2-D separable digital filtering system. Efficient implementations of the pixel processor are based on the use of look-up tables and local-multiplexes to minimize FPGA resources. For the pixel-processor, different realizations are generated based on the number of input bits, the number of cores, the number of output bits, and the frequency of operation. For each parameters combination, there is a different pixel-processor realization. Pareto-optimal realizations are selected based on measurements of energy per frame, PSNR accuracy, and performance in terms of frames per second. Dynamic EPA/PPA management is demonstrated for a sequential list of real-time constraints by selecting optimal realizations and implementing using DPR and dynamic frequency control. Efficient FPGA implementations for the 1-D and 2-D FIR filters are based on the use a distributed arithmetic technique. Different realizations are generated by varying the number of coefficients, coefficient bitwidth, and output bitwidth. Pareto-optimal realizations are selected in the EPA space. Dynamic EPA management is demonstrated on the application of real-time EPA constraints on a digital video. The results suggest that the general framework can be applied to a variety of digital signal, image, and video processing systems. It is based on the use of offline-processing that is used to determine the Pareto-optimal realizations. Real-time constraints are met by selecting Pareto-optimal realizations pre-loaded in memory that are then implemented efficiently using DPR and/or dynamic frequency control

    Cross-layer modeling and optimization of next-generation internet networks

    Get PDF
    Scaling traditional telecommunication networks so that they are able to cope with the volume of future traffic demands and the stringent European Commission (EC) regulations on emissions would entail unaffordable investments. For this very reason, the design of an innovative ultra-high bandwidth power-efficient network architecture is nowadays a bold topic within the research community. So far, the independent evolution of network layers has resulted in isolated, and hence, far-from-optimal contributions, which have eventually led to the issues today's networks are facing such as inefficient energy strategy, limited network scalability and flexibility, reduced network manageability and increased overall network and customer services costs. Consequently, there is currently large consensus among network operators and the research community that cross-layer interaction and coordination is fundamental for the proper architectural design of next-generation Internet networks. This thesis actively contributes to the this goal by addressing the modeling, optimization and performance analysis of a set of potential technologies to be deployed in future cross-layer network architectures. By applying a transversal design approach (i.e., joint consideration of several network layers), we aim for achieving the maximization of the integration of the different network layers involved in each specific problem. To this end, Part I provides a comprehensive evaluation of optical transport networks (OTNs) based on layer 2 (L2) sub-wavelength switching (SWS) technologies, also taking into consideration the impact of physical layer impairments (PLIs) (L0 phenomena). Indeed, the recent and relevant advances in optical technologies have dramatically increased the impact that PLIs have on the optical signal quality, particularly in the context of SWS networks. Then, in Part II of the thesis, we present a set of case studies where it is shown that the application of operations research (OR) methodologies in the desing/planning stage of future cross-layer Internet network architectures leads to the successful joint optimization of key network performance indicators (KPIs) such as cost (i.e., CAPEX/OPEX), resources usage and energy consumption. OR can definitely play an important role by allowing network designers/architects to obtain good near-optimal solutions to real-sized problems within practical running times

    Towards the development of a reliable reconfigurable real-time operating system on FPGAs

    Get PDF
    In the last two decades, Field Programmable Gate Arrays (FPGAs) have been rapidly developed from simple “glue-logic” to a powerful platform capable of implementing a System on Chip (SoC). Modern FPGAs achieve not only the high performance compared with General Purpose Processors (GPPs), thanks to hardware parallelism and dedication, but also better programming flexibility, in comparison to Application Specific Integrated Circuits (ASICs). Moreover, the hardware programming flexibility of FPGAs is further harnessed for both performance and manipulability, which makes Dynamic Partial Reconfiguration (DPR) possible. DPR allows a part or parts of a circuit to be reconfigured at run-time, without interrupting the rest of the chip’s operation. As a result, hardware resources can be more efficiently exploited since the chip resources can be reused by swapping in or out hardware tasks to or from the chip in a time-multiplexed fashion. In addition, DPR improves fault tolerance against transient errors and permanent damage, such as Single Event Upsets (SEUs) can be mitigated by reconfiguring the FPGA to avoid error accumulation. Furthermore, power and heat can be reduced by removing finished or idle tasks from the chip. For all these reasons above, DPR has significantly promoted Reconfigurable Computing (RC) and has become a very hot topic. However, since hardware integration is increasing at an exponential rate, and applications are becoming more complex with the growth of user demands, highlevel application design and low-level hardware implementation are increasingly separated and layered. As a consequence, users can obtain little advantage from DPR without the support of system-level middleware. To bridge the gap between the high-level application and the low-level hardware implementation, this thesis presents the important contributions towards a Reliable, Reconfigurable and Real-Time Operating System (R3TOS), which facilitates the user exploitation of DPR from the application level, by managing the complex hardware in the background. In R3TOS, hardware tasks behave just like software tasks, which can be created, scheduled, and mapped to different computing resources on the fly. The novel contributions of this work are: 1) a novel implementation of an efficient task scheduler and allocator; 2) implementation of a novel real-time scheduling algorithm (FAEDF) and two efficacious allocating algorithms (EAC and EVC), which schedule tasks in real-time and circumvent emerging faults while maintaining more compact empty areas. 3) Design and implementation of a faulttolerant microprocessor by harnessing the existing FPGA resources, such as Error Correction Code (ECC) and configuration primitives. 4) A novel symmetric multiprocessing (SMP)-based architectures that supports shared memory programing interface. 5) Two demonstrations of the integrated system, including a) the K-Nearest Neighbour classifier, which is a non-parametric classification algorithm widely used in various fields of data mining; and b) pairwise sequence alignment, namely the Smith Waterman algorithm, used for identifying similarities between two biological sequences. R3TOS gives considerably higher flexibility to support scalable multi-user, multitasking applications, whereby resources can be dynamically managed in respect of user requirements and hardware availability. Benefiting from this, not only the hardware resources can be more efficiently used, but also the system performance can be significantly increased. Results show that the scheduling and allocating efficiencies have been improved up to 2x, and the overall system performance is further improved by ~2.5x. Future work includes the development of Network on Chip (NoC), which is expected to further increase the communication throughput; as well as the standardization and automation of our system design, which will be carried out in line with the enablement of other high-level synthesis tools, to allow application developers to benefit from the system in a more efficient manner

    Build framework and runtime abstraction for partial reconfiguration on FPGA SoCs

    Get PDF
    Growth in edge computing has increased the requirement for edge systems to process larger volumes of real-time data, such as with image processing and machine learning; which are increasingly demanding of computing resources. Offloading tasks to the cloud provides some relief but is network dependant, high latency and expensive. Alternative architectures such as GPUs provide higher performance acceleration for this type of data processing but trade processing performance for an increase in power consumption. Another option is the Field Programmable Gate Array; a flexible matrix of logic that can be configured by a designer to provide a highly optimised computation path for incoming data. There are drawbacks; the FPGA design process is complex, the domain is dissimilar to software and the tools require bespoke expertise. A designer must manage the hardware to software paradigm introduced when tightly-coupled with general purpose processor. Advanced features, such as the ability to partially reconfigure (PR) specific regions of the FPGA, further increase this complexity. This thesis presents theory and demonstration of custom frameworks and tools for increasing abstraction and simplifying control over PR applications. We present mechanisms for networked PR; a mechanism for bypassing the traditional software networking stack to trigger PR with reduced latency and increased determinism. We developed a build framework for automating the end-to-end PR design process for Linux based systems as well as an abstracted runtime for managing the resulting applications. Finally, we take expand on this work and present a high level abstraction for PR on cyber physical systems, with a demonstration using the Robot Operating System. This work is released as open source contributions, designed to enable future PR research

    Conceptual design and realization of a dynamic partial reconfiguration extension of an existing soft-core processor

    Get PDF
    Viele aktuelle Field Programmable Gate Arrays (FPGAs) unterstützen die Technik der partiellen Rekonfiguration (PR), durch die dynamisch zur Laufzeit ein Hardware-Design auch nur teilweise ausgetauscht werden kann. Die vorliegende Arbeit integriert PR-Funktionalität in die an der Technischen Universität Ilmenau für harte Echtzeitaufgaben mit hochpräzisen Fließkommaberechnungen entwickelte VHDL Integrated Softcore Architecture for Reconfigurable Devices (ViSARD). Zu diesem Zweck wird die arithmetisch-logische Einheit angepasst, um das Auswechseln von Fließkomma-Ausführungseinheiten zu ermöglichen. Ziele der Entwicklung des PR-Systems sind hohe Geschwindigkeit, niedrige Latenz, niedrige Ressourcenkosten und harte Echtzeitfähigkeit. Erreicht werden diese durch die Umsetzung einer eigenen Steuereinheit (partial reconfiguration controller), die partielle Bitströme aus externem RAM über einen standardmäßigen AXI-Bus lädt sowie die entsprechende Erweiterung der ViSARD. In einem Testdesign, das zwischen drei verschiedenen Konfigurationen mit je zwischen einer und drei Ausführungseinheiten wechselt, hat das entwickelte PR-System den maximal spezifierten Bitstromdurchsatz auf dem Ziel-FPGA erreicht und den Verbrauch an Lookup-Tabellen um etwa 40 % verringert.Many modern field-programmable gate arrays (FPGAs) support partial reconfiguration, which allows to dynamically replace only a part of a design at run time. In this thesis, partial reconfiguration capability is integrated with the VHDL Integrated Softcore Architecture for Reconfigurable Devices (ViSARD) developed at Technische Universität Ilmenau and conceived for hard real-time tasks requiring floating-point calculations with high precision. Specifically, its arithmetic logic unit is modified to allow exchanging floating-point arithmetic execution units. Design goals of the partial reconfiguration system are high speed, low latency, low resource overhead, and hard real-time capability. They are reached by implementing a custom partial reconfiguration controller loading partial bitstreams from external RAM over a standard AXI bus and extending the ViSARD appropriately. In a test design that switched between 3 different configurations each containing between 1 and 3 execution units, the proposed partial reconfiguration system achieved the maximum specified bitstream throughput on the target FPGA and allowed for roughly 40 % reduced look-up table usage
    corecore