72 research outputs found

    Energy efficient video encoding using the tegra K1 mobile processor

    Full text link

    Harnessing single board computers for military data analytics

    Get PDF
    Executive summary: This chapter covers the use of Single Board Computers (SBCs) to expedite onsite data analytics for a variety of military applications. Onsite data summarization and analytics is increasingly critical for command, control, and intelligence (C2I) operations, as excessive power consumption and communication latency can restrict the efficacy of down-range operations. SBCs offer power-efficient, inexpensive data-processing capabilities while maintaining a small form factor. We discuss the use of SBCs in a variety of domains, including wireless sensor networks, unmanned vehicles, and cluster computing. We conclude with a discussion of existing challenges and opportunities for future use.https://digitalcommons.usmalibrary.org/books/1010/thumbnail.jp

    Software Defined Multi-Spectral Imaging for Arctic Sensor Networks

    Get PDF
    Availability of off-the-shelf infrared sensors combined with high definition visible cameras has made possible the construction of a Software Defined Multi-Spectral Imager (SDMSI) combining long-wave, near-infrared and visible imaging. The SDMSI requires a real-time embedded processor to fuse images and to create real-time depth maps for opportunistic uplink in sensor networks. Researchers at Embry Riddle Aeronautical University working with University of Alaska Anchorage at the Arctic Domain Awareness Center and the University of Colorado Boulder have built several versions of a low-cost drop-in-place SDMSI to test alternatives for power efficient image fusion. The SDMSI is intended for use in field applications including marine security, search and rescue operations and environmental surveys in the Arctic region. Based on Arctic marine sensor network mission goals, the team has designed the SDMSI to include features to rank images based on saliency and to provide on camera fusion and depth mapping. A major challenge has been the design of the camera computing system to operate within a 10 to 20 Watt power budget. This paper presents a power analysis of three options: 1) multi-core, 2) field programmable gate array with multi-core, and 3) graphics processing units with multi-core. For each test, power consumed for common fusion workloads has been measured at a range of frame rates and resolutions. Detailed analyses from our power efficiency comparison for workloads specific to stereo depth mapping and sensor fusion are summarized. Preliminary mission feasibility results from testing with off-the-shelf long-wave infrared and visible cameras in Alaska and Arizona are also summarized to demonstrate the value of the SDMSI for applications such as ice tracking, ocean color, soil moisture, animal and marine vessel detection and tracking. The goal is to select the most power efficient solution for the SDMSI for use on UAVs (Unoccupied Aerial Vehicles) and other drop-in-place installations in the Arctic. The prototype selected will be field tested in Alaska in the summer of 2016

    Energy reconstruction on the LHC ATLAS TileCal upgraded front end: feasibility study for a sROD co-processing unit

    Get PDF
    Dissertation presented in ful lment of the requirements for the degree of: Master of Science in Physics 2016The Phase-II upgrade of the Large Hadron Collider at CERN in the early 2020s will enable an order of magnitude increase in the data produced, unlocking the potential for new physics discoveries. In the ATLAS detector, the upgraded Hadronic Tile Calorimeter (TileCal) Phase-II front end read out system is currently being prototyped to handle a total data throughput of 5.1 TB/s, from the current 20.4 GB/s. The FPGA based Super Read Out Driver (sROD) prototype must perform an energy reconstruction algorithm on 2.88 GB/s raw data, or 275 million events per second. Due to the very high level of pro ciency required and time consuming nature of FPGA rmware development, it may be more e ective to implement certain complex energy reconstruction and monitoring algorithms on a general purpose, CPU based sROD co-processor. Hence, the feasibility of a general purpose ARM System on Chip based co-processing unit (PU) for the sROD is determined in this work. A PCI-Express test platform was designed and constructed to link two ARM Cortex-A9 SoCs via their PCI-Express Gen-2 x1 interfaces. Test results indicate that the latency of the PCI-Express interface is su ciently low and the data throughput is superior to that of alternative interfaces such as Ethernet, for use as an interconnect for the SoCs to the sROD. CPU performance benchmarks were performed on ve ARM development platforms to determine the CPU integer, oating point and memory system performance as well as energy e ciency. To complement the benchmarks, Fast Fourier Transform and Optimal Filtering (OF) applications were also tested. Based on the test results, in order for the PU to process 275 million events per second with OF, within the 6 s timing budget of the ATLAS triggering system, a cluster of three Tegra-K1, Cortex-A15 SoCs connected to the sROD via a Gen-2 x8 PCI-Express interface would be suitable. A high level design for the PU is proposed which surpasses the requirements for the sROD co-processor and can also be used in a general purpose, high data throughput system, with 80 Gb/s Ethernet and 15 GB/s PCI-Express throughput, using four X-Gene SoCs

    sensor data collection and performance evaluation using a TK1 board

    Get PDF
    Monitoring applications are abundant in todaýђةs world. Our goal is to monitor an individualand his neighborhood using wearable sensors. The system is smart in the sense it can process the captured data in near real-time and communicate-opportunistically with other such systems as well as smart phones and computers. we develop the hardware platform using existing components to support such functionalities. The Nvidia Jetson TEGRA-KEPLER (TK) board is used as the processor as it is one of the most powerful processors for embedded applications with the flexibility to connect to a plethora of sensors. Data transfer for communication is facilitated via Bluetooth and Wireless Fidelity (Wi-Fi). Results on the performance of this setup is reported in experiments with different sensors such as cameras, microphone, gas sensor,temperature/pressure/humidity sensor, and Garmin smart health watch determined heart rate/distance/speed/altitude/- position latitude and longitude and using metrics such as read/write speed,heat generated of Central Processing Unit (CPU), TK board and transmission delay

    Avaliação de clusters baseados em sistemas em um chip para a computação de alto desempenho: uma revisão

    Get PDF
    High-performance computing systems are the maximum expression in the field of processing for large amounts of data. However, their energy consumption is an aspect of great importance, which was not considered decades ago. Hence, software developers and hardware providers are obligated to approach new challenges to address energy consumption, and costs. Constructing a computational cluster with a large amount of systems on a chip can result in a powerful, ecologic platform, with the capacity to offer sufficient performance for different applications, as long as low costs and minimum energy consumption can be maintained. As a result, energy efficient hardware has an opportunity to impact upon the area of high-performance computing. This article presents a systematic review of the evaluations conducted on clusters of  ystems on a Chip for High-Performance computing in the research setting.Los sistemas de computación de alto desempeño son la máxima expresión en el campo de procesamiento para grandes cantidades de datos. Sin embargo, su consumo de energía es un aspecto de gran importancia que no era tenido en cuenta en décadas pasadas. Por lo tanto, desarrolladores de software y proveedores de hardware están obligados a enfocarse en nuevos retos para abordar el consumo de energía y costos. Construir un clúster informático con una gran cantidad de sistemas en un chip puede dar como resultado una plataforma poderosa, ecológica y capaz de ofrecer el rendimiento suficiente para diferentes aplicaciones, siempre y cuando se puedan mantener bajos costos y el menor consumo de energía posible. Como resultado, el hardware eficiente en el consumo de energía tiene la oportunidad de tener un impacto en el área de la computación de alto desempeño. En este artículo se presenta una revisión sistemática para conocer las evaluaciones realizadas a clústeres de sistemas en un chip para computación de alto desempeño en el ámbito investigativo. Os sistemas de computação de alto desempenho são a máxima expressão no campo de processamento para grandes quantidades de dados. No entanto, seu consumo de energia é um aspecto de grande importância que não era levado em consideração em décadas passadas. Portanto, desenvolvedores de software e provedores de hardware estão obrigados a focar-se em novos desafios para abordar o consumo de energia e  ustos. Construir um cluster informático com uma grande quantidade de sistemas em um chip pode dar como resultado uma plataforma poderosa, ecológica e capaz de oferecer o rendimento suficiente para diferentes aplicações, desde que possam ser mantidos baixos custos e o menor consumo de energia possível. Como resultado, o hardware eficiente no consumo de energia tem a oportunidade de ter um impacto na área da computação de alto desempenho. Neste artigo, apresenta-se uma revisão sistemática para conhecer as avaliações realizadas a clusters de sistemas em um chip para computação de alto desempenho no âmbito investigativo.&nbsp

    On the use of heterogenous computing in high-energy particle physics at the ATLAS detector

    Get PDF
    A dissertation submitted in fulfillment of the requirements for the degree of Master of Physics in the School of Physics November 1, 2017.The ATLAS detector at the Large Hadron Collider (LHC) at CERN is undergoing upgrades to its instrumentation, as well as the hardware and software that comprise its Trigger and Data Acquisition (TDAQ) system. The increased energy will yield larger cross sections for interesting physics processes, but will also lead to increased artifacts in on-line reconstruction in the trigger, as well as increased trigger rates, beyond the current system’s capabilities. To meet these demands it is likely that the massive parallelism of General-Purpose Programming with Graphic Processing Units (GPGPU) will be utilised. This dissertation addresses the problem of integrating GPGPU into the existing Trigger and TDAQ platforms; detailing and analysing GPGPU performance in the context of performing in a high-throughput, on-line environment like ATLAS. Preliminary tests show low to moderate speed-up with GPU relative to CPU, indicating that to achieve a more significant performance increase it may be necessary to alter the current platform beyond pairing suitable GPUs to CPUs in an optimum ratio. Possible solutions are proposed and recommendations for future work are given.LG201

    Low-power System-on-Chip Processors for Energy Efficient High Performance Computing: The Texas Instruments Keystone II

    No full text
    The High Performance Computing (HPC) community recognizes energy consumption as a major problem. Extensive research is underway to identify means to increase energy efficiency of HPC systems including consideration of alternative building blocks for future systems. This thesis considers one such system, the Texas Instruments Keystone II, a heterogeneous Low-Power System-on-Chip (LPSoC) processor that combines a quad core ARM CPU with an octa-core Digital Signal Processor (DSP). It was first released in 2012. Four issues are considered: i) maximizing the Keystone II ARM CPU performance; ii) implementation and extension of the OpenMP programming model for the Keystone II; iii) simultaneous use of ARM and DSP cores across multiple Keystone SoCs; and iv) an energy model for applications running on LPSoCs like the Keystone II and heterogeneous systems in general. Maximizing the performance of the ARM CPU on the Keystone II system is fundamental to adoption of this system by the HPC community and, of the ARM architecture more broadly. Key to achieving good performance is exploitation of the ARM vector instructions. This thesis presents the first detailed comparison of the use of ARM compiler intrinsic functions with automatic compiler vectorization across four generations of ARM processors. Comparisons are also made with x86 based platforms and the use of equivalent Intel vector instructions. Implementation of the OpenMP programming model on the Keystone II system presents both challenges and opportunities. Challenges in that the OpenMP model was originally developed for a homogeneous programming environment with a common instruction set architecture, and in 2012 work had only just begun to consider how OpenMP might work with accelerators. Opportunities in that shared memory is accessible to all processing elements on the LPSoC, offering performance advantages over what typically exists with attached accelerators. This thesis presents an analysis of a prototype version of OpenMP implemented as a bare-metal runtime on the DSP of a Keystone I system. An implementation for the Keystone II that maps OpenMP 4.0 accelerator directives to OpenCL runtime library operations is presented and evaluated. Exploitation of some of the underlying hardware features of the Keystone II is also discussed. Simultaneous use of the ARM and DSP cores across multiple Keystone II boards is fundamental to the creation of commercially viable HPC offerings based on Keystone technology. The nCore BrownDwarf and HPE Moonshot systems represent two such systems. This thesis presents a proof-of-concept implementation of matrix multiplication (GEMM) for the BrownDwarf system. The BrownDwarf utilizes both Keystone II and Keystone I SoCs through a point-to-point interconnect called Hyperlink. Details of how a novel message passing communication framework across Hyperlink was implemented to support this complex environment are provided. An energy model that can be used to predict energy usage as a function of what fraction of a particular computation is performed on each of the available compute devices offers the opportunity for making runtime decisions on how best to minimize energy usage. This thesis presents a basic energy usage model that considers rates of executions on each device and their active and idle power usages. Using this model, it is shown that only under certain conditions does there exist an energy-optimal work partition that uses multiple compute devices. To validate the model a high resolution energy measurement environment is developed and used to gather energy measurements for a matrix multiplication benchmark running on a variety of systems. Results presented support the model. Drawing on the four issues noted above and other developments that have occurred since the Keystone II system was first announced, the thesis concludes by making comments regarding the future of LPSoCs as building blocks for HPC systems
    corecore