212 research outputs found

    Exploration of cyber-physical systems for GPGPU computer vision-based detection of biological viruses

    Get PDF
    This work presents a method for a computer vision-based detection of biological viruses in PAMONO sensor images and, related to this, methods to explore cyber-physical systems such as those consisting of the PAMONO sensor, the detection software, and processing hardware. The focus is especially on an exploration of Graphics Processing Units (GPU) hardware for “General-Purpose computing on Graphics Processing Units” (GPGPU) software and the targeted systems are high performance servers, desktop systems, mobile systems, and hand-held systems. The first problem that is addressed and solved in this work is to automatically detect biological viruses in PAMONO sensor images. PAMONO is short for “Plasmon Assisted Microscopy Of Nano-sized Objects”. The images from the PAMONO sensor are very challenging to process. The signal magnitude and spatial extension from attaching viruses is small, and it is not visible to the human eye on raw sensor images. Compared to the signal, the noise magnitude in the images is large, resulting in a small Signal-to-Noise Ratio (SNR). With the VirusDetectionCL method for a computer vision-based detection of viruses, presented in this work, an automatic detection and counting of individual viruses in PAMONO sensor images has been made possible. A data set of 4000 images can be evaluated in less than three minutes, whereas a manual evaluation by an expert can take up to two days. As the most important result, sensor signals with a median SNR of two can be handled. This enables the detection of particles down to 100 nm. The VirusDetectionCL method has been realized as a GPGPU software. The PAMONO sensor, the detection software, and the processing hardware form a so called cyber-physical system. For different PAMONO scenarios, e.g., using the PAMONO sensor in laboratories, hospitals, airports, and in mobile scenarios, one or more cyber-physical systems need to be explored. Depending on the particular use case, the demands toward the cyber-physical system differ. This leads to the second problem for which a solution is presented in this work: how can existing software with several degrees of freedom be automatically mapped to a selection of hardware architectures with several hardware configurations to fulfill the demands to the system? Answering this question is a difficult task. Especially, when several possibly conflicting objectives, e.g., quality of the results, energy consumption, and execution time have to be optimized. An extensive exploration of different software and hardware configurations is expensive and time-consuming. Sometimes it is not even possible, e.g., if the desired architecture is not yet available on the market or the design space is too big to be explored manually in reasonable time. A Pareto optimal selection of software parameters, hardware architectures, and hardware configurations has to be found. To achieve this, three parameter and design space exploration methods have been developed. These are named SOG-PSE, SOG-DSE, and MOGEA-DSE. MOGEA-DSE is the most advanced method of these three. It enables a multi-objective, energy-aware, measurement-based or simulation-based exploration of cyber-physical systems. This can be done in a hardware/software codesign manner. In addition, offloading of tasks to a server and approximate computing can be taken into account. With the simulation-based exploration, systems that do not exist can be explored. This is useful if a system should be equipped, e.g., with the next generation of GPUs. Such an exploration can reveal bottlenecks of the existing software before new GPUs are bought. With MOGEA-DSE the overall goal—to develop a method to automatically explore suitable cyber-physical systems for different PAMONO scenarios—could be achieved. As a result, a rapid, reliable detection and counting of viruses in PAMONO sensor data using high-performance, desktop, laptop, down to hand-held systems has been made possible. The fact that this could be achieved even for a small, hand-held device is the most important result of MOGEA-DSE. With the automatic parameter and design space exploration 84% energy could be saved on the hand-held device compared to a baseline measurement. At the same time, a speedup of four and an F-1 quality score of 0.995 could be obtained. The speedup enables live processing of the sensor data on the embedded system with a very high detection quality. With this result, viruses can be detected and counted on a mobile, hand-held device in less than three minutes and with real-time visualization of results. This opens up completely new possibilities for biological virus detection that were not possible before

    Resource efficient processing and communication in sensor/actuator environments

    Get PDF
    The future of computer systems will not be dominated by personal computer like hardware platforms but by embedded and cyber-physical systems assisting humans in a hidden but omnipresent manner. These pervasive computing devices can, for example, be utilized in the home automation sector to create sensor/ actuator networks supporting the inhabitants of a house in everyday life. The efficient usage of resources is an important topic at design time and operation time of mobile embedded and cyber-physical systems. Therefore, this thesis presents methods which allow an efficient use of energy and processing resources in sensor/actuator networks. These networks comprise different nodes cooperating for a “smart” joint control function. Sensor/actuator nodes are typical cyber-physical systems comprising sensors/actuators and processing and communication components. Processing components of today’s sensor nodes can comprise many-core chips. This thesis introduces new methods for optimizing the code and the application mapping of the aforementioned systems and presents novel results with regard to design space explorations for energy-efficient and embedded many-core systems. The considered many-core systems are graphics processing units. The application code for these graphics processing units is optimized for a particular platform variant with the objectives of minimal energy consumption and/or of minimal runtime. These two objectives are targeted with the utilization of multi-objective optimization techniques. The mapping optimizations are realized by means of multi-objective design space explorations. Furthermore, this thesis introduces new techniques and functions for a resource-efficient middleware design employing service-oriented architectures. Therefore, a service-oriented architecture based middleware framework is presented which comprises a lightweight service orchestration. In addition to that, a flexible resource management mechanism will be introduced. This resource management adapts resource utilization and services to an environmental context and provides methods to reduce the energy consumption of sensor nodes

    Cross layer reliability estimation for digital systems

    Get PDF
    Forthcoming manufacturing technologies hold the promise to increase multifuctional computing systems performance and functionality thanks to a remarkable growth of the device integration density. Despite the benefits introduced by this technology improvements, reliability is becoming a key challenge for the semiconductor industry. With transistor size reaching the atomic dimensions, vulnerability to unavoidable fluctuations in the manufacturing process and environmental stress rise dramatically. Failing to meet a reliability requirement may add excessive re-design cost to recover and may have severe consequences on the success of a product. %Worst-case design with large margins to guarantee reliable operation has been employed for long time. However, it is reaching a limit that makes it economically unsustainable due to its performance, area, and power cost. One of the open challenges for future technologies is building ``dependable'' systems on top of unreliable components, which will degrade and even fail during normal lifetime of the chip. Conventional design techniques are highly inefficient. They expend significant amount of energy to tolerate the device unpredictability by adding safety margins to a circuit's operating voltage, clock frequency or charge stored per bit. Unfortunately, the additional cost introduced to compensate unreliability are rapidly becoming unacceptable in today's environment where power consumption is often the limiting factor for integrated circuit performance, and energy efficiency is a top concern. Attention should be payed to tailor techniques to improve the reliability of a system on the basis of its requirements, ending up with cost-effective solutions favoring the success of the product on the market. Cross-layer reliability is one of the most promising approaches to achieve this goal. Cross-layer reliability techniques take into account the interactions between the layers composing a complex system (i.e., technology, hardware and software layers) to implement efficient cross-layer fault mitigation mechanisms. Fault tolerance mechanism are carefully implemented at different layers starting from the technology up to the software layer to carefully optimize the system by exploiting the inner capability of each layer to mask lower level faults. For this purpose, cross-layer reliability design techniques need to be complemented with cross-layer reliability evaluation tools, able to precisely assess the reliability level of a selected design early in the design cycle. Accurate and early reliability estimates would enable the exploration of the system design space and the optimization of multiple constraints such as performance, power consumption, cost and reliability. This Ph.D. thesis is devoted to the development of new methodologies and tools to evaluate and optimize the reliability of complex digital systems during the early design stages. More specifically, techniques addressing hardware accelerators (i.e., FPGAs and GPUs), microprocessors and full systems are discussed. All developed methodologies are presented in conjunction with their application to real-world use cases belonging to different computational domains

    Proceedings of the First PhD Symposium on Sustainable Ultrascale Computing Systems (NESUS PhD 2016)

    Get PDF
    Proceedings of the First PhD Symposium on Sustainable Ultrascale Computing Systems (NESUS PhD 2016) Timisoara, Romania. February 8-11, 2016.The PhD Symposium was a very good opportunity for the young researchers to share information and knowledge, to present their current research, and to discuss topics with other students in order to look for synergies and common research topics. The idea was very successful and the assessment made by the PhD Student was very good. It also helped to achieve one of the major goals of the NESUS Action: to establish an open European research network targeting sustainable solutions for ultrascale computing aiming at cross fertilization among HPC, large scale distributed systems, and big data management, training, contributing to glue disparate researchers working across different areas and provide a meeting ground for researchers in these separate areas to exchange ideas, to identify synergies, and to pursue common activities in research topics such as sustainable software solutions (applications and system software stack), data management, energy efficiency, and resilience.European Cooperation in Science and Technology. COS

    IU PTI/UITS Research Technologies Annual Report: FY 2014

    Get PDF
    This Fiscal Year 2014 (FY2014) report outlines IU community accomplishments using IU's cyberinfrastructure, as they relate to several IU Bicentennial Strategic Plan goals and ongoing principles of excellence. The report includes research and discovery highlights

    Real-time GPU-accelerated Out-of-Core Rendering and Light-field Display Visualization for Improved Massive Volume Understanding

    Get PDF
    Nowadays huge digital models are becoming increasingly available for a number of different applications ranging from CAD, industrial design to medicine and natural sciences. Particularly, in the field of medicine, data acquisition devices such as MRI or CT scanners routinely produce huge volumetric datasets. Currently, these datasets can easily reach dimensions of 1024^3 voxels and datasets larger than that are not uncommon. This thesis focuses on efficient methods for the interactive exploration of such large volumes using direct volume visualization techniques on commodity platforms. To reach this goal specialized multi-resolution structures and algorithms, which are able to directly render volumes of potentially unlimited size are introduced. The developed techniques are output sensitive and their rendering costs depend only on the complexity of the generated images and not on the complexity of the input datasets. The advanced characteristics of modern GPGPU architectures are exploited and combined with an out-of-core framework in order to provide a more flexible, scalable and efficient implementation of these algorithms and data structures on single GPUs and GPU clusters. To improve visual perception and understanding, the use of novel 3D display technology based on a light-field approach is introduced. This kind of device allows multiple naked-eye users to perceive virtual objects floating inside the display workspace, exploiting the stereo and horizontal parallax. A set of specialized and interactive illustrative techniques capable of providing different contextual information in different areas of the display, as well as an out-of-core CUDA based ray-casting engine with a number of improvements over current GPU volume ray-casters are both reported. The possibilities of the system are demonstrated by the multi-user interactive exploration of 64-GVoxel datasets on a 35-MPixel light-field display driven by a cluster of PCs. ------------------------------------------------------------------------------------------------------ Negli ultimi anni si sta verificando una proliferazione sempre più consistente di modelli digitali di notevoli dimensioni in campi applicativi che variano dal CAD e la progettazione industriale alla medicina e le scienze naturali. In modo particolare, nel settore della medicina, le apparecchiature di acquisizione dei dati come RM o TAC producono comunemente dei dataset volumetrici di grosse dimensioni. Questi dataset possono facilmente raggiungere taglie dell’ordine di 10243 voxels e dataset di dimensioni maggiori possono essere frequenti. Questa tesi si focalizza su metodi efficienti per l’esplorazione di tali grossi volumi utilizzando tecniche di visualizzazione diretta su piattaforme HW di diffusione di massa. Per raggiungere tale obiettivo si introducono strutture specializzate multi-risoluzione e algoritmi in grado di visualizzare volumi di dimensioni potenzialmente infinite. Le tecniche sviluppate sono “ouput sensitive” e la loro complessità di rendering dipende soltanto dalle dimensioni delle immagini generate e non dalle dimensioni dei dataset di input. Le caratteristiche avanzate delle architetture moderne GPGPU vengono inoltre sfruttate e combinate con un framework “out-of-core” in modo da offrire una implementazione di questi algoritmi e strutture dati più flessibile, scalabile ed efficiente su singole GPU o cluster di GPU. Per migliorare la percezione visiva e la comprensione dei dati, viene introdotto inoltre l’uso di tecnologie di display 3D di nuova generazione basate su un approccio di tipo light-field. Questi tipi di dispositivi consentono a diversi utenti di percepire ad occhio nudo oggetti che galleggiano all’interno dello spazio di lavoro del display, sfruttando lo stereo e la parallasse orizzontale. Si descrivono infine un insieme di tecniche illustrative interattive in grado di fornire diverse informazioni contestuali in diverse zone del display, così come un motore di “ray-casting out-of-core” basato su CUDA e contenente una serie di miglioramenti rispetto agli attuali metodi GPU di “ray-casting” di volumi. Le possibilità del sistema sono dimostrate attraverso l’esplorazione interattiva di dataset di 64-GVoxel su un display di tipo light-field da 35-MPixel pilotato da un cluster di PC

    Advances in Grid Computing

    Get PDF
    This book approaches the grid computing with a perspective on the latest achievements in the field, providing an insight into the current research trends and advances, and presenting a large range of innovative research papers. The topics covered in this book include resource and data management, grid architectures and development, and grid-enabled applications. New ideas employing heuristic methods from swarm intelligence or genetic algorithm and quantum encryption are considered in order to explain two main aspects of grid computing: resource management and data management. The book addresses also some aspects of grid computing that regard architecture and development, and includes a diverse range of applications for grid computing, including possible human grid computing system, simulation of the fusion reaction, ubiquitous healthcare service provisioning and complex water systems

    Doctor of Philosophy

    Get PDF
    dissertationStochastic methods, dense free-form mapping, atlas construction, and total variation are examples of advanced image processing techniques which are robust but computationally demanding. These algorithms often require a large amount of computational power as well as massive memory bandwidth. These requirements used to be ful lled only by supercomputers. The development of heterogeneous parallel subsystems and computation-specialized devices such as Graphic Processing Units (GPUs) has brought the requisite power to commodity hardware, opening up opportunities for scientists to experiment and evaluate the in uence of these techniques on their research and practical applications. However, harnessing the processing power from modern hardware is challenging. The di fferences between multicore parallel processing systems and conventional models are signi ficant, often requiring algorithms and data structures to be redesigned signi ficantly for efficiency. It also demands in-depth knowledge about modern hardware architectures to optimize these implementations, sometimes on a per-architecture basis. The goal of this dissertation is to introduce a solution for this problem based on a 3D image processing framework, using high performance APIs at the core level to utilize parallel processing power of the GPUs. The design of the framework facilitates an efficient application development process, which does not require scientists to have extensive knowledge about GPU systems, and encourages them to harness this power to solve their computationally challenging problems. To present the development of this framework, four main problems are described, and the solutions are discussed and evaluated: (1) essential components of a general 3D image processing library: data structures and algorithms, as well as how to implement these building blocks on the GPU architecture for optimal performance; (2) an implementation of unbiased atlas construction algorithms|an illustration of how to solve a highly complex and computationally expensive algorithm using this framework; (3) an extension of the framework to account for geometry descriptors to solve registration challenges with large scale shape changes and high intensity-contrast di fferences; and (4) an out-of-core streaming model, which enables developers to implement multi-image processing techniques on commodity hardware

    Custom optimization algorithms for efficient hardware implementation

    No full text
    The focus is on real-time optimal decision making with application in advanced control systems. These computationally intensive schemes, which involve the repeated solution of (convex) optimization problems within a sampling interval, require more efficient computational methods than currently available for extending their application to highly dynamical systems and setups with resource-constrained embedded computing platforms. A range of techniques are proposed to exploit synergies between digital hardware, numerical analysis and algorithm design. These techniques build on top of parameterisable hardware code generation tools that generate VHDL code describing custom computing architectures for interior-point methods and a range of first-order constrained optimization methods. Since memory limitations are often important in embedded implementations we develop a custom storage scheme for KKT matrices arising in interior-point methods for control, which reduces memory requirements significantly and prevents I/O bandwidth limitations from affecting the performance in our implementations. To take advantage of the trend towards parallel computing architectures and to exploit the special characteristics of our custom architectures we propose several high-level parallel optimal control schemes that can reduce computation time. A novel optimization formulation was devised for reducing the computational effort in solving certain problems independent of the computing platform used. In order to be able to solve optimization problems in fixed-point arithmetic, which is significantly more resource-efficient than floating-point, tailored linear algebra algorithms were developed for solving the linear systems that form the computational bottleneck in many optimization methods. These methods come with guarantees for reliable operation. We also provide finite-precision error analysis for fixed-point implementations of first-order methods that can be used to minimize the use of resources while meeting accuracy specifications. The suggested techniques are demonstrated on several practical examples, including a hardware-in-the-loop setup for optimization-based control of a large airliner.Open Acces

    Efficient and Scalable Computing for Resource-Constrained Cyber-Physical Systems: A Layered Approach

    Get PDF
    With the evolution of computing and communication technology, cyber-physical systems such as self-driving cars, unmanned aerial vehicles, and mobile cognitive robots are achieving increasing levels of multifunctionality and miniaturization, enabling them to execute versatile tasks in a resource-constrained environment. Therefore, the computing systems that power these resource-constrained cyber-physical systems (RCCPSs) have to achieve high efficiency and scalability. First of all, given a fixed amount of onboard energy, these computing systems should not only be power-efficient but also exhibit sufficiently high performance to gracefully handle complex algorithms for learning-based perception and AI-driven decision-making. Meanwhile, scalability requires that the current computing system and its components can be extended both horizontally, with more resources, and vertically, with emerging advanced technology. To achieve efficient and scalable computing systems in RCCPSs, my research broadly investigates a set of techniques and solutions via a bottom-up layered approach. This layered approach leverages the characteristics of each system layer (e.g., the circuit, architecture, and operating system layers) and their interactions to discover and explore the optimal system tradeoffs among performance, efficiency, and scalability. At the circuit layer, we investigate the benefits of novel power delivery and management schemes enabled by integrated voltage regulators (IVRs). Then, between the circuit and microarchitecture/architecture layers, we present a voltage-stacked power delivery system that offers best-in-class power delivery efficiency for many-core systems. After this, using Graphics Processing Units (GPUs) as a case study, we develop a real-time resource scheduling framework at the architecture and operating system layers for heterogeneous computing platforms with guaranteed task deadlines. Finally, fast dynamic voltage and frequency scaling (DVFS) based power management across the circuit, architecture, and operating system layers is studied through a learning-based hierarchical power management strategy for multi-/many-core systems
    • …
    corecore