177 research outputs found

    Deep Learning and parallelization of Meta-heuristic Methods for IoT Cloud

    Get PDF
    Healthcare 4.0 is one of the Fourth Industrial Revolution’s outcomes that make a big revolution in the medical field. Healthcare 4.0 came with more facilities advantages that improved the average life expectancy and reduced population mortality. This paradigm depends on intelligent medical devices (wearable devices, sensors), which are supposed to generate a massive amount of data that need to be analyzed and treated with appropriate data-driven algorithms powered by Artificial Intelligence such as machine learning and deep learning (DL). However, one of the most significant limits of DL techniques is the long time required for the training process. Meanwhile, the realtime application of DL techniques, especially in sensitive domains such as healthcare, is still an open question that needs to be treated. On the other hand, meta-heuristic achieved good results in optimizing machine learning models. The Internet of Things (IoT) integrates billions of smart devices that can communicate with one another with minimal human intervention. IoT technologies are crucial in enhancing several real-life smart applications that can improve life quality. Cloud Computing has emerged as a key enabler for IoT applications because it provides scalable and on-demand, anytime, anywhere access to the computing resources. In this thesis, we are interested in improving the efficacity and performance of Computer-aided diagnosis systems in the medical field by decreasing the complexity of the model and increasing the quality of data. To accomplish this, three contributions have been proposed. First, we proposed a computer aid diagnosis system for neonatal seizures detection using metaheuristics and convolutional neural network (CNN) model to enhance the system’s performance by optimizing the CNN model. Secondly, we focused our interest on the covid-19 pandemic and proposed a computer-aided diagnosis system for its detection. In this contribution, we investigate Marine Predator Algorithm to optimize the configuration of the CNN model that will improve the system’s performance. In the third contribution, we aimed to improve the performance of the computer aid diagnosis system for covid-19. This contribution aims to discover the power of optimizing the data using different AI methods such as Principal Component Analysis (PCA), Discrete wavelet transform (DWT), and Teager Kaiser Energy Operator (TKEO). The proposed methods and the obtained results were validated with comparative studies using benchmark and public medical data

    Marshall Space Flight Center Faculty Fellowship Program

    Get PDF
    The research projects conducted by the 2016 Faculty Fellows at NASA Marshall Space Flight Center included propulsion studies on propellant issues, and materials investigations involving plasma effects and friction stir welding. Spacecraft Systems research was conducted on wireless systems and 3D printing of avionics. Vehicle Systems studies were performed on controllers and spacecraft instruments. The Science and Technology group investigated additive construction applied to Mars and Lunar regolith, medical uses of 3D printing, and unique instrumentation, while the Test Laboratory measured pressure vessel leakage and crack growth rates

    Ant Colony Optimization

    Get PDF
    Ant Colony Optimization (ACO) is the best example of how studies aimed at understanding and modeling the behavior of ants and other social insects can provide inspiration for the development of computational algorithms for the solution of difficult mathematical problems. Introduced by Marco Dorigo in his PhD thesis (1992) and initially applied to the travelling salesman problem, the ACO field has experienced a tremendous growth, standing today as an important nature-inspired stochastic metaheuristic for hard optimization problems. This book presents state-of-the-art ACO methods and is divided into two parts: (I) Techniques, which includes parallel implementations, and (II) Applications, where recent contributions of ACO to diverse fields, such as traffic congestion and control, structural optimization, manufacturing, and genomics are presented

    Exploring resource/performance trade-offs for streaming applications on embedded multiprocessors

    Get PDF
    Embedded system design is challenged by the gap between the ever-increasing customer demands and the limited resource budgets. The tough competition demands ever-shortening time-to-market and product lifecycles. To solve or, at least to alleviate, the aforementioned issues, designers and manufacturers need model-based quantitative analysis techniques for early design-space exploration to study trade-offs of different implementation candidates. Moreover, modern embedded applications, especially the streaming applications addressed in this thesis, face more and more dynamic input contents, and the platforms that they are running on are more flexible and allow runtime configuration. Quantitative analysis techniques for embedded system design have to be able to handle such dynamic adaptable systems. This thesis has the following contributions: - A resource-aware extension to the Synchronous Dataflow (SDF) model of computation. - Trade-off analysis techniques, both in the time-domain and in the iterationdomain (i.e., on an SDF iteration basis), with support for resource sharing. - Bottleneck-driven design-space exploration techniques for resource-aware SDF. - A game-theoretic approach to controller synthesis, guaranteeing performance under dynamic input. As a first contribution, we propose a new model, as an extension of static synchronous dataflow graphs (SDF) that allows the explicit modeling of resources with consistency checking. The model is called resource-aware SDF (RASDF). The extension enables us to investigate resource sharing and to explore different scheduling options (ways to allocate the resources to the different tasks) using state-space exploration techniques. Consistent SDF and RASDF graphs have the property that an execution occurs in so-called iterations. An iteration typically corresponds to the processing of a meaningful piece of data, and it returns the graph to its initial state. On multiprocessor platforms, iterations may be executed in a pipelined fashion, which makes performance analysis challenging. As the second contribution, this thesis develops trade-off analysis techniques for RASDF, both in the time-domain and in the iteration-domain (i.e., on an SDF iteration basis), to dimension resources on platforms. The time-domain analysis allows interleaving of different iterations, but the size of the explored state space grows quickly. The iteration-based technique trades the potential of interleaving of iterations for a compact size of the iteration state space. An efficient bottleneck-driven designspace exploration technique for streaming applications, the third main contribution in this thesis, is derived from analysis of the critical cycle of the state space, to reveal bottleneck resources that are limiting the throughput. All techniques are based on state-based exploration. They enable system designers to tailor their platform to the required applications, based on their own specific performance requirements. Pruning techniques for efficient exploration of the state space have been developed. Pareto dominance in terms of performance and resource usage is used for exact pruning, and approximation techniques are used for heuristic pruning. Finally, the thesis investigates dynamic scheduling techniques to respond to dynamic changes in input streams. The fourth contribution in this thesis is a game-theoretic approach to tackle controller synthesis to select the appropriate schedules in response to dynamic inputs from the environment. The approach transforms the explored iteration state space of a scenario- and resource-aware SDF (SARA SDF) graph to a bipartite game graph, and maps the controller synthesis problem to the problem of finding a winning positional strategy in a classical mean payoff game. A winning strategy of the game can be used to synthesize the controller of schedules for the system that is guaranteed to satisfy the throughput requirement given by the designer

    Efficient implementation of resource-constrained cyber-physical systems using multi-core parallelism

    Get PDF
    The quest for more performance of applications and systems became more challenging in the recent years. Especially in the cyber-physical and mobile domain, the performance requirements increased significantly. Applications, previously found in the high-performance domain, emerge in the area of resource-constrained domain. Modern heterogeneous high-performance MPSoCs provide a solid foundation to satisfy the high demand. Such systems combine general processors with specialized accelerators ranging from GPUs to machine learning chips. On the other side of the performance spectrum, the demand for small energy efficient systems exposed by modern IoT applications increased vastly. Developing efficient software for such resource-constrained multi-core systems is an error-prone, time-consuming and challenging task. This thesis provides with PA4RES a holistic semiautomatic approach to parallelize and implement applications for such platforms efficiently. Our solution supports the developer to find good trade-offs to tackle the requirements exposed by modern applications and systems. With PICO, we propose a comprehensive approach to express parallelism in sequential applications. PICO detects data dependencies and implements required synchronization automatically. Using a genetic algorithm, PICO optimizes the data synchronization. The evolutionary algorithm considers channel capacity, memory mapping, channel merging and flexibility offered by the channel implementation with respect to execution time, energy consumption and memory footprint. PICO's communication optimization phase was able to generate a speedup almost 2 or an energy improvement of 30% for certain benchmarks. The PAMONO sensor approach enables a fast detection of biological viruses using optical methods. With a sophisticated virus detection software, a real-time virus detection running on stationary computers was achieved. Within this thesis, we were able to derive a soft real-time capable virus detection running on a high-performance embedded system, commonly found in today's smart phones. This was accomplished with smart DSE algorithm which optimizes for execution time, energy consumption and detection quality. Compared to a baseline implementation, our solution achieved a speedup of 4.1 and 87\% energy savings and satisfied the soft real-time requirements. Accepting a degradation of the detection quality, which still is usable in medical context, led to a speedup of 11.1. This work provides the fundamentals for a truly mobile real-time virus detection solution. The growing demand for processing power can no longer satisfied following well-known approaches like higher frequencies. These so-called performance walls expose a serious challenge for the growing performance demand. Approximate computing is a promising approach to overcome or at least shift the performance walls by accepting a degradation in the output quality to gain improvements in other objectives. Especially for a safe integration of approximation into existing application or during the development of new approximation techniques, a method to assess the impact on the output quality is essential. With QCAPES, we provide a multi-metric assessment framework to analyze the impact of approximation. Furthermore, QCAPES provides useful insights on the impact of approximation on execution time and energy consumption. With ApproxPICO we propose an extension to PICO to consider approximate computing during the parallelization of sequential applications

    Automatically Parallelizing Embedded Legacy Software on Soft-Core SoCs

    Get PDF
    Nowadays, embedded systems are utilized in many areas and become omnipresent, making people's lives more comfortable. Embedded systems have to handle more and more functionality in many products. To maintain the often required low energy consumption, multi-core systems provide high performance at moderate energy consumption. The development started with dual-core processors and has today reached many-core designs with dozens and hundreds of processor cores. However, existing applications can barely leverage the potential of that many cores. Legacy applications are usually written sequentially and thus typically use only one processor core. Thus, these applications do not benefit from the advantages provided by modern many-core systems. Rewriting those applications to use multiple cores requires new skills from developers and it is also time-consuming and highly error prone. Dozens of languages, APIs and compilers have already been presented in the past decades to aid the user with parallelizing applications. Fully automatic parallelizing compilers are seen as the holy grail, since the user effort is kept minimal. However, automatic parallelizers often cannot extract parallelism as good as user aided approaches. Most of these parallelization tools are designed for desktop and high-performance systems and are thus not tuned or applicable for low performance embedded systems. To improve this situation, this work presents an automatic parallelizer for embedded systems, which is able to mostly deliver better quality than user aided approaches and if not allows easy manual fine-tuning. Parallelization tools extract concurrently executable tasks from an application. These tasks can then be executed on different processor cores. Parallelization tools and automatic parallelizers in particular often struggle to efficiently map the extracted parallelism to an existing multi-core processor. This work uses soft-core processors on FPGAs, which makes it possible to realize custom multi-core designs in hardware, within a few minutes. This allows to adapt the multi-core processor to the characteristics of the extracted parallelism. Especially, core-interconnects for communication can be optimized to fit the communication pattern of the parallel application. Embedded applications are often structured as follows: receive input data, (multiple) data processing steps, data output. The multiple processing steps are often realized as consecutive loosely coupled transformations. These steps naturally already model the structure of a processing pipeline. It is the goal of this work to extract this kind of pipeline-parallelism from an application and map it to multiple cores to increase the overall throughput of the system. Multiple cores forming a chain with direct communication channels ideally fit this pattern. The previously described, so called pipeline-parallelism is a barely addressed concept in most parallelization tools. Also, current multi-core designs often do not support the hardware flexibility provided by soft-cores, targeted in this approach. The main contribution of this work is an automatic parallelizer which is able to map different processing steps from the source-code of a sequential application to different cores in a multi-core pipeline. Users only specify the required processing speed after parallelization. The developed tool tries to extract a matching parallelized software design along with a custom multi-core design out of sequential embedded legacy applications. The automatically created multi-core system already contains used peripherals extracted from the source-code and is ready to be used. The presented parallelizer implements multi-objective optimization to generate a minimal hardware design, just fulfilling the user defined requirement. To the best of my knowledge, the possibility to generate such a multi-core pipeline defined by the demands of the parallelized software has never been presented before. The approach is implemented for two soft-core processors and evaluation shows for both targets high speedups of 12x and higher at a reasonable hardware overhead. Compared to other automatic parallelizers, which mainly focus on speedups through latency reduction, significantly higher speedups can be achieved depending on the given application structure

    Analysis and design of massively parallel channel estimation algorithms on graphic cards

    Get PDF
    The necessity of accurate channel estimation for coherent multiuser detectors is well known. Indeed they are based on the assumption that signals are perfectly estimated, and this is never completely achieved in practice. Furthermore, practical transmitters and receivers are affected by many non-idealities like strong phase noise, and thus the task of channel estimation is all the more challenging. Another notorious issue is the high computational complexity of multiuser techniques. This project has devoted significant attention for massively parallel receiver architectures and the possibility to parallelize channel estimation algorithms. Nvidia CUDA graphic cards are especially well-suited to address problems that can be expressed as data parallel computations. This task is very challenging and ambitious, since the usage of such cards for receiver design is still at its infant stage. This thesis describes the work carried out at German Aerospace Center (DLR) where a real-world multiuser detector is studied. The desired goals were the following: fine tuning of the already existing channel estimation algorithm; exploration of the factor graph approach in order to improve the estimation quality and to develop algorithms suitable to be parallelized; parallel implementation of the algorithms on CUDA graphic card. All these points have been covered. Two different improvements for the already implemented phase estimator are proposed. Both are based on the same approximation of the Wiener-Levy phase model and assume the same knowledge at the receiver. By adopting the factor graph approach, we present two existing algorithms for the phase estimation in a new parallel fashion and we show that, at the same time, they improve the estimation quality, and they are suitable to be parallelized on the board. The performance improvement for all estimators proposed in terms of Mean Square Error are validated through several simulation campaigns carried out in different scenarios, most of them characterized by strong phase noise and low signal-to-noise ratios. Finally we present several parallel phase estimation algorithms working on CUDA graphic card and we show that, in some cases, we are in presence of a massive parallelization in which is achieved a speedup more than 200 times compared to the serial implementation. The results obtained represent a starting point for the implementation of a Parallel Iterative Receiver to be inserted in the existing multiuser detector and completely executed on CUDA graphic cardope

    A High-Yield Microfabrication Process for Sapphire Substrate Pressure Sensors with Low Parasitic Capacitances and 200 C Tolerance

    Full text link
    Microelectromechanical systems (MEMS) can offer many benefits over conventional sensor assembly, especially as the desire for smaller and more effective instrumentation escalates in demand. While many industries continually strive for improved sensing capabilities, those invested in natural gas and oil extraction have a particular interest in miniaturized pressure sensing systems. These sensors need to operate autonomously in harsh environment (50 MPa, 125°C) fissures (≤1 cm) with at least 10 bit pressure resolution (≤0.05 MPa). The primary focus of this report is the development of a surface micromachining process to fabricate high performance capacitive pressures sensors, utilizing dielectric substrates to enable extremely low offset and parasitic capacitances and temperature coefficients. In contrast to conventional bulk silicon micromachining methods that use various kinds of etch stops such as electrochemical or dopant selective, dry additive processes are utilized to reduce manufacturing complexity, cost, and material consumption and have gained favor in recent years as the tools have matured. The fabricated devices must meet both pressure sensing and dimensional scaling requirements with a full scale range of ≥50 MPa, resolution of ≤50 kPa (>20 fF/MPa with a system resolution of 1 fF/code), and size of ≤2×1×0.5 mm3. In order to meet these goals while maximizing yield, particular attention has been given to the interplay between equipment limitations and device design. Process and design features have been refined over four process generations that together lead to a capacitance response of >450 fF/MPa over 50 MPa, provide a yield of >80%, permit an extreme span (>1000×) of full scale range designs, and allow automated system assembly. Devices have been tested at pressures and temperatures of up to ≥50 MPa and 200°C, representing downhole environments, demonstrating < 7.0 kPa (< 1 psi) resolution. Devices designed to operate over a much lower full scale range of < 50 kPa (≤350 Torr), representing biomedical applications, have been tested and demonstrate a resolution of < 80 Pa (< 0.6 Torr). Sensor response and design have been validated in the primary use case of autonomous microsystem integration. The system circuity includes a microcontroller, capacitance-to-digital converter, temperature sensor, photodiode, and battery. The readout electronics and sensor are mounted onto a flexible PCB, packaged into stainless steel or ceramic shells, sealed with silicone epoxy to permit pressure transmission while providing environmental protection, and measure < 9×9×7 mm3 in size. The systems have been successfully field tested in a brine well. While the capacitive pressure sensors have been developed primarily for active microsystems, there may be situations where a wired connection to the readout circuitry is not possible. A passive wireless pressure monitoring system utilizing short range inductive coupling has been developed to evaluate the performance of the sapphire substrate sensors for this use case. The passive sensing element consists of the capacitive pressure sensor and an inductor, packaged in a 3D printed biocompatible housing measuring ø12 x 24 mm3. Pressure monitoring within the GI tract has been targeted; an in situ resolution of 1.6 kPa (12 Torr) at 6 cm has been achieved through conductive saline. A practical application of the sensor has been demonstrated in vivo, having been ingested and successfully interrogated in a canine model to monitor stomach pressure for over two days.PHDElectrical EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/149856/1/acbenken_1.pd

    Run-time management for future MPSoC platforms

    Get PDF
    In recent years, we are witnessing the dawning of the Multi-Processor Systemon- Chip (MPSoC) era. In essence, this era is triggered by the need to handle more complex applications, while reducing overall cost of embedded (handheld) devices. This cost will mainly be determined by the cost of the hardware platform and the cost of designing applications for that platform. The cost of a hardware platform will partly depend on its production volume. In turn, this means that ??exible, (easily) programmable multi-purpose platforms will exhibit a lower cost. A multi-purpose platform not only requires ??exibility, but should also combine a high performance with a low power consumption. To this end, MPSoC devices integrate computer architectural properties of various computing domains. Just like large-scale parallel and distributed systems, they contain multiple heterogeneous processing elements interconnected by a scalable, network-like structure. This helps in achieving scalable high performance. As in most mobile or portable embedded systems, there is a need for low-power operation and real-time behavior. The cost of designing applications is equally important. Indeed, the actual value of future MPSoC devices is not contained within the embedded multiprocessor IC, but in their capability to provide the user of the device with an amount of services or experiences. So from an application viewpoint, MPSoCs are designed to ef??ciently process multimedia content in applications like video players, video conferencing, 3D gaming, augmented reality, etc. Such applications typically require a lot of processing power and a signi??cant amount of memory. To keep up with ever evolving user needs and with new application standards appearing at a fast pace, MPSoC platforms need to be be easily programmable. Application scalability, i.e. the ability to use just enough platform resources according to the user requirements and with respect to the device capabilities is also an important factor. Hence scalability, ??exibility, real-time behavior, a high performance, a low power consumption and, ??nally, programmability are key components in realizing the success of MPSoC platforms. The run-time manager is logically located between the application layer en the platform layer. It has a crucial role in realizing these MPSoC requirements. As it abstracts the platform hardware, it improves platform programmability. By deciding on resource assignment at run-time and based on the performance requirements of the user, the needs of the application and the capabilities of the platform, it contributes to ??exibility, scalability and to low power operation. As it has an arbiter function between different applications, it enables real-time behavior. This thesis details the key components of such an MPSoC run-time manager and provides a proof-of-concept implementation. These key components include application quality management algorithms linked to MPSoC resource management mechanisms and policies, adapted to the provided MPSoC platform services. First, we describe the role, the responsibilities and the boundary conditions of an MPSoC run-time manager in a generic way. This includes a de??nition of the multiprocessor run-time management design space, a description of the run-time manager design trade-offs and a brief discussion on how these trade-offs affect the key MPSoC requirements. This design space de??nition and the trade-offs are illustrated based on ongoing research and on existing commercial and academic multiprocessor run-time management solutions. Consequently, we introduce a fast and ef??cient resource allocation heuristic that considers FPGA fabric properties such as fragmentation. In addition, this thesis introduces a novel task assignment algorithm for handling soft IP cores denoted as hierarchical con??guration. Hierarchical con??guration managed by the run-time manager enables easier application design and increases the run-time spatial mapping freedom. In turn, this improves the performance of the resource assignment algorithm. Furthermore, we introduce run-time task migration components. We detail a new run-time task migration policy closely coupled to the run-time resource assignment algorithm. In addition to detailing a design-environment supported mechanism that enables moving tasks between an ISP and ??ne-grained recon??gurable hardware, we also propose two novel task migration mechanisms tailored to the Network-on-Chip environment. Finally, we propose a novel mechanism for task migration initiation, based on reusing debug registers in modern embedded microprocessors. We propose a reactive on-chip communication management mechanism. We show that by exploiting an injection rate control mechanism it is possible to provide a communication management system capable of providing a soft (reactive) QoS in a NoC. We introduce a novel, platform independent run-time algorithm to perform quality management, i.e. to select an application quality operating point at run-time based on the user requirements and the available platform resources, as reported by the resource manager. This contribution also proposes a novel way to manage the interaction between the quality manager and the resource manager. In order to have a the realistic, reproducible and ??exible run-time manager testbench with respect to applications with multiple quality levels and implementation tradev offs, we have created an input data generation tool denoted Pareto Surfaces For Free (PSFF). The the PSFF tool is, to the best of our knowledge, the ??rst tool that generates multiple realistic application operating points either based on pro??ling information of a real-life application or based on a designer-controlled random generator. Finally, we provide a proof-of-concept demonstrator that combines these concepts and shows how these mechanisms and policies can operate for real-life situations. In addition, we show that the proposed solutions can be integrated into existing platform operating systems
    • …
    corecore