84 research outputs found

    Energy Driven Application Self-Adaptation at Run-time

    Full text link


    Get PDF
    Objective: The objective is to incorporate low bioavailable Rosuvastatin Calcium (20%) into polymeric nanoparticles (PNs) to improve its biopharmaceutical properties of Rosuvastatin Calcium. Methods: The PNs were prepared by solvent evaporation method by applying 23 factorial designs. The formulated PN are investigated for particle size (PS) and shape, zeta potential (ZP), polydispersity index (PI) and entrapment efficiency (EE), in vivo pharmacokinetic. Results: Among 8 formulations, PN7 shows least PS of 159.9±16.1 nm, which enhance the dissolution, surface area and permeability; ZP of-33.5±1.54 mV, which shows good stability; PI of 0.587±0.16 shows monodisperse distribution pattern; high EE of about 94.20±2.46 %; percentage yield of 96.80±2.08 %; maximum in vitro drug release of about 96.54±2.02 % at 24 h with controlled and predetermined release pattern. PN7 drug release obeys zero-order release kinetics, non-fickian diffusion mechanism with r2 value>0.96 and release exponent ‘n’ value falls between 0.5-0.8 for peppas kinetic model i.e., the mechanism of drug diffusion is based on polymer relaxation. In vivo pharmacokinetic studies illustrate enhance in AUCo-α in mg/ml, which proves a significant enhancement of bioavailability of Rosuvastatin Calcium by PNs. Conclusion: This PN shows a significant enhancement of bioavailability by minimizing the dose-dependent adverse side effects of rosuvastatin calcium

    Finding optimal L1 cache configuration for embedded systems

    Full text link
    Modern embedded system execute a single application or a class of applications repeatedly. A new emerging methodology of designing embedded system utilizes configurable processors where the cache size, associativity, and line size can be chosen by the designer. In this paper, a method is given to rapidly find the L1 cache miss rate of an application. An energy model and an execution time model are developed to find the best cache configuration for the given embedded application. Using benchmarks from Mediabench, we find that our method is on average 45 times faster to explore the design space, compared to Dinero IV while still having 100% accuracy

    ApproxTrain: Fast Simulation of Approximate Multipliers for DNN Training and Inference

    Full text link
    Edge training of Deep Neural Networks (DNNs) is a desirable goal for continuous learning; however, it is hindered by the enormous computational power required by training. Hardware approximate multipliers have shown their effectiveness for gaining resource-efficiency in DNN inference accelerators; however, training with approximate multipliers is largely unexplored. To build resource efficient accelerators with approximate multipliers supporting DNN training, a thorough evaluation of training convergence and accuracy for different DNN architectures and different approximate multipliers is needed. This paper presents ApproxTrain, an open-source framework that allows fast evaluation of DNN training and inference using simulated approximate multipliers. ApproxTrain is as user-friendly as TensorFlow (TF) and requires only a high-level description of a DNN architecture along with C/C++ functional models of the approximate multiplier. We improve the speed of the simulation at the multiplier level by using a novel LUT-based approximate floating-point (FP) multiplier simulator on GPU (AMSim). ApproxTrain leverages CUDA and efficiently integrates AMSim into the TensorFlow library, in order to overcome the absence of native hardware approximate multiplier in commercial GPUs. We use ApproxTrain to evaluate the convergence and accuracy of DNN training with approximate multipliers for small and large datasets (including ImageNet) using LeNets and ResNets architectures. The evaluations demonstrate similar convergence behavior and negligible change in test accuracy compared to FP32 and bfloat16 multipliers. Compared to CPU-based approximate multiplier simulations in training and inference, the GPU-accelerated ApproxTrain is more than 2500x faster. Based on highly optimized closed-source cuDNN/cuBLAS libraries with native hardware multipliers, the original TensorFlow is only 8x faster than ApproxTrain.Comment: 14 pages, 12 figure

    A Power to Pulse Width Modulation Sensor for Remote Power Analysis Attacks

    Get PDF
    Field-programmable gate arrays (FPGAs) deployed on commercial cloud services are increasingly gaining popularity due to the cost and compute benefits offered by them. Recent studies have discovered security threats than can be launched remotely on FPGAs that share the logic fabric between trusted and untrusted parties, posing a danger to designs deployed on cloud FPGAs. With remote power analysis (RPA) attacks, an attacker aims to deduce secret information present on a remote FPGA by deploying an on-chip sensor on the FPGA logic fabric. Information captured with the on-chip sensor is transferred off the chip for analysis and existing on-chip sensors demand a significant amount of bandwidth for this task as a result of their wider output bit width. However, attackers are often left with the only option of using a covert communication channel and the bandwidth of such channels is generally limited. This paper proposes a novel area-efficient on-chip power sensor named PPWM that integrates a logic design outputting a pulse whose width is modulated by the power consumption of the FPGA. This pulse is used to clear a flip-flop selectively and asynchronously, and the single-bit output of the flip-flop is used to perform an RPA attack. This paper demonstrates the possibility of successfully recovering a 128-bit Advanced Encryption Standard (AES) key within 16,000 power traces while consuming just 25% of the bandwidth when compared to the state of the art. Moreover, this paper assesses the threat posed by the proposed PPWM to remote FPGAs including those that are deployed on cloud services

    HW-SW Co-Synthesis: The Present and The Future

    No full text
    As we move towards several million transistors per chip it is desirable to move to higher levels of abstraction for the purposes of automated design of systems. Increasing performance of microprocessors in the marketplace is moving the balance between software and hardware. In this environment, it is necessary to adapt our tools to create systems, which encompass these fast microprocessors rather than compete with them. It is important to adapt other peripheral components such as sensors and RF circuits into our design methodology. I