19 research outputs found
Optimización en GPU de algoritmos para la mejora del realce y segmentación en imágenes hepáticas
This doctoral thesis deepens the GPU acceleration for liver enhancement and segmentation. With this motivation, detailed research is carried out here in a compendium of articles. The work developed is structured in three scientific contributions, the first one is based upon enhancement and tumor segmentation, the second one explores the vessel segmentation and the last is published on liver segmentation. These works are implemented on GPU with significant speedups with great scientific impact and relevance in this doctoral thesis The first work proposes cross-modality based contrast enhancement for tumor segmentation on GPU. To do this, it takes target and guidance images as an input and enhance the low quality target image by applying two dimensional histogram approach. Further it has been observed that the enhanced image provides more accurate tumor segmentation using GPU based dynamic seeded region growing. The second contribution is about fast parallel gradient based seeded region growing where static approach has been proposed and implemented on GPU for accurate vessel segmentation. The third contribution describes GPU acceleration of Chan-Vese model and cross-modality based contrast enhancement for liver segmentation
Polynomial Time Cryptanalytic Extraction of Neural Network Models
Billions of dollars and countless GPU hours are currently spent on training
Deep Neural Networks (DNNs) for a variety of tasks. Thus, it is essential to
determine the difficulty of extracting all the parameters of such neural
networks when given access to their black-box implementations. Many versions of
this problem have been studied over the last 30 years, and the best current
attack on ReLU-based deep neural networks was presented at Crypto 2020 by
Carlini, Jagielski, and Mironov. It resembles a differential chosen plaintext
attack on a cryptosystem, which has a secret key embedded in its black-box
implementation and requires a polynomial number of queries but an exponential
amount of time (as a function of the number of neurons). In this paper, we
improve this attack by developing several new techniques that enable us to
extract with arbitrarily high precision all the real-valued parameters of a
ReLU-based DNN using a polynomial number of queries and a polynomial amount of
time. We demonstrate its practical efficiency by applying it to a full-sized
neural network for classifying the CIFAR10 dataset, which has 3072 inputs, 8
hidden layers with 256 neurons each, and over million neuronal parameters. An
attack following the approach by Carlini et al. requires an exhaustive search
over 2 to the power 256 possibilities. Our attack replaces this with our new
techniques, which require only 30 minutes on a 256-core computer
Workload-Balanced Pruning for Sparse Spiking Neural Networks
Pruning for Spiking Neural Networks (SNNs) has emerged as a fundamental
methodology for deploying deep SNNs on resource-constrained edge devices.
Though the existing pruning methods can provide extremely high weight sparsity
for deep SNNs, the high weight sparsity brings a workload imbalance problem.
Specifically, the workload imbalance happens when a different number of
non-zero weights are assigned to hardware units running in parallel, which
results in low hardware utilization and thus imposes longer latency and higher
energy costs. In preliminary experiments, we show that sparse SNNs (98%
weight sparsity) can suffer as low as 59% utilization. To alleviate the
workload imbalance problem, we propose u-Ticket, where we monitor and adjust
the weight connections of the SNN during Lottery Ticket Hypothesis (LTH) based
pruning, thus guaranteeing the final ticket gets optimal utilization when
deployed onto the hardware. Experiments indicate that our u-Ticket can
guarantee up to 100% hardware utilization, thus reducing up to 76.9% latency
and 63.8% energy cost compared to the non-utilization-aware LTH method
ACE-HoT: Accelerating an extreme amount of symmetric Cipher Evaluations for High-Order avalanche Tests
In this work, we tackle the problem of estimating the security of iterated symmetric ciphers in an efficient manner, with tests that do not require a deep analysis of the internal structure of the cipher. This is particularly useful during the design phase of these ciphers, especially for quickly testing several combinations of possible parameters defining several cipher design variants.
We consider a popular statistical test that allows us to determine the probability of flipping each cipher output bit, given a small variation in the input of the cipher. From these probabilities, one can compute three measurable metrics related to the well-known full diffusion, avalanche and strict avalanche criteria.
This highly parallelizable testing process scales linearly with the number of samples, i.e., cipher inputs, to be evaluated and the number of design variants to be tested. But, the number of design variants might grow exponentially with respect to some parameters. The high cost of CPUs, makes them a bad candidate for this kind of parallelization. As a main contribution, we propose a framework, ACE-HoT, to parallelize the testing process using multi-GPU. Our implementation does not perform any intermediate CPU-GPU data transfers.
The diffusion and avalanche criteria can be seen as an application of discrete first-order derivatives. As a secondary contribution, we generalize these criteria to their high-order version. Our generalization requires an exponentially larger number of samples, in order to compute sufficiently accurate probabilities.
As a case study, we apply ACE-HoT on most of the finalists of the NIST lightweight standardization process, with a special focus on the winner ASCON
Polynomial Time Cryptanalytic Extraction of Neural Network Models
Billions of dollars and countless GPU hours are currently
spent on training Deep Neural Networks (DNNs) for a variety of tasks.
Thus, it is essential to determine the difficulty of extracting all the parameters of such neural networks when given access to their black-box
implementations. Many versions of this problem have been studied over
the last 30 years, and the best current attack on ReLU-based deep neural
networks was presented at Crypto’20 by Carlini, Jagielski, and Mironov.
It resembles a differential chosen plaintext attack on a cryptosystem,
which has a secret key embedded in its black-box implementation and
requires a polynomial number of queries but an exponential amount of
time (as a function of the number of neurons).
In this paper, we improve this attack by developing several new techniques that enable us to extract with arbitrarily high precision all the
real-valued parameters of a ReLU-based DNN using a polynomial number of queries and a polynomial amount of time. We demonstrate its
practical efficiency by applying it to a full-sized neural network for classifying the CIFAR10 dataset, which has 3072 inputs, 8 hidden layers with
256 neurons each, and about 1.2 million neuronal parameters. An attack
following the approach by Carlini et al. requires an exhaustive search
over 2^256 possibilities. Our attack replaces this with our new techniques,
which require only 30 minutes on a 256-core computer
Investigation of mechanical motion amplification for vibration energy harvesting
Vibration Energy Harvesting is being investigated for autonomous sensors and actuators that mainly utilize ambient and machine induced vibrations. Recently mechanical motion amplification is incorporated for improving power to weight ratio of vibration harvesters. The present study is motivated to investigate mechanical motion amplification characteristics with different configurations. The parameters investigated are motion amplification ratio, force transmissibility characteristics, weight of the electrical generator, effective damping coefficient achieved and linear nature of damping. Numerical analysis has been performed to compare important characteristics of device operating without amplification to that of with amplification with different configuration. The study has been concluded with comments on application of suitable type of amplification mechanism depending on weight/space constraints and desired effective damping coefficient
Energy harvesting shock absorber with linear generator and mechanical motion amplification
Energy harvesting shock absorbers can generate about 15-20 W of electric power for normal suspension velocities. However, higher weight, fail safe characteristics and space limitations have restricted development of regenerative shock absorbers to research prototypes. Power to weight ratio of regenerative shock absorbers can be improved by incorporating motion amplification. In the presented work, an innovative design of energy harvesting shock absorber has been presented that uses motion amplification for improving harvesting efficiency. Apart from improving electric power, the proposed solution is fail safe and can be easily incorporated in existing vehicles with only marginal change in suspension layout. Study includes detailed numerical analysis for vibration transmissibility to investigate comfort and safety. Further, a prototype has been fabricated and experimentation has been performed to compute electric power generated and comfort. Simulations have been performed on real size model with utilization of harvested electric power which indicates about 19% of overall harvesting efficiency
A Flexible Mechanism Based Vibration Isolator for Machine Tool Application
The paper presents novel design of vibration absorber with innovative features including use of flexible link based mechanism at the interface of tool holder and cutting tool. The mechanism ensures modification of the dynamic force interaction at the damping element and results in lower force transmissibility. It ensures amplification of the relative velocity at the damping element, which results in significant reduction of the damping element mass used for energy dissipation. The presented absorber has advantages of passive and economical operation in comparison to the active and semi-active solutions. Further, the proposed solution results in up to 53% reduction in the force transmissibility. A real size design has been presented for frequency range of 0-1100 Hz and maximum force amplitude of 700 N. Numerical simulations have been performed with consideration of flexible joint and structural element dynamics. Simulation results with FEA and PRBM approach have been compared with detailed analysis of the important design parameters