14 research outputs found
Motion learning using spatio-temporal neural network
Motion trajectory prediction is one of the key areas in behaviour and surveillance studies. Many related successful applications have been reported in the literature. However, most of the studies
are based on sigmoidal neural networks in which some dynamic properties of the data are overlooked due to the absence of spatiotemporal encoding functionalities. Even though some sequential
(motion) learning studies have been proposed using spatiotemporal neural networks, as in those sigmoidal neural networks, the approach used is mainly supervised learning. In such learning,
it requires a target signal, in which this is not always available in some applications. For this study, motion learning using spatio temporal neural network is proposed. The learning is based on reward-modulated spike-timing-dependent plasticity (STDP), whereby the learning weight adjustment provided by the standard STDP is modulated by the reinforcement. The implementation
of reinforcement approach for motion trajectory can be regarded as a major contribution of this study. In this study, learning is implemented on a reward basis without the need for learning targets.The algorithm has shown good potential in learning motion trajectory particularly in noisy and dynamic settings. Furthermore, the learning uses generic neural network architecture, which
makes learning adaptable for many applications
Improving Functionalities in a Multi-agent Architecture for Ocean Monitoring
This paper presents an improved version of a multiagent architecture aimed at providing solutions for monitoring the interaction between the atmosphere and the ocean. The ocean surface and the atmosphere exchange carbon dioxide. This process is can be modeled by a multiagent system with advanced learning and adaption capabilities. The proposed multiagent architecture incorporates CBR-agents. The CBR-agents proposed in this paper integrate novel strategies that both monitor the parameters that affect the interaction, and facilitate the creation of models. The system was tested and this paper presents the results obtained
Adaptive noise reduction and code matching for IRIS pattern recognition system
Among all biometric modalities, iris is becoming more popular due to its high performance in recognizing or verifying individuals. Iris recognition has been used in numerous fields such as authentications at prisons, airports, banks and healthcare. Although iris recognition system has high accuracy with very low false acceptance rate, the system performance can still be affected by noise. Very low intensity value of eyelash pixels or high intensity values of eyelids and light reflection pixels cause inappropriate threshold values, and therefore, degrade the accuracy of system. To reduce the effects of noise and improve the accuracy of an iris recognition system, a robust algorithm consisting of two main components is proposed. First, an Adaptive Fuzzy Switching Noise Reduction (AFSNR) filter is proposed. This filter is able to reduce the effects of noise with different densities by employing fuzzy switching between adaptive median filter and filling method. Next, an Adaptive Weighted Shifting Hamming Distance (AWSHD) is proposed which improves the performance of iris code matching stage and level of decidability of the system. As a result, the proposed AFSNR filter with its adaptive window size successfully reduces the effects ofdifferent types of noise with different densities. By applying the proposed AWSHD, the distance corresponding to a genuine user is reduced, while the distance for impostors is increased. Consequently, the genuine user is more likely to be authenticated and the impostor is more likely to be rejected. Experimental results show that the proposed algorithm with genuine acceptance rate (GAR) of 99.98% and is accurate to enhance the performance of the iris recognition system
Recommended from our members
Computational models of object motion detectors accelerated using FPGA technology
The detection of moving objects is a trivial task when performed by vertebrate retinas, yet a complex computer vision task. This PhD research programme has made three key contributions, namely: 1) a multi-hierarchical spiking neural network (MHSNN) architecture for detecting horizontal and vertical movements, 2) a Hybrid Sensitive Motion Detector (HSMD) algorithm for detecting object motion and 3) the Neuromorphic Hybrid Sensitive Motion Detector (NeuroHSMD) , a real-time neuromorphic implementation of the HSMD algorithm.
The MHSNN is a customised 4 layers Spiking Neural Network (SNN) architecture designed to reflect the basic connectivity, similar to canonical behaviours found in the majority of vertebrate retinas (including human retinas). The architecture, was trained using images from a custom dataset generated in laboratory settings. Simulation results revealed that each cell model is sensitive to vertical and horizontal movements, with a detection error of 6.75% contrasted against the teaching signals (expected output signals) used to train the MHSNN. The experimental evaluation of the methodology shows that the MH SNN was not scalable because of the overall number of neurons and synapses which lead to the development of the HSMD.
The HSMD algorithm enhanced an existing Dynamic Background subtraction (DBS) algorithm using a customised 3-layer SNN. The customised 3-layer SNN was used to stabilise the foreground information of moving objects in the scene, which improves the object motion detection. The algorithm was compared against existing background subtraction approaches, available on the Open Computer Vision (OpenCV) library, specifically on the 2012 Change Detection (CDnet2012) and the 2014 Change Detection (CDnet2014) benchmark datasets. The accuracy results show that the HSMD was ranked overall first and performed better than all the other benchmarked algorithms on four of the categories, across all eight test metrics. Furthermore, the HSMD is the first to use an SNN to enhance the existing dynamic background subtraction algorithm without a substantial degradation of the frame rate, being capable of processing images 720 × 480 at 13.82 Frames Per Second (fps) (CDnet2014) and 720 × 480 at 13.92 fps (CDnet2012) on a High Performance computer (96 cores and 756 GB of RAM). Although the HSMD analysis shows good Percentage of Correct Classifications (PCC) on the CDnet2012 and CDnet2014, it was identified that the 3-layer customised SNN was the bottleneck, in terms of speed, and could be improved using dedicated hardware.
The NeuroHSMD is thus an adaptation of the HSMD algorithm whereby the SNN component has been fully implemented on dedicated hardware [Terasic DE10-pro Field-Programmable Gate Array (FPGA) board]. Open Computer Language (OpenCL) was used to simplify the FPGA design flow and allow the code portability to other devices such as FPGA and Graphical Processing Unit (GPU). The NeuroHSMD was also tested against the CDnet2012 and CDnet2014 datasets with an acceleration of 82% over the HSMD algorithm, being capable of processing 720 × 480 images at 28.06 fps (CDnet2012) and 28.71 fps (CDnet2014)
Advancing iris biometric technology
PhD ThesisThe iris biometric is a well-established technology which is already in use in
several nation-scale applications and it is still an active research area with several
unsolved problems. This work focuses on three key problems in iris biometrics
namely: segmentation, protection and cross-matching. Three novel
methods in each of these areas are proposed and analyzed thoroughly.
In terms of iris segmentation, a novel iris segmentation method is designed
based on a fusion of an expanding and a shrinking active contour by integrating
a new pressure force within the Gradient Vector Flow (GVF) active
contour model. In addition, a new method for closed eye detection is proposed.
The experimental results on the CASIA V4, MMU2, UBIRIS V1 and
UBIRIS V2 databases show that the proposed method achieves state-of-theart
results in terms of segmentation accuracy and recognition performance
while being computationally more efficient. In this context, improvements
by 60.5%, 42% and 48.7% are achieved in segmentation accuracy for the
CASIA V4, MMU2 and UBIRIS V1 databases, respectively. For the UBIRIS
V2 database, a superior time reduction is reported (85.7%) while maintaining
a similar accuracy. Similarly, considerable time improvements by 63.8%,
56.6% and 29.3% are achieved for the CASIA V4, MMU2 and UBIRIS V1
databases, respectively.
With respect to iris biometric protection, a novel security architecture is designed
to protect the integrity of iris images and templates using watermarking
and Visual Cryptography (VC). Firstly, for protecting the iris image, text
which carries personal information is embedded in the middle band frequency
region of the iris image using a novel watermarking algorithm that randomly
interchanges multiple middle band pairs of the Discrete Cosine Transform
(DCT). Secondly, for iris template protection, VC is utilized to protect the
iii
iris template. In addition, the integrity of the stored template in the biometric
smart card is guaranteed by using the hash signatures. The proposed method
has a minimal effect on the iris recognition performance of only 3.6% and
4.9% for the CASIA V4 and UBIRIS V1 databases, respectively. In addition,
the VC scheme is designed to be readily applied to protect any biometric binary
template without any degradation to the recognition performance with a
complexity of only O(N).
As for cross-spectral matching, a framework is designed which is capable of
matching iris images in different lighting conditions. The first method is designed
to work with registered iris images where the key idea is to synthesize
the corresponding Near Infra-Red (NIR) images from the Visible Light (VL)
images using an Artificial Neural Network (ANN) while the second method
is capable of working with unregistered iris images based on integrating the
Gabor filter with different photometric normalization models and descriptors
along with decision level fusion to achieve the cross-spectral matching. A
significant improvement by 79.3% in cross-spectral matching performance is
attained for the UTIRIS database. As for the PolyU database, the proposed
verification method achieved an improvement by 83.9% in terms of NIR vs
Red channel matching which confirms the efficiency of the proposed method.
In summary, the most important open issues in exploiting the iris biometric
are presented and novel methods to address these problems are proposed.
Hence, this work will help to establish a more robust iris recognition system
due to the development of an accurate segmentation method working for iris
images taken under both the VL and NIR. In addition, the proposed protection
scheme paves the way for a secure iris images and templates storage.
Moreover, the proposed framework for cross-spectral matching will help to
employ the iris biometric in several security applications such as surveillance
at-a-distance and automated watch-list identification.Ministry of Higher Education and
Scientific Research in Ira
Improved Human Face Recognition by Introducing a New Cnn Arrangement and Hierarchical Method
Human face recognition has become one of the most attractive topics in the fields of biometrics due to its wide applications. The face is a part of the body that carries the most information regarding identification in human interactions. Features such as the composition of facial components, skin tone, face\u27s central axis, distances between eyes, and many more, alongside the other biometrics, are used unconsciously by the brain to distinguish a person. Indeed, analyzing the facial features could be the first method humans use to identify a person in their lives.
As one of the main biometric measures, human face recognition has been utilized in various commercial applications over the past two decades. From banking to smart advertisement and from border security to mobile applications. These are a few examples that show us how far these methods have come. We can confidently say that the techniques for face recognition have reached an acceptable level of accuracy to be implemented in some real-life applications. However, there are other applications that could benefit from improvement. Given the increasing demand for the topic and the fact that nowadays, we have almost all the infrastructure that we might need for our application, make face recognition an appealing topic.
When we are evaluating the quality of a face recognition method, there are some benchmarks that we should consider: accuracy, speed, and complexity are the main parameters. Of course, we can measure other aspects of the algorithm, such as size, precision, cost, etc. But eventually, every one of those parameters will contribute to improving one or some of these three concepts of the method. Then again, although we can see a significant level of accuracy in existing algorithms, there is still much room for improvement in speed and complexity. In addition, the accuracy of the mentioned methods highly depends on the properties of the face images. In other words, uncontrolled situations and variables like head pose, occlusion, lighting, image noise, etc., can affect the results dramatically.
Human face recognition systems are used in either identification or verification. In verification, the system\u27s main goal is to check if an input belongs to a pre-determined tag or a person\u27s ID.
Almost every face recognition system consists of four major steps. These steps are pre-processing, face detection, feature extraction, and classification. Improvement in each of these steps will lead to the overall enhancement of the system. In this work, the main objective is to propose new, improved and enhanced methods in each of those mentioned steps, evaluate the results by comparing them with other existing techniques and investigate the outcome of the proposed system.
Applications de la représentation parcimonieuse perceptuelle par graphe de décharges (Spikegramme) pour la protection du droit d’auteur des signaux sonores
Chaque année, le piratage mondial de la musique coûte plusieurs milliards de dollars en
pertes économiques, pertes d’emplois et pertes de gains des travailleurs ainsi que la perte
de millions de dollars en recettes fiscales. La plupart du piratage de la musique est dû
à la croissance rapide et à la facilité des technologies actuelles pour la copie, le partage,
la manipulation et la distribution de données musicales [Domingo, 2015], [Siwek, 2007].
Le tatouage des signaux sonores a été proposé pour protéger les droit des auteurs et
pour permettre la localisation des instants où le signal sonore a été falsifié. Dans cette
thèse, nous proposons d’utiliser la représentation parcimonieuse bio-inspirée par graphe de
décharges (spikegramme), pour concevoir une nouvelle méthode permettant la localisation
de la falsification dans les signaux sonores. Aussi, une nouvelle méthode de protection du
droit d’auteur. Finalement, une nouvelle attaque perceptuelle, en utilisant le spikegramme,
pour attaquer des systèmes de tatouage sonore.
Nous proposons tout d’abord une technique de localisation des falsifications (‘tampering’)
des signaux sonores. Pour cela nous combinons une méthode à spectre étendu modifié
(‘modified spread spectrum’, MSS) avec une représentation parcimonieuse. Nous utilisons
une technique de poursuite perceptive adaptée (perceptual marching pursuit, PMP [Hossein
Najaf-Zadeh, 2008]) pour générer une représentation parcimonieuse (spikegramme) du
signal sonore d’entrée qui est invariante au décalage temporel [E. C. Smith, 2006] et qui
prend en compte les phénomènes de masquage tels qu’ils sont observés en audition. Un code
d’authentification est inséré à l’intérieur des coefficients de la représentation en spikegramme.
Puis ceux-ci sont combinés aux seuils de masquage. Le signal tatoué est resynthétisé à
partir des coefficients modifiés, et le signal ainsi obtenu est transmis au décodeur. Au
décodeur, pour identifier un segment falsifié du signal sonore, les codes d’authentification de
tous les segments intacts sont analysés. Si les codes ne peuvent être détectés correctement,
on sait qu’alors le segment aura été falsifié. Nous proposons de tatouer selon le principe
à spectre étendu (appelé MSS) afin d’obtenir une grande capacité en nombre de bits de
tatouage introduits. Dans les situations où il y a désynchronisation entre le codeur et le
décodeur, notre méthode permet quand même de détecter des pièces falsifiées. Par rapport
à l’état de l’art, notre approche a le taux d’erreur le plus bas pour ce qui est de détecter
les pièces falsifiées. Nous avons utilisé le test de l’opinion moyenne (‘MOS’) pour mesurer
la qualité des systèmes tatoués. Nous évaluons la méthode de tatouage semi-fragile par
le taux d’erreur (nombre de bits erronés divisé par tous les bits soumis) suite à plusieurs
attaques. Les résultats confirment la supériorité de notre approche pour la localisation des
pièces falsifiées dans les signaux sonores tout en préservant la qualité des signaux.
Ensuite nous proposons une nouvelle technique pour la protection des signaux sonores.
Cette technique est basée sur la représentation par spikegrammes des signaux sonores
et utilise deux dictionnaires (TDA pour Two-Dictionary Approach). Le spikegramme est
utilisé pour coder le signal hôte en utilisant un dictionnaire de filtres gammatones. Pour
le tatouage, nous utilisons deux dictionnaires différents qui sont sélectionnés en fonction
du bit d’entrée à tatouer et du contenu du signal. Notre approche trouve les gammatones appropriés (appelés noyaux de tatouage) sur la base de la valeur du bit à tatouer, et
incorpore les bits de tatouage dans la phase des gammatones du tatouage. De plus, il
est montré que la TDA est libre d’erreur dans le cas d’aucune situation d’attaque. Il est
démontré que la décorrélation des noyaux de tatouage permet la conception d’une méthode
de tatouage sonore très robuste.
Les expériences ont montré la meilleure robustesse pour la méthode proposée lorsque le
signal tatoué est corrompu par une compression MP3 à 32 kbits par seconde avec une
charge utile de 56.5 bps par rapport à plusieurs techniques récentes. De plus nous avons
étudié la robustesse du tatouage lorsque les nouveaux codec USAC (Unified Audion and
Speech Coding) à 24kbps sont utilisés. La charge utile est alors comprise entre 5 et 15 bps.
Finalement, nous utilisons les spikegrammes pour proposer trois nouvelles méthodes
d’attaques. Nous les comparons aux méthodes récentes d’attaques telles que 32 kbps MP3
et 24 kbps USAC. Ces attaques comprennent l’attaque par PMP, l’attaque par bruit
inaudible et l’attaque de remplacement parcimonieuse. Dans le cas de l’attaque par PMP,
le signal de tatouage est représenté et resynthétisé avec un spikegramme. Dans le cas de
l’attaque par bruit inaudible, celui-ci est généré et ajouté aux coefficients du spikegramme.
Dans le cas de l’attaque de remplacement parcimonieuse, dans chaque segment du signal,
les caractéristiques spectro-temporelles du signal (les décharges temporelles ;‘time spikes’)
se trouvent en utilisant le spikegramme et les spikes temporelles et similaires sont remplacés
par une autre.
Pour comparer l’efficacité des attaques proposées, nous les comparons au décodeur du
tatouage à spectre étendu. Il est démontré que l’attaque par remplacement parcimonieux
réduit la corrélation normalisée du décodeur de spectre étendu avec un plus grand facteur
par rapport à la situation où le décodeur de spectre étendu est attaqué par la transformation MP3 (32 kbps) et 24 kbps USAC.Abstract : Every year global music piracy is making billion dollars of economic, job, workers’ earnings
losses and also million dollars loss in tax revenues. Most of the music piracy is because of
rapid growth and easiness of current technologies for copying, sharing, manipulating and
distributing musical data [Domingo, 2015], [Siwek, 2007]. Audio watermarking has been
proposed as one approach for copyright protection and tamper localization of audio signals
to prevent music piracy. In this thesis, we use the spikegram- which is a bio-inspired sparse
representation- to propose a novel approach to design an audio tamper localization method
as well as an audio copyright protection method and also a new perceptual attack against
any audio watermarking system.
First, we propose a tampering localization method for audio signal, based on a Modified
Spread Spectrum (MSS) approach. Perceptual Matching Pursuit (PMP) is used to compute
the spikegram (which is a sparse and time-shift invariant representation of audio signals) as
well as 2-D masking thresholds. Then, an authentication code (which includes an Identity
Number, ID) is inserted inside the sparse coefficients. For high quality watermarking, the
watermark data are multiplied with masking thresholds. The time domain watermarked
signal is re-synthesized from the modified coefficients and the signal is sent to the decoder.
To localize a tampered segment of the audio signal, at the decoder, the ID’s associated to
intact segments are detected correctly, while the ID associated to a tampered segment is
mis-detected or not detected. To achieve high capacity, we propose a modified version of
the improved spread spectrum watermarking called MSS (Modified Spread Spectrum). We
performed a mean opinion test to measure the quality of the proposed watermarking system.
Also, the bit error rates for the presented tamper localization method are computed under
several attacks. In comparison to conventional methods, the proposed tamper localization
method has the smallest number of mis-detected tampered frames, when only one frame
is tampered. In addition, the mean opinion test experiments confirms that the proposed
method preserves the high quality of input audio signals.
Moreover, we introduce a new audio watermarking technique based on a kernel-based
representation of audio signals. A perceptive sparse representation (spikegram) is combined
with a dictionary of gammatone kernels to construct a robust representation of sounds.
Compared to traditional phase embedding methods where the phase of signal’s Fourier
coefficients are modified, in this method, the watermark bit stream is inserted by modifying
the phase of gammatone kernels. Moreover, the watermark is automatically embedded only
into kernels with high amplitudes where all masked (non-meaningful) gammatones have
been already removed. Two embedding methods are proposed, one based on the watermark
embedding into the sign of gammatones (one dictionary method) and another one based
on watermark embedding into both sign and phase of gammatone kernels (two-dictionary
method). The robustness of the proposed method is shown against 32 kbps MP3 with
an embedding rate of 56.5 bps while the state of the art payload for 32 kbps MP3 robust
iii
iv
watermarking is lower than 50.3 bps. Also, we showed that the proposed method is robust
against unified speech and audio codec (24 kbps USAC, Linear predictive and Fourier
domain modes) with an average payload of 5 − 15 bps. Moreover, it is shown that the
proposed method is robust against a variety of signal processing transforms while preserving
quality.
Finally, three perceptual attacks are proposed in the perceptual sparse domain using
spikegram. These attacks are called PMP, inaudible noise adding and the sparse replacement
attacks. In PMP attack, the host signals are represented and re-synthesized with
spikegram. In inaudible noise attack, the inaudible noise is generated and added to the
spikegram coefficients. In sparse replacement attack, each specific frame of the spikegram
representation - when possible - is replaced with a combination of similar frames located
in other parts of the spikegram. It is shown than the PMP and inaudible noise attacks
have roughly the same efficiency as the 32 kbps MP3 attack, while the replacement attack
reduces the normalized correlation of the spread spectrum decoder with a greater factor
than when attacking with 32 kbps MP3 or 24 kbps unified speech and audio coding (USAC)