Search CORE

1,467,904 research outputs found

Crowd Analysis using visual and non-visual sensors, a survey

Author: Irfan M
Marcenaro L
Symposium on Signal Processing for Understanding Crowd Dynamics IEEE Global Conference on Signal and Information Processing
TOKARCHUK LN
Publication venue
Publication date: 12/10/2016
Field of study

Learning Frame Similarity using Siamese networks for Audio-to-Score Alignment

Author: 2020 28th European Signal Processing Conference
Agrawal R
Dixon S
Publication venue
Publication date: 29/05/2020
Field of study

Audio-to-score alignment aims at generating an accurate mapping between a performance audio and the score of a given piece. Standard alignment methods are based on Dynamic Time Warping (DTW) and employ handcrafted features, which cannot be adapted to different acoustic conditions. We propose a method to overcome this limitation using learned frame similarity for audio-to-score alignment. We focus on offline audio- to-score alignment of piano music. Experiments on music data from different acoustic conditions demonstrate that our method achieves higher alignment accuracy than a standard DTW-based method that uses handcrafted features, and generates robust alignments whilst being adaptable to different domains at the same time

Queen Mary Research Online

A Large Imaging Database and Novel Deep Neural Architecture for Covid-19 Diagnosis

Author: IEEE 14th Image Video, and Multidimensional Signal Processing Workshop (IVMSP)
Kollias D
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 16/06/2022
Field of study

Deep learning methodologies constitute nowadays the main approach for medical image analysis and disease prediction. Large annotated databases are necessary for developing these methodologies; such databases are difficult to obtain and to make publicly available for use by researchers and medical experts. In this paper, we focus on diagnosis of Covid-19 based on chest 3-D CT scans and develop a dual knowledge framework, including a large imaging database and a novel deep neural architecture. We introduce COV19-CT-DB, a very large database annotated for COVID-19 that consists of 7,750 3-D CT scans, 1,650 of which refer to COVID-19 cases and 6,100 to non-COVID19 cases. We use this database to train and develop the RACNet architecture. This architecture performs 3-D analysis based on a CNN-RNN network and handles input CT scans of different lengths, through the introduction of dynamic routing, feature alignment and a mask layer. We conduct a large experimental study that illustrates that the RACNet network has the best performance compared to other deep neural networks i) when trained and tested on COV19-CT-DB; ii) when tested, or when applied, through transfer learning, to other public databases

Queen Mary Research Online

Improved quality of experience of reconstructed H.264/AVC encoded video sequences through robust pixel domain error detection

Author: Debono Carl James
Farrugia Reuben A.
IEEE 10th Workshop on Multimedia Signal Processing
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2008
Field of study

The transmission of H.264/AVC encoded sequences over noisy wireless channels generally adopt the error detection capabilities of the transport protocol to identify and discard corrupted slices. All the macroblocks (MBs) within each corrupted slice are then concealed. This paper presents an algorithm that does not discard the corrupted slices but tries to detect those MBs which provide major visual artefacts and then conceal only these MBs. Results show that the proposed solution, based on a set of image-level features and two Support Vector Machines (SVMs), manages to detect 94.6% of those artefacts. Gains in Peak Signal-to-Noise Ratios (PSNR) of up to 5.74 dB have been obtained when compared to the standard H.264/AVC decoder.peer-reviewe

OAR@UM

Improving motion vector prediction using linear regression

Author: 5th International Symposium on Communications Control and Signal Processing (ISCCSP)
Farrugia Reuben A.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2012
Field of study

The motion vectors take a large portion of the H.264/AVC encoded bitstream. This video coding standard employs predictive coding to minimize the amount of motion vector information to be transmitted. However, the motion vectors still accounts for around 40% of the transmitted bitstream, which suggests further research in this area. This paper presents an algorithm which employs a feature selection process to select the neighboring motion vectors which are most suitable to predict the motion vectors mv being encoded. The selected motion vectors are then used to approximate mv using Linear Regression. Simulation results have indicated a reduction in Mean Squared Error (MSE) of around 22% which results in reducing the residual error of the predictive coded motion vectors. This suggests that higher compression efficiencies can be achieved using the proposed Linear Regression based motion vector predictor.peer-reviewe

OAR@UM

Hypernetworks for sound event detection: a proof-of-concept

Author: 30th European Signal Processing Conference (EUSIPCO 2022)
Benetos E
Phan QH
Singh S
Publication venue: EURASIP
Publication date: 29/08/2022
Field of study

Polyphonic sound event detection (SED) involves the prediction of sound events present in an audio recording along with their onset and offset times. Recently, Deep Neural Networks, specifically convolutional recurrent neural networks (CRNN) have achieved impressive results for this task. The convolution part of the architecture is used to extract translational invariant features from the input and the recurrent part learns the underlying temporal relationship between audio frames. Recent studies showed that the weight sharing paradigm of recurrent networks might be a hindering factor in certain kinds of time series data, specifically where there is a temporal conditional shift, i.e. the conditional distribution of a label changes across the temporal scale. This warrants a relevant question - is there a similar phenomenon in polyphonic sound events due to dynamic polyphony level across the temporal axis? In this work, we explore this question and inquire if relaxed weight sharing improves performance of a CRNN for polyphonic SED. We propose to use hypernetworks to relax weight sharing in the recurrent part and show that the CRNN’s performance is improved by ~3% across two datasets, thus paving the way for further exploration of the existence of temporal conditional shift for polyphonic SED

Queen Mary Research Online

Eigen-patch iris super-resolution for iris recognition improvement

Author: 23rd European Signal Processing Conference (EUSIPCO)
Alonso-Fernandez Fernando
Bigun Josef
Farrugia Reuben A.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2015
Field of study

Low image resolution will be a predominant factor in iris recognition systems as they evolve towards more relaxed acquisition conditions. Here, we propose a super-resolution technique to enhance iris images based on Principal Component Analysis (PCA) Eigen-transformation of local image patches. Each patch is reconstructed separately, allowing better quality of enhanced images by preserving local information and reducing artifacts. We validate the system used a database of 1,872 near-infrared iris images. Results show the superiority of the presented approach over bilinear or bicubic interpolation, with the eigen-patch method being more resilient to image resolution reduction. We also perform recognition experiments with an iris matcher based 1D Log-Gabor, demonstrating that verification rates degrades more rapidly with bilinear or bicubic interpolation.peer-reviewe

OAR@UM

EdgeFool: An Adversarial Image Enhancement Filter

Author: Cavallaro A
International Conference on Acoustics Speech, and Signal Processing
Oh C
Shahin A
Publication venue: International Conference on Acoustics, Speech, and Signal Processing
Publication date: 01/01/2020
Field of study

Adversarial examples are intentionally perturbed images that mislead classifiers. These images can, however, be easily detected using denoising algorithms, when high-frequency spatial perturbations are used, or can be noticed by humans, when perturbations are large. In this paper, we propose EdgeFool, an adversarial image enhancement filter that learns structure-aware adversarial perturbations. EdgeFool generates adversarial images with perturbations that enhance image details via training a fully convolutional neural network end-to-end with a multi-task loss function. This loss function accounts for both image detail enhancement and class misleading objectives. We evaluate EdgeFool on three classifiers (ResNet-50, ResNet-18 and AlexNet) using two datasets (ImageNet and Private-Places365) and compare it with six adversarial methods (DeepFool, SparseFool, Carlini-Wagner, SemanticAdv, Non-targeted and Private Fast Gradient Sign Methods)

Queen Mary Research Online

Television signal processing system Patent

Author: Wong R. Y.
Publication venue
Publication date: 17/11/1970
Field of study

Video signal processing system for sampling video brightness level

NASA Technical Reports Server