1,467,737 research outputs found

    Learning Frame Similarity using Siamese networks for Audio-to-Score Alignment

    Get PDF
    Audio-to-score alignment aims at generating an accurate mapping between a performance audio and the score of a given piece. Standard alignment methods are based on Dynamic Time Warping (DTW) and employ handcrafted features, which cannot be adapted to different acoustic conditions. We propose a method to overcome this limitation using learned frame similarity for audio-to-score alignment. We focus on offline audio- to-score alignment of piano music. Experiments on music data from different acoustic conditions demonstrate that our method achieves higher alignment accuracy than a standard DTW-based method that uses handcrafted features, and generates robust alignments whilst being adaptable to different domains at the same time

    A Large Imaging Database and Novel Deep Neural Architecture for Covid-19 Diagnosis

    Get PDF
    Deep learning methodologies constitute nowadays the main approach for medical image analysis and disease prediction. Large annotated databases are necessary for developing these methodologies; such databases are difficult to obtain and to make publicly available for use by researchers and medical experts. In this paper, we focus on diagnosis of Covid-19 based on chest 3-D CT scans and develop a dual knowledge framework, including a large imaging database and a novel deep neural architecture. We introduce COV19-CT-DB, a very large database annotated for COVID-19 that consists of 7,750 3-D CT scans, 1,650 of which refer to COVID-19 cases and 6,100 to non-COVID19 cases. We use this database to train and develop the RACNet architecture. This architecture performs 3-D analysis based on a CNN-RNN network and handles input CT scans of different lengths, through the introduction of dynamic routing, feature alignment and a mask layer. We conduct a large experimental study that illustrates that the RACNet network has the best performance compared to other deep neural networks i) when trained and tested on COV19-CT-DB; ii) when tested, or when applied, through transfer learning, to other public databases

    Improved quality of experience of reconstructed H.264/AVC encoded video sequences through robust pixel domain error detection

    Get PDF
    The transmission of H.264/AVC encoded sequences over noisy wireless channels generally adopt the error detection capabilities of the transport protocol to identify and discard corrupted slices. All the macroblocks (MBs) within each corrupted slice are then concealed. This paper presents an algorithm that does not discard the corrupted slices but tries to detect those MBs which provide major visual artefacts and then conceal only these MBs. Results show that the proposed solution, based on a set of image-level features and two Support Vector Machines (SVMs), manages to detect 94.6% of those artefacts. Gains in Peak Signal-to-Noise Ratios (PSNR) of up to 5.74 dB have been obtained when compared to the standard H.264/AVC decoder.peer-reviewe

    Improving motion vector prediction using linear regression

    Get PDF
    The motion vectors take a large portion of the H.264/AVC encoded bitstream. This video coding standard employs predictive coding to minimize the amount of motion vector information to be transmitted. However, the motion vectors still accounts for around 40% of the transmitted bitstream, which suggests further research in this area. This paper presents an algorithm which employs a feature selection process to select the neighboring motion vectors which are most suitable to predict the motion vectors mv being encoded. The selected motion vectors are then used to approximate mv using Linear Regression. Simulation results have indicated a reduction in Mean Squared Error (MSE) of around 22% which results in reducing the residual error of the predictive coded motion vectors. This suggests that higher compression efficiencies can be achieved using the proposed Linear Regression based motion vector predictor.peer-reviewe

    Hypernetworks for sound event detection: a proof-of-concept

    Get PDF
    Polyphonic sound event detection (SED) involves the prediction of sound events present in an audio recording along with their onset and offset times. Recently, Deep Neural Networks, specifically convolutional recurrent neural networks (CRNN) have achieved impressive results for this task. The convolution part of the architecture is used to extract translational invariant features from the input and the recurrent part learns the underlying temporal relationship between audio frames. Recent studies showed that the weight sharing paradigm of recurrent networks might be a hindering factor in certain kinds of time series data, specifically where there is a temporal conditional shift, i.e. the conditional distribution of a label changes across the temporal scale. This warrants a relevant question - is there a similar phenomenon in polyphonic sound events due to dynamic polyphony level across the temporal axis? In this work, we explore this question and inquire if relaxed weight sharing improves performance of a CRNN for polyphonic SED. We propose to use hypernetworks to relax weight sharing in the recurrent part and show that the CRNN’s performance is improved by ~3% across two datasets, thus paving the way for further exploration of the existence of temporal conditional shift for polyphonic SED

    Eigen-patch iris super-resolution for iris recognition improvement

    Get PDF
    Low image resolution will be a predominant factor in iris recognition systems as they evolve towards more relaxed acquisition conditions. Here, we propose a super-resolution technique to enhance iris images based on Principal Component Analysis (PCA) Eigen-transformation of local image patches. Each patch is reconstructed separately, allowing better quality of enhanced images by preserving local information and reducing artifacts. We validate the system used a database of 1,872 near-infrared iris images. Results show the superiority of the presented approach over bilinear or bicubic interpolation, with the eigen-patch method being more resilient to image resolution reduction. We also perform recognition experiments with an iris matcher based 1D Log-Gabor, demonstrating that verification rates degrades more rapidly with bilinear or bicubic interpolation.peer-reviewe

    EdgeFool: An Adversarial Image Enhancement Filter

    Get PDF
    Adversarial examples are intentionally perturbed images that mislead classifiers. These images can, however, be easily detected using denoising algorithms, when high-frequency spatial perturbations are used, or can be noticed by humans, when perturbations are large. In this paper, we propose EdgeFool, an adversarial image enhancement filter that learns structure-aware adversarial perturbations. EdgeFool generates adversarial images with perturbations that enhance image details via training a fully convolutional neural network end-to-end with a multi-task loss function. This loss function accounts for both image detail enhancement and class misleading objectives. We evaluate EdgeFool on three classifiers (ResNet-50, ResNet-18 and AlexNet) using two datasets (ImageNet and Private-Places365) and compare it with six adversarial methods (DeepFool, SparseFool, Carlini-Wagner, SemanticAdv, Non-targeted and Private Fast Gradient Sign Methods)

    Television signal processing system Patent

    Get PDF
    Video signal processing system for sampling video brightness level
    • …