1,881 research outputs found

    Machine learning for flow field measurements: a perspective

    Get PDF
    Advancements in machine-learning (ML) techniques are driving a paradigm shift in image processing. Flow diagnostics with optical techniques is not an exception. Considering the existing and foreseeable disruptive developments in flow field measurement techniques, we elaborate this perspective, particularly focused to the field of particle image velocimetry. The driving forces for the advancements in ML methods for flow field measurements in recent years are reviewed in terms of image preprocessing, data treatment and conditioning. Finally, possible routes for further developments are highlighted.Stefano Discetti acknowledges funding from the European Research Council (ERC) under the European Union's Horizon 2020 research and innovation programme (grant agreement No 949085). Yingzheng Liu acknowledges financial support from the National Natural Science Foundation of China (11725209)

    Raw Depth Image Enhancement Using a Neural Network

    Get PDF
    The term image is often used to denote a data format that records information about a scene’s color. This dissertation object focuses on a similar format for recording distance information about a scene, “depth images”. Depth images have been used extensively in consumer-level applications, such as Apple’s Face ID, based on depth images for face recognition. However, depth images suffer from low precision and high errors, and some post-processing techniques need to be utilized to improve their quality. Deep learning, or neural networks, are frameworks that use a series of hierarchically arranged nonlinear networks to process input data. Although each layer of the network is limited in its capabilities, the learning capacity accumulated by the multilayer network becomes very powerful. This dissertation assembles two different deep learning frameworks to solve two different types of raw image preprocessing problems. The first network is the super-resolution network, a nonlinear interpolation of low-resolution deep images through the deep network to obtain high-resolution images. The second network is the inpainting network, which is used to mitigate the problem of losing specific pixel data in the original depth image for various reasons. This dissertation presents deep images processed by these two frameworks, and the quality of the processed images is significantly improved compared to the original images. The great potential of deep learning techniques in the field of deep image processing is shown

    Video modeling via implicit motion representations

    Get PDF
    Video modeling refers to the development of analytical representations for explaining the intensity distribution in video signals. Based on the analytical representation, we can develop algorithms for accomplishing particular video-related tasks. Therefore video modeling provides us a foundation to bridge video data and related-tasks. Although there are many video models proposed in the past decades, the rise of new applications calls for more efficient and accurate video modeling approaches.;Most existing video modeling approaches are based on explicit motion representations, where motion information is explicitly expressed by correspondence-based representations (i.e., motion velocity or displacement). Although it is conceptually simple, the limitations of those representations and the suboptimum of motion estimation techniques can degrade such video modeling approaches, especially for handling complex motion or non-ideal observation video data. In this thesis, we propose to investigate video modeling without explicit motion representation. Motion information is implicitly embedded into the spatio-temporal dependency among pixels or patches instead of being explicitly described by motion vectors.;Firstly, we propose a parametric model based on a spatio-temporal adaptive localized learning (STALL). We formulate video modeling as a linear regression problem, in which motion information is embedded within the regression coefficients. The coefficients are adaptively learned within a local space-time window based on LMMSE criterion. Incorporating a spatio-temporal resampling and a Bayesian fusion scheme, we can enhance the modeling capability of STALL on more general videos. Under the framework of STALL, we can develop video processing algorithms for a variety of applications by adjusting model parameters (i.e., the size and topology of model support and training window). We apply STALL on three video processing problems. The simulation results show that motion information can be efficiently exploited by our implicit motion representation and the resampling and fusion do help to enhance the modeling capability of STALL.;Secondly, we propose a nonparametric video modeling approach, which is not dependent on explicit motion estimation. Assuming the video sequence is composed of many overlapping space-time patches, we propose to embed motion-related information into the relationships among video patches and develop a generic sparsity-based prior for typical video sequences. First, we extend block matching to more general kNN-based patch clustering, which provides an implicit and distributed representation for motion information. We propose to enforce the sparsity constraint on a higher-dimensional data array signal, which is generated by packing the patches in the similar patch set. Then we solve the inference problem by updating the kNN array and the wanted signal iteratively. Finally, we present a Bayesian fusion approach to fuse multiple-hypothesis inferences. Simulation results in video error concealment, denoising, and deartifacting are reported to demonstrate its modeling capability.;Finally, we summarize the proposed two video modeling approaches. We also point out the perspectives of implicit motion representations in applications ranging from low to high level problems

    Gait Recognition based on Inverse Fast Fourier Transform Gaussian and Enhancement Histogram Oriented of Gradient

    Get PDF
    Gait recognition using the energy image representation of the average silhouette image in one complete cycle becomes a baseline in model-free approaches research. Nevertheless, gait is sensitive to any changes. Up to date in the area of feature extraction, image feature representation method based on the spatial gradient is still lacking in efficiency especially for the covariate case like carrying bag and wearing a coat. Although the use of Histogram of orientation Gradient (HOG) in pedestrian detection is the most effective method, its accuracy is still considered low after testing on covariate dataset. Thus this research proposed a combination of frequency and spatial features based on Inverse Fast Fourier Transform and Histogram of Oriented Gradient (IFFTG-HoG) for gait recognition. It consists of three phases, namely image processing phase, feature extraction phase in the production of a new image representation and the classification. The first phase comprises image binarization process and energy image generation using gait average image in one cycle. In the second phase, the IFFTG-HoG method is used as a features gait extraction after generating energy image. Here, the IFFTG-HoG method has also been improved by using Chebyshev distance to calculate the magnitude of the gradient to increase the rate of recognition accuracy. Lastly, K-Nearest Neighbour (k=NN) classifier with K=1 is employed for individual classification in the third phase. A total of 124 people from CASIA B dataset were tested using the proposed IFTG-HoG method. It performed better in gait individual classification as the value of average accuracy for the standard dataset 96.7%, 93.1% and 99.6%compared to HoG method by 94.1%, 85.9% and 96.2% in order. With similar motivation, we tested on Rempit datasets to recognize motorcycle rider anomaly event and our proposed method also outperforms Dalal Method

    Diffusion-Based Audio Inpainting

    Full text link
    Audio inpainting aims to reconstruct missing segments in corrupted recordings. Previous methods produce plausible reconstructions when the gap length is shorter than about 100\;ms, but the quality decreases for longer gaps. This paper explores recent advancements in deep learning and, particularly, diffusion models, for the task of audio inpainting. The proposed method uses an unconditionally trained generative model, which can be conditioned in a zero-shot fashion for audio inpainting, offering high flexibility to regenerate gaps of arbitrary length. An improved deep neural network architecture based on the constant-Q transform, which allows the model to exploit pitch-equivariant symmetries in audio, is also presented. The performance of the proposed algorithm is evaluated through objective and subjective metrics for the task of reconstructing short to mid-sized gaps. The results of a formal listening test show that the proposed method delivers a comparable performance against state-of-the-art for short gaps, while retaining a good audio quality and outperforming the baselines for the longest gap lengths tested, 150\;ms and 200\;ms. This work helps improve the restoration of sound recordings having fairly long local disturbances or dropouts, which must be reconstructed.Comment: Submitted for publication to the Journal of Audio Engineering Society on January 30th, 202

    Anti- Forensics: The Tampering of Media

    Get PDF
    In the context of forensic investigations, the traditional understanding of evidence is changing where nowadays most prosecutors, lawyers and judges heavily rely on multimedia signs. This modern shift has allowed the law enforcement to better reconstruct the crime scenes or reveal the truth of any critical event.In this paper we shed the light on the role of video, audio and photos as forensic evidences presenting the possibility of their tampering by various easy-to-use, available anti-forensics softwares. We proved that along with the forensic analysis, digital processing, enhancement and authentication via forgery detection algorithms to testify the integrity of the content and the respective source of each, differentiating between an original and altered evidence is now feasible. These operations assist the court to attain higher degree of intelligibility of the multimedia data handled and assert the information retrieved from each that support the success of the investigation process

    Image Data Augmentation from Small Training Datasets Using Generative Adversarial Networks (GANs)

    Get PDF
    The scarcity of labelled data is a serious problem since deep models generally require a large amount of training data to achieve desired performance. Data augmentation is widely adopted to enhance the diversity of original datasets and further improve the performance of deep learning models. Learning-based methods, compared to traditional techniques, are specialized in feature extraction, which enhances the effectiveness of data augmentation. Generative adversarial networks (GANs), one of the learning-based generative models, have made remarkable advances in data synthesis. However, GANs still face many challenges in generating high-quality augmented images from small datasets because learning-based generative methods are difficult to create reliable outcomes without sufficient training data. This difficulty deteriorates the data augmentation applications using learning-based methods. In this thesis, to tackle the problem of labelled data scarcity and the training difficulty of augmenting image data from small datasets, three novel GAN models suitable for training with a small number of training samples have been proposed based on three different mapping relationships between the input and output images, including one-to-many mapping, one-to-one mapping, and many-to-many mapping. The proposed GANs employ limited training data, such as a small number of images and limited conditional features, and the synthetic images generated by the proposed GANs are expected to generate images of not only high generative quality but also desirable data diversity. To evaluate the effectiveness of the augmented images generated by the proposed models, inception distances and human perception methods are adopted. Additionally, different image classification tasks were carried out and accuracies from using the original datasets and the augmented datasets were compared. Experimental results illustrate the image classification performance based on convolutional neural networks, i.e., AlexNet, GoogLeNet, ResNet and VGGNet, is comprehensively enhanced, and the scale of improvement is significant when a small number of training samples are involved
    • …
    corecore