981 research outputs found

    Joint-SRVDNet: Joint Super Resolution and Vehicle Detection Network

    Get PDF
    In many domestic and military applications, aerial vehicle detection and super-resolutionalgorithms are frequently developed and applied independently. However, aerial vehicle detection on super-resolved images remains a challenging task due to the lack of discriminative information in the super-resolved images. To address this problem, we propose a Joint Super-Resolution and Vehicle DetectionNetwork (Joint-SRVDNet) that tries to generate discriminative, high-resolution images of vehicles fromlow-resolution aerial images. First, aerial images are up-scaled by a factor of 4x using a Multi-scaleGenerative Adversarial Network (MsGAN), which has multiple intermediate outputs with increasingresolutions. Second, a detector is trained on super-resolved images that are upscaled by factor 4x usingMsGAN architecture and finally, the detection loss is minimized jointly with the super-resolution loss toencourage the target detector to be sensitive to the subsequent super-resolution training. The network jointlylearns hierarchical and discriminative features of targets and produces optimal super-resolution results. Weperform both quantitative and qualitative evaluation of our proposed network on VEDAI, xView and DOTAdatasets. The experimental results show that our proposed framework achieves better visual quality than thestate-of-the-art methods for aerial super-resolution with 4x up-scaling factor and improves the accuracy ofaerial vehicle detection

    Full-Reference Image Quality Expression via Genetic Programming

    Get PDF
    Bakurov, I., Buzzelli, M., Schettini, R., Castelli, M., & Vanneschi, L. (2023). Full-Reference Image Quality Expression via Genetic Programming. IEEE Transactions on Image Processing, 32, 1458-1473. https://doi.org/10.1109/TIP.2023.3244662 This work was supported by national funds through the FCT (Fundação para a Ciência e a Tecnologia) under the projects Algoritmos de Inteligência artificial no Consumo de crédito e conciliação de Endividamento (AICE) (DSAIPA/DS/0113/2019) and UIDB/04152/2020 - Centro de Investigação em Gestão de Informação (MagIC)/NOVA IMS. Mauro Castelli acknowledges the financial support from the Slovenian Research Agency (research core funding no. P5-0410).Full-reference image quality measures are a fundamental tool to approximate the human visual system in various applications for digital data management: from retrieval to compression to detection of unauthorized uses. Inspired by both the effectiveness and the simplicity of hand-crafted Structural Similarity Index Measure (SSIM), in this work, we present a framework for the formulation of SSIM-like image quality measures through genetic programming. We explore different terminal sets, defined from the building blocks of structural similarity at different levels of abstraction, and we propose a two-stage genetic optimization that exploits hoist mutation to constrain the complexity of the solutions. Our optimized measures are selected through a cross-dataset validation procedure, which results in superior performance against different versions of structural similarity, measured as correlation with human mean opinion scores. We also demonstrate how, by tuning on specific datasets, it is possible to obtain solutions that are competitive with (or even outperform) more complex image quality measures.authorsversionauthorsversionpublishe

    DatasetDM: Synthesizing Data with Perception Annotations Using Diffusion Models

    Full text link
    Current deep networks are very data-hungry and benefit from training on largescale datasets, which are often time-consuming to collect and annotate. By contrast, synthetic data can be generated infinitely using generative models such as DALL-E and diffusion models, with minimal effort and cost. In this paper, we present DatasetDM, a generic dataset generation model that can produce diverse synthetic images and the corresponding high-quality perception annotations (e.g., segmentation masks, and depth). Our method builds upon the pre-trained diffusion model and extends text-guided image synthesis to perception data generation. We show that the rich latent code of the diffusion model can be effectively decoded as accurate perception annotations using a decoder module. Training the decoder only needs less than 1% (around 100 images) manually labeled images, enabling the generation of an infinitely large annotated dataset. Then these synthetic data can be used for training various perception models for downstream tasks. To showcase the power of the proposed approach, we generate datasets with rich dense pixel-wise labels for a wide range of downstream tasks, including semantic segmentation, instance segmentation, and depth estimation. Notably, it achieves 1) state-of-the-art results on semantic segmentation and instance segmentation; 2) significantly more robust on domain generalization than using the real data alone; and state-of-the-art results in zero-shot segmentation setting; and 3) flexibility for efficient application and novel task composition (e.g., image editing). The project website and code can be found at https://weijiawu.github.io/DatasetDM_page/ and https://github.com/showlab/DatasetDM, respectivel

    Learning from small and imbalanced dataset of images using generative adversarial neural networks.

    Get PDF
    The performance of deep learning models is unmatched by any other approach in supervised computer vision tasks such as image classification. However, training these models requires a lot of labeled data, which are not always available. Labelling a massive dataset is largely a manual and very demanding process. Thus, this problem has led to the development of techniques that bypass the need for labelling at scale. Despite this, existing techniques such as transfer learning, data augmentation and semi-supervised learning have not lived up to expectations. Some of these techniques do not account for other classification challenges, such as a class-imbalance problem. Thus, these techniques mostly underperform when compared with fully supervised approaches. In this thesis, we propose new methods to train a deep model on image classification with a limited number of labeled examples. This was achieved by extending state-of-the-art generative adversarial networks with multiple fake classes and network switchers. These new features enabled us to train a classifier using large unlabeled data, while generating class specific samples. The proposed model is label agnostic and is suitable for different classification scenarios, ranging from weakly supervised to fully supervised settings. This was used to address classification challenges with limited labeled data and a class-imbalance problem. Extensive experiments were carried out on different benchmark datasets. Firstly, the proposed approach was used to train a classification model and our findings indicated that the proposed approach achieved better classification accuracies, especially when the number of labeled samples is small. Secondly, the proposed approach was able to generate high-quality samples from class-imbalance datasets. The samples' quality is evident in improved classification performances when generated samples were used in neutralising class-imbalance. The results are thoroughly analyzed and, overall, our method showed superior performances over popular resampling technique and the AC-GAN model. Finally, we successfully applied the proposed approach as a new augmentation technique to two challenging real-world problems: face with attributes and legacy engineering drawings. The results obtained demonstrate that the proposed approach is effective even in extreme cases

    Pinching sweaters on your phone – iShoogle : multi-gesture touchscreen fabric simulator using natural on-fabric gestures to communicate textile qualities

    Get PDF
    The inability to touch fabrics online frustrates consumers, who are used to evaluating physical textiles by engaging in complex, natural gestural interactions. When customers interact with physical fabrics, they combine cross-modal information about the fabric's look, sound and handle to build an impression of its physical qualities. But whenever an interaction with a fabric is limited (i.e. when watching clothes online) there is a perceptual gap between the fabric qualities perceived digitally and the actual fabric qualities that a person would perceive when interacting with the physical fabric. The goal of this thesis was to create a fabric simulator that minimized this perceptual gap, enabling accurate perception of the qualities of fabrics presented digitally. We designed iShoogle, a multi-gesture touch-screen sound-enabled fabric simulator that aimed to create an accurate representation of fabric qualities without the need for touching the physical fabric swatch. iShoogle uses on-screen gestures (inspired by natural on-fabric movements e.g. Crunching) to control pre-recorded videos and audio of fabrics being deformed (e.g. being Crunched). iShoogle creates an illusion of direct video manipulation and also direct manipulation of the displayed fabric. This thesis describes the results of nine studies leading towards the development and evaluation of iShoogle. In the first three studies, we combined expert and non-expert textile-descriptive words and grouped them into eight dimensions labelled with terms Crisp, Hard, Soft, Textured, Flexible, Furry, Rough and Smooth. These terms were used to rate fabric qualities throughout the thesis. We observed natural on-fabric gestures during a fabric handling study (Study 4) and used the results to design iShoogle's on-screen gestures. In Study 5 we examined iShoogle's performance and speed in a fabric handling task and in Study 6 we investigated users' preferences for sound playback interactivity. iShoogle's accuracy was then evaluated in the last three studies by comparing participants’ ratings of textile qualities when using iShoogle with ratings produced when handling physical swatches. We also described the recording and processing techniques for the video and audio content that iShoogle used. Finally, we described the iShoogle iPhone app that was released to the general public. Our evaluation studies showed that iShoogle significantly improved the accuracy of fabric perception in at least some cases. Further research could investigate which fabric qualities and which fabrics are particularly suited to be represented with iShoogle

    Data Hiding in Digital Video

    Get PDF
    With the rapid development of digital multimedia technologies, an old method which is called steganography has been sought to be a solution for data hiding applications such as digital watermarking and covert communication. Steganography is the art of secret communication using a cover signal, e.g., video, audio, image etc., whereas the counter-technique, detecting the existence of such as a channel through a statistically trained classifier, is called steganalysis. The state-of-the art data hiding algorithms utilize features; such as Discrete Cosine Transform (DCT) coefficients, pixel values, motion vectors etc., of the cover signal to convey the message to the receiver side. The goal of embedding algorithm is to maximize the number of bits sent to the decoder side (embedding capacity) with maximum robustness against attacks while keeping the perceptual and statistical distortions (security) low. Data Hiding schemes are characterized by these three conflicting requirements: security against steganalysis, robustness against channel associated and/or intentional distortions, and the capacity in terms of the embedded payload. Depending upon the application it is the designer\u27s task to find an optimum solution amongst them. The goal of this thesis is to develop a novel data hiding scheme to establish a covert channel satisfying statistical and perceptual invisibility with moderate rate capacity and robustness to combat steganalysis based detection. The idea behind the proposed method is the alteration of Video Object (VO) trajectory coordinates to convey the message to the receiver side by perturbing the centroid coordinates of the VO. Firstly, the VO is selected by the user and tracked through the frames by using a simple region based search strategy and morphological operations. After the trajectory coordinates are obtained, the perturbation of the coordinates implemented through the usage of a non-linear embedding function, such as a polar quantizer where both the magnitude and phase of the motion is used. However, the perturbations made to the motion magnitude and phase were kept small to preserve the semantic meaning of the object motion trajectory. The proposed method is well suited to the video sequences in which VOs have smooth motion trajectories. Examples of these types could be found in sports videos in which the ball is the focus of attention and exhibits various motion types, e.g., rolling on the ground, flying in the air, being possessed by a player, etc. Different sports video sequences have been tested by using the proposed method. Through the experimental results, it is shown that the proposed method achieved the goal of both statistical and perceptual invisibility with moderate rate embedding capacity under AWGN channel with varying noise variances. This achievement is important as the first step for both active and passive steganalysis is the detection of the existence of covert channel. This work has multiple contributions in the field of data hiding. Firstly, it is the first example of a data hiding method in which the trajectory of a VO is used. Secondly, this work has contributed towards improving steganographic security by providing new features: the coordinate location and semantic meaning of the object
    corecore