Search CORE

981 research outputs found

Joint-SRVDNet: Joint Super Resolution and Vehicle Detection Network

Author: Ferdous Syeda Nyma
Mostofa Moktari
Nasrabadi Nasser M.
Riggan Benjamin S.
Publication venue
Publication date: 03/05/2020
Field of study

In many domestic and military applications, aerial vehicle detection and super-resolutionalgorithms are frequently developed and applied independently. However, aerial vehicle detection on super-resolved images remains a challenging task due to the lack of discriminative information in the super-resolved images. To address this problem, we propose a Joint Super-Resolution and Vehicle DetectionNetwork (Joint-SRVDNet) that tries to generate discriminative, high-resolution images of vehicles fromlow-resolution aerial images. First, aerial images are up-scaled by a factor of 4x using a Multi-scaleGenerative Adversarial Network (MsGAN), which has multiple intermediate outputs with increasingresolutions. Second, a detector is trained on super-resolved images that are upscaled by factor 4x usingMsGAN architecture and finally, the detection loss is minimized jointly with the super-resolution loss toencourage the target detector to be sensitive to the subsequent super-resolution training. The network jointlylearns hierarchical and discriminative features of targets and produces optimal super-resolution results. Weperform both quantitative and qualitative evaluation of our proposed network on VEDAI, xView and DOTAdatasets. The experimental results show that our proposed framework achieves better visual quality than thestate-of-the-art methods for aerial super-resolution with 4x up-scaling factor and improves the accuracy ofaerial vehicle detection

arXiv.org e-Print Archive

DigitalCommons@University of Nebraska

Full-Reference Image Quality Expression via Genetic Programming

Author: Bakurov Illya
Buzzelli Marco
Castelli Mauro
Schettini Raimondo
Vanneschi Leonardo
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/03/2023
Field of study

Bakurov, I., Buzzelli, M., Schettini, R., Castelli, M., & Vanneschi, L. (2023). Full-Reference Image Quality Expression via Genetic Programming. IEEE Transactions on Image Processing, 32, 1458-1473. https://doi.org/10.1109/TIP.2023.3244662 This work was supported by national funds through the FCT (Fundação para a Ciência e a Tecnologia) under the projects Algoritmos de Inteligência artificial no Consumo de crédito e conciliação de Endividamento (AICE) (DSAIPA/DS/0113/2019) and UIDB/04152/2020 - Centro de Investigação em Gestão de Informação (MagIC)/NOVA IMS. Mauro Castelli acknowledges the financial support from the Slovenian Research Agency (research core funding no. P5-0410).Full-reference image quality measures are a fundamental tool to approximate the human visual system in various applications for digital data management: from retrieval to compression to detection of unauthorized uses. Inspired by both the effectiveness and the simplicity of hand-crafted Structural Similarity Index Measure (SSIM), in this work, we present a framework for the formulation of SSIM-like image quality measures through genetic programming. We explore different terminal sets, defined from the building blocks of structural similarity at different levels of abstraction, and we propose a two-stage genetic optimization that exploits hoist mutation to constrain the complexity of the solutions. Our optimized measures are selected through a cross-dataset validation procedure, which results in superior performance against different versions of structural similarity, measured as correlation with human mean opinion scores. We also demonstrate how, by tuning on specific datasets, it is possible to obtain solutions that are competitive with (or even outperform) more complex image quality measures.authorsversionauthorsversionpublishe

Repositório da Universidade Nova de Lisboa

DatasetDM: Synthesizing Data with Perception Annotations Using Diffusion Models

Author: Chen Hao
Gu Yuchao
He Yefei
Shen Chunhua
Shou Mike Zheng
Wu Weijia
Zhao Rui
Zhao Yuzhong
Zhou Hong
Publication venue
Publication date: 09/10/2023
Field of study

Current deep networks are very data-hungry and benefit from training on largescale datasets, which are often time-consuming to collect and annotate. By contrast, synthetic data can be generated infinitely using generative models such as DALL-E and diffusion models, with minimal effort and cost. In this paper, we present DatasetDM, a generic dataset generation model that can produce diverse synthetic images and the corresponding high-quality perception annotations (e.g., segmentation masks, and depth). Our method builds upon the pre-trained diffusion model and extends text-guided image synthesis to perception data generation. We show that the rich latent code of the diffusion model can be effectively decoded as accurate perception annotations using a decoder module. Training the decoder only needs less than 1% (around 100 images) manually labeled images, enabling the generation of an infinitely large annotated dataset. Then these synthetic data can be used for training various perception models for downstream tasks. To showcase the power of the proposed approach, we generate datasets with rich dense pixel-wise labels for a wide range of downstream tasks, including semantic segmentation, instance segmentation, and depth estimation. Notably, it achieves 1) state-of-the-art results on semantic segmentation and instance segmentation; 2) significantly more robust on domain generalization than using the real data alone; and state-of-the-art results in zero-shot segmentation setting; and 3) flexibility for efficient application and novel task composition (e.g., image editing). The project website and code can be found at https://weijiawu.github.io/DatasetDM_page/ and https://github.com/showlab/DatasetDM, respectivel

arXiv.org e-Print Archive

Learning from small and imbalanced dataset of images using generative adversarial neural networks.

Author: Ali-Gombe Adamu
Publication venue
Publication date: 31/12/2019
Field of study

The performance of deep learning models is unmatched by any other approach in supervised computer vision tasks such as image classification. However, training these models requires a lot of labeled data, which are not always available. Labelling a massive dataset is largely a manual and very demanding process. Thus, this problem has led to the development of techniques that bypass the need for labelling at scale. Despite this, existing techniques such as transfer learning, data augmentation and semi-supervised learning have not lived up to expectations. Some of these techniques do not account for other classification challenges, such as a class-imbalance problem. Thus, these techniques mostly underperform when compared with fully supervised approaches. In this thesis, we propose new methods to train a deep model on image classification with a limited number of labeled examples. This was achieved by extending state-of-the-art generative adversarial networks with multiple fake classes and network switchers. These new features enabled us to train a classifier using large unlabeled data, while generating class specific samples. The proposed model is label agnostic and is suitable for different classification scenarios, ranging from weakly supervised to fully supervised settings. This was used to address classification challenges with limited labeled data and a class-imbalance problem. Extensive experiments were carried out on different benchmark datasets. Firstly, the proposed approach was used to train a classification model and our findings indicated that the proposed approach achieved better classification accuracies, especially when the number of labeled samples is small. Secondly, the proposed approach was able to generate high-quality samples from class-imbalance datasets. The samples' quality is evident in improved classification performances when generated samples were used in neutralising class-imbalance. The results are thoroughly analyzed and, overall, our method showed superior performances over popular resampling technique and the AC-GAN model. Finally, we successfully applied the proposed approach as a new augmentation technique to two challenging real-world problems: face with attributes and legacy engineering drawings. The results obtained demonstrate that the proposed approach is effective even in extreme cases

Open Access Institutional Repository at Robert Gordon University

Pinching sweaters on your phone – iShoogle : multi-gesture touchscreen fabric simulator using natural on-fabric gestures to communicate textile qualities

Author: Orzechowski Pawel Michal
Publication venue: Mathematical and Computer Sciences
Publication date: 01/05/2016
Field of study

The inability to touch fabrics online frustrates consumers, who are used to evaluating physical textiles by engaging in complex, natural gestural interactions. When customers interact with physical fabrics, they combine cross-modal information about the fabric's look, sound and handle to build an impression of its physical qualities. But whenever an interaction with a fabric is limited (i.e. when watching clothes online) there is a perceptual gap between the fabric qualities perceived digitally and the actual fabric qualities that a person would perceive when interacting with the physical fabric. The goal of this thesis was to create a fabric simulator that minimized this perceptual gap, enabling accurate perception of the qualities of fabrics presented digitally. We designed iShoogle, a multi-gesture touch-screen sound-enabled fabric simulator that aimed to create an accurate representation of fabric qualities without the need for touching the physical fabric swatch. iShoogle uses on-screen gestures (inspired by natural on-fabric movements e.g. Crunching) to control pre-recorded videos and audio of fabrics being deformed (e.g. being Crunched). iShoogle creates an illusion of direct video manipulation and also direct manipulation of the displayed fabric. This thesis describes the results of nine studies leading towards the development and evaluation of iShoogle. In the first three studies, we combined expert and non-expert textile-descriptive words and grouped them into eight dimensions labelled with terms Crisp, Hard, Soft, Textured, Flexible, Furry, Rough and Smooth. These terms were used to rate fabric qualities throughout the thesis. We observed natural on-fabric gestures during a fabric handling study (Study 4) and used the results to design iShoogle's on-screen gestures. In Study 5 we examined iShoogle's performance and speed in a fabric handling task and in Study 6 we investigated users' preferences for sound playback interactivity. iShoogle's accuracy was then evaluated in the last three studies by comparing participants’ ratings of textile qualities when using iShoogle with ratings produced when handling physical swatches. We also described the recording and processing techniques for the video and audio content that iShoogle used. Finally, we described the iShoogle iPhone app that was released to the general public. Our evaluation studies showed that iShoogle significantly improved the accuracy of fabric perception in at least some cases. Further research could investigate which fabric qualities and which fabrics are particularly suited to be represented with iShoogle

ROS: The Research Output Service. Heriot-Watt University Edinburgh

Recommended from our members

Towards solving computer vision problems: datasets, labels, algorithms, and applications

Author: Kwak Iljung Samuel
Publication venue: eScholarship, University of California
Publication date: 01/01/2019
Field of study

The solution to a supervised computer vision problem consists of an application, algorithm, input data, and a set of human generated labels. Solving these kinds of tasks involves collecting large quantities of data, collecting appropriate labels, and developing machine vision algorithms tailored to the application. Progress on these problems has often benefited from large scale datasets with high fidelity labels. Successful algorithms display a synergy between application goals and the size and quality of the dataset. This thesis presents work highlighting the importance of each component of a supervised vision task.First, the problem of automatically classifying groups of people into social categories is introduced. This problem is called Urban Tribe Classification. To tackle this problem, each individual and the entire group of individuals are modeled. Since this was a newly introduced computer vision problem, a dataset for this task was created. On this dataset, the combined representation of group and individuals outperforms using only the person representations. This model showed promising results for automatic subculture classification.Second, the problem of creating perceptual embeddings based on human similarity judgements is tackled. This work focuses on triplet similarity comparisons of the form ``Is object

i

more similar to

j

k

?'', which have been useful for computer vision and machine learning applications. Unfortunately, triplet similarity comparisons, like many human labeling efforts, can be prohibitively expensive. This work proposes two techniques for dealing with this obstacle. First, an alternative display for collecting triplets is designed. This display shows a probe image and a grid of query images, allowing the user to collect multiple triplets simultaneously. The display is shown to reduce the cost and time of triplet collection. In addition, higher quality embeddings are created with the improved triplet collection UI. A 10,000-food item dataset of human taste similarity was created using this UI. Second, ``SNaCK,'' a low-dimensional perceptual embedding algorithm that combines human expertise with automatic machine kernels, is introduced. Both parts are complementary: human insight can capture relationships that are not apparent from the object's visual similarity and the machine can help relieve the human from having to exhaustively specify many constraints. Finally, the precise localization of key frames of an action is explored. This work focuses on detecting the exact starting frame of a behavior, an important task for neuroscience research. To address this problem, a loss designed to penalize extra and missed action start detections over small misalignments. Recurrent neural networks (RNN) are trained to optimize this loss. The model is shown to reduce the number of false positives, an important criteria defined by the neuroscientist. The performance of the model is evaluated on a new dataset, the Mouse Reach Dataset, a large, annotated video dataset of mice performing a sequence of actions. The dataset was created for neuroscience research. On this dataset, the proposed model outperforms related approaches and baseline methods using an unstructured loss

eScholarship - University of California

Data Hiding in Digital Video

Author: Cay Abdullah
Publication venue: ODU Digital Commons
Publication date: 01/04/2011
Field of study

With the rapid development of digital multimedia technologies, an old method which is called steganography has been sought to be a solution for data hiding applications such as digital watermarking and covert communication. Steganography is the art of secret communication using a cover signal, e.g., video, audio, image etc., whereas the counter-technique, detecting the existence of such as a channel through a statistically trained classifier, is called steganalysis. The state-of-the art data hiding algorithms utilize features; such as Discrete Cosine Transform (DCT) coefficients, pixel values, motion vectors etc., of the cover signal to convey the message to the receiver side. The goal of embedding algorithm is to maximize the number of bits sent to the decoder side (embedding capacity) with maximum robustness against attacks while keeping the perceptual and statistical distortions (security) low. Data Hiding schemes are characterized by these three conflicting requirements: security against steganalysis, robustness against channel associated and/or intentional distortions, and the capacity in terms of the embedded payload. Depending upon the application it is the designer\u27s task to find an optimum solution amongst them. The goal of this thesis is to develop a novel data hiding scheme to establish a covert channel satisfying statistical and perceptual invisibility with moderate rate capacity and robustness to combat steganalysis based detection. The idea behind the proposed method is the alteration of Video Object (VO) trajectory coordinates to convey the message to the receiver side by perturbing the centroid coordinates of the VO. Firstly, the VO is selected by the user and tracked through the frames by using a simple region based search strategy and morphological operations. After the trajectory coordinates are obtained, the perturbation of the coordinates implemented through the usage of a non-linear embedding function, such as a polar quantizer where both the magnitude and phase of the motion is used. However, the perturbations made to the motion magnitude and phase were kept small to preserve the semantic meaning of the object motion trajectory. The proposed method is well suited to the video sequences in which VOs have smooth motion trajectories. Examples of these types could be found in sports videos in which the ball is the focus of attention and exhibits various motion types, e.g., rolling on the ground, flying in the air, being possessed by a player, etc. Different sports video sequences have been tested by using the proposed method. Through the experimental results, it is shown that the proposed method achieved the goal of both statistical and perceptual invisibility with moderate rate embedding capacity under AWGN channel with varying noise variances. This achievement is important as the first step for both active and passive steganalysis is the detection of the existence of covert channel. This work has multiple contributions in the field of data hiding. Firstly, it is the first example of a data hiding method in which the trajectory of a VO is used. Secondly, this work has contributed towards improving steganographic security by providing new features: the coordinate location and semantic meaning of the object

Old Dominion University