6,056 research outputs found
Reconstructive Sparse Code Transfer for Contour Detection and Semantic Labeling
We frame the task of predicting a semantic labeling as a sparse
reconstruction procedure that applies a target-specific learned transfer
function to a generic deep sparse code representation of an image. This
strategy partitions training into two distinct stages. First, in an
unsupervised manner, we learn a set of generic dictionaries optimized for
sparse coding of image patches. We train a multilayer representation via
recursive sparse dictionary learning on pooled codes output by earlier layers.
Second, we encode all training images with the generic dictionaries and learn a
transfer function that optimizes reconstruction of patches extracted from
annotated ground-truth given the sparse codes of their corresponding image
patches. At test time, we encode a novel image using the generic dictionaries
and then reconstruct using the transfer function. The output reconstruction is
a semantic labeling of the test image.
Applying this strategy to the task of contour detection, we demonstrate
performance competitive with state-of-the-art systems. Unlike almost all prior
work, our approach obviates the need for any form of hand-designed features or
filters. To illustrate general applicability, we also show initial results on
semantic part labeling of human faces.
The effectiveness of our approach opens new avenues for research on deep
sparse representations. Our classifiers utilize this representation in a novel
manner. Rather than acting on nodes in the deepest layer, they attach to nodes
along a slice through multiple layers of the network in order to make
predictions about local patches. Our flexible combination of a generatively
learned sparse representation with discriminatively trained transfer
classifiers extends the notion of sparse reconstruction to encompass arbitrary
semantic labeling tasks.Comment: to appear in Asian Conference on Computer Vision (ACCV), 201
Face Centered Image Analysis Using Saliency and Deep Learning Based Techniques
Image analysis starts with the purpose of configuring vision machines that can perceive like human to intelligently infer general principles and sense the surrounding situations from imagery. This dissertation studies the face centered image analysis as the core problem in high level computer vision research and addresses the problem by tackling three challenging subjects: Are there anything interesting in the image? If there is, what is/are that/they? If there is a person presenting, who is he/she? What kind of expression he/she is performing? Can we know his/her age? Answering these problems results in the saliency-based object detection, deep learning structured objects categorization and recognition, human facial landmark detection and multitask biometrics.
To implement object detection, a three-level saliency detection based on the self-similarity technique (SMAP) is firstly proposed in the work. The first level of SMAP accommodates statistical methods to generate proto-background patches, followed by the second level that implements local contrast computation based on image self-similarity characteristics. At last, the spatial color distribution constraint is considered to realize the saliency detection. The outcome of the algorithm is a full resolution image with highlighted saliency objects and well-defined edges.
In object recognition, the Adaptive Deconvolution Network (ADN) is implemented to categorize the objects extracted from saliency detection. To improve the system performance, L1/2 norm regularized ADN has been proposed and tested in different applications. The results demonstrate the efficiency and significance of the new structure.
To fully understand the facial biometrics related activity contained in the image, the low rank matrix decomposition is introduced to help locate the landmark points on the face images. The natural extension of this work is beneficial in human facial expression recognition and facial feature parsing research.
To facilitate the understanding of the detected facial image, the automatic facial image analysis becomes essential. We present a novel deeply learnt tree-structured face representation to uniformly model the human face with different semantic meanings. We show that the proposed feature yields unified representation in multi-task facial biometrics and the multi-task learning framework is applicable to many other computer vision tasks
A Review on Skin Disease Classification and Detection Using Deep Learning Techniques
Skin cancer ranks among the most dangerous cancers. Skin cancers are commonly referred to as Melanoma. Melanoma is brought on by genetic faults or mutations on the skin, which are caused by Unrepaired Deoxyribonucleic Acid (DNA) in skin cells. It is essential to detect skin cancer in its infancy phase since it is more curable in its initial phases. Skin cancer typically progresses to other regions of the body. Owing to the disease's increased frequency, high mortality rate, and prohibitively high cost of medical treatments, early diagnosis of skin cancer signs is crucial. Due to the fact that how hazardous these disorders are, scholars have developed a number of early-detection techniques for melanoma. Lesion characteristics such as symmetry, colour, size, shape, and others are often utilised to detect skin cancer and distinguish benign skin cancer from melanoma. An in-depth investigation of deep learning techniques for melanoma's early detection is provided in this study. This study discusses the traditional feature extraction-based machine learning approaches for the segmentation and classification of skin lesions. Comparison-oriented research has been conducted to demonstrate the significance of various deep learning-based segmentation and classification approaches
AvatarFusion: Zero-shot Generation of Clothing-Decoupled 3D Avatars Using 2D Diffusion
Large-scale pre-trained vision-language models allow for the zero-shot
text-based generation of 3D avatars. The previous state-of-the-art method
utilized CLIP to supervise neural implicit models that reconstructed a human
body mesh. However, this approach has two limitations. Firstly, the lack of
avatar-specific models can cause facial distortion and unrealistic clothing in
the generated avatars. Secondly, CLIP only provides optimization direction for
the overall appearance, resulting in less impressive results. To address these
limitations, we propose AvatarFusion, the first framework to use a latent
diffusion model to provide pixel-level guidance for generating human-realistic
avatars while simultaneously segmenting clothing from the avatar's body.
AvatarFusion includes the first clothing-decoupled neural implicit avatar model
that employs a novel Dual Volume Rendering strategy to render the decoupled
skin and clothing sub-models in one space. We also introduce a novel
optimization method, called Pixel-Semantics Difference-Sampling (PS-DS), which
semantically separates the generation of body and clothes, and generates a
variety of clothing styles. Moreover, we establish the first benchmark for
zero-shot text-to-avatar generation. Our experimental results demonstrate that
our framework outperforms previous approaches, with significant improvements
observed in all metrics. Additionally, since our model is clothing-decoupled,
we can exchange the clothes of avatars. Code will be available on Github
Digital Image Processing
This book presents several recent advances that are related or fall under the umbrella of 'digital image processing', with the purpose of providing an insight into the possibilities offered by digital image processing algorithms in various fields. The presented mathematical algorithms are accompanied by graphical representations and illustrative examples for an enhanced readability. The chapters are written in a manner that allows even a reader with basic experience and knowledge in the digital image processing field to properly understand the presented algorithms. Concurrently, the structure of the information in this book is such that fellow scientists will be able to use it to push the development of the presented subjects even further
mHealth hyperspectral learning for instantaneous spatiospectral imaging of hemodynamics
Hyperspectral imaging acquires data in both the spatial and frequency domains
to offer abundant physical or biological information. However, conventional
hyperspectral imaging has intrinsic limitations of bulky instruments, slow data
acquisition rate, and spatiospectral tradeoff. Here we introduce hyperspectral
learning for snapshot hyperspectral imaging in which sampled hyperspectral data
in a small subarea are incorporated into a learning algorithm to recover the
hypercube. Hyperspectral learning exploits the idea that a photograph is more
than merely a picture and contains detailed spectral information. A small
sampling of hyperspectral data enables spectrally informed learning to recover
a hypercube from an RGB image. Hyperspectral learning is capable of recovering
full spectroscopic resolution in the hypercube, comparable to high spectral
resolutions of scientific spectrometers. Hyperspectral learning also enables
ultrafast dynamic imaging, leveraging ultraslow video recording in an
off-the-shelf smartphone, given that a video comprises a time series of
multiple RGB images. To demonstrate its versatility, an experimental model of
vascular development is used to extract hemodynamic parameters via statistical
and deep-learning approaches. Subsequently, the hemodynamics of peripheral
microcirculation is assessed at an ultrafast temporal resolution up to a
millisecond, using a conventional smartphone camera. This spectrally informed
learning method is analogous to compressed sensing; however, it further allows
for reliable hypercube recovery and key feature extractions with a transparent
learning algorithm. This learning-powered snapshot hyperspectral imaging method
yields high spectral and temporal resolutions and eliminates the spatiospectral
tradeoff, offering simple hardware requirements and potential applications of
various machine-learning techniques.Comment: This paper will appear in PNAS Nexu
Neural Rendering and Its Hardware Acceleration: A Review
Neural rendering is a new image and video generation method based on deep
learning. It combines the deep learning model with the physical knowledge of
computer graphics, to obtain a controllable and realistic scene model, and
realize the control of scene attributes such as lighting, camera parameters,
posture and so on. On the one hand, neural rendering can not only make full use
of the advantages of deep learning to accelerate the traditional forward
rendering process, but also provide new solutions for specific tasks such as
inverse rendering and 3D reconstruction. On the other hand, the design of
innovative hardware structures that adapt to the neural rendering pipeline
breaks through the parallel computing and power consumption bottleneck of
existing graphics processors, which is expected to provide important support
for future key areas such as virtual and augmented reality, film and television
creation and digital entertainment, artificial intelligence and the metaverse.
In this paper, we review the technical connotation, main challenges, and
research progress of neural rendering. On this basis, we analyze the common
requirements of neural rendering pipeline for hardware acceleration and the
characteristics of the current hardware acceleration architecture, and then
discuss the design challenges of neural rendering processor architecture.
Finally, the future development trend of neural rendering processor
architecture is prospected
- …