487 research outputs found
Advances in 3D Neural Stylization: A Survey
Modern artificial intelligence provides a novel way of producing digital art
in styles. The expressive power of neural networks enables the realm of visual
style transfer methods, which can be used to edit images, videos, and 3D data
to make them more artistic and diverse. This paper reports on recent advances
in neural stylization for 3D data. We provide a taxonomy for neural stylization
by considering several important design choices, including scene
representation, guidance data, optimization strategies, and output styles.
Building on such taxonomy, our survey first revisits the background of neural
stylization on 2D images, and then provides in-depth discussions on recent
neural stylization methods for 3D data, where we also provide a mini-benchmark
on artistic stylization methods. Based on the insights gained from the survey,
we then discuss open challenges, future research, and potential applications
and impacts of neural stylization.Comment: 26 page
Towards Artistic Image Aesthetics Assessment: a Large-scale Dataset and a New Method
Image aesthetics assessment (IAA) is a challenging task due to its highly
subjective nature. Most of the current studies rely on large-scale datasets
(e.g., AVA and AADB) to learn a general model for all kinds of photography
images. However, little light has been shed on measuring the aesthetic quality
of artistic images, and the existing datasets only contain relatively few
artworks. Such a defect is a great obstacle to the aesthetic assessment of
artistic images. To fill the gap in the field of artistic image aesthetics
assessment (AIAA), we first introduce a large-scale AIAA dataset: Boldbrush
Artistic Image Dataset (BAID), which consists of 60,337 artistic images
covering various art forms, with more than 360,000 votes from online users. We
then propose a new method, SAAN (Style-specific Art Assessment Network), which
can effectively extract and utilize style-specific and generic aesthetic
information to evaluate artistic images. Experiments demonstrate that our
proposed approach outperforms existing IAA methods on the proposed BAID dataset
according to quantitative comparisons. We believe the proposed dataset and
method can serve as a foundation for future AIAA works and inspire more
research in this field. Dataset and code are available at:
https://github.com/Dreemurr-T/BAID.gitComment: Accepted by CVPR 202
Neural Radiance Fields: Past, Present, and Future
The various aspects like modeling and interpreting 3D environments and
surroundings have enticed humans to progress their research in 3D Computer
Vision, Computer Graphics, and Machine Learning. An attempt made by Mildenhall
et al in their paper about NeRFs (Neural Radiance Fields) led to a boom in
Computer Graphics, Robotics, Computer Vision, and the possible scope of
High-Resolution Low Storage Augmented Reality and Virtual Reality-based 3D
models have gained traction from res with more than 1000 preprints related to
NeRFs published. This paper serves as a bridge for people starting to study
these fields by building on the basics of Mathematics, Geometry, Computer
Vision, and Computer Graphics to the difficulties encountered in Implicit
Representations at the intersection of all these disciplines. This survey
provides the history of rendering, Implicit Learning, and NeRFs, the
progression of research on NeRFs, and the potential applications and
implications of NeRFs in today's world. In doing so, this survey categorizes
all the NeRF-related research in terms of the datasets used, objective
functions, applications solved, and evaluation criteria for these applications.Comment: 413 pages, 9 figures, 277 citation
Evaluation in neural style transfer: a review
The field of neural style transfer (NST) has witnessed remarkable progress in the past few years, with approaches being able to synthesize artistic and photorealistic images and videos of exceptional quality. To evaluate such results, a diverse landscape of evaluation methods and metrics is used, including authors' opinions based on side-by-side comparisons, human evaluation studies that quantify the subjective judgements of participants, and a multitude of quantitative computational metrics which objectively assess the different aspects of an algorithm's performance. However, there is no consensus regarding the most suitable and effective evaluation procedure that can guarantee the reliability of the results. In this review, we provide an in-depth analysis of existing evaluation techniques, identify the inconsistencies and limitations of current evaluation methods, and give recommendations for standardized evaluation practices. We believe that the development of a robust evaluation framework will not only enable more meaningful and fairer comparisons among NST methods but will also enhance the comprehension and interpretation of research findings in the field
Visual Representation Learning with Limited Supervision
The quality of a Computer Vision system is proportional to the rigor of data representation it is built upon. Learning expressive representations of images is therefore the centerpiece to almost every computer vision application, including image search, object detection and classification, human re-identification, object tracking, pose understanding, image-to-image translation, and embodied agent navigation to name a few. Deep Neural Networks are most often seen among the modern methods of representation learning. The limitation is, however, that deep representation learning methods require extremely large amounts of manually labeled data for training. Clearly, annotating vast amounts of images for various environments is infeasible due to cost and time constraints. This requirement of obtaining labeled data is a prime restriction regarding pace of the development of visual recognition systems.
In order to cope with the exponentially growing amounts of visual data generated daily, machine learning algorithms have to at least strive to scale at a similar rate.
The second challenge consists in the learned representations having to generalize to novel objects, classes, environments and tasks in order to accommodate to the diversity of the visual world.
Despite the evergrowing number of recent publications tangentially addressing the topic of learning generalizable representations, efficient generalization is yet to be achieved. This dissertation attempts to tackle the problem of learning visual representations that can generalize to novel settings while requiring few labeled examples.
In this research, we study the limitations of the existing supervised representation learning approaches and propose a framework that improves the generalization of learned features by exploiting visual similarities between images which are not captured by provided manual annotations. Furthermore, to mitigate the common requirement of large scale manually annotated datasets, we propose several approaches that can learn expressive representations without human-attributed labels, in a self-supervised fashion, by grouping highly-similar samples into surrogate classes based on progressively learned representations.
The development of computer vision as science is preconditioned upon the seamless ability of a machine to record and disentangle pictures' attributes that were expected to only be conceived by humans. As such, particular interest was dedicated to the ability to analyze the means of artistic expression and style which depicts a more complex task than merely breaking an image down to colors and pixels. The ultimate test for this ability is the task of style transfer which involves altering the style of an image while keeping its content. An effective solution of style transfer requires learning such image representation which would allow disentangling image style and its content.
Moreover, particular artistic styles come with idiosyncrasies that affect which content details should be preserved and which discarded.
Another pitfall here is that it is impossible to get pixel-wise annotations of style and how the style should be altered.
We address this problem by proposing an unsupervised approach that enables encoding the image content in such a way that is required by a particular style.
The proposed approach exchanges the style of an input image by first extracting the content representation in a style-aware way and then rendering it in a new style using a style-specific decoder network, achieving compelling results in image and video stylization.
Finally, we combine supervised and self-supervised representation learning techniques for the task of human and animals pose understanding. The proposed method enables transfer of the representation learned for recognition of human poses to proximal mammal species without using labeled animal images. This approach is not limited to dense pose estimation and could potentially enable autonomous agents from robots to self-driving cars to retrain themselves and adapt to novel environments based on learning from previous experiences
- …