631 research outputs found
Relit-NeuLF: Efficient Relighting and Novel View Synthesis via Neural 4D Light Field
In this paper, we address the problem of simultaneous relighting and novel
view synthesis of a complex scene from multi-view images with a limited number
of light sources. We propose an analysis-synthesis approach called Relit-NeuLF.
Following the recent neural 4D light field network (NeuLF), Relit-NeuLF first
leverages a two-plane light field representation to parameterize each ray in a
4D coordinate system, enabling efficient learning and inference. Then, we
recover the spatially-varying bidirectional reflectance distribution function
(SVBRDF) of a 3D scene in a self-supervised manner. A DecomposeNet learns to
map each ray to its SVBRDF components: albedo, normal, and roughness. Based on
the decomposed BRDF components and conditioning light directions, a RenderNet
learns to synthesize the color of the ray. To self-supervise the SVBRDF
decomposition, we encourage the predicted ray color to be close to the
physically-based rendering result using the microfacet model. Comprehensive
experiments demonstrate that the proposed method is efficient and effective on
both synthetic data and real-world human face data, and outperforms the
state-of-the-art results. We publicly released our code on GitHub. You can find
it here: https://github.com/oppo-us-research/RelitNeuLFComment: 10 page
Neural Radiance Fields: Past, Present, and Future
The various aspects like modeling and interpreting 3D environments and
surroundings have enticed humans to progress their research in 3D Computer
Vision, Computer Graphics, and Machine Learning. An attempt made by Mildenhall
et al in their paper about NeRFs (Neural Radiance Fields) led to a boom in
Computer Graphics, Robotics, Computer Vision, and the possible scope of
High-Resolution Low Storage Augmented Reality and Virtual Reality-based 3D
models have gained traction from res with more than 1000 preprints related to
NeRFs published. This paper serves as a bridge for people starting to study
these fields by building on the basics of Mathematics, Geometry, Computer
Vision, and Computer Graphics to the difficulties encountered in Implicit
Representations at the intersection of all these disciplines. This survey
provides the history of rendering, Implicit Learning, and NeRFs, the
progression of research on NeRFs, and the potential applications and
implications of NeRFs in today's world. In doing so, this survey categorizes
all the NeRF-related research in terms of the datasets used, objective
functions, applications solved, and evaluation criteria for these applications.Comment: 413 pages, 9 figures, 277 citation
RenderMe-360: A Large Digital Asset Library and Benchmarks Towards High-fidelity Head Avatars
Synthesizing high-fidelity head avatars is a central problem for computer
vision and graphics. While head avatar synthesis algorithms have advanced
rapidly, the best ones still face great obstacles in real-world scenarios. One
of the vital causes is inadequate datasets -- 1) current public datasets can
only support researchers to explore high-fidelity head avatars in one or two
task directions; 2) these datasets usually contain digital head assets with
limited data volume, and narrow distribution over different attributes. In this
paper, we present RenderMe-360, a comprehensive 4D human head dataset to drive
advance in head avatar research. It contains massive data assets, with 243+
million complete head frames, and over 800k video sequences from 500 different
identities captured by synchronized multi-view cameras at 30 FPS. It is a
large-scale digital library for head avatars with three key attributes: 1) High
Fidelity: all subjects are captured by 60 synchronized, high-resolution 2K
cameras in 360 degrees. 2) High Diversity: The collected subjects vary from
different ages, eras, ethnicities, and cultures, providing abundant materials
with distinctive styles in appearance and geometry. Moreover, each subject is
asked to perform various motions, such as expressions and head rotations, which
further extend the richness of assets. 3) Rich Annotations: we provide
annotations with different granularities: cameras' parameters, matting, scan,
2D/3D facial landmarks, FLAME fitting, and text description.
Based on the dataset, we build a comprehensive benchmark for head avatar
research, with 16 state-of-the-art methods performed on five main tasks: novel
view synthesis, novel expression synthesis, hair rendering, hair editing, and
talking head generation. Our experiments uncover the strengths and weaknesses
of current methods. RenderMe-360 opens the door for future exploration in head
avatars.Comment: Technical Report; Project Page: 36; Github Link:
https://github.com/RenderMe-360/RenderMe-36
Multimedia Forensics
This book is open access. Media forensics has never been more relevant to societal life. Not only media content represents an ever-increasing share of the data traveling on the net and the preferred communications means for most users, it has also become integral part of most innovative applications in the digital information ecosystem that serves various sectors of society, from the entertainment, to journalism, to politics. Undoubtedly, the advances in deep learning and computational imaging contributed significantly to this outcome. The underlying technologies that drive this trend, however, also pose a profound challenge in establishing trust in what we see, hear, and read, and make media content the preferred target of malicious attacks. In this new threat landscape powered by innovative imaging technologies and sophisticated tools, based on autoencoders and generative adversarial networks, this book fills an important gap. It presents a comprehensive review of state-of-the-art forensics capabilities that relate to media attribution, integrity and authenticity verification, and counter forensics. Its content is developed to provide practitioners, researchers, photo and video enthusiasts, and students a holistic view of the field
Multimedia Forensics
This book is open access. Media forensics has never been more relevant to societal life. Not only media content represents an ever-increasing share of the data traveling on the net and the preferred communications means for most users, it has also become integral part of most innovative applications in the digital information ecosystem that serves various sectors of society, from the entertainment, to journalism, to politics. Undoubtedly, the advances in deep learning and computational imaging contributed significantly to this outcome. The underlying technologies that drive this trend, however, also pose a profound challenge in establishing trust in what we see, hear, and read, and make media content the preferred target of malicious attacks. In this new threat landscape powered by innovative imaging technologies and sophisticated tools, based on autoencoders and generative adversarial networks, this book fills an important gap. It presents a comprehensive review of state-of-the-art forensics capabilities that relate to media attribution, integrity and authenticity verification, and counter forensics. Its content is developed to provide practitioners, researchers, photo and video enthusiasts, and students a holistic view of the field
Vision for Social Robots: Human Perception and Pose Estimation
In order to extract the underlying meaning from a scene captured from the surrounding world in a single still image, social robots will need to learn the human ability to detect different objects, understand their arrangement and relationships relative both to their own parts and to each other, and infer the dynamics under which they are evolving. Furthermore, they will need to develop and hold a notion of context to allow assigning different meanings (semantics) to the same visual configuration (syntax) of a scene.
The underlying thread of this Thesis is the investigation of new ways for enabling interactions between social robots and humans, by advancing the visual perception capabilities of robots when they process images and videos in which humans are the main focus of attention.
First, we analyze the general problem of scene understanding, as social robots moving through the world need to be able to interpret scenes without having been assigned a specific preset goal. Throughout this line of research, i) we observe that human actions and interactions which can be visually discriminated from an image follow a very heavy-tailed distribution; ii) we develop an algorithm that can obtain a spatial understanding of a scene by only using cues arising from the effect of perspective on a picture of a person’s face; and iii) we define a novel taxonomy of errors for the task of estimating the 2D body pose of people in images to better explain the behavior of algorithms and highlight their underlying causes of error.
Second, we focus on the specific task of 3D human pose and motion estimation from monocular 2D images using weakly supervised training data, as accurately predicting human pose will open up the possibility of richer interactions between humans and social robots. We show that when 3D ground-truth data is only available in small quantities, or not at all, it is possible to leverage knowledge about the physical properties of the human body, along with additional constraints related to alternative types of supervisory signals, to learn models that can regress the full 3D pose of the human body and predict its motions from monocular 2D images.
Taken in its entirety, the intent of this Thesis is to highlight the importance of, and provide novel methodologies for, social robots' ability to interpret their surrounding environment, learn in a way that is robust to low data availability, and generalize previously observed behaviors to unknown situations in a similar way to humans.</p
Revealing the Invisible: On the Extraction of Latent Information from Generalized Image Data
The desire to reveal the invisible in order to explain the world around us has been a source of impetus for technological and scientific progress throughout human history. Many of the phenomena that directly affect us cannot be sufficiently explained based on the observations using our primary senses alone. Often this is because their originating cause is either too small, too far away, or in other ways obstructed. To put it in other words: it is invisible to us. Without careful observation and experimentation, our models of the world remain inaccurate and research has to be conducted in order to improve our understanding of even the most basic effects. In this thesis, we1 are going to present our solutions to three challenging problems in visual computing, where a surprising amount of information is hidden in generalized image data and cannot easily be extracted by human observation or existing methods. We are able to extract the latent information using non-linear and discrete optimization methods based on physically motivated models and computer graphics methodology, such as ray tracing, real-time transient rendering, and image-based rendering
Recommended from our members
Proceedings of EVA London 2024
The Electronic Visualisation and the Arts London 2024 Conference (EVA London 2024) is co-sponsored by the Computer Arts Society (CAS) and BCS, the Chartered Institute for IT, of which the CAS is a Specialist Group. As for 2022, the EVA London 2023 Conference is a physical and online “hybrid” conference. We continue with publishing the proceedings, both online, with open access via ScienceOpen, and also in our traditional printed form, in full colour. The main conference presentations run during 10–13 July 2023, with workshops and other activities, especially for students, on 14 July 2023
- …