Search CORE

980 research outputs found

Video modeling via implicit motion representations

Author: Zheng Yunfei
Publication venue: The Research Repository @ WVU
Publication date: 01/12/2008
Field of study

Video modeling refers to the development of analytical representations for explaining the intensity distribution in video signals. Based on the analytical representation, we can develop algorithms for accomplishing particular video-related tasks. Therefore video modeling provides us a foundation to bridge video data and related-tasks. Although there are many video models proposed in the past decades, the rise of new applications calls for more efficient and accurate video modeling approaches.;Most existing video modeling approaches are based on explicit motion representations, where motion information is explicitly expressed by correspondence-based representations (i.e., motion velocity or displacement). Although it is conceptually simple, the limitations of those representations and the suboptimum of motion estimation techniques can degrade such video modeling approaches, especially for handling complex motion or non-ideal observation video data. In this thesis, we propose to investigate video modeling without explicit motion representation. Motion information is implicitly embedded into the spatio-temporal dependency among pixels or patches instead of being explicitly described by motion vectors.;Firstly, we propose a parametric model based on a spatio-temporal adaptive localized learning (STALL). We formulate video modeling as a linear regression problem, in which motion information is embedded within the regression coefficients. The coefficients are adaptively learned within a local space-time window based on LMMSE criterion. Incorporating a spatio-temporal resampling and a Bayesian fusion scheme, we can enhance the modeling capability of STALL on more general videos. Under the framework of STALL, we can develop video processing algorithms for a variety of applications by adjusting model parameters (i.e., the size and topology of model support and training window). We apply STALL on three video processing problems. The simulation results show that motion information can be efficiently exploited by our implicit motion representation and the resampling and fusion do help to enhance the modeling capability of STALL.;Secondly, we propose a nonparametric video modeling approach, which is not dependent on explicit motion estimation. Assuming the video sequence is composed of many overlapping space-time patches, we propose to embed motion-related information into the relationships among video patches and develop a generic sparsity-based prior for typical video sequences. First, we extend block matching to more general kNN-based patch clustering, which provides an implicit and distributed representation for motion information. We propose to enforce the sparsity constraint on a higher-dimensional data array signal, which is generated by packing the patches in the similar patch set. Then we solve the inference problem by updating the kNN array and the wanted signal iteratively. Finally, we present a Bayesian fusion approach to fuse multiple-hypothesis inferences. Simulation results in video error concealment, denoising, and deartifacting are reported to demonstrate its modeling capability.;Finally, we summarize the proposed two video modeling approaches. We also point out the perspectives of implicit motion representations in applications ranging from low to high level problems

The Research Repository @ WVU (West Virginia University)

GET3D: A Generative Model of High Quality 3D Textured Shapes Learned from Images

Author: Chen Wenzheng
Fidler Sanja
Gao Jun
Gojcic Zan
Li Daiqing
Litany Or
Shen Tianchang
Wang Zian
Yin Kangxue
Publication venue
Publication date: 22/09/2022
Field of study

As several industries are moving towards modeling massive 3D virtual worlds, the need for content creation tools that can scale in terms of the quantity, quality, and diversity of 3D content is becoming evident. In our work, we aim to train performant 3D generative models that synthesize textured meshes which can be directly consumed by 3D rendering engines, thus immediately usable in downstream applications. Prior works on 3D generative modeling either lack geometric details, are limited in the mesh topology they can produce, typically do not support textures, or utilize neural renderers in the synthesis process, which makes their use in common 3D software non-trivial. In this work, we introduce GET3D, a Generative model that directly generates Explicit Textured 3D meshes with complex topology, rich geometric details, and high-fidelity textures. We bridge recent success in the differentiable surface modeling, differentiable rendering as well as 2D Generative Adversarial Networks to train our model from 2D image collections. GET3D is able to generate high-quality 3D textured meshes, ranging from cars, chairs, animals, motorbikes and human characters to buildings, achieving significant improvements over previous methods.Comment: NeurIPS 2022, Project Page: https://nv-tlabs.github.io/GET3D

arXiv.org e-Print Archive

Local Motion Phases for Learning Multi-Contact Character Movements

Author: Clavet Simon
Felix
Kazi Zaman
Li Zimo
Mirza Mehdi
Sebastian Starke
Taku Komura
Yiwei Zhao
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 08/07/2020
Field of study

Crossref

Edinburgh Research Explorer

Non-stationary demand forecasting by cross-sectional aggregation

Author: Babai M.Z.
Ducq Y.
Rostami-Tabar Bahman
Syntetos A.
Publication venue: 'Elsevier BV'
Publication date: 01/01/2015
Field of study

In this paper the relative effectiveness of top-down (TD) versus bottom-up (BU) approaches is compared for cross-sectionally forecasting aggregate and sub-aggregate demand. We assume that the sub-aggregate demand follows a non-stationary Integrated Moving Average (IMA) process of order one and a Single Exponential Smoothing (SES) procedure is used to extrapolate future requirements. Such demand processes are often encountered in practice and SES is one of the standard estimators used in industry (in addition to being the optimal estimator for an IMA process). Theoretical variances of forecast error are derived for the BU and TD approach in order to contrast the relevant forecasting performances. The theoretical analysis is supported by an extensive numerical investigation at both the aggregate and sub-aggregate level, in addition to empirically validating our findings on a real dataset from a European superstore. The results demonstrate the increased benefit resulting from cross-sectional forecasting in a non-stationary environment than in a stationary one. Valuable insights are offered to demand planners and the paper closes with an agenda for further research in this area. © 2015 Elsevier B.V. All rights reserved

Online Research @ Cardiff

Coventry University Pure Portal

Oskar Bordeaux

Automated Complexity-Sensitive Image Fusion

Author: Jackson Brian Patrick
Publication venue: CORE Scholar
Publication date: 01/01/2014
Field of study

To construct a complete representation of a scene with environmental obstacles such as fog, smoke, darkness, or textural homogeneity, multisensor video streams captured in diferent modalities are considered. A computational method for automatically fusing multimodal image streams into a highly informative and unified stream is proposed. The method consists of the following steps: 1. Image registration is performed to align video frames in the visible band over time, adapting to the nonplanarity of the scene by automatically subdividing the image domain into regions approximating planar patches 2. Wavelet coefficients are computed for each of the input frames in each modality 3. Corresponding regions and points are compared using spatial and temporal information across various scales 4. Decision rules based on the results of multimodal image analysis are used to combine thewavelet coefficients from different modalities 5. The combined wavelet coefficients are inverted to produce an output frame containing useful information gathered from the available modalities Experiments show that the proposed system is capable of producing fused output containing the characteristics of color visible-spectrum imagery while adding information exclusive to infrared imagery, with attractive visual and informational properties

OhioLINK Electronic Thesis and Dissertation Center

CORE

Recommended from our members

Automated tracking and grasping of a moving object with a robotic hand-eye system

Author: Allen Peter K.
Michelman Paul
Timcenko Aleksandar
Yoshimi Billibon
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/1993
Field of study

An attempt to achieve a high level of interaction between a real-time vision system capable of tracking moving objects in 3-D and a robot arm with gripper that can be used to pick up a moving object is described. The interplay of hand-eye coordination in dynamic grasping tasks such as grasping of parts on a moving conveyor system, assembly of articulated parts, or for grasping from a mobile robotic system is explored. The goal is to build an integrated sensing and actuation system that can operate in dynamic as opposed to static environments. The system built addresses three distinct problems in using robotic hand-eye coordination for grasping moving objects: fast computation of 3-D motion parameters from vision, predictive control of a moving robotic arm to track a moving object, and interception and grasping. The system operates at approximately human arm movement rates. Experimental results in which a moving model train is tracked, stably grasped, and picked up by the system are presented. The algorithms developed to relate sensing to actuation are quite general and applicable to a variety of complex robotic tasks

Columbia University Academic Commons

A review of differentiable digital signal processing for music and speech synthesis

Author: Fazekas G
Hayes B
McPherson A
Saitis C
Shier J
Publication venue: Frontiers Media
Publication date: 11/01/2024
Field of study

The term “differentiable digital signal processing” describes a family of techniques in which loss function gradients are backpropagated through digital signal processors, facilitating their integration into neural networks. This article surveys the literature on differentiable audio signal processing, focusing on its use in music and speech synthesis. We catalogue applications to tasks including music performance rendering, sound matching, and voice transformation, discussing the motivations for and implications of the use of this methodology. This is accompanied by an overview of digital signal processing operations that have been implemented differentiably, which is further supported by a web book containing practical advice on differentiable synthesiser programming (https://intro2ddsp.github.io/). Finally, we highlight open challenges, including optimisation pathologies, robustness to real-world conditions, and design trade-offs, and discuss directions for future research

Queen Mary Research Online