4,030 research outputs found

    Inner Space Preserving Generative Pose Machine

    Full text link
    Image-based generative methods, such as generative adversarial networks (GANs) have already been able to generate realistic images with much context control, specially when they are conditioned. However, most successful frameworks share a common procedure which performs an image-to-image translation with pose of figures in the image untouched. When the objective is reposing a figure in an image while preserving the rest of the image, the state-of-the-art mainly assumes a single rigid body with simple background and limited pose shift, which can hardly be extended to the images under normal settings. In this paper, we introduce an image "inner space" preserving model that assigns an interpretable low-dimensional pose descriptor (LDPD) to an articulated figure in the image. Figure reposing is then generated by passing the LDPD and the original image through multi-stage augmented hourglass networks in a conditional GAN structure, called inner space preserving generative pose machine (ISP-GPM). We evaluated ISP-GPM on reposing human figures, which are highly articulated with versatile variations. Test of a state-of-the-art pose estimator on our reposed dataset gave an accuracy over 80% on PCK0.5 metric. The results also elucidated that our ISP-GPM is able to preserve the background with high accuracy while reasonably recovering the area blocked by the figure to be reposed.Comment: http://www.northeastern.edu/ostadabbas/2018/07/23/inner-space-preserving-generative-pose-machine

    Active Tactile Sensing for Texture Perception in Robotic Systems

    Get PDF
    This thesis presents a comprehensive study of tactile sensing, particularly on the prob- lem of active texture perception. It includes a brief introduction to tactile sensing technology and the neural basis for tactile perception. It follows the literature review of textural percep- tion with tactile sensing. I propose a decoding and perception pipeline to tackle fine-texture classification/identification problems via active touching. Experiments are conducted using a 7DOF robotic arm with a finger-shaped tactile sensor mounted on the end-effector to per- form sliding/rubbing movements on multiple fabrics. Low-dimensional frequency features are extracted from the raw signals to form a perceptive feature space, where tactile signals are mapped and segregated into fabric classes. Fabric classes can be parameterized and sim- plified in the feature space using elliptical equations. Results from experiments of varied control parameters are compared and visualized to show that different exploratory move- ments have an apparent impact on the perceived tactile information. It implies the possibil- ity of optimising the robotic movements to improve the textural classification/identification performance

    AI-generated Content for Various Data Modalities: A Survey

    Full text link
    AI-generated content (AIGC) methods aim to produce text, images, videos, 3D assets, and other media using AI algorithms. Due to its wide range of applications and the demonstrated potential of recent works, AIGC developments have been attracting lots of attention recently, and AIGC methods have been developed for various data modalities, such as image, video, text, 3D shape (as voxels, point clouds, meshes, and neural implicit fields), 3D scene, 3D human avatar (body and head), 3D motion, and audio -- each presenting different characteristics and challenges. Furthermore, there have also been many significant developments in cross-modality AIGC methods, where generative methods can receive conditioning input in one modality and produce outputs in another. Examples include going from various modalities to image, video, 3D shape, 3D scene, 3D avatar (body and head), 3D motion (skeleton and avatar), and audio modalities. In this paper, we provide a comprehensive review of AIGC methods across different data modalities, including both single-modality and cross-modality methods, highlighting the various challenges, representative works, and recent technical directions in each setting. We also survey the representative datasets throughout the modalities, and present comparative results for various modalities. Moreover, we also discuss the challenges and potential future research directions

    Models for Motion Perception

    Get PDF
    As observers move through the environment or shift their direction of gaze, the world moves past them. In addition, there may be objects that are moving differently from the static background, either rigid-body motions or nonrigid (e.g., turbulent) ones. This dissertation discusses several models for motion perception. The models rely on first measuring motion energy, a multi-resolution representation of motion information extracted from image sequences. The image flow model combines the outputs of a set of spatiotemporal motion-energy filters to estimate image velocity, consonant with current views regarding the neurophysiology and psychophysics of motion perception. A parallel implementation computes a distributed representation of image velocity that encodes both a velocity estimate and the uncertainty in that estimate. In addition, a numerical measure of image-flow uncertainty is derived. The egomotion model poses the detection of moving objects and the recovery of depth from motion as sensor fusion problems that necessitate combining information from different sensors in the presence of noise and uncertainty. Image sequences are segmented by finding image regions corresponding to entire objects that are moving differently from the stationary background. The turbulent flow model utilizes a fractal-based model of turbulence, and estimates the fractal scaling parameter of fractal image sequences from the outputs of motion-energy filters. Some preliminary results demonstrate the model\u27s potential for discriminating image regions based on fractal scaling

    Path Tracking by a Mobile Robot Equipped with Only a Downward Facing Camera

    Get PDF
    This paper presents a practical path-tracking method for a mobile robot with only a downward camera facing the passage plane. A unique algorithm for tracking and searching ground images with natural texture is used to localize the robot without a feature-point extraction scheme commonly used in other visual odometry methods. In our tracking algorithm, groups of reference pixels are used to detect the relative translation and rotation between frames. Furthermore, a reference pixel group of another shape is registered both to record a path and to correct errors accumulated during localization. All image processing and robot control operations are carried out with low memory consumption for image registration and fast calculation times for completing the searches on a laptop PC. We also describe experimental results in which a vehicle developed by the proposed method repeatedly performed precise path tracking under indoor and outdoor environments

    Acquisition, Modeling, and Augmentation of Reflectance for Synthetic Optical Flow Reference Data

    Get PDF
    This thesis is concerned with the acquisition, modeling, and augmentation of material reflectance to simulate high-fidelity synthetic data for computer vision tasks. The topic is covered in three chapters: I commence with exploring the upper limits of reflectance acquisition. I analyze state-of-the-art BTF reflectance field renderings and show that they can be applied to optical flow performance analysis with closely matching performance to real-world images. Next, I present two methods for fitting efficient BRDF reflectance models to measured BTF data. Both methods combined retain all relevant reflectance information as well as the surface normal details on a pixel level. I further show that the resulting synthesized images are suited for optical flow performance analysis, with a virtually identical performance for all material types. Finally, I present a novel method for augmenting real-world datasets with physically plausible precipitation effects, including ground surface wetting, water droplets on the windshield, and water spray and mists. This is achieved by projecting the realworld image data onto a reconstructed virtual scene, manipulating the scene and the surface reflectance, and performing unbiased light transport simulation of the precipitation effects

    SEGMENTATION, RECOGNITION, AND ALIGNMENT OF COLLABORATIVE GROUP MOTION

    Get PDF
    Modeling and recognition of human motion in videos has broad applications in behavioral biometrics, content-based visual data analysis, security and surveillance, as well as designing interactive environments. Significant progress has been made in the past two decades by way of new models, methods, and implementations. In this dissertation, we focus our attention on a relatively less investigated sub-area called collaborative group motion analysis. Collaborative group motions are those that typically involve multiple objects, wherein the motion patterns of individual objects may vary significantly in both space and time, but the collective motion pattern of the ensemble allows characterization in terms of geometry and statistics. Therefore, the motions or activities of an individual object constitute local information. A framework to synthesize all local information into a holistic view, and to explicitly characterize interactions among objects, involves large scale global reasoning, and is of significant complexity. In this dissertation, we first review relevant previous contributions on human motion/activity modeling and recognition, and then propose several approaches to answer a sequence of traditional vision questions including 1) which of the motion elements among all are the ones relevant to a group motion pattern of interest (Segmentation); 2) what is the underlying motion pattern (Recognition); and 3) how two motion ensembles are similar and how we can 'optimally' transform one to match the other (Alignment). Our primary practical scenario is American football play, where the corresponding problems are 1) who are offensive players; 2) what are the offensive strategy they are using; and 3) whether two plays are using the same strategy and how we can remove the spatio-temporal misalignment between them due to internal or external factors. The proposed approaches discard traditional modeling paradigm but explore either concise descriptors, hierarchies, stochastic mechanism, or compact generative model to achieve both effectiveness and efficiency. In particular, the intrinsic geometry of the spaces of the involved features/descriptors/quantities is exploited and statistical tools are established on these nonlinear manifolds. These initial attempts have identified new challenging problems in complex motion analysis, as well as in more general tasks in video dynamics. The insights gained from nonlinear geometric modeling and analysis in this dissertation may hopefully be useful toward a broader class of computer vision applications

    Depth Perception, Cueing, and Control

    Get PDF
    Humans rely on a variety of visual cues to inform them of the depth or range of a particular object or feature. Some cues are provided by physiological mechanisms, others from pictorial cues that are interpreted psychologically, and still others by the relative motions of objects or features induced by observer (or vehicle) motions. These cues provide different levels of information (ordinal, relative, absolute) and saliency depending upon depth, task, and interaction with other cues. Display technologies used for head-down and head-up displays, as well as out-the-window displays, have differing capabilities for providing depth cueing information to the observeroperator. In addition to technologies, display content and the source (camera sensor versus computer rendering) provide varying degrees of cue information. Additionally, most displays create some degree of cue conflict. In this paper, visual depth cues and their interactions will be discussed, as well as display technology and content and related artifacts. Lastly, the role of depth cueing in performing closed-loop control tasks will be discussed
    • …
    corecore