14 research outputs found

    Early stages of perceptual organization

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Brain and Cognitive Sciences, 2010.This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.Cataloged from student-submitted PDF version of thesis.Includes bibliographical references.One of the great puzzles of vision science is how, over the course of development, the complex visual array comprising many regions of different colors and luminances is transformed into a sophisticated and meaningful constellation of objects. Gestaltists describe some of the rules that seem to govern a mature parsing of the visual scene, but where do these rules come from? Are they innate--endowed by evolution, or do they come somehow from visual experience? The answer to this question is usually confounded in infant studies as the timelines of maturation and experience are inextricably linked. Here, we describe studies with a special population of late--onset vision patients, which suggest a distinction between those capabilities available innately and those which are crafted via learning from the visual environment. We conclude with a hypothesis, based on these findings and other evidence, that early-available common fate motion cues provide a level of perceptual organization which forms the basis for the learning of subsequent cues.by Yuri Ostrovsky.Ph.D

    Multigranularity Representations for Human Inter-Actions: Pose, Motion and Intention

    Get PDF
    Tracking people and their body pose in videos is a central problem in computer vision. Standard tracking representations reason about temporal coherence of detected people and body parts. They have difficulty tracking targets under partial occlusions or rare body poses, where detectors often fail, since the number of training examples is often too small to deal with the exponential variability of such configurations. We propose tracking representations that track and segment people and their body pose in videos by exploiting information at multiple detection and segmentation granularities when available, whole body, parts or point trajectories. Detections and motion estimates provide contradictory information in case of false alarm detections or leaking motion affinities. We consolidate contradictory information via graph steering, an algorithm for simultaneous detection and co-clustering in a two-granularity graph of motion trajectories and detections, that corrects motion leakage between correctly detected objects, while being robust to false alarms or spatially inaccurate detections. We first present a motion segmentation framework that exploits long range motion of point trajectories and large spatial support of image regions. We show resulting video segments adapt to targets under partial occlusions and deformations. Second, we augment motion-based representations with object detection for dealing with motion leakage. We demonstrate how to combine dense optical flow trajectory affinities with repulsions from confident detections to reach a global consensus of detection and tracking in crowded scenes. Third, we study human motion and pose estimation. We segment hard to detect, fast moving body limbs from their surrounding clutter and match them against pose exemplars to detect body pose under fast motion. We employ on-the-fly human body kinematics to improve tracking of body joints under wide deformations. We use motion segmentability of body parts for re-ranking a set of body joint candidate trajectories and jointly infer multi-frame body pose and video segmentation. We show empirically that such multi-granularity tracking representation is worthwhile, obtaining significantly more accurate multi-object tracking and detailed body pose estimation in popular datasets

    Dynamic surface completion : the joint formation of color, texture, and shape

    Get PDF
    Dynamic surface completion is a phenomenon of visual filling-in where a colored pattern perceptually spreads onto an area confined by virtual contours in a multi-aperture motion display. The spreading effect is qualitatively similar to static texture spreading but widely surpasses it in strength, making it particularly suited for quantitative studies of visual interpolation processes. I carried out six experiments to establish with objective tasks that homogeneous color, as well as non-uniform texture spreading is a genuine representation of surface qualities and thus goes beyond mere contour interpolation. The experiments also serve to relate the phenomena to ongoing discussions about potentially responsible mechanisms for spatiotemporal integration: With a phenomenological method, I examined to what extent simple sensory persistence might be causally involved in the effect under consideration. The findings are partially consistent with the idea of sensory persistence, and indicate that information fragments are integrated over a time window of about 100 to 150 ms to form a complete surface representation

    Challenges for Monocular 6D Object Pose Estimation in Robotics

    Full text link
    Object pose estimation is a core perception task that enables, for example, object grasping and scene understanding. The widely available, inexpensive and high-resolution RGB sensors and CNNs that allow for fast inference based on this modality make monocular approaches especially well suited for robotics applications. We observe that previous surveys on object pose estimation establish the state of the art for varying modalities, single- and multi-view settings, and datasets and metrics that consider a multitude of applications. We argue, however, that those works' broad scope hinders the identification of open challenges that are specific to monocular approaches and the derivation of promising future challenges for their application in robotics. By providing a unified view on recent publications from both robotics and computer vision, we find that occlusion handling, novel pose representations, and formalizing and improving category-level pose estimation are still fundamental challenges that are highly relevant for robotics. Moreover, to further improve robotic performance, large object sets, novel objects, refractive materials, and uncertainty estimates are central, largely unsolved open challenges. In order to address them, ontological reasoning, deformability handling, scene-level reasoning, realistic datasets, and the ecological footprint of algorithms need to be improved.Comment: arXiv admin note: substantial text overlap with arXiv:2302.1182

    Virtual Reality Games for Motor Rehabilitation

    Get PDF
    This paper presents a fuzzy logic based method to track user satisfaction without the need for devices to monitor users physiological conditions. User satisfaction is the key to any product’s acceptance; computer applications and video games provide a unique opportunity to provide a tailored environment for each user to better suit their needs. We have implemented a non-adaptive fuzzy logic model of emotion, based on the emotional component of the Fuzzy Logic Adaptive Model of Emotion (FLAME) proposed by El-Nasr, to estimate player emotion in UnrealTournament 2004. In this paper we describe the implementation of this system and present the results of one of several play tests. Our research contradicts the current literature that suggests physiological measurements are needed. We show that it is possible to use a software only method to estimate user emotion

    Scalable visualization of spatial data in 3D terrain

    Get PDF
    Designing visualizations of spatial data in 3D terrain is challenging because various heterogeneous data aspects need to be considered, including the terrain itself, multiple data attributes, and data uncertainty. It is hardly possible to visualize these data at full detail in a single image. Therefore, this thesis devises a scalable visualization approach that focuses on relevant information to be emphasized, while less-relevant information can be attenuated. In this context, a noval concept of visualizing spatial data in 3D terrain and different soft- and hardware solutions are proposed.Die Erstellung von Visualisierungen für räumliche Daten im 3D-Gelände ist schwierig, da viele heterogene Datenaspekte wie das Gelände selbst, die verschiedenen Datenattribute sowie Unsicherheiten bei der Darstellung zu berücksichtigen sind. Im Allgemeinen ist es nicht möglich, diese Datenaspekte gleichzeitig in einer Visualisierung darzustellen. Daher werden in der Arbeit skalierbare Visualisierungsstrategien entwickelt, welche die wichtigen Informationen hervorheben und trotzdem gleichzeitig Kontextinformationen liefern. Hierfür werden neue Systematisierungen und Konzepte vorgestellt
    corecore