13 research outputs found

    Grounding the Lexical Semantics of Verbs in Visual Perception using Force Dynamics and Event Logic

    Full text link
    This paper presents an implemented system for recognizing the occurrence of events described by simple spatial-motion verbs in short image sequences. The semantics of these verbs is specified with event-logic expressions that describe changes in the state of force-dynamic relations between the participants of the event. An efficient finite representation is introduced for the infinite sets of intervals that occur when describing liquid and semi-liquid events. Additionally, an efficient procedure using this representation is presented for inferring occurrences of compound events, described with event-logic expressions, from occurrences of primitive events. Using force dynamics and event logic to specify the lexical semantics of events allows the system to be more robust than prior systems based on motion profile

    Detecting Hand-Ball Events in Video

    Get PDF
    We analyze videos in which a hand interacts with a basketball. In this work, we present a computational system which detects and classifies hand-ball events, given the trajectories of a hand and ball. Our approach is to determine non-gravitational parts of the ball's motion using only the motion of the hand as a reliable cue for hand-ball events. This thesis makes three contributions. First, we show that hand motion can be segmented using piecewise fifth-order polynomials inspired by work in motor control. We make the remarkable experimental observation that hand-ball events have a phenomenal correspondence to the segmentation breakpoints. Second, by fitting a context-dependent gravitational model to the ball over an adaptive window, we can isolate places where the hand is causing non-gravitational motion of the ball. Finally, given a precise segmentation, we use the measured velocity steps (force impulses) on the ball to detect and classify various event types

    Visual impressions of pushing and pulling: the object perceived as causal is not always the one that moves first

    Get PDF
    Stimuli were presented in which a moving object (A) contacted a stationary object (B), whereupon both objects moved back in the direction from which object A had come. When object B rapidly decelerated to a standstill, so that the two objects did not remain in contact, object B was perceived as pushing object A. Thus, even though object B only moved when contacted by object A, it was perceived as the causal object, as making something happen to object A. This is contrary to the hypothesis that the object perceived as causal is always the object that moves first. It is, however, consistent with a theoretical account, in which visual causal impressions occur through a process in which visually picked-up kinematic information is matched to stored representations, based on experiences of actions on objects, which specify forces and causality as part of the perceptual interpretation of the event

    Contextual information based multimedia indexing

    Get PDF
    Master'sMASTER OF ENGINEERIN

    Towards gestural understanding for intelligent robots

    Get PDF
    Fritsch JN. Towards gestural understanding for intelligent robots. Bielefeld: Universität Bielefeld; 2012.A strong driving force of scientific progress in the technical sciences is the quest for systems that assist humans in their daily life and make their life easier and more enjoyable. Nowadays smartphones are probably the most typical instances of such systems. Another class of systems that is getting increasing attention are intelligent robots. Instead of offering a smartphone touch screen to select actions, these systems are intended to offer a more natural human-machine interface to their users. Out of the large range of actions performed by humans, gestures performed with the hands play a very important role especially when humans interact with their direct surrounding like, e.g., pointing to an object or manipulating it. Consequently, a robot has to understand such gestures to offer an intuitive interface. Gestural understanding is, therefore, a key capability on the way to intelligent robots. This book deals with vision-based approaches for gestural understanding. Over the past two decades, this has been an intensive field of research which has resulted in a variety of algorithms to analyze human hand motions. Following a categorization of different gesture types and a review of other sensing techniques, the design of vision systems that achieve hand gesture understanding for intelligent robots is analyzed. For each of the individual algorithmic steps – hand detection, hand tracking, and trajectory-based gesture recognition – a separate Chapter introduces common techniques and algorithms and provides example methods. The resulting recognition algorithms are considering gestures in isolation and are often not sufficient for interacting with a robot who can only understand such gestures when incorporating the context like, e.g., what object was pointed at or manipulated. Going beyond a purely trajectory-based gesture recognition by incorporating context is an important prerequisite to achieve gesture understanding and is addressed explicitly in a separate Chapter of this book. Two types of context, user-provided context and situational context, are reviewed and existing approaches to incorporate context for gestural understanding are reviewed. Example approaches for both context types provide a deeper algorithmic insight into this field of research. An overview of recent robots capable of gesture recognition and understanding summarizes the currently realized human-robot interaction quality. The approaches for gesture understanding covered in this book are manually designed while humans learn to recognize gestures automatically during growing up. Promising research targeted at analyzing developmental learning in children in order to mimic this capability in technical systems is highlighted in the last Chapter completing this book as this research direction may be highly influential for creating future gesture understanding systems

    Statistical dependence estimation for object interaction and matching

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2006.Includes bibliographical references (p. 97-103).This dissertation shows how statistical dependence estimation underlies two key problems in visual surveillance and wide-area tracking. The first problem is to detect and describe interactions between moving objects. The goal is to measure the influence objects exert on one another. The second problem is to match objects between non-overlapping cameras. There, the goal is to pair the departures in one camera with the arrivals in a different camera so that the resulting distribution of relationships best models the data. Both problems have become important for scaling up surveillance systems to larger areas and expanding the monitoring to more interesting behaviors. We show how statistical dependence estimation generalizes previous work and may have applications in other areas. The two problems represent different applications of our thesis that statistical dependence estimation underlies the learning of the structure of probabilistic models. First, we analyze the relationship between Bayesian, information-theoretic, and classical statistical methods for statistical dependence estimation. Then, we apply these ideas to formulate object interaction in terms of dependency structure model selection.(cont.) We describe experiments on simulated and real interaction data to validate our approach. Second, we formulate the matching problem in terms of maximizing statistical dependence. This allows us to generalize previous work on matching, and we show improved results on simulated and real data for non-overlapping cameras. We also prove an intractability result on exact maximally dependent matching.by Kinh Tieu.Ph.D

    Categorical organization and machine perception of oscillatory motion patterns

    Get PDF
    Thesis (Ph.D.)--Massachusetts Institute of Technology, Dept. of Architecture, 2000.Includes bibliographical references (p. 126-132).Many animal behaviors consist of using special patterns of motion for communication, with certain types of movements appearing widely across animal species. Oscillatory motions in particular are quite prevalent, where many of these repetitive movements can be characterized by a simple sinusoidal model with very specific and limited parameter values. We develop a computational model of categorical perception of these motion patterns based on their inherent structural regularity. The model proposes the initial construction of a hierarchical ordering of the model parameters to partition them into sub-categorical specializations. This organization is then used to specify the types and layout of localized computations required for the corresponding visual recognition system. The goal here is to do away with ad hoc motion recognition methods of computer vision, and instead exploit the underlying structural description for a motion category as a motivating mechanism for recognition. We implement this framework and present an analysis of the approach with synthetic and real oscillatory motions, and demonstrate its applicability within an interactive artificial life environment. With this categorical foundation for the description and recognition of related motions, we gain insight into the basis and development of a machine vision system designed to recognize these patterns.by James W. Davis.Ph.D

    The computational perception of scene dynamics

    No full text
    grantor: University of TorontoUnderstanding observations of image sequences requires one to reason about 'qualitative scene dynamics'. For example, on observing a hand lifting a cup, we may infer that an 'active' hand is applying an upwards force (by grasping) on a 'passive' cup. In order to perform such reasoning, we require an 'ontology' that describes object properties and the generation and transfer of forces in the scene. Such an ontology should include, for example: the presence of gravity, the presence of a ground plane, whether objects are 'active' or 'passive', whether objects are contacting and/or attached to other objects, and so on. In this work we make these ideas precise by presenting an implemented computational system that derives symbolic force-dynamic descriptions from video sequences. Our approach to scene dynamics is based on an analysis of the Newtonian mechanics of a simplified scene model. The critical requirement is that, given image sequences, we can obtain estimates for the shape and motion of the objects in the scene. To do this, we assume that the objects can be approximated by a two-dimensional 'layered' scene model. The input to our system consists of a set of polygonal outlines along with estimates for their velocities and accelerations, obtained from a view-based tracker. Given such input, we present a system that extracts force-dynamic descriptions for the image sequence. We provide computational examples to demonstrate that our ontology is sufficiently rich to describe a wide variety of image sequences. This work makes three central contributions. First, we provide an ontology suitable for describing object properties and the generation and transfer of forces in the scene. Second, we provide a computational procedure to test the feasibility of such interpretations by reducing the problem to a feasibility test in linear programming. Finally, we provide a theory of preference ordering between multiple interpretations along with an efficient computational procedure to determine maximal elements in such orderings.Ph.D
    corecore