5 research outputs found
Grounding the Lexical Semantics of Verbs in Visual Perception using Force Dynamics and Event Logic
This paper presents an implemented system for recognizing the occurrence of
events described by simple spatial-motion verbs in short image sequences. The
semantics of these verbs is specified with event-logic expressions that
describe changes in the state of force-dynamic relations between the
participants of the event. An efficient finite representation is introduced for
the infinite sets of intervals that occur when describing liquid and
semi-liquid events. Additionally, an efficient procedure using this
representation is presented for inferring occurrences of compound events,
described with event-logic expressions, from occurrences of primitive events.
Using force dynamics and event logic to specify the lexical semantics of events
allows the system to be more robust than prior systems based on motion profile
Specific-to-General Learning for Temporal Events with Application to Learning Event Definitions from Video
We develop, analyze, and evaluate a novel, supervised, specific-to-general
learner for a simple temporal logic and use the resulting algorithm to learn
visual event definitions from video sequences. First, we introduce a simple,
propositional, temporal, event-description language called AMA that is
sufficiently expressive to represent many events yet sufficiently restrictive
to support learning. We then give algorithms, along with lower and upper
complexity bounds, for the subsumption and generalization problems for AMA
formulas. We present a positive-examples--only specific-to-general learning
method based on these algorithms. We also present a polynomial-time--computable
``syntactic'' subsumption test that implies semantic subsumption without being
equivalent to it. A generalization algorithm based on syntactic subsumption can
be used in place of semantic generalization to improve the asymptotic
complexity of the resulting learning algorithm. Finally, we apply this
algorithm to the task of learning relational event definitions from video and
show that it yields definitions that are competitive with hand-coded ones