such as making cereal and arranging objects in a room (see Fig. 9). For example, the making cereal activity consists of around 12 sub-activities on average, which includes reaching the pitcher, moving the pitcher to the bowl, and then pouring the milk into the bowl. This proves to be a very challenging task given the variability across individuals in performing each sub-activity, and other environment induced conditions such as cluttered background and viewpoint changes. (See Fig. 2 for some examples.) In most previous works, object detection and activity recognition have been addressed as separate tasks. Only recently, some works have shown that modeling mutual context is beneficial (Gupta et al., 2009; Yao and Fei-Fei, 2010). The key idea in our work is to note that, in activity detection, it is sometimes more informative to know how an object is being used (associated affordances, Gibson, 1979) rather than knowing what the object is (i.e. the object category). For example, both chair and sofa might be categorized as ‘sittable, ’ and a cup might be categorized as both ‘drinkable ’ and ‘pourable. ’ Note that the affordances of an object change over time depending on its use, e.g., a pitcher may first be reachable, then movable and finally pourable. In addition to helping activity recognition, recognizing object affordances is important by itself because of their use in robotic applications (e.g., Kormushev et al., 2010; Jiang et al., 2012a; Jiang and Saxena, 2012). We propose a method to learn human activities by modarXiv:1210.1207v
To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.