The goal of this work is to detect hand and arm positions over continuous sign language video sequences of more than one hour in length. We cast the problem as inference in a generative model of the image. Under this model, limb detection is expensive due to the very large number of\ud possible configurations each part can assume. We make the following contributions to reduce this cost: (i) using efficient sampling from a pictorial structure proposal distribution to obtain reasonable configurations; (ii) identifying a large set of frames where correct configurations can be inferred, and using temporal tracking elsewhere. Results are reported for signing footage with changing background, challenging image conditions, and different signers; and we show that the method is able to identify the true arm and hand locations. The results exceed the state-of-the-art for the length and stability of continuous limb tracking
To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.