27,382 research outputs found
Deformable Object Tracking with Gated Fusion
The tracking-by-detection framework receives growing attentions through the
integration with the Convolutional Neural Networks (CNNs). Existing
tracking-by-detection based methods, however, fail to track objects with severe
appearance variations. This is because the traditional convolutional operation
is performed on fixed grids, and thus may not be able to find the correct
response while the object is changing pose or under varying environmental
conditions. In this paper, we propose a deformable convolution layer to enrich
the target appearance representations in the tracking-by-detection framework.
We aim to capture the target appearance variations via deformable convolution,
which adaptively enhances its original features. In addition, we also propose a
gated fusion scheme to control how the variations captured by the deformable
convolution affect the original appearance. The enriched feature representation
through deformable convolution facilitates the discrimination of the CNN
classifier on the target object and background. Extensive experiments on the
standard benchmarks show that the proposed tracker performs favorably against
state-of-the-art methods
A new framework for sign language recognition based on 3D handshape identification and linguistic modeling
Current approaches to sign recognition by computer generally have at least some of the following limitations: they rely on laboratory
conditions for sign production, are limited to a small vocabulary, rely on 2D modeling (and therefore cannot deal with occlusions
and off-plane rotations), and/or achieve limited success. Here we propose a new framework that (1) provides a new tracking method
less dependent than others on laboratory conditions and able to deal with variations in background and skin regions (such as the
face, forearms, or other hands); (2) allows for identification of 3D hand configurations that are linguistically important in American
Sign Language (ASL); and (3) incorporates statistical information reflecting linguistic constraints in sign production. For purposes of
large-scale computer-based sign language recognition from video, the ability to distinguish hand configurations accurately is critical.
Our current method estimates the 3D hand configuration to distinguish among 77 hand configurations linguistically relevant for
ASL. Constraining the problem in this way makes recognition of 3D hand configuration more tractable and provides the information
specifically needed for sign recognition. Further improvements are obtained by incorporation of statistical information about linguistic
dependencies among handshapes within a sign derived from an annotated corpus of almost 10,000 sign tokens
- …