124 research outputs found
Linguistically-driven framework for computationally efficient and scalable sign recognition
We introduce a new general framework for sign recognition from monocular video using limited quantities of annotated data. The novelty of the hybrid framework we describe here is that we exploit state-of-the art learning methods while also incorporating features based on what we know about the linguistic composition of lexical signs. In particular, we analyze hand shape, orientation, location, and motion trajectories, and then use CRFs to combine this linguistically significant information for purposes of sign recognition. Our robust modeling and recognition of these sub-components of sign production allow an efficient parameterization of the sign recognition problem as compared with purely data-driven methods. This parameterization enables a scalable and extendable time-series learning approach that advances the state of the art in sign recognition, as shown by the results reported here for recognition of isolated, citation-form, lexical signs from American Sign Language (ASL)
Multispectral Deep Neural Networks for Pedestrian Detection
Multispectral pedestrian detection is essential for around-the-clock
applications, e.g., surveillance and autonomous driving. We deeply analyze
Faster R-CNN for multispectral pedestrian detection task and then model it into
a convolutional network (ConvNet) fusion problem. Further, we discover that
ConvNet-based pedestrian detectors trained by color or thermal images
separately provide complementary information in discriminating human instances.
Thus there is a large potential to improve pedestrian detection by using color
and thermal images in DNNs simultaneously. We carefully design four ConvNet
fusion architectures that integrate two-branch ConvNets on different DNNs
stages, all of which yield better performance compared with the baseline
detector. Our experimental results on KAIST pedestrian benchmark show that the
Halfway Fusion model that performs fusion on the middle-level convolutional
features outperforms the baseline method by 11% and yields a missing rate 3.5%
lower than the other proposed architectures.Comment: 13 pages, 8 figures, BMVC 2016 ora
Semantic Graph Convolutional Networks for 3D Human Pose Regression
In this paper, we study the problem of learning Graph Convolutional Networks
(GCNs) for regression. Current architectures of GCNs are limited to the small
receptive field of convolution filters and shared transformation matrix for
each node. To address these limitations, we propose Semantic Graph
Convolutional Networks (SemGCN), a novel neural network architecture that
operates on regression tasks with graph-structured data. SemGCN learns to
capture semantic information such as local and global node relationships, which
is not explicitly represented in the graph. These semantic relationships can be
learned through end-to-end training from the ground truth without additional
supervision or hand-crafted rules. We further investigate applying SemGCN to 3D
human pose regression. Our formulation is intuitive and sufficient since both
2D and 3D human poses can be represented as a structured graph encoding the
relationships between joints in the skeleton of a human body. We carry out
comprehensive studies to validate our method. The results prove that SemGCN
outperforms state of the art while using 90% fewer parameters.Comment: In CVPR 2019 (13 pages including supplementary material). The code
can be found at https://github.com/garyzhao/SemGC
DiffSLVA: Harnessing Diffusion Models for Sign Language Video Anonymization
Since American Sign Language (ASL) has no standard written form, Deaf signers
frequently share videos in order to communicate in their native language.
However, since both hands and face convey critical linguistic information in
signed languages, sign language videos cannot preserve signer privacy. While
signers have expressed interest, for a variety of applications, in sign
language video anonymization that would effectively preserve linguistic
content, attempts to develop such technology have had limited success, given
the complexity of hand movements and facial expressions. Existing approaches
rely predominantly on precise pose estimations of the signer in video footage
and often require sign language video datasets for training. These requirements
prevent them from processing videos 'in the wild,' in part because of the
limited diversity present in current sign language video datasets. To address
these limitations, our research introduces DiffSLVA, a novel methodology that
utilizes pre-trained large-scale diffusion models for zero-shot text-guided
sign language video anonymization. We incorporate ControlNet, which leverages
low-level image features such as HED (Holistically-Nested Edge Detection)
edges, to circumvent the need for pose estimation. Additionally, we develop a
specialized module dedicated to capturing facial expressions, which are
critical for conveying essential linguistic information in signed languages. We
then combine the above methods to achieve anonymization that better preserves
the essential linguistic content of the original signer. This innovative
methodology makes possible, for the first time, sign language video
anonymization that could be used for real-world applications, which would offer
significant benefits to the Deaf and Hard-of-Hearing communities. We
demonstrate the effectiveness of our approach with a series of signer
anonymization experiments.Comment: Project webpage: https://github.com/Jeffery9707/DiffSLV
Three-Dimensional Motion Reconstruction and Analysis of the Right Ventricle Using Tagged MRI
Right ventricular (RV) dysfunction can serve as an indicator of heart and lung disease and can adversely affect the left ventricle (LV). However, normal RV function must be characterized before abnormal states can be detected. We can describe a method for reconstructing the 3D motion of the RV images by fitting of a deformable model to extracted tag and contour data from multiview tagged magnetic resonance images(MRI). The deformable model is a biventricular finite element mesh built directly from the contours. Our approach accommodates the geometrically complex RV by using the entire lengths of the tags, localized degrees of freedom (DOFs), and finite elements for geometric modeling. We convert the results of the reconstruction into potentially useful motion variables, such as strains and displacements. The fitting technique is applied to synthetic data, two normal hearts, and a heart with right ventricular hypertrophy (RVH). The results in this paper are limited to the RV free wall and septum. We find noticeable differences between the motion variables calculated for the normal volunteers and the RVH patient
- …