Search CORE

123 research outputs found

Linguistically-driven framework for computationally efficient and scalable sign recognition

Author: Dilsizian Mark
Metaxas Dimitris N.
Neidle Carol
Publication venue: European Language Resources Association (ELRA)
Publication date: 01/01/2018
Field of study

We introduce a new general framework for sign recognition from monocular video using limited quantities of annotated data. The novelty of the hybrid framework we describe here is that we exploit state-of-the art learning methods while also incorporating features based on what we know about the linguistic composition of lexical signs. In particular, we analyze hand shape, orientation, location, and motion trajectories, and then use CRFs to combine this linguistically significant information for purposes of sign recognition. Our robust modeling and recognition of these sub-components of sign production allow an efficient parameterization of the sign recognition problem as compared with purely data-driven methods. This parameterization enables a scalable and extendable time-series learning approach that advances the state of the art in sign recognition, as shown by the results reported here for recognition of isolated, citation-form, lexical signs from American Sign Language (ASL)

Boston University Institutional Repository (OpenBU)

Multispectral Deep Neural Networks for Pedestrian Detection

Author: Liu Jingjing
Metaxas Dimitris N.
Wang Shu
Zhang Shaoting
Publication venue
Publication date: 01/01/2016
Field of study

Multispectral pedestrian detection is essential for around-the-clock applications, e.g., surveillance and autonomous driving. We deeply analyze Faster R-CNN for multispectral pedestrian detection task and then model it into a convolutional network (ConvNet) fusion problem. Further, we discover that ConvNet-based pedestrian detectors trained by color or thermal images separately provide complementary information in discriminating human instances. Thus there is a large potential to improve pedestrian detection by using color and thermal images in DNNs simultaneously. We carefully design four ConvNet fusion architectures that integrate two-branch ConvNets on different DNNs stages, all of which yield better performance compared with the baseline detector. Our experimental results on KAIST pedestrian benchmark show that the Halfway Fusion model that performs fusion on the middle-level convolutional features outperforms the baseline method by 11% and yields a missing rate 3.5% lower than the other proposed architectures.Comment: 13 pages, 8 figures, BMVC 2016 ora

arXiv.org e-Print Archive

Crossref

Semantic Graph Convolutional Networks for 3D Human Pose Regression

Author: Kapadia Mubbasir
Metaxas Dimitris N.
Peng Xi
Tian Yu
Zhao Long
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 08/03/2020
Field of study

In this paper, we study the problem of learning Graph Convolutional Networks (GCNs) for regression. Current architectures of GCNs are limited to the small receptive field of convolution filters and shared transformation matrix for each node. To address these limitations, we propose Semantic Graph Convolutional Networks (SemGCN), a novel neural network architecture that operates on regression tasks with graph-structured data. SemGCN learns to capture semantic information such as local and global node relationships, which is not explicitly represented in the graph. These semantic relationships can be learned through end-to-end training from the ground truth without additional supervision or hand-crafted rules. We further investigate applying SemGCN to 3D human pose regression. Our formulation is intuitive and sufficient since both 2D and 3D human poses can be represented as a structured graph encoding the relationships between joints in the skeleton of a human body. We carry out comprehensive studies to validate our method. The results prove that SemGCN outperforms state of the art while using 90% fewer parameters.Comment: In CVPR 2019 (13 pages including supplementary material). The code can be found at https://github.com/garyzhao/SemGC

arXiv.org e-Print Archive

Crossref

DiffSLVA: Harnessing Diffusion Models for Sign Language Video Anonymization

Author: Metaxas Dimitris N.
Neidle Carol
Xia Zhaoyang
Publication venue
Publication date: 27/11/2023
Field of study

Since American Sign Language (ASL) has no standard written form, Deaf signers frequently share videos in order to communicate in their native language. However, since both hands and face convey critical linguistic information in signed languages, sign language videos cannot preserve signer privacy. While signers have expressed interest, for a variety of applications, in sign language video anonymization that would effectively preserve linguistic content, attempts to develop such technology have had limited success, given the complexity of hand movements and facial expressions. Existing approaches rely predominantly on precise pose estimations of the signer in video footage and often require sign language video datasets for training. These requirements prevent them from processing videos 'in the wild,' in part because of the limited diversity present in current sign language video datasets. To address these limitations, our research introduces DiffSLVA, a novel methodology that utilizes pre-trained large-scale diffusion models for zero-shot text-guided sign language video anonymization. We incorporate ControlNet, which leverages low-level image features such as HED (Holistically-Nested Edge Detection) edges, to circumvent the need for pose estimation. Additionally, we develop a specialized module dedicated to capturing facial expressions, which are critical for conveying essential linguistic information in signed languages. We then combine the above methods to achieve anonymization that better preserves the essential linguistic content of the original signer. This innovative methodology makes possible, for the first time, sign language video anonymization that could be used for real-world applications, which would offer significant benefits to the Deaf and Hard-of-Hearing communities. We demonstrate the effectiveness of our approach with a series of signer anonymization experiments.Comment: Project webpage: https://github.com/Jeffery9707/DiffSLV

arXiv.org e-Print Archive

Three-Dimensional Motion Reconstruction and Analysis of the Right Ventricle Using Tagged MRI

Author: Axel Leon
Haber Idith
Metaxas Dimitris N
Publication venue: ScholarlyCommons
Publication date: 01/01/1999
Field of study

Right ventricular (RV) dysfunction can serve as an indicator of heart and lung disease and can adversely affect the left ventricle (LV). However, normal RV function must be characterized before abnormal states can be detected. We can describe a method for reconstructing the 3D motion of the RV images by fitting of a deformable model to extracted tag and contour data from multiview tagged magnetic resonance images(MRI). The deformable model is a biventricular finite element mesh built directly from the contours. Our approach accommodates the geometrically complex RV by using the entire lengths of the tags, localized degrees of freedom (DOFs), and finite elements for geometric modeling. We convert the results of the reconstruction into potentially useful motion variables, such as strains and displacements. The fitting technique is applied to synthetic data, two normal hearts, and a heart with right ventricular hypertrophy (RVH). The results in this paper are limited to the RV free wall and septum. We find noticeable differences between the motion variables calculated for the normal volunteers and the RVH patient

CiteSeerX

ScholarlyCommons@Penn