Search CORE

31 research outputs found

GMM Mapping Of Visual Features of Cued Speech From Speech Spectral Features

Author: Beautemps Denis
Feng Gang
Ming Zuheng
Publication venue: HAL CCSD
Publication date: 29/08/2013
Field of study

International audienceIn this paper, we present a statistical method based on GMM modeling to map the acoustic speech spectral features to visual features of Cued Speech in the regression criterion of Minimum Mean-Square Error (MMSE) in a low signal level which is innovative and different with the classic text-to-visual approach. Two different training methods for GMM, namely Expectation-Maximization (EM) approach and supervised training method were discussed respectively. In comparison with the GMM based mapping modeling we first present the results with the use of a Multiple-Linear Regression (MLR) model also at the low signal level and study the limitation of the approach. The experimental results demonstrate that the GMM based mapping method can significantly improve the mapping performance compared with the MLR mapping model especially in the sense of the weak linear correlation between the target and the predictor such as the hand positions of Cued Speech and the acoustic speech spectral features

Hal - Université Grenoble Alpes

GMM Mapping Of Visual Features of Cued Speech From Speech Spectral Features

Author: Beautemps Denis
Feng Gang
Ming Zuheng
Publication venue: HAL CCSD
Publication date: 29/08/2013
Field of study

Hal - Université Grenoble Alpes

Multimodal Transformer Using Cross-Channel attention for Object Detection in Remote Sensing Images

Author: Bahaduri Bissmella
Feng Fangchen
Ming Zuheng
Mokraou Anissa
Publication venue
Publication date: 20/10/2023
Field of study

Object detection in Remote Sensing Images (RSI) is a critical task for numerous applications in Earth Observation (EO). Unlike general object detection, object detection in RSI has specific challenges: 1) the scarcity of labeled data in RSI compared to general object detection datasets, and 2) the small objects presented in a high-resolution image with a vast background. To address these challenges, we propose a multimodal transformer exploring multi-source remote sensing data for object detection. Instead of directly combining the multimodal input through a channel-wise concatenation, which ignores the heterogeneity of different modalities, we propose a cross-channel attention module. This module learns the relationship between different channels, enabling the construction of a coherent multimodal input by aligning the different modalities at the early stage. We also introduce a new architecture based on the Swin transformer that incorporates convolution layers in non-shifting blocks while maintaining fixed dimensions, allowing for the generation of fine-to-coarse representations with a favorable accuracy-computation trade-off. The extensive experiments prove the effectiveness of the proposed multimodal fusion module and architecture, demonstrating their applicability to multimodal aerial imagery.Comment: submitted to ICASSP202

arXiv.org e-Print Archive

Facial Action Units Intensity Estimation by the Fusion of Features with Multi-kernel Support Vector Machine

Author: Bugeau Aurélie
Ming Zuheng
Rouas Jean-Luc
Shochi Takaaki
Publication venue: HAL CCSD
Publication date: 01/05/2015
Field of study

International audience— Automatic facial expression recognition has emerged over two decades. The recognition of the posed facial expressions and the detection of Action Units (AUs) of facial expression have already made great progress. More recently, the automatic estimation of the variation of facial expression, either in terms of the intensities of AUs or in terms of the values of dimensional emotions, has emerged in the field of the facial expression analysis. However, discriminating different intensities of AUs is a far more challenging task than AUs detection due to several intractable problems. Aiming to continuing standardized evaluation procedures and surpass the limits of the current research, the second Facial Expression Recognition and Analysis challenge (FERA2015) is presented. In this context, we propose a method using the fusion of the different appearance and geometry features based on a multi-kernel Support Vector Machine (SVM) for the automatic estimation of the intensities of the AUs. The result of our approach benefiting from taking advantages of the different features adapting to a multi-kernel SVM is shown to outperform the conventional methods based on the mono-type feature with single kernel SVM

Crossref

Scientific Publications of the University of Toulouse II Le Mirail

HAL Descartes

Oskar Bordeaux

MIDV-2020: A Comprehensive Benchmark Dataset for Identity Document Analysis

Author: Arlazarov Vladimir V.
Bulatov Konstantin
Burie Jean-Christophe
Chernyshova Yulia
Emelianova Ekaterina
Luqman Muhammad Muzzamil
Ming Zuheng
Sheshkus Alexander
Skoryukina Natalya
Tropin Daniil
Usilin Sergey
Publication venue
Publication date: 01/07/2021
Field of study

Identity documents recognition is an important sub-field of document analysis, which deals with tasks of robust document detection, type identification, text fields recognition, as well as identity fraud prevention and document authenticity validation given photos, scans, or video frames of an identity document capture. Significant amount of research has been published on this topic in recent years, however a chief difficulty for such research is scarcity of datasets, due to the subject matter being protected by security requirements. A few datasets of identity documents which are available lack diversity of document types, capturing conditions, or variability of document field values. In addition, the published datasets were typically designed only for a subset of document recognition problems, not for a complex identity document analysis. In this paper, we present a dataset MIDV-2020 which consists of 1000 video clips, 2000 scanned images, and 1000 photos of 1000 unique mock identity documents, each with unique text field values and unique artificially generated faces, with rich annotation. For the presented benchmark dataset baselines are provided for such tasks as document location and identification, text fields recognition, and face detection. With 72409 annotated images in total, to the date of publication the proposed dataset is the largest publicly available identity documents dataset with variable artificially generated data, and we believe that it will prove invaluable for advancement of the field of document analysis and recognition. The dataset is available for download at ftp://smartengines.com/midv-2020 and http://l3i-share.univ-lr.fr

arXiv.org e-Print Archive

Directory of Open Access Journals

Samara University