27 research outputs found
FEAFA: A Well-Annotated Dataset for Facial Expression Analysis and 3D Facial Animation
Facial expression analysis based on machine learning requires large number of
well-annotated data to reflect different changes in facial motion. Publicly
available datasets truly help to accelerate research in this area by providing
a benchmark resource, but all of these datasets, to the best of our knowledge,
are limited to rough annotations for action units, including only their
absence, presence, or a five-level intensity according to the Facial Action
Coding System. To meet the need for videos labeled in great detail, we present
a well-annotated dataset named FEAFA for Facial Expression Analysis and 3D
Facial Animation. One hundred and twenty-two participants, including children,
young adults and elderly people, were recorded in real-world conditions. In
addition, 99,356 frames were manually labeled using Expression Quantitative
Tool developed by us to quantify 9 symmetrical FACS action units, 10
asymmetrical (unilateral) FACS action units, 2 symmetrical FACS action
descriptors and 2 asymmetrical FACS action descriptors, and each action unit or
action descriptor is well-annotated with a floating point number between 0 and
1. To provide a baseline for use in future research, a benchmark for the
regression of action unit values based on Convolutional Neural Networks are
presented. We also demonstrate the potential of our FEAFA dataset for 3D facial
animation. Almost all state-of-the-art algorithms for facial animation are
achieved based on 3D face reconstruction. We hence propose a novel method that
drives virtual characters only based on action unit value regression of the 2D
video frames of source actors.Comment: 9 pages, 7 figure
K-Space-Aware Cross-Modality Score for Synthesized Neuroimage Quality Assessment
The problem of how to assess cross-modality medical image synthesis has been
largely unexplored. The most used measures like PSNR and SSIM focus on
analyzing the structural features but neglect the crucial lesion location and
fundamental k-space speciality of medical images. To overcome this problem, we
propose a new metric K-CROSS to spur progress on this challenging problem.
Specifically, K-CROSS uses a pre-trained multi-modality segmentation network to
predict the lesion location, together with a tumor encoder for representing
features, such as texture details and brightness intensities. To further
reflect the frequency-specific information from the magnetic resonance imaging
principles, both k-space features and vision features are obtained and employed
in our comprehensive encoders with a frequency reconstruction penalty. The
structure-shared encoders are designed and constrained with a similarity loss
to capture the intrinsic common structural information for both modalities. As
a consequence, the features learned from lesion regions, k-space, and
anatomical structures are all captured, which serve as our quality evaluators.
We evaluate the performance by constructing a large-scale cross-modality
neuroimaging perceptual similarity (NIRPS) dataset with 6,000 radiologist
judgments. Extensive experiments demonstrate that the proposed method
outperforms other metrics, especially in comparison with the radiologists on
NIRPS
IM-IAD: Industrial Image Anomaly Detection Benchmark in Manufacturing
Image anomaly detection (IAD) is an emerging and vital computer vision task
in industrial manufacturing (IM). Recently many advanced algorithms have been
published, but their performance deviates greatly. We realize that the lack of
actual IM settings most probably hinders the development and usage of these
methods in real-world applications. As far as we know, IAD methods are not
evaluated systematically. As a result, this makes it difficult for researchers
to analyze them because they are designed for different or special cases. To
solve this problem, we first propose a uniform IM setting to assess how well
these algorithms perform, which includes several aspects, i.e., various levels
of supervision (unsupervised vs. semi-supervised), few-shot learning, continual
learning, noisy labels, memory usage, and inference speed. Moreover, we
skillfully build a comprehensive image anomaly detection benchmark (IM-IAD)
that includes 16 algorithms on 7 mainstream datasets with uniform settings. Our
extensive experiments (17,017 in total) provide in-depth insights for IAD
algorithm redesign or selection under the IM setting. Next, the proposed
benchmark IM-IAD gives challenges as well as directions for the future. To
foster reproducibility and accessibility, the source code of IM-IAD is uploaded
on the website, https://github.com/M-3LAB/IM-IAD
Multimodal ultrasound imaging: a method to improve the accuracy of sentinel lymph node diagnosis in breast cancer
AimThis study assessed the utility of multimodal ultrasound in enhancing the accuracy of breast cancer sentinel lymph node (SLN) assessment and compared it with single-modality ultrasound.MethodsPreoperative examinations, including two-dimensional ultrasound (2D US), intradermal contrast-enhanced ultrasound (CEUS), intravenous CEUS, shear-wave elastography (SWE), and surface localization, were conducted on 86 SLNs from breast cancer patients. The diagnostic performance of single and multimodal approaches for detecting metastatic SLNs was compared to postoperative pathological results.ResultsAmong the 86 SLNs, 29 were pathologically diagnosed as metastatic, and 57 as non-metastatic. Single-modality ultrasounds had AUC values of 0.826 (intradermal CEUS), 0.705 (intravenous CEUS), 0.678 (2D US), and 0.677 (SWE), respectively. Intradermal CEUS significantly outperformed the other methods (p<0.05), while the remaining three methods had no statistically significant differences (p>0.05). Multimodal ultrasound, combining intradermal CEUS, intravenous CEUS, 2D US, and SWE, achieved an AUC of 0.893, with 86.21% sensitivity and 84.21% specificity. The DeLong test confirmed that multimodal ultrasound was significantly better than the four single-modal ultrasound methods (p<0.05). Decision curve analysis and clinical impact curves demonstrated the superior performance of multimodal ultrasound in identifying high-risk SLN patients.ConclusionMultimodal ultrasound improves breast cancer SLN identification and diagnostic accuracy
Applications of artificial intelligence in children and elderly care and short video industries - Cases from Cubo Ai and Tiktok
Ever since the concept of Artificial Intelligence (AI) was first coined in 1955, the quest for sophistication and improvement of existing technologies paved the way for the continuous development of AI technologies. Nowadays, AI technologies are redefining and disrupting the way people work and live in many different domains. This paper mainly focuses on AI applications in two fields closely related to people's life - children & elderly care and short video industries. It first introduces several prevailing AI technologies applied in children & elderly care and short video industries, and then uses two case studies from Cubo Ai and Tiktok to elaborate the applications in the corresponding fields
A Coarse-to-Fine Facial Landmark Detection Method Based on Self-attention Mechanism
© 1999-2012 IEEE. Facial landmark detection in the wild remains a challenging problem in computer vision. Deep learning-based methods currently play a leading role in solving this. However, these approaches generally focus on local feature learning and ignore global relationships. Therefore, in this study, a self-attention mechanism is introduced into facial landmark detection. Specifically, a coarse-to-fine facial landmark detection method is proposed that uses two stacked hourglasses as the backbone, with a new landmark-guided self-attention (LGSA) block inserted between them. The LGSA block learns the global relationships between different positions on the feature map and allows feature learning to focus on the locations of landmarks with the help of a landmark-specific attention map, which is generated in the first-stage hourglass model. A novel attentional consistency loss is also proposed to ensure the generation of an accurate landmark-specific attention map. A new channel transformation block is used as the building block of the hourglass model to improve the model\u27s capacity. The coarse-to-fine strategy is adopted during and between phases to reduce complexity. Extensive experimental results on public datasets demonstrate the superiority of our proposed method against state-of-the-art models
TPE: Lightweight Transformer Photo Enhancement Based on Curve Adjustment
In recent years, learning-based methods have made great progress in the field of photo enhancement. However, the enhancement methods rely on complex network structures and consume excessive amounts of computing resources, which greatly increases the difficulty of their deployment on lightweight devices. Additionally, the methods have poor real-time performance when processing very large resolution images. In contrast to previous works on designing structurally diverse CNN networks, photo enhancement can be achieved through a lightweight self-attentive mechanism for global-local tuning. In this paper, we design a lightweight photo enhancement tool based on Transformer; we dub the tool, TPE. TPE captures long-range dependencies among image patches and can efficiently extract the structural relationships within an image. A multistage curve adjustment strategy overcomes the problem of the limited adjustment capabilities of the global adjustment function, allowing the method to combine both global modifications and local fine-tuning. Experiments on various benchmarks demonstrate the qualitative and quantitative advantages of TPE over state-of-the-art methods in photo retouching and low-light image enhancement tasks
Tiny adversarial multi-objective one-shot neural architecture search
Xie G, Wang J, Yu G, Lyu J, Zheng F, Jin Y. Tiny adversarial multi-objective one-shot neural architecture search. Complex & Intelligent Systems. 2023.The widely employed tiny neural networks (TNNs) in mobile devices are vulnerable to adversarial attacks. However, more advanced research on the robustness of TNNs is highly in demand. This work focuses on improving the robustness of TNNs without sacrificing the model’s accuracy. To find the optimal trade-off networks in terms of the adversarial accuracy, clean accuracy, and model size, we present TAM-NAS, a tiny adversarial multi-objective one-shot network architecture search method. First, we build a novel search space comprised of new tiny blocks and channels to establish a balance between the model size and adversarial performance. Then, we demonstrate how the supernet facilitates the acquisition of the optimal subnet under white-box adversarial attacks, provided that the supernet significantly impacts the subnet’s performance. Concretely, we investigate a new adversarial training paradigm by evaluating the adversarial transferability, the width of the supernet, and the distinction between training subnets from scratch and fine-tuning. Finally, we undertake statistical analysis for the layer-wise combination of specific blocks and channels on the first non-dominated front, which can be utilized as a design guideline for the design of TNNs