4,534 research outputs found
The Devil in the Details: Simple and Effective Optical Flow Synthetic Data Generation
Recent work on dense optical flow has shown significant progress, primarily
in a supervised learning manner requiring a large amount of labeled data. Due
to the expensiveness of obtaining large scale real-world data, computer
graphics are typically leveraged for constructing datasets. However, there is a
common belief that synthetic-to-real domain gaps limit generalization to real
scenes. In this paper, we show that the required characteristics in an optical
flow dataset are rather simple and present a simpler synthetic data generation
method that achieves a certain level of realism with compositions of elementary
operations. With 2D motion-based datasets, we systematically analyze the
simplest yet critical factors for generating synthetic datasets. Furthermore,
we propose a novel method of utilizing occlusion masks in a supervised method
and observe that suppressing gradients on occluded regions serves as a powerful
initial state in the curriculum learning sense. The RAFT network initially
trained on our dataset outperforms the original RAFT on the two most
challenging online benchmarks, MPI Sintel and KITTI 2015
MultiTalk: Enhancing 3D Talking Head Generation Across Languages with Multilingual Video Dataset
Recent studies in speech-driven 3D talking head generation have achieved
convincing results in verbal articulations. However, generating accurate
lip-syncs degrades when applied to input speech in other languages, possibly
due to the lack of datasets covering a broad spectrum of facial movements
across languages. In this work, we introduce a novel task to generate 3D
talking heads from speeches of diverse languages. We collect a new multilingual
2D video dataset comprising over 420 hours of talking videos in 20 languages.
With our proposed dataset, we present a multilingually enhanced model that
incorporates language-specific style embeddings, enabling it to capture the
unique mouth movements associated with each language. Additionally, we present
a metric for assessing lip-sync accuracy in multilingual settings. We
demonstrate that training a 3D talking head model with our proposed dataset
significantly enhances its multilingual performance. Codes and datasets are
available at https://multi-talk.github.io/.Comment: Interspeech 202
An anomalous dissociation of protonated cluster ions of DNA guanine-cytosine base-pair
Autonomous Cooperative Levels of Multiple-Heterogeneous Unmanned Vehicle Systems
As multiple and heterogenous unmanned vehicle systems continue to play an
increasingly important role in addressing complex missions in the real world,
the need for effective cooperation among unmanned vehicles becomes paramount.
The concept of autonomous cooperation, wherein unmanned vehicles cooperate
without human intervention or human control, offers promising avenues for
enhancing the efficiency and adaptability of intelligence of
multiple-heterogeneous unmanned vehicle systems. Despite the growing interests
in this domain, as far as the authors are concerned, there exists a notable
lack of comprehensive literature on defining explicit concept and classifying
levels of autonomous cooperation of multiple-heterogeneous unmanned vehicle
systems. In this aspect, this article aims to define the explicit concept of
autonomous cooperation of multiple-heterogeneous unmanned vehicle systems.
Furthermore, we provide a novel criterion to assess the technical maturity of
the developed unmanned vehicle systems by classifying the autonomous
cooperative levels of multiple-heterogeneous unmanned vehicle systems
SMILE: Multimodal Dataset for Understanding Laughter in Video with Language Models
Despite the recent advances of the artificial intelligence, building social
intelligence remains a challenge. Among social signals, laughter is one of the
distinctive expressions that occurs during social interactions between humans.
In this work, we tackle a new challenge for machines to understand the
rationale behind laughter in video, Video Laugh Reasoning. We introduce this
new task to explain why people laugh in a particular video and a dataset for
this task. Our proposed dataset, SMILE, comprises video clips and language
descriptions of why people laugh. We propose a baseline by leveraging the
reasoning capacity of large language models (LLMs) with textual video
representation. Experiments show that our baseline can generate plausible
explanations for laughter. We further investigate the scalability of our
baseline by probing other video understanding tasks and in-the-wild videos. We
release our dataset, code, and model checkpoints on
https://github.com/postech-ami/SMILE-Dataset.Comment: 19 pages, 14 figure
A Large-Scale 3D Face Mesh Video Dataset via Neural Re-parameterized Optimization
We propose NeuFace, a 3D face mesh pseudo annotation method on videos via
neural re-parameterized optimization. Despite the huge progress in 3D face
reconstruction methods, generating reliable 3D face labels for in-the-wild
dynamic videos remains challenging. Using NeuFace optimization, we annotate the
per-view/-frame accurate and consistent face meshes on large-scale face videos,
called the NeuFace-dataset. We investigate how neural re-parameterization helps
to reconstruct image-aligned facial details on 3D meshes via gradient analysis.
By exploiting the naturalness and diversity of 3D faces in our dataset, we
demonstrate the usefulness of our dataset for 3D face-related tasks: improving
the reconstruction accuracy of an existing 3D face reconstruction model and
learning 3D facial motion prior. Code and datasets will be available at
https://neuface-dataset.github.io.Comment: 9 pages, 7 figures, and 3 tables for the main paper. 8 pages, 6
figures and 3 tables for the appendi
LaughTalk: Expressive 3D Talking Head Generation with Laughter
Laughter is a unique expression, essential to affirmative social interactions
of humans. Although current 3D talking head generation methods produce
convincing verbal articulations, they often fail to capture the vitality and
subtleties of laughter and smiles despite their importance in social context.
In this paper, we introduce a novel task to generate 3D talking heads capable
of both articulate speech and authentic laughter. Our newly curated dataset
comprises 2D laughing videos paired with pseudo-annotated and human-validated
3D FLAME parameters and vertices. Given our proposed dataset, we present a
strong baseline with a two-stage training scheme: the model first learns to
talk and then acquires the ability to express laughter. Extensive experiments
demonstrate that our method performs favorably compared to existing approaches
in both talking head generation and expressing laughter signals. We further
explore potential applications on top of our proposed method for rigging
realistic avatars.Comment: Accepted to WACV202
PREVALENCE AND CHARACTERISTICS OF APICAL ANEURYSM ON CARDIOVASCULAR MAGNETIC RESONANCE IN PATIENTS WITH HYPERTROPHIC CARDIOMYOPATHY
Determination of the theoretical personalized optimum chest compression point using anteroposterior chest radiography
Objective There is a traditional assumption that to maximize stroke volume, the point beneath which the left ventricle (LV) is at its maximum diameter (P_max.LV) should be compressed. Thus, we aimed to derive and validate rules to estimate P_max.LV using anteroposterior chest radiography (chest_AP), which is performed for critically ill patients urgently needing determination of their personalized P_max.LV. Methods A retrospective, cross-sectional study was performed with non-cardiac arrest adults who underwent chest_AP within 1 hour of computed tomography (derivation:validation=3:2). On chest_AP, we defined cardiac diameter (CD), distance from right cardiac border to midline (RB), and cardiac height (CH) from the carina to the uppermost point of left hemi-diaphragm. Setting point zero (0, 0) at the midpoint of the xiphisternal joint and designating leftward and upward directions as positive on x- and y-axes, we located P_max.LV (x_max.LV, y_max.LV). The coefficients of the following mathematically inferred rules were sought: x_max.LV=α0*CD-RB; y_max.LV=β0*CH+γ0 (α0: mean of [x_max.LV+RB]/CD; β0, γ0: representative coefficient and constant of linear regression model, respectively). Results Among 360 cases (52.0±18.3 years, 102 females), we derived: x_max.LV=0.643*CD-RB and y_max.LV=55-0.390*CH. This estimated P_max.LV (19±11 mm) was as close as the averaged P_max.LV (19±11 mm, P=0.13) and closer than the three equidistant points representing the current guidelines (67±13, 56±10, and 77±17 mm; all P<0.001) to the reference identified on computed tomography. Thus, our findings were validated. Conclusion Personalized P_max.LV can be estimated using chest_AP. Further studies with actual cardiac arrest victims are needed to verify the safety and effectiveness of the rule
Effect of a multi-layer infection control barrier on the micro-hardness of a composite resin
OBJECTIVE: The aim of this study was to evaluate the effect of multiple layers of an infection control barrier on the micro-hardness of a composite resin. MATERIAL AND METHODS: One, two, four, and eight layers of an infection control barrier were used to cover the light guides of a high-power light emitting diode (LeD) light curing unit (LCU) and a low-power halogen LCU. The composite specimens were photopolymerized with the LCUs and the barriers, and the micro-hardness of the upper and lower surfaces was measured (n=10). The hardness ratio was calculated by dividing the bottom surface hardness of the experimental groups by the irradiated surface hardness of the control groups. The data was analyzed by two-way ANOVA and Tukey's HSD test. RESULTS: The micro-hardness of the composite specimens photopolymerized with the LED LCU decreased significantly in the four- and eight-layer groups of the upper surface and in the two-, four-, and eight-layer groups of the lower surface. The hardness ratio of the composite specimens wa
- …