4,534 research outputs found

    The Devil in the Details: Simple and Effective Optical Flow Synthetic Data Generation

    Full text link
    Recent work on dense optical flow has shown significant progress, primarily in a supervised learning manner requiring a large amount of labeled data. Due to the expensiveness of obtaining large scale real-world data, computer graphics are typically leveraged for constructing datasets. However, there is a common belief that synthetic-to-real domain gaps limit generalization to real scenes. In this paper, we show that the required characteristics in an optical flow dataset are rather simple and present a simpler synthetic data generation method that achieves a certain level of realism with compositions of elementary operations. With 2D motion-based datasets, we systematically analyze the simplest yet critical factors for generating synthetic datasets. Furthermore, we propose a novel method of utilizing occlusion masks in a supervised method and observe that suppressing gradients on occluded regions serves as a powerful initial state in the curriculum learning sense. The RAFT network initially trained on our dataset outperforms the original RAFT on the two most challenging online benchmarks, MPI Sintel and KITTI 2015

    MultiTalk: Enhancing 3D Talking Head Generation Across Languages with Multilingual Video Dataset

    Full text link
    Recent studies in speech-driven 3D talking head generation have achieved convincing results in verbal articulations. However, generating accurate lip-syncs degrades when applied to input speech in other languages, possibly due to the lack of datasets covering a broad spectrum of facial movements across languages. In this work, we introduce a novel task to generate 3D talking heads from speeches of diverse languages. We collect a new multilingual 2D video dataset comprising over 420 hours of talking videos in 20 languages. With our proposed dataset, we present a multilingually enhanced model that incorporates language-specific style embeddings, enabling it to capture the unique mouth movements associated with each language. Additionally, we present a metric for assessing lip-sync accuracy in multilingual settings. We demonstrate that training a 3D talking head model with our proposed dataset significantly enhances its multilingual performance. Codes and datasets are available at https://multi-talk.github.io/.Comment: Interspeech 202

    Autonomous Cooperative Levels of Multiple-Heterogeneous Unmanned Vehicle Systems

    Full text link
    As multiple and heterogenous unmanned vehicle systems continue to play an increasingly important role in addressing complex missions in the real world, the need for effective cooperation among unmanned vehicles becomes paramount. The concept of autonomous cooperation, wherein unmanned vehicles cooperate without human intervention or human control, offers promising avenues for enhancing the efficiency and adaptability of intelligence of multiple-heterogeneous unmanned vehicle systems. Despite the growing interests in this domain, as far as the authors are concerned, there exists a notable lack of comprehensive literature on defining explicit concept and classifying levels of autonomous cooperation of multiple-heterogeneous unmanned vehicle systems. In this aspect, this article aims to define the explicit concept of autonomous cooperation of multiple-heterogeneous unmanned vehicle systems. Furthermore, we provide a novel criterion to assess the technical maturity of the developed unmanned vehicle systems by classifying the autonomous cooperative levels of multiple-heterogeneous unmanned vehicle systems

    SMILE: Multimodal Dataset for Understanding Laughter in Video with Language Models

    Full text link
    Despite the recent advances of the artificial intelligence, building social intelligence remains a challenge. Among social signals, laughter is one of the distinctive expressions that occurs during social interactions between humans. In this work, we tackle a new challenge for machines to understand the rationale behind laughter in video, Video Laugh Reasoning. We introduce this new task to explain why people laugh in a particular video and a dataset for this task. Our proposed dataset, SMILE, comprises video clips and language descriptions of why people laugh. We propose a baseline by leveraging the reasoning capacity of large language models (LLMs) with textual video representation. Experiments show that our baseline can generate plausible explanations for laughter. We further investigate the scalability of our baseline by probing other video understanding tasks and in-the-wild videos. We release our dataset, code, and model checkpoints on https://github.com/postech-ami/SMILE-Dataset.Comment: 19 pages, 14 figure

    A Large-Scale 3D Face Mesh Video Dataset via Neural Re-parameterized Optimization

    Full text link
    We propose NeuFace, a 3D face mesh pseudo annotation method on videos via neural re-parameterized optimization. Despite the huge progress in 3D face reconstruction methods, generating reliable 3D face labels for in-the-wild dynamic videos remains challenging. Using NeuFace optimization, we annotate the per-view/-frame accurate and consistent face meshes on large-scale face videos, called the NeuFace-dataset. We investigate how neural re-parameterization helps to reconstruct image-aligned facial details on 3D meshes via gradient analysis. By exploiting the naturalness and diversity of 3D faces in our dataset, we demonstrate the usefulness of our dataset for 3D face-related tasks: improving the reconstruction accuracy of an existing 3D face reconstruction model and learning 3D facial motion prior. Code and datasets will be available at https://neuface-dataset.github.io.Comment: 9 pages, 7 figures, and 3 tables for the main paper. 8 pages, 6 figures and 3 tables for the appendi

    LaughTalk: Expressive 3D Talking Head Generation with Laughter

    Full text link
    Laughter is a unique expression, essential to affirmative social interactions of humans. Although current 3D talking head generation methods produce convincing verbal articulations, they often fail to capture the vitality and subtleties of laughter and smiles despite their importance in social context. In this paper, we introduce a novel task to generate 3D talking heads capable of both articulate speech and authentic laughter. Our newly curated dataset comprises 2D laughing videos paired with pseudo-annotated and human-validated 3D FLAME parameters and vertices. Given our proposed dataset, we present a strong baseline with a two-stage training scheme: the model first learns to talk and then acquires the ability to express laughter. Extensive experiments demonstrate that our method performs favorably compared to existing approaches in both talking head generation and expressing laughter signals. We further explore potential applications on top of our proposed method for rigging realistic avatars.Comment: Accepted to WACV202

    Determination of the theoretical personalized optimum chest compression point using anteroposterior chest radiography

    Get PDF
    Objective There is a traditional assumption that to maximize stroke volume, the point beneath which the left ventricle (LV) is at its maximum diameter (P_max.LV) should be compressed. Thus, we aimed to derive and validate rules to estimate P_max.LV using anteroposterior chest radiography (chest_AP), which is performed for critically ill patients urgently needing determination of their personalized P_max.LV. Methods A retrospective, cross-sectional study was performed with non-cardiac arrest adults who underwent chest_AP within 1 hour of computed tomography (derivation:validation=3:2). On chest_AP, we defined cardiac diameter (CD), distance from right cardiac border to midline (RB), and cardiac height (CH) from the carina to the uppermost point of left hemi-diaphragm. Setting point zero (0, 0) at the midpoint of the xiphisternal joint and designating leftward and upward directions as positive on x- and y-axes, we located P_max.LV (x_max.LV, y_max.LV). The coefficients of the following mathematically inferred rules were sought: x_max.LV=α0*CD-RB; y_max.LV=β0*CH+γ0 (α0: mean of [x_max.LV+RB]/CD; β0, γ0: representative coefficient and constant of linear regression model, respectively). Results Among 360 cases (52.0±18.3 years, 102 females), we derived: x_max.LV=0.643*CD-RB and y_max.LV=55-0.390*CH. This estimated P_max.LV (19±11 mm) was as close as the averaged P_max.LV (19±11 mm, P=0.13) and closer than the three equidistant points representing the current guidelines (67±13, 56±10, and 77±17 mm; all P<0.001) to the reference identified on computed tomography. Thus, our findings were validated. Conclusion Personalized P_max.LV can be estimated using chest_AP. Further studies with actual cardiac arrest victims are needed to verify the safety and effectiveness of the rule

    Effect of a multi-layer infection control barrier on the micro-hardness of a composite resin

    Get PDF
    OBJECTIVE: The aim of this study was to evaluate the effect of multiple layers of an infection control barrier on the micro-hardness of a composite resin. MATERIAL AND METHODS: One, two, four, and eight layers of an infection control barrier were used to cover the light guides of a high-power light emitting diode (LeD) light curing unit (LCU) and a low-power halogen LCU. The composite specimens were photopolymerized with the LCUs and the barriers, and the micro-hardness of the upper and lower surfaces was measured (n=10). The hardness ratio was calculated by dividing the bottom surface hardness of the experimental groups by the irradiated surface hardness of the control groups. The data was analyzed by two-way ANOVA and Tukey's HSD test. RESULTS: The micro-hardness of the composite specimens photopolymerized with the LED LCU decreased significantly in the four- and eight-layer groups of the upper surface and in the two-, four-, and eight-layer groups of the lower surface. The hardness ratio of the composite specimens wa
    corecore