71 research outputs found

    Learn to Model Blurry Motion via Directional Similarity and Filtering

    Get PDF

    NPC: Neural Point Characters from Video

    Full text link
    High-fidelity human 3D models can now be learned directly from videos, typically by combining a template-based surface model with neural representations. However, obtaining a template surface requires expensive multi-view capture systems, laser scans, or strictly controlled conditions. Previous methods avoid using a template but rely on a costly or ill-posed mapping from observation to canonical space. We propose a hybrid point-based representation for reconstructing animatable characters that does not require an explicit surface model, while being generalizable to novel poses. For a given video, our method automatically produces an explicit set of 3D points representing approximate canonical geometry, and learns an articulated deformation model that produces pose-dependent point transformations. The points serve both as a scaffold for high-frequency neural features and an anchor for efficiently mapping between observation and canonical space. We demonstrate on established benchmarks that our representation overcomes limitations of prior work operating in either canonical or in observation space. Moreover, our automatic point extraction approach enables learning models of human and animal characters alike, matching the performance of the methods using rigged surface templates despite being more general. Project website: https://lemonatsu.github.io/npc/Comment: Project website: https://lemonatsu.github.io/npc

    XMem++: Production-level Video Segmentation From Few Annotated Frames

    Full text link
    Despite advancements in user-guided video segmentation, extracting complex objects consistently for highly complex scenes is still a labor-intensive task, especially for production. It is not uncommon that a majority of frames need to be annotated. We introduce a novel semi-supervised video object segmentation (SSVOS) model, XMem++, that improves existing memory-based models, with a permanent memory module. Most existing methods focus on single frame annotations, while our approach can effectively handle multiple user-selected frames with varying appearances of the same object or region. Our method can extract highly consistent results while keeping the required number of frame annotations low. We further introduce an iterative and attention-based frame suggestion mechanism, which computes the next best frame for annotation. Our method is real-time and does not require retraining after each user input. We also introduce a new dataset, PUMaVOS, which covers new challenging use cases not found in previous benchmarks. We demonstrate SOTA performance on challenging (partial and multi-class) segmentation scenarios as well as long videos, while ensuring significantly fewer frame annotations than any existing method. Project page: https://max810.github.io/xmem2-project-page/Comment: Accepted to ICCV 2023. 18 pages, 16 figure

    Fully convolutional architectures for multi-part body segmentation

    Get PDF
    Treballs finals del Màster de Fonaments de Ciència de Dades, Facultat de matemàtiques, Universitat de Barcelona, Any: 2018, Tutor: Meysam Madadi i Sergio Escalera Guerrero[en] Since the appearance of the baseline Fully Convolutinal Network (FCN), convolution architectures usage has spread widely among Deep Neural Networks: from classification tasks to object tracking, they are found ubiquitously in the Deep Learning field. In this study, three different convolutional architectures are studied with regard its application to the semantic segmentation of the human body: ICNet, a different resolution cascade network, SegNet, a encoder-decoder network, and Stacked Hourglass, a specially purposed network for the human body. For this purpose, the SURREAL (Synthetic hUmans foR REAL tasks) dataset, which consists of synthetically rendered but realistic images of people, is used. As a result, is shown that the best performing network for this task is the Stacked Hourglass. Due to its continuous refinement of the output and the use of the full network for inference a 55.3% mIoU is achieved on the 24 body part dataset

    Increasing Occupational Participation of Older Adults with Low Vision Through an Occupation-Based Exercise Video

    Get PDF
    With the increasingly large population of older adults with low vision, many older adults would benefit from having a guide dog as an assistive device. When walking with a guide dog, different upper extremity muscles and postures are adopted to handle the guide dog. However, older adults with low vision may not be in the proper physical condition to meet the strenuous demands of handling a guide dog due to the normal aging process and decreased mobility. To prevent pain and injury, stretching and strengthening muscles used when handling a guide dog may benefit older adults before entering the Guide Dogs for the Blind (GDB) training program. The objective of the project is to improve older adults’ strength and endurance through the use of an evidence-based, occupational exercise video. The exercises within the video are integrated into daily life activities to promote habituation and adherence to the exercises

    Continuous Camera-Based Premature-Infant Monitoring Algorithms for NICU

    Get PDF
    Non-contact visual monitoring of vital signs in neonatology has been demonstrated by several recent studies in ideal scenarios where the baby is calm and there is no medical or parental intervention. Similar to contact monitoring methods (e.g., ECG, pulse oximeter) the camera-based solutions suffer from motion artifacts. Therefore, during care and the infants’ active periods, calculated values typically differ largely from the real ones. In this way, our main contribution to existing remote camera-based techniques is to detect and classify such situations with a high level of confidence. Our algorithms can not only evaluate quiet periods, but can also provide continuous monitoring. Altogether, our proposed algorithms can measure pulse rate, breathing rate, and to recognize situations such as medical intervention or very active subjects using only a single camera, while the system does not exceed the computational capabilities of average CPU-GPU-based hardware. The performance of the algorithms was evaluated on our database collected at the Ist Dept. of Neonatology of Pediatrics, Dept of Obstetrics and Gynecology, Semmelweis University, Budapest, Hungary
    corecore