1,677 research outputs found

    Fast Face-swap Using Convolutional Neural Networks

    Get PDF
    We consider the problem of face swapping in images, where an input identity is transformed into a target identity while preserving pose, facial expression, and lighting. To perform this mapping, we use convolutional neural networks trained to capture the appearance of the target identity from an unstructured collection of his/her photographs.This approach is enabled by framing the face swapping problem in terms of style transfer, where the goal is to render an image in the style of another one. Building on recent advances in this area, we devise a new loss function that enables the network to produce highly photorealistic results. By combining neural networks with simple pre- and post-processing steps, we aim at making face swap work in real-time with no input from the user

    Multi-contrast imaging and digital refocusing on a mobile microscope with a domed LED array

    Get PDF
    We demonstrate the design and application of an add-on device for improving the diagnostic and research capabilities of CellScope--a low-cost, smartphone-based point-of-care microscope. We replace the single LED illumination of the original CellScope with a programmable domed LED array. By leveraging recent advances in computational illumination, this new device enables simultaneous multi-contrast imaging with brightfield, darkfield, and phase imaging modes. Further, we scan through illumination angles to capture lightfield datasets, which can be used to recover 3D intensity and phase images without any hardware changes. This digital refocusing procedure can be used for either 3D imaging or software-only focus correction, reducing the need for precise mechanical focusing during field experiments. All acquisition and processing is performed on the mobile phone and controlled through a smartphone application, making the computational microscope compact and portable. Using multiple samples and different objective magnifications, we demonstrate that the performance of our device is comparable to that of a commercial microscope. This unique device platform extends the field imaging capabilities of CellScope, opening up new clinical and research possibilities

    Owl and Lizard: Patterns of Head Pose and Eye Pose in Driver Gaze Classification

    Full text link
    Accurate, robust, inexpensive gaze tracking in the car can help keep a driver safe by facilitating the more effective study of how to improve (1) vehicle interfaces and (2) the design of future Advanced Driver Assistance Systems. In this paper, we estimate head pose and eye pose from monocular video using methods developed extensively in prior work and ask two new interesting questions. First, how much better can we classify driver gaze using head and eye pose versus just using head pose? Second, are there individual-specific gaze strategies that strongly correlate with how much gaze classification improves with the addition of eye pose information? We answer these questions by evaluating data drawn from an on-road study of 40 drivers. The main insight of the paper is conveyed through the analogy of an "owl" and "lizard" which describes the degree to which the eyes and the head move when shifting gaze. When the head moves a lot ("owl"), not much classification improvement is attained by estimating eye pose on top of head pose. On the other hand, when the head stays still and only the eyes move ("lizard"), classification accuracy increases significantly from adding in eye pose. We characterize how that accuracy varies between people, gaze strategies, and gaze regions.Comment: Accepted for Publication in IET Computer Vision. arXiv admin note: text overlap with arXiv:1507.0476

    SuperPoint: Self-Supervised Interest Point Detection and Description

    Full text link
    This paper presents a self-supervised framework for training interest point detectors and descriptors suitable for a large number of multiple-view geometry problems in computer vision. As opposed to patch-based neural networks, our fully-convolutional model operates on full-sized images and jointly computes pixel-level interest point locations and associated descriptors in one forward pass. We introduce Homographic Adaptation, a multi-scale, multi-homography approach for boosting interest point detection repeatability and performing cross-domain adaptation (e.g., synthetic-to-real). Our model, when trained on the MS-COCO generic image dataset using Homographic Adaptation, is able to repeatedly detect a much richer set of interest points than the initial pre-adapted deep model and any other traditional corner detector. The final system gives rise to state-of-the-art homography estimation results on HPatches when compared to LIFT, SIFT and ORB.Comment: Camera-ready version for CVPR 2018 Deep Learning for Visual SLAM Workshop (DL4VSLAM2018

    Robust Intrinsic and Extrinsic Calibration of RGB-D Cameras

    Get PDF
    Color-depth cameras (RGB-D cameras) have become the primary sensors in most robotics systems, from service robotics to industrial robotics applications. Typical consumer-grade RGB-D cameras are provided with a coarse intrinsic and extrinsic calibration that generally does not meet the accuracy requirements needed by many robotics applications (e.g., highly accurate 3D environment reconstruction and mapping, high precision object recognition and localization, ...). In this paper, we propose a human-friendly, reliable and accurate calibration framework that enables to easily estimate both the intrinsic and extrinsic parameters of a general color-depth sensor couple. Our approach is based on a novel two components error model. This model unifies the error sources of RGB-D pairs based on different technologies, such as structured-light 3D cameras and time-of-flight cameras. Our method provides some important advantages compared to other state-of-the-art systems: it is general (i.e., well suited for different types of sensors), based on an easy and stable calibration protocol, provides a greater calibration accuracy, and has been implemented within the ROS robotics framework. We report detailed experimental validations and performance comparisons to support our statements

    Super-linear speedup for real-time condition monitoring using image processing and drones

    Get PDF
    Real-time inspections for the large-scale solar system may take a long time to get the hazard situations for any failures that may take place in the solar panels normal operations, where prior hazards detection is important. Reducing the execution time and improving the system’s performance are the ultimate goals of multiprocessing or multicore systems. Real-time video processing and analysis from two camcorders, thermal and charge-coupling devices (CCD), mounted on a drone compose the embedded system being proposed for solar panels inspection. The inspection method needs more time for capturing and processing the frames and detecting the faulty panels. The system can determine the longitude and latitude of the defect position information in real-time. In this work, we investigate parallel processing for the image processing operations which reduces the processing time for the inspection systems. The results show a super-linear speedup for real-time condition monitoring in large-scale solar systems. Using the multiprocessing module in Python, we execute fault detection algorithms using streamed frames from both video cameras. The experimental results show a super-linear speedup for thermal and CCD video processing, the execution time is efficiently reduced with an average of 3.1 times and 6.3 times using 2 processes and 4 processes respectively

    Face recognition in low resolution video sequences using super resolution

    Get PDF
    Human activity is a major concern in a wide variety of applications, such as video surveillance, human computer interface and face image database management. Detecting and recognizing faces is a crucial step in these applications. Furthermore, major advancements and initiatives in security applications in the past years have propelled face recognition technology into the spotlight. The performance of existing face recognition systems declines significantly if the resolution of the face image falls below a certain level. This is especially critical in surveillance imagery where often, due to many reasons, only low-resolution video of faces is available. If these low-resolution images are passed to a face recognition system, the performance is usually unacceptable. Hence, resolution plays a key role in face recognition systems. In this thesis, we address this issue by using super-resolution techniques as a middle step, where multiple low resolution face image frames are used to obtain a high-resolution face image for improved recognition rates. Two different techniques based on frequency and spatial domains were utilized in super resolution image enhancement. In this thesis, we apply super resolution to both images and video utilizing these techniques and we employ principal component analysis for face matching, which is both computationally efficient and accurate. The result is a system hat can accurately recognize faces using multiple low resolution images/frames

    Accuracy vs. Energy: An Assessment of Bee Object Inference in Videos From On-Hive Video Loggers With YOLOv3, YOLOv4-Tiny, and YOLOv7-Tiny

    Get PDF
    A continuing trend in precision apiculture is to use computer vision methods to quantify characteristics of bee traffic in managed colonies at the hive\u27s entrance. Since traffic at the hive\u27s entrance is a contributing factor to the hive\u27s productivity and health, we assessed the potential of three open-source convolutional network models, YOLOv3, YOLOv4-tiny, and YOLOv7-tiny, to quantify omnidirectional traffic in videos from on-hive video loggers on regular, unmodified one- and two-super Langstroth hives and compared their accuracies, energy efficacies, and operational energy footprints. We trained and tested the models with a 70/30 split on a dataset of 23,173 flying bees manually labeled in 5819 images from 10 randomly selected videos and manually evaluated the trained models on 3600 images from 120 randomly selected videos from different apiaries, years, and queen races. We designed a new energy efficacy metric as a ratio of performance units per energy unit required to make a model operational in a continuous hive monitoring data pipeline. In terms of accuracy, YOLOv3 was first, YOLOv7-tiny—second, and YOLOv4-tiny—third. All models underestimated the true amount of traffic due to false negatives. YOLOv3 was the only model with no false positives, but had the lowest energy efficacy and highest operational energy footprint in a deployed hive monitoring data pipeline. YOLOv7-tiny had the highest energy efficacy and the lowest operational energy footprint in the same pipeline. Consequently, YOLOv7-tiny is a model worth considering for training on larger bee datasets if a primary objective is the discovery of non-invasive computer vision models of traffic quantification with higher energy efficacies and lower operational energy footprints
    • …
    corecore