1,677 research outputs found
Fast Face-swap Using Convolutional Neural Networks
We consider the problem of face swapping in images, where an input identity
is transformed into a target identity while preserving pose, facial expression,
and lighting. To perform this mapping, we use convolutional neural networks
trained to capture the appearance of the target identity from an unstructured
collection of his/her photographs.This approach is enabled by framing the face
swapping problem in terms of style transfer, where the goal is to render an
image in the style of another one. Building on recent advances in this area, we
devise a new loss function that enables the network to produce highly
photorealistic results. By combining neural networks with simple pre- and
post-processing steps, we aim at making face swap work in real-time with no
input from the user
Multi-contrast imaging and digital refocusing on a mobile microscope with a domed LED array
We demonstrate the design and application of an add-on device for improving the diagnostic and research capabilities of CellScope--a low-cost, smartphone-based point-of-care microscope. We replace the single LED illumination of the original CellScope with a programmable domed LED array. By leveraging recent advances in computational illumination, this new device enables simultaneous multi-contrast imaging with brightfield, darkfield, and phase imaging modes. Further, we scan through illumination angles to capture lightfield datasets, which can be used to recover 3D intensity and phase images without any hardware changes. This digital refocusing procedure can be used for either 3D imaging or software-only focus correction, reducing the need for precise mechanical focusing during field experiments. All acquisition and processing is performed on the mobile phone and controlled through a smartphone application, making the computational microscope compact and portable. Using multiple samples and different objective magnifications, we demonstrate that the performance of our device is comparable to that of a commercial microscope. This unique device platform extends the field imaging capabilities of CellScope, opening up new clinical and research possibilities
Owl and Lizard: Patterns of Head Pose and Eye Pose in Driver Gaze Classification
Accurate, robust, inexpensive gaze tracking in the car can help keep a driver
safe by facilitating the more effective study of how to improve (1) vehicle
interfaces and (2) the design of future Advanced Driver Assistance Systems. In
this paper, we estimate head pose and eye pose from monocular video using
methods developed extensively in prior work and ask two new interesting
questions. First, how much better can we classify driver gaze using head and
eye pose versus just using head pose? Second, are there individual-specific
gaze strategies that strongly correlate with how much gaze classification
improves with the addition of eye pose information? We answer these questions
by evaluating data drawn from an on-road study of 40 drivers. The main insight
of the paper is conveyed through the analogy of an "owl" and "lizard" which
describes the degree to which the eyes and the head move when shifting gaze.
When the head moves a lot ("owl"), not much classification improvement is
attained by estimating eye pose on top of head pose. On the other hand, when
the head stays still and only the eyes move ("lizard"), classification accuracy
increases significantly from adding in eye pose. We characterize how that
accuracy varies between people, gaze strategies, and gaze regions.Comment: Accepted for Publication in IET Computer Vision. arXiv admin note:
text overlap with arXiv:1507.0476
SuperPoint: Self-Supervised Interest Point Detection and Description
This paper presents a self-supervised framework for training interest point
detectors and descriptors suitable for a large number of multiple-view geometry
problems in computer vision. As opposed to patch-based neural networks, our
fully-convolutional model operates on full-sized images and jointly computes
pixel-level interest point locations and associated descriptors in one forward
pass. We introduce Homographic Adaptation, a multi-scale, multi-homography
approach for boosting interest point detection repeatability and performing
cross-domain adaptation (e.g., synthetic-to-real). Our model, when trained on
the MS-COCO generic image dataset using Homographic Adaptation, is able to
repeatedly detect a much richer set of interest points than the initial
pre-adapted deep model and any other traditional corner detector. The final
system gives rise to state-of-the-art homography estimation results on HPatches
when compared to LIFT, SIFT and ORB.Comment: Camera-ready version for CVPR 2018 Deep Learning for Visual SLAM
Workshop (DL4VSLAM2018
Robust Intrinsic and Extrinsic Calibration of RGB-D Cameras
Color-depth cameras (RGB-D cameras) have become the primary sensors in most
robotics systems, from service robotics to industrial robotics applications.
Typical consumer-grade RGB-D cameras are provided with a coarse intrinsic and
extrinsic calibration that generally does not meet the accuracy requirements
needed by many robotics applications (e.g., highly accurate 3D environment
reconstruction and mapping, high precision object recognition and localization,
...). In this paper, we propose a human-friendly, reliable and accurate
calibration framework that enables to easily estimate both the intrinsic and
extrinsic parameters of a general color-depth sensor couple. Our approach is
based on a novel two components error model. This model unifies the error
sources of RGB-D pairs based on different technologies, such as
structured-light 3D cameras and time-of-flight cameras. Our method provides
some important advantages compared to other state-of-the-art systems: it is
general (i.e., well suited for different types of sensors), based on an easy
and stable calibration protocol, provides a greater calibration accuracy, and
has been implemented within the ROS robotics framework. We report detailed
experimental validations and performance comparisons to support our statements
Super-linear speedup for real-time condition monitoring using image processing and drones
Real-time inspections for the large-scale solar system may take a long time to get the hazard situations for any failures that may take place in the solar panels normal operations, where prior hazards detection is important. Reducing the execution time and improving the system’s performance are the ultimate goals of multiprocessing or multicore systems. Real-time video processing and analysis from two camcorders, thermal and charge-coupling devices (CCD), mounted on a drone compose the embedded system being proposed for solar panels inspection. The inspection method needs more time for capturing and processing the frames and detecting the faulty panels. The system can determine the longitude and latitude of the defect position information in real-time. In this work, we investigate parallel processing for the image processing operations which reduces the processing time for the inspection systems. The results show a super-linear speedup for real-time condition monitoring in large-scale solar systems. Using the multiprocessing module in Python, we execute fault detection algorithms using streamed frames from both video cameras. The experimental results show a super-linear speedup for thermal and CCD video processing, the execution time is efficiently reduced with an average of 3.1 times and 6.3 times using 2 processes and 4 processes respectively
Face recognition in low resolution video sequences using super resolution
Human activity is a major concern in a wide variety of applications, such as video surveillance, human computer interface and face image database management. Detecting and recognizing faces is a crucial step in these applications. Furthermore, major advancements and initiatives in security applications in the past years have propelled face recognition technology into the spotlight. The performance of existing face recognition systems declines significantly if the resolution of the face image falls below a certain level. This is especially critical in surveillance imagery where often, due to many reasons, only low-resolution video of faces is available. If these low-resolution images are passed to a face recognition system, the performance is usually unacceptable. Hence, resolution plays a key role in face recognition systems. In this thesis, we address this issue by using super-resolution techniques as a middle step, where multiple low resolution face image frames are used to obtain a high-resolution face image for improved recognition rates. Two different techniques based on frequency and spatial domains were utilized in super resolution image enhancement. In this thesis, we apply super resolution to both images and video utilizing these techniques and we employ principal component analysis for face matching, which is both computationally efficient and accurate. The result is a system hat can accurately recognize faces using multiple low resolution images/frames
Accuracy vs. Energy: An Assessment of Bee Object Inference in Videos From On-Hive Video Loggers With YOLOv3, YOLOv4-Tiny, and YOLOv7-Tiny
A continuing trend in precision apiculture is to use computer vision methods to quantify characteristics of bee traffic in managed colonies at the hive\u27s entrance. Since traffic at the hive\u27s entrance is a contributing factor to the hive\u27s productivity and health, we assessed the potential of three open-source convolutional network models, YOLOv3, YOLOv4-tiny, and YOLOv7-tiny, to quantify omnidirectional traffic in videos from on-hive video loggers on regular, unmodified one- and two-super Langstroth hives and compared their accuracies, energy efficacies, and operational energy footprints. We trained and tested the models with a 70/30 split on a dataset of 23,173 flying bees manually labeled in 5819 images from 10 randomly selected videos and manually evaluated the trained models on 3600 images from 120 randomly selected videos from different apiaries, years, and queen races. We designed a new energy efficacy metric as a ratio of performance units per energy unit required to make a model operational in a continuous hive monitoring data pipeline. In terms of accuracy, YOLOv3 was first, YOLOv7-tiny—second, and YOLOv4-tiny—third. All models underestimated the true amount of traffic due to false negatives. YOLOv3 was the only model with no false positives, but had the lowest energy efficacy and highest operational energy footprint in a deployed hive monitoring data pipeline. YOLOv7-tiny had the highest energy efficacy and the lowest operational energy footprint in the same pipeline. Consequently, YOLOv7-tiny is a model worth considering for training on larger bee datasets if a primary objective is the discovery of non-invasive computer vision models of traffic quantification with higher energy efficacies and lower operational energy footprints
- …