21,225 research outputs found
Context-aware person identification in personal photo collections
Identifying the people in photos is an important need for users of photo management systems. We present MediAssist, one such system which facilitates browsing, searching and semi-automatic annotation of personal photos, using analysis of both image content and the context in which the photo is captured. This semi-automatic annotation includes annotation of the identity of people in photos. In this paper, we focus on such person annotation, and propose person identification techniques based on a combination of context and content. We propose language modelling and nearest neighbor approaches to context-based person identification, in addition to novel face color and image color content-based features (used alongside face recognition and body patch features). We conduct a comprehensive empirical study of these techniques using the real private photo collections of a number of users, and show that combining context- and content-based analysis improves performance over content or context alone
Joint Material and Illumination Estimation from Photo Sets in the Wild
Faithful manipulation of shape, material, and illumination in 2D Internet
images would greatly benefit from a reliable factorization of appearance into
material (i.e., diffuse and specular) and illumination (i.e., environment
maps). On the one hand, current methods that produce very high fidelity
results, typically require controlled settings, expensive devices, or
significant manual effort. To the other hand, methods that are automatic and
work on 'in the wild' Internet images, often extract only low-frequency
lighting or diffuse materials. In this work, we propose to make use of a set of
photographs in order to jointly estimate the non-diffuse materials and sharp
lighting in an uncontrolled setting. Our key observation is that seeing
multiple instances of the same material under different illumination (i.e.,
environment), and different materials under the same illumination provide
valuable constraints that can be exploited to yield a high-quality solution
(i.e., specular materials and environment illumination) for all the observed
materials and environments. Similar constraints also arise when observing
multiple materials in a single environment, or a single material across
multiple environments. The core of this approach is an optimization procedure
that uses two neural networks that are trained on synthetic images to predict
good gradients in parametric space given observation of reflected light. We
evaluate our method on a range of synthetic and real examples to generate
high-quality estimates, qualitatively compare our results against
state-of-the-art alternatives via a user study, and demonstrate
photo-consistent image manipulation that is otherwise very challenging to
achieve
Stylizing Face Images via Multiple Exemplars
We address the problem of transferring the style of a headshot photo to face
images. Existing methods using a single exemplar lead to inaccurate results
when the exemplar does not contain sufficient stylized facial components for a
given photo. In this work, we propose an algorithm to stylize face images using
multiple exemplars containing different subjects in the same style. Patch
correspondences between an input photo and multiple exemplars are established
using a Markov Random Field (MRF), which enables accurate local energy transfer
via Laplacian stacks. As image patches from multiple exemplars are used, the
boundaries of facial components on the target image are inevitably
inconsistent. The artifacts are removed by a post-processing step using an
edge-preserving filter. Experimental results show that the proposed algorithm
consistently produces visually pleasing results.Comment: In CVIU 2017. Project Page:
http://www.cs.cityu.edu.hk/~yibisong/cviu17/index.htm
Visual Landmark Recognition from Internet Photo Collections: A Large-Scale Evaluation
The task of a visual landmark recognition system is to identify photographed
buildings or objects in query photos and to provide the user with relevant
information on them. With their increasing coverage of the world's landmark
buildings and objects, Internet photo collections are now being used as a
source for building such systems in a fully automatic fashion. This process
typically consists of three steps: clustering large amounts of images by the
objects they depict; determining object names from user-provided tags; and
building a robust, compact, and efficient recognition index. To this date,
however, there is little empirical information on how well current approaches
for those steps perform in a large-scale open-set mining and recognition task.
Furthermore, there is little empirical information on how recognition
performance varies for different types of landmark objects and where there is
still potential for improvement. With this paper, we intend to fill these gaps.
Using a dataset of 500k images from Paris, we analyze each component of the
landmark recognition pipeline in order to answer the following questions: How
many and what kinds of objects can be discovered automatically? How can we best
use the resulting image clusters to recognize the object in a query? How can
the object be efficiently represented in memory for recognition? How reliably
can semantic information be extracted? And finally: What are the limiting
factors in the resulting pipeline from query to semantics? We evaluate how
different choices of methods and parameters for the individual pipeline steps
affect overall system performance and examine their effects for different query
categories such as buildings, paintings or sculptures
PlaceRaider: Virtual Theft in Physical Spaces with Smartphones
As smartphones become more pervasive, they are increasingly targeted by
malware. At the same time, each new generation of smartphone features
increasingly powerful onboard sensor suites. A new strain of sensor malware has
been developing that leverages these sensors to steal information from the
physical environment (e.g., researchers have recently demonstrated how malware
can listen for spoken credit card numbers through the microphone, or feel
keystroke vibrations using the accelerometer). Yet the possibilities of what
malware can see through a camera have been understudied. This paper introduces
a novel visual malware called PlaceRaider, which allows remote attackers to
engage in remote reconnaissance and what we call virtual theft. Through
completely opportunistic use of the camera on the phone and other sensors,
PlaceRaider constructs rich, three dimensional models of indoor environments.
Remote burglars can thus download the physical space, study the environment
carefully, and steal virtual objects from the environment (such as financial
documents, information on computer monitors, and personally identifiable
information). Through two human subject studies we demonstrate the
effectiveness of using mobile devices as powerful surveillance and virtual
theft platforms, and we suggest several possible defenses against visual
malware
Video Registration in Egocentric Vision under Day and Night Illumination Changes
With the spread of wearable devices and head mounted cameras, a wide range of
application requiring precise user localization is now possible. In this paper
we propose to treat the problem of obtaining the user position with respect to
a known environment as a video registration problem. Video registration, i.e.
the task of aligning an input video sequence to a pre-built 3D model, relies on
a matching process of local keypoints extracted on the query sequence to a 3D
point cloud. The overall registration performance is strictly tied to the
actual quality of this 2D-3D matching, and can degrade if environmental
conditions such as steep changes in lighting like the ones between day and
night occur. To effectively register an egocentric video sequence under these
conditions, we propose to tackle the source of the problem: the matching
process. To overcome the shortcomings of standard matching techniques, we
introduce a novel embedding space that allows us to obtain robust matches by
jointly taking into account local descriptors, their spatial arrangement and
their temporal robustness. The proposal is evaluated using unconstrained
egocentric video sequences both in terms of matching quality and resulting
registration performance using different 3D models of historical landmarks. The
results show that the proposed method can outperform state of the art
registration algorithms, in particular when dealing with the challenges of
night and day sequences
- …