22,150 research outputs found
Semantic Visual Localization
Robust visual localization under a wide range of viewing conditions is a
fundamental problem in computer vision. Handling the difficult cases of this
problem is not only very challenging but also of high practical relevance,
e.g., in the context of life-long localization for augmented reality or
autonomous robots. In this paper, we propose a novel approach based on a joint
3D geometric and semantic understanding of the world, enabling it to succeed
under conditions where previous approaches failed. Our method leverages a novel
generative model for descriptor learning, trained on semantic scene completion
as an auxiliary task. The resulting 3D descriptors are robust to missing
observations by encoding high-level 3D geometric and semantic information.
Experiments on several challenging large-scale localization datasets
demonstrate reliable localization under extreme viewpoint, illumination, and
geometry changes
Self-Configuring and Evolving Fuzzy Image Thresholding
Every segmentation algorithm has parameters that need to be adjusted in order
to achieve good results. Evolving fuzzy systems for adjustment of segmentation
parameters have been proposed recently (Evolving fuzzy image segmentation --
EFIS [1]. However, similar to any other algorithm, EFIS too suffers from a few
limitations when used in practice. As a major drawback, EFIS depends on
detection of the object of interest for feature calculation, a task that is
highly application-dependent. In this paper, a new version of EFIS is proposed
to overcome these limitations. The new EFIS, called self-configuring EFIS
(SC-EFIS), uses available training data to auto-configure the parameters that
are fixed in EFIS. As well, the proposed SC-EFIS relies on a feature selection
process that does not require the detection of a region of interest (ROI).Comment: To appear in proceedings of The 14th International Conference on
Machine Learning and Applications (IEEE ICMLA 2015), Miami, Florida, USA,
201
Generic 3D Representation via Pose Estimation and Matching
Though a large body of computer vision research has investigated developing
generic semantic representations, efforts towards developing a similar
representation for 3D has been limited. In this paper, we learn a generic 3D
representation through solving a set of foundational proxy 3D tasks:
object-centric camera pose estimation and wide baseline feature matching. Our
method is based upon the premise that by providing supervision over a set of
carefully selected foundational tasks, generalization to novel tasks and
abstraction capabilities can be achieved. We empirically show that the internal
representation of a multi-task ConvNet trained to solve the above core problems
generalizes to novel 3D tasks (e.g., scene layout estimation, object pose
estimation, surface normal estimation) without the need for fine-tuning and
shows traits of abstraction abilities (e.g., cross-modality pose estimation).
In the context of the core supervised tasks, we demonstrate our representation
achieves state-of-the-art wide baseline feature matching results without
requiring apriori rectification (unlike SIFT and the majority of learned
features). We also show 6DOF camera pose estimation given a pair local image
patches. The accuracy of both supervised tasks come comparable to humans.
Finally, we contribute a large-scale dataset composed of object-centric street
view scenes along with point correspondences and camera pose information, and
conclude with a discussion on the learned representation and open research
questions.Comment: Published in ECCV16. See the project website
http://3drepresentation.stanford.edu/ and dataset website
https://github.com/amir32002/3D_Street_Vie
Counterfeit Detection with Multispectral Imaging
Multispectral imaging is becoming more practical for a variety of applications due to its ability to provide hyper specific information through a non-destructive analysis. Multispectral imaging cameras can detect light reflectance from different spectral bands of visible and nonvisible wavelengths. Based on the different amount of band reflectance, information can be deduced on the subject. Counterfeit detection applications of multispectral imaging will be decomposed and analyzed in this thesis. Relations between light reflectance and objects’ features will be addressed. The process of the analysis will be broken down to show how this information can be used to provide more insight on the object. This technology provides desired and viable information that can greatly improve multiple fields. For this paper, the multispectral imaging research process of element solution concentrations and counterfeit detection applications of multispectral imaging will be discussed. BaySpec’s OCI-M Ultra Compact Multispectral Imager is used for data collection. This camera is capable of capturing light reflectance from wavelengths of 400 – 1000 nm. Further research opportunities of developing self-automated unmanned aerial vehicles for precision agriculture and extending counterfeit detection applications will also be explored
Keyframe detection in visual lifelogs
The SenseCam is a wearable camera that passively captures images. Therefore, it requires no conscious effort by a user in taking a photo. A Visual Diary from such a source could prove to be a valuable tool in assisting the elderly, individuals with neurodegenerative diseases, or other traumas. One issue with Visual Lifelogs is the large volume of image data generated. In previous work we spit a day's worth of images into more manageable segments, i.e. into distinct events or activities. However, each event coud stil consist of 80-100 images. thus, in this paper we propose a novel approach to selecting the key images within an event using a combination of MPEG-7 and Scale Invariant Feature Transform (SIFT) features
Feature-Guided Black-Box Safety Testing of Deep Neural Networks
Despite the improved accuracy of deep neural networks, the discovery of
adversarial examples has raised serious safety concerns. Most existing
approaches for crafting adversarial examples necessitate some knowledge
(architecture, parameters, etc.) of the network at hand. In this paper, we
focus on image classifiers and propose a feature-guided black-box approach to
test the safety of deep neural networks that requires no such knowledge. Our
algorithm employs object detection techniques such as SIFT (Scale Invariant
Feature Transform) to extract features from an image. These features are
converted into a mutable saliency distribution, where high probability is
assigned to pixels that affect the composition of the image with respect to the
human visual system. We formulate the crafting of adversarial examples as a
two-player turn-based stochastic game, where the first player's objective is to
minimise the distance to an adversarial example by manipulating the features,
and the second player can be cooperative, adversarial, or random. We show that,
theoretically, the two-player game can con- verge to the optimal strategy, and
that the optimal strategy represents a globally minimal adversarial image. For
Lipschitz networks, we also identify conditions that provide safety guarantees
that no adversarial examples exist. Using Monte Carlo tree search we gradually
explore the game state space to search for adversarial examples. Our
experiments show that, despite the black-box setting, manipulations guided by a
perception-based saliency distribution are competitive with state-of-the-art
methods that rely on white-box saliency matrices or sophisticated optimization
procedures. Finally, we show how our method can be used to evaluate robustness
of neural networks in safety-critical applications such as traffic sign
recognition in self-driving cars.Comment: 35 pages, 5 tables, 23 figure
- …