Search CORE

48 research outputs found

End-to-End Localization and Ranking for Relative Attributes

Author: A Shrivastava
CL Zitnick
J. R. R. Uijlings
M Rastegari
MH Kiapour
N Kumar
S Branson
S Li
Publication venue
Publication date: 08/08/2016
Field of study

We propose an end-to-end deep convolutional network to simultaneously localize and rank relative visual attributes, given only weakly-supervised pairwise image comparisons. Unlike previous methods, our network jointly learns the attribute's features, localization, and ranker. The localization module of our network discovers the most informative image region for the attribute, which is then used by the ranking module to learn a ranking model of the attribute. Our end-to-end framework also significantly speeds up processing and is much faster than previous methods. We show state-of-the-art ranking results on various relative attribute datasets, and our qualitative localization results clearly demonstrate our network's ability to learn meaningful image patches.Comment: Appears in European Conference on Computer Vision (ECCV), 201

arXiv.org e-Print Archive

Crossref

A Diagram Is Worth A Dozen Images

Author: B Alexe
CL Zitnick
F Pedregosa
J von Engelhardt
JRR Uijlings
M Twyman
R Horn
R Koncel-Kedziorski
RK Srihari
RW Ferguson
S Antol
S Hochreiter
SC Zhu
SK Card
Publication venue
Publication date: 23/03/2016
Field of study

Diagrams are common tools for representing complex concepts, relationships and events, often when it would be difficult to portray the same information with natural images. Understanding natural images has been extensively studied in computer vision, while diagram understanding has received little attention. In this paper, we study the problem of diagram interpretation and reasoning, the challenging task of identifying the structure of a diagram and the semantics of its constituents and their relationships. We introduce Diagram Parse Graphs (DPG) as our representation to model the structure of diagrams. We define syntactic parsing of diagrams as learning to infer DPGs for diagrams and study semantic interpretation and reasoning of diagrams in the context of diagram question answering. We devise an LSTM-based method for syntactic parsing of diagrams and introduce a DPG-based attention model for diagram question answering. We compile a new dataset of diagrams with exhaustive annotations of constituents and relationships for over 5,000 diagrams and 15,000 questions and answers. Our results show the significance of our models for syntactic parsing and question answering in diagrams using DPGs

arXiv.org e-Print Archive

Crossref

Multi-layer Architecture For Storing Visual Data Based on WCF and Microsoft SQL Server Database

Author: A. Biniaz
C.L. Zitnick
D.G. Lowe
H. Bay
H. Bay
J. Śmietański
J.L. Chu
K. Łapa
M. Bazarganigilani
M. Chen
M. Chromiak
M. Zalasiński
M.R. Ogiela
M.R. Ogiela
P. Drozda
P. Drozda
R. Grycuk
R. Grycuk
R. Grycuk
R. Hirschheim
R.C. Veltkamp
S. Makinana
S. Mallik
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

In this paper we present a novel architecture for storing visual data. Effective storing, browsing and searching collections of images is one of the most important challenges of computer science. The design of architecture for storing such data requires a set of tools and frameworks such as SQL database management systems and service-oriented frameworks. The proposed solution is based on a multi-layer architecture, which allows to replace any component without recompilation of other components. The approach contains five components, i.e. Model, Base Engine, Concrete Engine, CBIR service and Presentation. They were based on two well-known design patterns: Dependency Injection and Inverse of Control. For experimental purposes we implemented the SURF local interest point detector as a feature extractor and

K

-means clustering as indexer. The presented architecture is intended for content-based retrieval systems simulation purposes as well as for real-world CBIR tasks.Comment: Accepted for the 14th International Conference on Artificial Intelligence and Soft Computing, ICAISC, June 14-18, 2015, Zakopane, Polan

arXiv.org e-Print Archive

Crossref

Vehicle Detection Using Alex Net and Faster R-CNN Deep Learning Models: A Comparative Study

Author: A Ottlik
B Su
B Tian
C Hu
CH Lampert
CL Zitnick
J Bromley
J Hosang
JR Uijlings
K He
LW Tsai
MD Zeiler
R Cucchiara
RS Feris
S Gupte
S Messelodi
X Chen
X Luo
Y Gao
Y Lecun
Y Zhou
Z Zivkovic
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

This paper has been presented at : 5th International Visual Informatics Conference (IVIC 2017)This paper presents a comparative study of two deep learning models used here for vehicle detection. Alex Net and Faster R-CNN are compared with the analysis of an urban video sequence. Several tests were carried to evaluate the quality of detections, failure rates and times employed to complete the detection task. The results allow to obtain important conclusions regarding the architectures and strategies used for implementing such network for the task of video detection, encouraging future research in this topic.S.A. Velastin is grateful to funding received from the Universidad Carlos III de Madrid, the European Union’s Seventh Framework Programme for research, technological development and demonstration under grant agreement no. 600371, el Ministerio de Economía y Competitividad (COFUND2013-51509) and Banco Santander. The authors wish to thank Dr. Fei Yin for the code for metrics employed for evaluations. Finally, we gratefully acknowledge the support of NVIDIA Corporation with the donation of the GPUs used for this research. The data and code used for this work is available upon request from the authors

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Universidad Carlos III de Madrid e-Archivo

Intrinsic Textures for Relightable Free-Viewpoint Video

Author: A. Bousseau
C. Zitnick
E.H. Land
G. Li
H.P.A. Lensch
J. Starck
J. Starck
J.Y. Guillemaut
M.F. Tappen
N. Ahmed
P.F. Felzenszwalb
R. Ramamoorthi
S. Vedula
T. Kanade
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

This paper presents an approach to estimate the intrinsic texture properties (albedo, shading, normal) of scenes from multiple view acquisition under unknown illumination conditions. We introduce the concept of intrinsic textures, which are pixel-resolution surface textures representing the intrinsic appearance parameters of a scene. Unlike previous video relighting methods, the approach does not assume regions of uniform albedo, which makes it applicable to richly textured scenes. We show that intrinsic image methods can be used to refine an initial, low-frequency shading estimate based on a global lighting reconstruction from an original texture and coarse scene geometry in order to resolve the inherent global ambiguity in shading. The method is applied to relighting of free-viewpoint rendering from multiple view video capture. This demonstrates relighting with reproduction of fine surface detail. Quantitative evaluation on synthetic models with textured appearance shows accurate estimation of intrinsic surface reflectance properties. © 2014 Springer International Publishing

CiteSeerX

Crossref

University of Surrey

Surrey Research Insight