4,965 research outputs found
Automatic Structural Scene Digitalization
In this paper, we present an automatic system for the analysis and labeling
of structural scenes, floor plan drawings in Computer-aided Design (CAD)
format. The proposed system applies a fusion strategy to detect and recognize
various components of CAD floor plans, such as walls, doors, windows and other
ambiguous assets. Technically, a general rule-based filter parsing method is
fist adopted to extract effective information from the original floor plan.
Then, an image-processing based recovery method is employed to correct
information extracted in the first step. Our proposed method is fully automatic
and real-time. Such analysis system provides high accuracy and is also
evaluated on a public website that, on average, archives more than ten
thousands effective uses per day and reaches a relatively high satisfaction
rate.Comment: paper submitted to PloS On
Spatio-temporal Video Parsing for Abnormality Detection
Abnormality detection in video poses particular challenges due to the
infinite size of the class of all irregular objects and behaviors. Thus no (or
by far not enough) abnormal training samples are available and we need to find
abnormalities in test data without actually knowing what they are.
Nevertheless, the prevailing concept of the field is to directly search for
individual abnormal local patches or image regions independent of another. To
address this problem, we propose a method for joint detection of abnormalities
in videos by spatio-temporal video parsing. The goal of video parsing is to
find a set of indispensable normal spatio-temporal object hypotheses that
jointly explain all the foreground of a video, while, at the same time, being
supported by normal training samples. Consequently, we avoid a direct detection
of abnormalities and discover them indirectly as those hypotheses which are
needed for covering the foreground without finding an explanation for
themselves by normal samples. Abnormalities are localized by MAP inference in a
graphical model and we solve it efficiently by formulating it as a convex
optimization problem. We experimentally evaluate our approach on several
challenging benchmark sets, improving over the state-of-the-art on all standard
benchmarks both in terms of abnormality classification and localization.Comment: 15 pages, 12 figures, 3 table
Learning Correspondence Structures for Person Re-identification
This paper addresses the problem of handling spatial misalignments due to
camera-view changes or human-pose variations in person re-identification. We
first introduce a boosting-based approach to learn a correspondence structure
which indicates the patch-wise matching probabilities between images from a
target camera pair. The learned correspondence structure can not only capture
the spatial correspondence pattern between cameras but also handle the
viewpoint or human-pose variation in individual images. We further introduce a
global constraint-based matching process. It integrates a global matching
constraint over the learned correspondence structure to exclude cross-view
misalignments during the image patch matching process, hence achieving a more
reliable matching score between images. Finally, we also extend our approach by
introducing a multi-structure scheme, which learns a set of local
correspondence structures to capture the spatial correspondence sub-patterns
between a camera pair, so as to handle the spatial misalignments between
individual images in a more precise way. Experimental results on various
datasets demonstrate the effectiveness of our approach.Comment: IEEE Trans. Image Processing, vol. 26, no. 5, pp. 2438-2453, 2017.
The project page for this paper is available at
http://min.sjtu.edu.cn/lwydemo/personReID.htm arXiv admin note: text overlap
with arXiv:1504.0624
Neural Motifs: Scene Graph Parsing with Global Context
We investigate the problem of producing structured graph representations of
visual scenes. Our work analyzes the role of motifs: regularly appearing
substructures in scene graphs. We present new quantitative insights on such
repeated structures in the Visual Genome dataset. Our analysis shows that
object labels are highly predictive of relation labels but not vice-versa. We
also find that there are recurring patterns even in larger subgraphs: more than
50% of graphs contain motifs involving at least two relations. Our analysis
motivates a new baseline: given object detections, predict the most frequent
relation between object pairs with the given labels, as seen in the training
set. This baseline improves on the previous state-of-the-art by an average of
3.6% relative improvement across evaluation settings. We then introduce Stacked
Motif Networks, a new architecture designed to capture higher order motifs in
scene graphs that further improves over our strong baseline by an average 7.1%
relative gain. Our code is available at github.com/rowanz/neural-motifs.Comment: CVPR 2018 camera read
ARCHITECTURE ESTIMATION FROM SPARSE IMAGES USING GRAMMATICAL SHAPE PRIORS FOR CULTURAL HERITAGE
The estimation and reconstruction of 3D architectural structures is of great in- terest in computer vision, as well as cultural heritage. This dissertation proposes a novel approach to solve the di??cult problem of estimating architectural structures from sparse images and e??ciently generating 3D models from estimation results for cultural heritage. This approach takes as input one plan drawing image and a few fac¸ade images, and provides as output the volumetric 3D models which represent the structures in the sparse images. Support of this research goal has motivated new investigations in underlying structure estimation problems including detecting structural feature points in 2D images, decomposing plan drawings into semantically meaningful shapes for medieval castles, estimating rectangular and Gothic fac¸ades using shape priors, and estimating complete 3D models for architectural structures using a novel volumetric shape grammar. Major outstanding challenges in each of these topic areas are addressed resulting in contributions to current state-of-the-art as it applied to these di??cult problems
Parsing Occluded People by Flexible Compositions
This paper presents an approach to parsing humans when there is significant
occlusion. We model humans using a graphical model which has a tree structure
building on recent work [32, 6] and exploit the connectivity prior that, even
in presence of occlusion, the visible nodes form a connected subtree of the
graphical model. We call each connected subtree a flexible composition of
object parts. This involves a novel method for learning occlusion cues. During
inference we need to search over a mixture of different flexible models. By
exploiting part sharing, we show that this inference can be done extremely
efficiently requiring only twice as many computations as searching for the
entire object (i.e., not modeling occlusion). We evaluate our model on the
standard benchmarked "We Are Family" Stickmen dataset and obtain significant
performance improvements over the best alternative algorithms.Comment: CVPR 15 Camera Read
A Deep Understanding of Structural and Functional Behavior of Tabular and Graphical Modules in Technical Documents
The rapid increase of published research papers in recent years has escalated the need for automated ways to process and understand them. The successful recognition of the information that is contained in technical documents, depends on the understanding of the document’s individual modalities. These modalities include tables, graphics, diagrams and etc. as defined in Bourbakis’ pioneering work. However, the depth of understanding is correlated to the efficiency of detection and recognition. In this work, a novel methodology is proposed for automatic processing of and understanding of tables and graphics images in technical document. Previous attempts on tables and graphics understanding retrieve only superficial knowledge such as table contents and axis values. However, the focus on capturing the internal associations and relations between the extracted data from each figure is studied here. The proposed methodology is divided into the following steps: 1) figure detection, 2) figure recognition, 3) figure understanding, by figures we mean tables, graphics and diagrams. More specifically, we evaluate different heuristic and learning methods for classifying table and graphics images as part of the detection module. Table recognition and deep understanding includes the extraction of the knowledge that is illustrated in a table image along with the deeper associations between the table variables. The graphics recognition module follows a clustering based approach in order to recognize middle points. Middle points are 2D points where the direction of the curves changes. They delimit the straight line segments that construct the graphics curves. We use these detected middle points in order to understand various features of each line segment and the associations between them. Additionally, we convert the extracted internal tabular associations and the captured curves’ structural and functional behavior into a common and at the same time unique form of representation, which is the Stochastic Petri-net (SPN) graphs. The use of SPN graphs allow for the merging of different document modalities through the functions that describe them, without any prior knowledge about what these functions are. Finally, we achieve a higher level of document understanding through the synergistic merging of the aforementioned SPN graphs that we extract from the table and graphics modalities. We provide results from every step of the document modalities understanding methodologies and the synergistic merging as proof of concept for this research
- …