2,445 research outputs found
Digital Image Access & Retrieval
The 33th Annual Clinic on Library Applications of Data Processing, held at the University of Illinois at Urbana-Champaign in March of 1996, addressed the theme of "Digital Image Access & Retrieval." The papers from this conference cover a wide range of topics concerning digital imaging technology for visual resource collections. Papers covered three general areas: (1) systems, planning, and implementation; (2) automatic and semi-automatic indexing; and (3) preservation with the bulk of the conference focusing on indexing and retrieval.published or submitted for publicatio
2D+3D Indoor Scene Understanding from a Single Monocular Image
Scene understanding, as a broad field encompassing many
subtopics, has gained great interest in recent years. Among these
subtopics, indoor scene understanding, having its own specific
attributes and challenges compared to outdoor scene under-
standing, has drawn a lot of attention. It has potential
applications in a wide variety of domains, such as robotic
navigation, object grasping for personal robotics, augmented
reality, etc. To our knowledge, existing research for indoor
scenes typically makes use of depth sensors, such as Kinect, that
is however not always available.
In this thesis, we focused on addressing the indoor scene
understanding tasks in a general case, where only a monocular
color image of the scene is available. Specifically, we first
studied the problem of estimating a detailed depth map from a
monocular image. Then, benefiting from deep-learning-based depth
estimation, we tackled the higher-level tasks of 3D box proposal
generation, and scene parsing with instance segmentation,
semantic labeling and support relationship inference from a
monocular image. Our research on indoor scene understanding
provides a comprehensive scene interpretation at various
perspectives and scales.
For monocular image depth estimation, previous approaches are
limited in that they only reason about depth locally on a single
scale, and do not utilize the important information of geometric
scene structures. Here, we developed a novel graphical model,
which reasons about detailed depth while leveraging geometric
scene structures at multiple scales.
For 3D box proposals, to our best knowledge, our approach
constitutes the first attempt to reason about class-independent
3D box proposals from a single monocular image. To this end, we
developed a novel integrated, differentiable framework that
estimates depth, extracts a volumetric scene representation and
generates 3D proposals. At the core of this framework lies a
novel residual, differentiable truncated signed distance function
module, which is able to handle the relatively low accuracy of
the predicted depth map.
For scene parsing, we tackled its three subtasks of instance
segmentation, se- mantic labeling, and the support relationship
inference on instances. Existing work typically reasons about
these individual subtasks independently. Here, we leverage the
fact that they bear strong connections, which can facilitate
addressing these sub- tasks if modeled properly. To this end, we
developed an integrated graphical model that reasons about the
mutual relationships of the above subtasks.
In summary, in this thesis, we introduced novel and effective
methodologies for each of three indoor scene understanding tasks,
i.e., depth estimation, 3D box proposal generation, and scene
parsing, and exploited the dependencies on depth estimates of the
latter two tasks. Evaluation on several benchmark datasets
demonstrated the effectiveness of our algorithms and the benefits
of utilizing depth estimates for higher-level tasks
Three for one and one for three: Flow, Segmentation, and Surface Normals
Optical flow, semantic segmentation, and surface normals represent different
information modalities, yet together they bring better cues for scene
understanding problems. In this paper, we study the influence between the three
modalities: how one impacts on the others and their efficiency in combination.
We employ a modular approach using a convolutional refinement network which is
trained supervised but isolated from RGB images to enforce joint modality
features. To assist the training process, we create a large-scale synthetic
outdoor dataset that supports dense annotation of semantic segmentation,
optical flow, and surface normals. The experimental results show positive
influence among the three modalities, especially for objects' boundaries,
region consistency, and scene structures.Comment: BMVC 201
- …