2,543 research outputs found
Redefining A in RGBA: Towards a Standard for Graphical 3D Printing
Advances in multimaterial 3D printing have the potential to reproduce various
visual appearance attributes of an object in addition to its shape. Since many
existing 3D file formats encode color and translucency by RGBA textures mapped
to 3D shapes, RGBA information is particularly important for practical
applications. In contrast to color (encoded by RGB), which is specified by the
object's reflectance, selected viewing conditions and a standard observer,
translucency (encoded by A) is neither linked to any measurable physical nor
perceptual quantity. Thus, reproducing translucency encoded by A is open for
interpretation.
In this paper, we propose a rigorous definition for A suitable for use in
graphical 3D printing, which is independent of the 3D printing hardware and
software, and which links both optical material properties and perceptual
uniformity for human observers. By deriving our definition from the absorption
and scattering coefficients of virtual homogeneous reference materials with an
isotropic phase function, we achieve two important properties. First, a simple
adjustment of A is possible, which preserves the translucency appearance if an
object is re-scaled for printing. Second, determining the value of A for a real
(potentially non-homogeneous) material, can be achieved by minimizing a
distance function between light transport measurements of this material and
simulated measurements of the reference materials. Such measurements can be
conducted by commercial spectrophotometers used in graphic arts.
Finally, we conduct visual experiments employing the method of constant
stimuli, and derive from them an embedding of A into a nearly perceptually
uniform scale of translucency for the reference materials.Comment: 20 pages (incl. appendices), 20 figures. Version with higher quality
images: https://cloud-ext.igd.fraunhofer.de/s/pAMH67XjstaNcrF (main article)
and https://cloud-ext.igd.fraunhofer.de/s/4rR5bH3FMfNsS5q (appendix).
Supplemental material including code:
https://cloud-ext.igd.fraunhofer.de/s/9BrZaj5Uh5d0cOU/downloa
Multi-Modal Multi-Scale Deep Learning for Large-Scale Image Annotation
Image annotation aims to annotate a given image with a variable number of
class labels corresponding to diverse visual concepts. In this paper, we
address two main issues in large-scale image annotation: 1) how to learn a rich
feature representation suitable for predicting a diverse set of visual concepts
ranging from object, scene to abstract concept; 2) how to annotate an image
with the optimal number of class labels. To address the first issue, we propose
a novel multi-scale deep model for extracting rich and discriminative features
capable of representing a wide range of visual concepts. Specifically, a novel
two-branch deep neural network architecture is proposed which comprises a very
deep main network branch and a companion feature fusion network branch designed
for fusing the multi-scale features computed from the main branch. The deep
model is also made multi-modal by taking noisy user-provided tags as model
input to complement the image input. For tackling the second issue, we
introduce a label quantity prediction auxiliary task to the main label
prediction task to explicitly estimate the optimal label number for a given
image. Extensive experiments are carried out on two large-scale image
annotation benchmark datasets and the results show that our method
significantly outperforms the state-of-the-art.Comment: Submited to IEEE TI
Multiscale sampling model for motion integration
Biologically plausible strategies for visual scene integration across spatial and temporal domains continues to be a challenging topic. The fundamental question we address is whether classical problems in motion integration, such as the aperture problem, can be solved in a model that samples the visual scene at multiple spatial and temporal scales in parallel. We hypothesize that fast interareal connections that allow feedback of information between cortical layers are the key processes that disambiguate motion direction. We developed a neural model showing how the aperture problem can be solved using different spatial sampling scales between LGN, V1 layer 4, V1 layer 6, and area MT. Our results suggest that multiscale sampling, rather than feedback explicitly, is the key process that gives rise to end-stopped cells in V1 and enables area MT to solve the aperture problem without the need for calculating intersecting constraints or crafting intricate patterns of spatiotemporal receptive fields. Furthermore, the model explains why end-stopped cells no longer emerge in the absence of V1 layer 6 activity (Bolz & Gilbert, 1986), why V1 layer 4 cells are significantly more end-stopped than V1 layer 6 cells (Pack, Livingstone, Duffy, & Born, 2003), and how it is possible to have a solution to the aperture problem in area MT with no solution in V1 in the presence of driving feedback. In summary, while much research in the field focuses on how a laminar architecture can give rise to complicated spatiotemporal receptive fields to solve problems in the motion domain, we show that one can reframe motion integration as an emergent property of multiscale sampling achieved concurrently within lamina and across multiple visual areas.This work was supported in part by CELEST, a National Science Foundation Science of Learning Center; NSF SBE-0354378 and OMA-0835976; ONR (N00014-11-1-0535); and AFOSR (FA9550-12-1-0436). (CELEST, a National Science Foundation Science of Learning Center; SBE-0354378 - NSF; OMA-0835976 - NSF; N00014-11-1-0535 - ONR; FA9550-12-1-0436 - AFOSR)Published versio
- …