64 research outputs found
Phenomenological modeling of image irradiance for non-Lambertian surfaces under natural illumination.
Various vision tasks are usually confronted by appearance variations due to changes of illumination. For instance, in a recognition system, it has been shown that the variability in human face appearance is owed to changes to lighting conditions rather than person\u27s identity. Theoretically, due to the arbitrariness of the lighting function, the space of all possible images of a fixed-pose object under all possible illumination conditions is infinite dimensional. Nonetheless, it has been proven that the set of images of a convex Lambertian surface under distant illumination lies near a low dimensional linear subspace. This result was also extended to include non-Lambertian objects with non-convex geometry. As such, vision applications, concerned with the recovery of illumination, reflectance or surface geometry from images, would benefit from a low-dimensional generative model which captures appearance variations w.r.t. illumination conditions and surface reflectance properties. This enables the formulation of such inverse problems as parameter estimation. Typically, subspace construction boils to performing a dimensionality reduction scheme, e.g. Principal Component Analysis (PCA), on a large set of (real/synthesized) images of object(s) of interest with fixed pose but different illumination conditions. However, this approach has two major problems. First, the acquired/rendered image ensemble should be statistically significant vis-a-vis capturing the full behavior of the sources of variations that is of interest, in particular illumination and reflectance. Second, the curse of dimensionality hinders numerical methods such as Singular Value Decomposition (SVD) which becomes intractable especially with large number of large-sized realizations in the image ensemble. One way to bypass the need of large image ensemble is to construct appearance subspaces using phenomenological models which capture appearance variations through mathematical abstraction of the reflection process. In particular, the harmonic expansion of the image irradiance equation can be used to derive an analytic subspace to represent images under fixed pose but different illumination conditions where the image irradiance equation has been formulated in a convolution framework. Due to their low-frequency nature, irradiance signals can be represented using low-order basis functions, where Spherical Harmonics (SH) has been extensively adopted. Typically, an ideal solution to the image irradiance (appearance) modeling problem should be able to incorporate complex illumination, cast shadows as well as realistic surface reflectance properties, while moving away from the simplifying assumptions of Lambertian reflectance and single-source distant illumination. By handling arbitrary complex illumination and non-Lambertian reflectance, the appearance model proposed in this dissertation moves the state of the art closer to the ideal solution. This work primarily addresses the geometrical compliance of the hemispherical basis for representing surface reflectance while presenting a compact, yet accurate representation for arbitrary materials. To maintain the plausibility of the resulting appearance, the proposed basis is constructed in a manner that satisfies the Helmholtz reciprocity property while avoiding high computational complexity. It is believed that having the illumination and surface reflectance represented in the spherical and hemispherical domains respectively, while complying with the physical properties of the surface reflectance would provide better approximation accuracy of image irradiance when compared to the representation in the spherical domain. Discounting subsurface scattering and surface emittance, this work proposes a surface reflectance basis, based on hemispherical harmonics (HSH), defined on the Cartesian product of the incoming and outgoing local hemispheres (i.e. w.r.t. surface points). This basis obeys physical properties of surface reflectance involving reciprocity and energy conservation. The basis functions are validated using analytical reflectance models as well as scattered reflectance measurements which might violate the Helmholtz reciprocity property (this can be filtered out through the process of projecting them on the subspace spanned by the proposed basis, where the reciprocity property is preserved in the least-squares sense). The image formation process of isotropic surfaces under arbitrary distant illumination is also formulated in the frequency space where the orthogonality relation between illumination and reflectance bases is encoded in what is termed as irradiance harmonics. Such harmonics decouple the effect of illumination and reflectance from the underlying pose and geometry. Further, a bilinear approach to analytically construct irradiance subspace is proposed in order to tackle the inherent problem of small-sample-size and curse of dimensionality. The process of finding the analytic subspace is posed as establishing a relation between its principal components and that of the irradiance harmonics basis functions. It is also shown how to incorporate prior information about natural illumination and real-world surface reflectance characteristics in order to capture the full behavior of complex illumination and non-Lambertian reflectance. The use of the presented theoretical framework to develop practical algorithms for shape recovery is further presented where the hitherto assumed Lambertian assumption is relaxed. With a single image of unknown general illumination, the underlying geometrical structure can be recovered while accounting explicitly for object reflectance characteristics (e.g. human skin types for facial images and teeth reflectance for human jaw reconstruction) as well as complex illumination conditions. Experiments on synthetic and real images illustrate the robustness of the proposed appearance model vis-a-vis illumination variation. Keywords: computer vision, computer graphics, shading, illumination modeling, reflectance representation, image irradiance, frequency space representations, {hemi)spherical harmonics, analytic bilinear PCA, model-based bilinear PCA, 3D shape reconstruction, statistical shape from shading
Can point cloud networks learn statistical shape models of anatomies?
Statistical Shape Modeling (SSM) is a valuable tool for investigating and
quantifying anatomical variations within populations of anatomies. However,
traditional correspondence-based SSM generation methods have a prohibitive
inference process and require complete geometric proxies (e.g., high-resolution
binary volumes or surface meshes) as input shapes to construct the SSM.
Unordered 3D point cloud representations of shapes are more easily acquired
from various medical imaging practices (e.g., thresholded images and surface
scanning). Point cloud deep networks have recently achieved remarkable success
in learning permutation-invariant features for different point cloud tasks
(e.g., completion, semantic segmentation, classification). However, their
application to learning SSM from point clouds is to-date unexplored. In this
work, we demonstrate that existing point cloud encoder-decoder-based completion
networks can provide an untapped potential for SSM, capturing population-level
statistical representations of shapes while reducing the inference burden and
relaxing the input requirement. We discuss the limitations of these techniques
to the SSM application and suggest future improvements. Our work paves the way
for further exploration of point cloud deep learning for SSM, a promising
avenue for advancing shape analysis literature and broadening SSM to diverse
use cases.Comment: Accepted to MICCAI 2023. 13 pages, 5 figures, appendi
Benchmarking Scalable Epistemic Uncertainty Quantification in Organ Segmentation
Deep learning based methods for automatic organ segmentation have shown
promise in aiding diagnosis and treatment planning. However, quantifying and
understanding the uncertainty associated with model predictions is crucial in
critical clinical applications. While many techniques have been proposed for
epistemic or model-based uncertainty estimation, it is unclear which method is
preferred in the medical image analysis setting. This paper presents a
comprehensive benchmarking study that evaluates epistemic uncertainty
quantification methods in organ segmentation in terms of accuracy, uncertainty
calibration, and scalability. We provide a comprehensive discussion of the
strengths, weaknesses, and out-of-distribution detection capabilities of each
method as well as recommendations for future improvements. These findings
contribute to the development of reliable and robust models that yield accurate
segmentations while effectively quantifying epistemic uncertainty.Comment: Accepted to the UNSURE Workshop held in conjunction with MICCAI 202
RENs: Relevance Encoding Networks
The manifold assumption for high-dimensional data assumes that the data is
generated by varying a set of parameters obtained from a low-dimensional latent
space. Deep generative models (DGMs) are widely used to learn data
representations in an unsupervised way. DGMs parameterize the underlying
low-dimensional manifold in the data space using bottleneck architectures such
as variational autoencoders (VAEs). The bottleneck dimension for VAEs is
treated as a hyperparameter that depends on the dataset and is fixed at design
time after extensive tuning. As the intrinsic dimensionality of most real-world
datasets is unknown, often, there is a mismatch between the intrinsic
dimensionality and the latent dimensionality chosen as a hyperparameter. This
mismatch can negatively contribute to the model performance for representation
learning and sample generation tasks. This paper proposes relevance encoding
networks (RENs): a novel probabilistic VAE-based framework that uses the
automatic relevance determination (ARD) prior in the latent space to learn the
data-specific bottleneck dimensionality. The relevance of each latent dimension
is directly learned from the data along with the other model parameters using
stochastic gradient descent and a reparameterization trick adapted to
non-Gaussian priors. We leverage the concept of DeepSets to capture permutation
invariant statistical properties in both data and latent spaces for relevance
determination. The proposed framework is general and flexible and can be used
for the state-of-the-art VAE models that leverage regularizers to impose
specific characteristics in the latent space (e.g., disentanglement). With
extensive experimentation on synthetic and public image datasets, we show that
the proposed model learns the relevant latent bottleneck dimensionality without
compromising the representation and generation quality of the samples
To pretrain or not to pretrain? A case study of domain-specific pretraining for semantic segmentation in histopathology
Annotating medical imaging datasets is costly, so fine-tuning (or transfer
learning) is the most effective method for digital pathology vision
applications such as disease classification and semantic segmentation. However,
due to texture bias in models trained on real-world images, transfer learning
for histopathology applications might result in underperforming models, which
necessitates the need for using unlabeled histopathology data and
self-supervised methods to discover domain-specific characteristics. Here, we
tested the premise that histopathology-specific pretrained models provide
better initializations for pathology vision tasks, i.e., gland and cell
segmentation. In this study, we compare the performance of gland and cell
segmentation tasks with domain-specific and non-domain-specific pretrained
weights. Moreover, we investigate the data size at which domain-specific
pretraining produces a statistically significant difference in performance. In
addition, we investigated whether domain-specific initialization improves the
effectiveness of out-of-domain testing on distinct datasets but the same task.
The results indicate that performance gain using domain-specific pretraining
depends on both the task and the size of the training dataset. In instances
with limited dataset sizes, a significant improvement in gland segmentation
performance was also observed, whereas models trained on cell segmentation
datasets exhibit no improvement
SetGAN: Improving the stability and diversity of generative models through a permutation invariant architecture
Generative adversarial networks (GANs) have proven effective in modeling
distributions of high-dimensional data. However, their training instability is
a well-known hindrance to convergence, which results in practical challenges in
their applications to novel data. Furthermore, even when convergence is
reached, GANs can be affected by mode collapse, a phenomenon for which the
generator learns to model only a small part of the target distribution,
disregarding the vast majority of the data manifold or distribution. This paper
addresses these challenges by introducing SetGAN, an adversarial architecture
that processes sets of generated and real samples, and discriminates between
the origins of these sets (i.e., training versus generated data) in a flexible,
permutation invariant manner. We also propose a new metric to quantitatively
evaluate GANs that does not require previous knowledge of the application,
apart from the data itself. Using the new metric, in conjunction with the
state-of-the-art evaluation methods, we show that the proposed architecture,
when compared with GAN variants stemming from similar strategies, produces more
accurate models of the input data in a way that is also less sensitive to
hyperparameter settings
- …