1,560 research outputs found
Texture analysis using the trace transform
EThOS - Electronic Theses Online ServiceGBUnited Kingdo
AutoFocusFormer: Image Segmentation off the Grid
Real world images often have highly imbalanced content density. Some areas
are very uniform, e.g., large patches of blue sky, while other areas are
scattered with many small objects. Yet, the commonly used successive grid
downsampling strategy in convolutional deep networks treats all areas equally.
Hence, small objects are represented in very few spatial locations, leading to
worse results in tasks such as segmentation. Intuitively, retaining more pixels
representing small objects during downsampling helps to preserve important
information. To achieve this, we propose AutoFocusFormer (AFF), a
local-attention transformer image recognition backbone, which performs adaptive
downsampling by learning to retain the most important pixels for the task.
Since adaptive downsampling generates a set of pixels irregularly distributed
on the image plane, we abandon the classic grid structure. Instead, we develop
a novel point-based local attention block, facilitated by a balanced clustering
module and a learnable neighborhood merging module, which yields
representations for our point-based versions of state-of-the-art segmentation
heads. Experiments show that our AutoFocusFormer (AFF) improves significantly
over baseline models of similar sizes.Comment: CVPR 202
Steered mixture-of-experts for light field images and video : representation and coding
Research in light field (LF) processing has heavily increased over the last decade. This is largely driven by the desire to achieve the same level of immersion and navigational freedom for camera-captured scenes as it is currently available for CGI content. Standardization organizations such as MPEG and JPEG continue to follow conventional coding paradigms in which viewpoints are discretely represented on 2-D regular grids. These grids are then further decorrelated through hybrid DPCM/transform techniques. However, these 2-D regular grids are less suited for high-dimensional data, such as LFs. We propose a novel coding framework for higher-dimensional image modalities, called Steered Mixture-of-Experts (SMoE). Coherent areas in the higher-dimensional space are represented by single higher-dimensional entities, called kernels. These kernels hold spatially localized information about light rays at any angle arriving at a certain region. The global model consists thus of a set of kernels which define a continuous approximation of the underlying plenoptic function. We introduce the theory of SMoE and illustrate its application for 2-D images, 4-D LF images, and 5-D LF video. We also propose an efficient coding strategy to convert the model parameters into a bitstream. Even without provisions for high-frequency information, the proposed method performs comparable to the state of the art for low-to-mid range bitrates with respect to subjective visual quality of 4-D LF images. In case of 5-D LF video, we observe superior decorrelation and coding performance with coding gains of a factor of 4x in bitrate for the same quality. At least equally important is the fact that our method inherently has desired functionality for LF rendering which is lacking in other state-of-the-art techniques: (1) full zero-delay random access, (2) light-weight pixel-parallel view reconstruction, and (3) intrinsic view interpolation and super-resolution
Modeling groundwater levels in California's Central Valley by hierarchical Gaussian process and neural network regression
Modeling groundwater levels continuously across California's Central Valley
(CV) hydrological system is challenging due to low-quality well data which is
sparsely and noisily sampled across time and space. A novel machine learning
method is proposed for modeling groundwater levels by learning from a 3D
lithological texture model of the CV aquifer. The proposed formulation performs
multivariate regression by combining Gaussian processes (GP) and deep neural
networks (DNN). Proposed hierarchical modeling approach constitutes training
the DNN to learn a lithologically informed latent space where non-parametric
regression with GP is performed. The methodology is applied for modeling
groundwater levels across the CV during 2015 - 2020. We demonstrate the
efficacy of GP-DNN regression for modeling non-stationary features in the well
data with fast and reliable uncertainty quantification. Our results indicate
that the 2017 and 2019 wet years in California were largely ineffective in
replenishing the groundwater loss caused during previous drought years.Comment: Submitted to Water Resources Researc
CARNet:Compression Artifact Reduction for Point Cloud Attribute
A learning-based adaptive loop filter is developed for the Geometry-based
Point Cloud Compression (G-PCC) standard to reduce attribute compression
artifacts. The proposed method first generates multiple Most-Probable Sample
Offsets (MPSOs) as potential compression distortion approximations, and then
linearly weights them for artifact mitigation. As such, we drive the filtered
reconstruction as close to the uncompressed PCA as possible. To this end, we
devise a Compression Artifact Reduction Network (CARNet) which consists of two
consecutive processing phases: MPSOs derivation and MPSOs combination. The
MPSOs derivation uses a two-stream network to model local neighborhood
variations from direct spatial embedding and frequency-dependent embedding,
where sparse convolutions are utilized to best aggregate information from
sparsely and irregularly distributed points. The MPSOs combination is guided by
the least square error metric to derive weighting coefficients on the fly to
further capture content dynamics of input PCAs. The CARNet is implemented as an
in-loop filtering tool of the GPCC, where those linear weighting coefficients
are encapsulated into the bitstream with negligible bit rate overhead.
Experimental results demonstrate significant improvement over the latest GPCC
both subjectively and objectively.Comment: 13pages, 8figure
NeRF-Enhanced Outpainting for Faithful Field-of-View Extrapolation
In various applications, such as robotic navigation and remote visual
assistance, expanding the field of view (FOV) of the camera proves beneficial
for enhancing environmental perception. Unlike image outpainting techniques
aimed solely at generating aesthetically pleasing visuals, these applications
demand an extended view that faithfully represents the scene. To achieve this,
we formulate a new problem of faithful FOV extrapolation that utilizes a set of
pre-captured images as prior knowledge of the scene. To address this problem,
we present a simple yet effective solution called NeRF-Enhanced Outpainting
(NEO) that uses extended-FOV images generated through NeRF to train a
scene-specific image outpainting model. To assess the performance of NEO, we
conduct comprehensive evaluations on three photorealistic datasets and one
real-world dataset. Extensive experiments on the benchmark datasets showcase
the robustness and potential of our method in addressing this challenge. We
believe our work lays a strong foundation for future exploration within the
research community
- …