4,637 research outputs found
Sparsity-regularized coded ptychography for robust and efficient lensless microscopy on a chip
In ptychographic imaging, the trade-off between the number of acquisitions
and the resultant imaging quality presents a complex optimization problem.
Increasing the number of acquisitions typically yields reconstructions with
higher spatial resolution and finer details. Conversely, a reduction in
measurement frequency often compromises the quality of the reconstructed
images, manifesting as increased noise and coarser details. To address this
challenge, we employ sparsity priors to reformulate the ptychographic
reconstruction task as a total variation regularized optimization problem. We
introduce a new computational framework, termed the ptychographic proximal
total-variation (PPTV) solver, designed to integrate into existing ptychography
settings without necessitating hardware modifications. Through comprehensive
numerical simulations, we validate that PPTV-driven coded ptychography is
capable of producing highly accurate reconstructions with a minimal set of
eight intensity measurements. Convergence analysis further substantiates the
robustness, stability, and computational feasibility of the proposed PPTV
algorithm. Experimental results obtained from optical setups unequivocally
demonstrate that the PPTV algorithm facilitates high-throughput,
high-resolution imaging while significantly reducing the measurement burden.
These findings indicate that the PPTV algorithm has the potential to
substantially mitigate the resource-intensive requirements traditionally
associated with high-quality ptychographic imaging, thereby offering a pathway
toward the development of more compact and efficient ptychographic microscopy
systems.Comment: 15 pages, 7 figure
A machine learning driven solution to the problem of perceptual video quality metrics
The advent of high-speed internet connections, advanced video coding algorithms, and consumer-grade computers with high computational capabilities has led videostreaming-over-the-internet to make up the majority of network traffic. This effect has led to a continuously expanding video streaming industry that seeks to offer enhanced quality-of-experience (QoE) to its users at the lowest cost possible. Video streaming services are now able to adapt to the hardware and network restrictions that each user faces and thus provide the best experience possible under those restrictions. The most common way to adapt to network bandwidth restrictions is to offer a video stream at the highest possible visual quality, for the maximum achievable bitrate under the network connection in use. This is achieved by storing various pre-encoded versions of the video content with different bitrate and visual quality settings. Visual quality is measured by means of objective quality metrics, such as the Mean Squared Error (MSE), Peak Signal-to-Noise Ratio (PSNR), Structural Similarity Index (SSIM), Visual Information Fidelity (VIF), and others, which can be easily computed analytically. Nevertheless, it is widely accepted that although these metrics provide an accurate estimate of the statistical quality degradation, they do not reflect the viewer’s perception of visual quality accurately. As a result, the acquisition of user ratings in the form of Mean Opinion Scores (MOS) remains the most accurate depiction of human-perceived video quality, albeit very costly and time consuming, and thus cannot be practically employed by video streaming providers that have hundreds or thousands of videos in their catalogues. A recent very promising approach for addressing this limitation is the use of machine learning techniques in order to train models that represent human video quality perception more accurately. To this end, regression techniques are used in order to map objective quality metrics to human video quality ratings, acquired for a large number of diverse video sequences. Results have been very promising, with approaches like the Video Multimethod Assessment Fusion (VMAF) metric achieving higher correlations to useracquired MOS ratings compared to traditional widely used objective quality metrics
LiveSketch: Query Perturbations for Guided Sketch-based Visual Search
LiveSketch is a novel algorithm for searching large image collections using
hand-sketched queries. LiveSketch tackles the inherent ambiguity of sketch
search by creating visual suggestions that augment the query as it is drawn,
making query specification an iterative rather than one-shot process that helps
disambiguate users' search intent. Our technical contributions are: a triplet
convnet architecture that incorporates an RNN based variational autoencoder to
search for images using vector (stroke-based) queries; real-time clustering to
identify likely search intents (and so, targets within the search embedding);
and the use of backpropagation from those targets to perturb the input stroke
sequence, so suggesting alterations to the query in order to guide the search.
We show improvements in accuracy and time-to-task over contemporary baselines
using a 67M image corpus.Comment: Accepted to CVPR 201
Communication Beyond Transmitting Bits: Semantics-Guided Source and Channel Coding
Classical communication paradigms focus on accurately transmitting bits over
a noisy channel, and Shannon theory provides a fundamental theoretical limit on
the rate of reliable communications. In this approach, bits are treated
equally, and the communication system is oblivious to what meaning these bits
convey or how they would be used. Future communications towards intelligence
and conciseness will predictably play a dominant role, and the proliferation of
connected intelligent agents requires a radical rethinking of coded
transmission paradigm to support the new communication morphology on the
horizon. The recent concept of "semantic communications" offers a promising
research direction. Injecting semantic guidance into the coded transmission
design to achieve semantics-aware communications shows great potential for
further breakthrough in effectiveness and reliability. This article sheds light
on semantics-guided source and channel coding as a transmission paradigm of
semantic communications, which exploits both data semantics diversity and
wireless channel diversity together to boost the whole system performance. We
present the general system architecture and key techniques, and indicate some
open issues on this topic.Comment: IEEE Wireless Communications, text overlap with arXiv:2112.0309
S^2-Transformer for Mask-Aware Hyperspectral Image Reconstruction
The technology of hyperspectral imaging (HSI) records the visual information
upon long-range-distributed spectral wavelengths. A representative
hyperspectral image acquisition procedure conducts a 3D-to-2D encoding by the
coded aperture snapshot spectral imager (CASSI) and requires a software decoder
for the 3D signal reconstruction. By observing this physical encoding
procedure, two major challenges stand in the way of a high-fidelity
reconstruction. (i) To obtain 2D measurements, CASSI dislocates multiple
channels by disperser-titling and squeezes them onto the same spatial region,
yielding an entangled data loss. (ii) The physical coded aperture leads to a
masked data loss by selectively blocking the pixel-wise light exposure. To
tackle these challenges, we propose a spatial-spectral (S^2-) Transformer
network with a mask-aware learning strategy. First, we simultaneously leverage
spatial and spectral attention modeling to disentangle the blended information
in the 2D measurement along both two dimensions. A series of Transformer
structures are systematically designed to fully investigate the spatial and
spectral informative properties of the hyperspectral data. Second, the masked
pixels will induce higher prediction difficulty and should be treated
differently from unmasked ones. Thereby, we adaptively prioritize the loss
penalty attributing to the mask structure by inferring the pixel-wise
reconstruction difficulty upon the mask-encoded prediction. We theoretically
discusses the distinct convergence tendencies between masked/unmasked regions
of the proposed learning strategy. Extensive experiments demonstrates that the
proposed method achieves superior reconstruction performance. Additionally, we
empirically elaborate the behaviour of spatial and spectral attentions under
the proposed architecture, and comprehensively examine the impact of the
mask-aware learning.Comment: 11 pages, 16 figures, 6 tables, Code:
https://github.com/Jiamian-Wang/S2-transformer-HS
A vector quantization approach to universal noiseless coding and quantization
A two-stage code is a block code in which each block of data is coded in two stages: the first stage codes the identity of a block code among a collection of codes, and the second stage codes the data using the identified code. The collection of codes may be noiseless codes, fixed-rate quantizers, or variable-rate quantizers. We take a vector quantization approach to two-stage coding, in which the first stage code can be regarded as a vector quantizer that “quantizes” the input data of length n to one of a fixed collection of block codes. We apply the generalized Lloyd algorithm to the first-stage quantizer, using induced measures of rate and distortion, to design locally optimal two-stage codes. On a source of medical images, two-stage variable-rate vector quantizers designed in this way outperform standard (one-stage) fixed-rate vector quantizers by over 9 dB. The tail of the operational distortion-rate function of the first-stage quantizer determines the optimal rate of convergence of the redundancy of a universal sequence of two-stage codes. We show that there exist two-stage universal noiseless codes, fixed-rate quantizers, and variable-rate quantizers whose per-letter rate and distortion redundancies converge to zero as (k/2)n -1 log n, when the universe of sources has finite dimension k. This extends the achievability part of Rissanen's theorem from universal noiseless codes to universal quantizers. Further, we show that the redundancies converge as O(n-1) when the universe of sources is countable, and as O(n-1+ϵ) when the universe of sources is infinite-dimensional, under appropriate conditions
- …