101 research outputs found
What's in a Name? Beyond Class Indices for Image Recognition
Existing machine learning models demonstrate excellent performance in image
object recognition after training on a large-scale dataset under full
supervision. However, these models only learn to map an image to a predefined
class index, without revealing the actual semantic meaning of the object in the
image. In contrast, vision-language models like CLIP are able to assign
semantic class names to unseen objects in a `zero-shot' manner, although they
still rely on a predefined set of candidate names at test time. In this paper,
we reconsider the recognition problem and task a vision-language model to
assign class names to images given only a large and essentially unconstrained
vocabulary of categories as prior information. We use non-parametric methods to
establish relationships between images which allow the model to automatically
narrow down the set of possible candidate names. Specifically, we propose
iteratively clustering the data and voting on class names within them, showing
that this enables a roughly 50\% improvement over the baseline on ImageNet.
Furthermore, we tackle this problem both in unsupervised and partially
supervised settings, as well as with a coarse-grained and fine-grained search
space as the unconstrained dictionary
Image Deblurring According to Facially Recognized Locations Within the Image
This publication describes techniques for image deblurring according to a facially recognized locations within the image. An algorithm may use facial detection and recognition to selectively sharpen aspects of faces within an image and the surrounding area associated with the facial detection. In one or more aspects, the selectivity of sharpening improves the computational load and other aspects of image provision to improve overall computer function, power consumption, and user experience. Individual faces within the image may be cropped or thumbnailed, providing portions of the image that include the faces. Counterpart images associated with the individual faces may be found within a database having a repository of sharp features associated with the counterpart images. As such, the features may be integrated with the blurred faces of the original image to sharpen an image output
Towards Authentic Face Restoration with Iterative Diffusion Models and Beyond
An authentic face restoration system is becoming increasingly demanding in
many computer vision applications, e.g., image enhancement, video
communication, and taking portrait. Most of the advanced face restoration
models can recover high-quality faces from low-quality ones but usually fail to
faithfully generate realistic and high-frequency details that are favored by
users. To achieve authentic restoration, we propose , an
teratively learned face restoration system based on denoising
iffusion odels (DDMs). We define the criterion of an
authentic face restoration system, and argue that denoising diffusion models
are naturally endowed with this property from two aspects: intrinsic iterative
refinement and extrinsic iterative enhancement. Intrinsic learning can preserve
the content well and gradually refine the high-quality details, while extrinsic
enhancement helps clean the data and improve the restoration task one step
further. We demonstrate superior performance on blind face restoration tasks.
Beyond restoration, we find the authentically cleaned data by the proposed
restoration system is also helpful to image generation tasks in terms of
training stabilization and sample quality. Without modifying the models, we
achieve better quality than state-of-the-art on FFHQ and ImageNet generation
using either GANs or diffusion models.Comment: ICCV 202
Boosting Image-based Mutual Gaze Detection using Pseudo 3D Gaze
Mutual gaze detection, i.e., predicting whether or not two people are looking
at each other, plays an important role in understanding human interactions. In
this work, we focus on the task of image-based mutual gaze detection, and
propose a simple and effective approach to boost the performance by using an
auxiliary 3D gaze estimation task during the training phase. We achieve the
performance boost without additional labeling cost by training the 3D gaze
estimation branch using pseudo 3D gaze labels deduced from mutual gaze labels.
By sharing the head image encoder between the 3D gaze estimation and the mutual
gaze detection branches, we achieve better head features than learned by
training the mutual gaze detection branch alone. Experimental results on three
image datasets show that the proposed approach improves the detection
performance significantly without additional annotations. This work also
introduces a new image dataset that consists of 33.1K pairs of humans annotated
with mutual gaze labels in 29.2K images
Ranking Neural Checkpoints
This paper is concerned with ranking many pre-trained deep neural networks
(DNNs), called checkpoints, for the transfer learning to a downstream task.
Thanks to the broad use of DNNs, we may easily collect hundreds of checkpoints
from various sources. Which of them transfers the best to our downstream task
of interest? Striving to answer this question thoroughly, we establish a neural
checkpoint ranking benchmark (NeuCRaB) and study some intuitive ranking
measures. These measures are generic, applying to the checkpoints of different
output types without knowing how the checkpoints are pre-trained on which
dataset. They also incur low computation cost, making them practically
meaningful. Our results suggest that the linear separability of the features
extracted by the checkpoints is a strong indicator of transferability. We also
arrive at a new ranking measure, NLEEP, which gives rise to the best
performance in the experiments.Comment: Accepted to CVPR 202
Light effects on seedling growth in simulated forest canopy gaps vary across species from different successional stages
Tropical forests continue to suffer from various kinds of disturbances in the
Anthropocene. An immediate impact of disturbances on forest ecosystems is the
creation of numerous large and small canopy gaps, which dramatically affect forest structure and function. Yet, we know little about the effect of canopy gaps on forest successional trajectory. More specifically, the responses of seedlings from different successional stages to increased light intensity under large and small canopy gaps in understory remain unclear. In this study, dominant tree seedlings from early-, mid-, and late-successional stages were selected, respectively from a tropical montane forest in Hainan Island, China to study their growth rate, biomass and traits. Our results showed that the light condition under small canopy gaps (SG, 10–15% of full sunlight) and large canopy gaps (LG, 40–50% of full sunlight) induced greater increment of relative growth rates for seedlings from early- and mid-successional stages relative to that in late-successional stage. Both SG and LG also significantly increased photosynthesis rate, leaf area (LA), light saturation point (LSP), root mass ratio (RMR) and root: shoot ratio, but decreased specific leaf area (SLA) of seedlings across successional stages. Tree seedlings from the earlysuccessional stage displayed the greatest decrease in leaf mass ratio, increase in LA, LSP, and RMR, in comparison to those from mid- and late- successional stages. Light condition and SLA were the most important factors for seedlings’ relative growth rate across successional stages. SLA connected the interaction between the light condition and successional stage on seedlings’ growth, thereby jointly explaining the 93% variation of seedlings’ growth, combining with area-based light saturated rate of CO2 assimilation. Our study highlights the distinct effect of disturbance-induced canopy gaps on seedling regeneration in the understory in tropical forest due to the variation of light intensity. We suspect that the seedlings from late-successional stage will recover relatively slow after disturbances causing canopy losses, which can have detrimental impacts on structure feature an
Federated Learning of Shareable Bases for Personalization-Friendly Image Classification
Personalized federated learning (PFL) aims to harness the collective wisdom
of clients' data while building personalized models tailored to individual
clients' data distributions. Existing works offer personalization primarily to
clients who participate in the FL process, making it hard to encompass new
clients who were absent or newly show up. In this paper, we propose FedBasis, a
novel PFL framework to tackle such a deficiency. FedBasis learns a set of few
shareable ``basis'' models, which can be linearly combined to form personalized
models for clients. Specifically for a new client, only a small set of
combination coefficients, not the model weights, needs to be learned. This
notion makes FedBasis more parameter-efficient, robust, and accurate than
competitive PFL baselines, especially in the low data regime, without
increasing the inference cost. To demonstrate the effectiveness and
applicability of FedBasis, we also present a more practical PFL testbed for
image classification, featuring larger data discrepancies across clients in
both the image and label spaces as well as more faithful training and test
splits.Comment: Preprin
- …