101 research outputs found

    What's in a Name? Beyond Class Indices for Image Recognition

    Full text link
    Existing machine learning models demonstrate excellent performance in image object recognition after training on a large-scale dataset under full supervision. However, these models only learn to map an image to a predefined class index, without revealing the actual semantic meaning of the object in the image. In contrast, vision-language models like CLIP are able to assign semantic class names to unseen objects in a `zero-shot' manner, although they still rely on a predefined set of candidate names at test time. In this paper, we reconsider the recognition problem and task a vision-language model to assign class names to images given only a large and essentially unconstrained vocabulary of categories as prior information. We use non-parametric methods to establish relationships between images which allow the model to automatically narrow down the set of possible candidate names. Specifically, we propose iteratively clustering the data and voting on class names within them, showing that this enables a roughly 50\% improvement over the baseline on ImageNet. Furthermore, we tackle this problem both in unsupervised and partially supervised settings, as well as with a coarse-grained and fine-grained search space as the unconstrained dictionary

    Image Deblurring According to Facially Recognized Locations Within the Image

    Get PDF
    This publication describes techniques for image deblurring according to a facially recognized locations within the image. An algorithm may use facial detection and recognition to selectively sharpen aspects of faces within an image and the surrounding area associated with the facial detection. In one or more aspects, the selectivity of sharpening improves the computational load and other aspects of image provision to improve overall computer function, power consumption, and user experience. Individual faces within the image may be cropped or thumbnailed, providing portions of the image that include the faces. Counterpart images associated with the individual faces may be found within a database having a repository of sharp features associated with the counterpart images. As such, the features may be integrated with the blurred faces of the original image to sharpen an image output

    Towards Authentic Face Restoration with Iterative Diffusion Models and Beyond

    Full text link
    An authentic face restoration system is becoming increasingly demanding in many computer vision applications, e.g., image enhancement, video communication, and taking portrait. Most of the advanced face restoration models can recover high-quality faces from low-quality ones but usually fail to faithfully generate realistic and high-frequency details that are favored by users. To achieve authentic restoration, we propose IDM\textbf{IDM}, an I\textbf{I}teratively learned face restoration system based on denoising D\textbf{D}iffusion M\textbf{M}odels (DDMs). We define the criterion of an authentic face restoration system, and argue that denoising diffusion models are naturally endowed with this property from two aspects: intrinsic iterative refinement and extrinsic iterative enhancement. Intrinsic learning can preserve the content well and gradually refine the high-quality details, while extrinsic enhancement helps clean the data and improve the restoration task one step further. We demonstrate superior performance on blind face restoration tasks. Beyond restoration, we find the authentically cleaned data by the proposed restoration system is also helpful to image generation tasks in terms of training stabilization and sample quality. Without modifying the models, we achieve better quality than state-of-the-art on FFHQ and ImageNet generation using either GANs or diffusion models.Comment: ICCV 202

    Boosting Image-based Mutual Gaze Detection using Pseudo 3D Gaze

    Full text link
    Mutual gaze detection, i.e., predicting whether or not two people are looking at each other, plays an important role in understanding human interactions. In this work, we focus on the task of image-based mutual gaze detection, and propose a simple and effective approach to boost the performance by using an auxiliary 3D gaze estimation task during the training phase. We achieve the performance boost without additional labeling cost by training the 3D gaze estimation branch using pseudo 3D gaze labels deduced from mutual gaze labels. By sharing the head image encoder between the 3D gaze estimation and the mutual gaze detection branches, we achieve better head features than learned by training the mutual gaze detection branch alone. Experimental results on three image datasets show that the proposed approach improves the detection performance significantly without additional annotations. This work also introduces a new image dataset that consists of 33.1K pairs of humans annotated with mutual gaze labels in 29.2K images

    Ranking Neural Checkpoints

    Full text link
    This paper is concerned with ranking many pre-trained deep neural networks (DNNs), called checkpoints, for the transfer learning to a downstream task. Thanks to the broad use of DNNs, we may easily collect hundreds of checkpoints from various sources. Which of them transfers the best to our downstream task of interest? Striving to answer this question thoroughly, we establish a neural checkpoint ranking benchmark (NeuCRaB) and study some intuitive ranking measures. These measures are generic, applying to the checkpoints of different output types without knowing how the checkpoints are pre-trained on which dataset. They also incur low computation cost, making them practically meaningful. Our results suggest that the linear separability of the features extracted by the checkpoints is a strong indicator of transferability. We also arrive at a new ranking measure, NLEEP, which gives rise to the best performance in the experiments.Comment: Accepted to CVPR 202

    Light effects on seedling growth in simulated forest canopy gaps vary across species from different successional stages

    Get PDF
    Tropical forests continue to suffer from various kinds of disturbances in the Anthropocene. An immediate impact of disturbances on forest ecosystems is the creation of numerous large and small canopy gaps, which dramatically affect forest structure and function. Yet, we know little about the effect of canopy gaps on forest successional trajectory. More specifically, the responses of seedlings from different successional stages to increased light intensity under large and small canopy gaps in understory remain unclear. In this study, dominant tree seedlings from early-, mid-, and late-successional stages were selected, respectively from a tropical montane forest in Hainan Island, China to study their growth rate, biomass and traits. Our results showed that the light condition under small canopy gaps (SG, 10–15% of full sunlight) and large canopy gaps (LG, 40–50% of full sunlight) induced greater increment of relative growth rates for seedlings from early- and mid-successional stages relative to that in late-successional stage. Both SG and LG also significantly increased photosynthesis rate, leaf area (LA), light saturation point (LSP), root mass ratio (RMR) and root: shoot ratio, but decreased specific leaf area (SLA) of seedlings across successional stages. Tree seedlings from the earlysuccessional stage displayed the greatest decrease in leaf mass ratio, increase in LA, LSP, and RMR, in comparison to those from mid- and late- successional stages. Light condition and SLA were the most important factors for seedlings’ relative growth rate across successional stages. SLA connected the interaction between the light condition and successional stage on seedlings’ growth, thereby jointly explaining the 93% variation of seedlings’ growth, combining with area-based light saturated rate of CO2 assimilation. Our study highlights the distinct effect of disturbance-induced canopy gaps on seedling regeneration in the understory in tropical forest due to the variation of light intensity. We suspect that the seedlings from late-successional stage will recover relatively slow after disturbances causing canopy losses, which can have detrimental impacts on structure feature an

    Federated Learning of Shareable Bases for Personalization-Friendly Image Classification

    Full text link
    Personalized federated learning (PFL) aims to harness the collective wisdom of clients' data while building personalized models tailored to individual clients' data distributions. Existing works offer personalization primarily to clients who participate in the FL process, making it hard to encompass new clients who were absent or newly show up. In this paper, we propose FedBasis, a novel PFL framework to tackle such a deficiency. FedBasis learns a set of few shareable ``basis'' models, which can be linearly combined to form personalized models for clients. Specifically for a new client, only a small set of combination coefficients, not the model weights, needs to be learned. This notion makes FedBasis more parameter-efficient, robust, and accurate than competitive PFL baselines, especially in the low data regime, without increasing the inference cost. To demonstrate the effectiveness and applicability of FedBasis, we also present a more practical PFL testbed for image classification, featuring larger data discrepancies across clients in both the image and label spaces as well as more faithful training and test splits.Comment: Preprin
    • …
    corecore