117 research outputs found

    ClipCrop: Conditioned Cropping Driven by Vision-Language Model

    Full text link
    Image cropping has progressed tremendously under the data-driven paradigm. However, current approaches do not account for the intentions of the user, which is an issue especially when the composition of the input image is complex. Moreover, labeling of cropping data is costly and hence the amount of data is limited, leading to poor generalization performance of current algorithms in the wild. In this work, we take advantage of vision-language models as a foundation for creating robust and user-intentional cropping algorithms. By adapting a transformer decoder with a pre-trained CLIP-based detection model, OWL-ViT, we develop a method to perform cropping with a text or image query that reflects the user's intention as guidance. In addition, our pipeline design allows the model to learn text-conditioned aesthetic cropping with a small cropping dataset, while inheriting the open-vocabulary ability acquired from millions of text-image pairs. We validate our model through extensive experiments on existing datasets as well as a new cropping test set we compiled that is characterized by content ambiguity

    Human Performance Modeling and Rendering via Neural Animated Mesh

    Full text link
    We have recently seen tremendous progress in the neural advances for photo-real human modeling and rendering. However, it's still challenging to integrate them into an existing mesh-based pipeline for downstream applications. In this paper, we present a comprehensive neural approach for high-quality reconstruction, compression, and rendering of human performances from dense multi-view videos. Our core intuition is to bridge the traditional animated mesh workflow with a new class of highly efficient neural techniques. We first introduce a neural surface reconstructor for high-quality surface generation in minutes. It marries the implicit volumetric rendering of the truncated signed distance field (TSDF) with multi-resolution hash encoding. We further propose a hybrid neural tracker to generate animated meshes, which combines explicit non-rigid tracking with implicit dynamic deformation in a self-supervised framework. The former provides the coarse warping back into the canonical space, while the latter implicit one further predicts the displacements using the 4D hash encoding as in our reconstructor. Then, we discuss the rendering schemes using the obtained animated meshes, ranging from dynamic texturing to lumigraph rendering under various bandwidth settings. To strike an intricate balance between quality and bandwidth, we propose a hierarchical solution by first rendering 6 virtual views covering the performer and then conducting occlusion-aware neural texture blending. We demonstrate the efficacy of our approach in a variety of mesh-based applications and photo-realistic free-view experiences on various platforms, i.e., inserting virtual human performances into real environments through mobile AR or immersively watching talent shows with VR headsets.Comment: 18 pages, 17 figure

    3D snapshot: Invertible embedding of 3D neural representations in a single image

    Get PDF
    3D neural rendering enables photo-realistic reconstruction of a specific scene by encoding discontinuous inputs into a neural representation. Despite the remarkable rendering results, the storage of network parameters is not transmission-friendly and not extendable to metaverse applications. In this paper, we propose an invertible neural rendering approach that enables generating an interactive 3D model from a single image (i.e., 3D Snapshot). Our idea is to distill a pre-trained neural rendering model (e.g., NeRF) into a visualizable image form that can then be easily inverted back to a neural network. To this end, we first present a neural image distillation method to optimize three neural planes for representing the original neural rendering model. However, this representation is noisy and visually meaningless. We thus propose a dynamic invertible neural network to embed this noisy representation into a plausible image representation of the scene. We demonstrate promising reconstruction quality quantitatively and qualitatively, by comparing to the original neural rendering model, as well as video-based invertible methods. On the other hand, our method can store dozens of NeRFs with a compact restoration network (5MB), and embedding each 3D scene takes up only 160KB of storage. More importantly, our approach is the first solution that allows embedding a neural rendering model into image representations, which enables applications like creating an interactive 3D model from a printed image in the metaverse

    Gradient differences of immunotherapy efficacy in metastatic melanoma related to sunlight exposure pattern: A population-based study

    Get PDF
    BackgroundImmune checkpoint inhibitors (ICIs) have revolutionized metastatic melanoma (MM) treatment in just a few years. Ultraviolet (UV) in sunlight is the most significant environmental cause of melanoma, which is considered to be the main reason for tumor mutation burden (TMB) increase in melanoma. High TMB usually predicts that PD-1 inhibitors are effective. The sunlight exposure pattern of MM might be a clinical feature that matches TMB. The relationship between sunlight exposure patterns and immunotherapy response in MM is unclear. This study aims to investigate the correlation between sunlight exposure patterns and immunotherapy response in MM and establish nomograms that predict 3- and 5-year overall survival (OS) rate.MethodsWe searched the Surveillance, Epidemiology, and End Results (SEER) database and enrolled MM cases from 2005-2016. According to the advent of ICIs in 2011, the era was divided into the non-ICIs era (2005-2010) and the ICIs era (2011-2016). Patients were divided into three cohorts according to the primary site sunlight exposure patterns: head and neck in the first cohort, trunk arms and legs in the second cohort, and acral sites in the third cohort. We compared survival differences for each cohort between the two eras, performed stratified analysis, established nomograms for predicting 3- and 5-year OS rate, and performed internal validation.ResultsComparing the survival difference between the ICIs and non-ICIs era, head and neck melanoma showed the greatest improvement in survival, with 3- and 5-year OS rate increasing by 10.2% and 9.1%, respectively (P=0.00011). In trunk arms and legs melanoma, the 3- and 5-year OS rate increased by 4.6% and 3.9%, respectively (P<0.0001). There is no improvement in survival in acral melanoma (AM) between the two eras (P=0.78). The receiver operating characteristic (ROC) curve, area under the ROC curve (AUC) and calibration graphs show good discrimination and accuracy of nomograms. Decision curve analysis (DCA) suggests good clinical utility of nomograms.ConclusionsBased on the classification of sunlight exposure patterns, there is a gradient difference in immunotherapy efficacy for MM. The degree of sunlight exposure is positively correlated with immunotherapy response. The nomograms are sufficiently accurate to predict 3- and 5-year OS rate for MM, allowing for individualized clinical decisions for future clinical work
    corecore