46 research outputs found
OpenFashionCLIP: Vision-and-Language Contrastive Learning with Open-Source Fashion Data
The inexorable growth of online shopping and e-commerce demands scalable and
robust machine learning-based solutions to accommodate customer requirements.
In the context of automatic tagging classification and multimodal retrieval,
prior works either defined a low generalizable supervised learning approach or
more reusable CLIP-based techniques while, however, training on closed source
data. In this work, we propose OpenFashionCLIP, a vision-and-language
contrastive learning method that only adopts open-source fashion data stemming
from diverse domains, and characterized by varying degrees of specificity. Our
approach is extensively validated across several tasks and benchmarks, and
experimental results highlight a significant out-of-domain generalization
capability and consistent improvements over state-of-the-art methods both in
terms of accuracy and recall. Source code and trained models are publicly
available at: https://github.com/aimagelab/open-fashion-clip.Comment: International Conference on Image Analysis and Processing (ICIAP)
202
OpenFashionCLIP: Vision-and-Language Contrastive Learning with Open-Source Fashion Data
The inexorable growth of online shopping and e-commerce demands scalable and robust machine learning-based solutions to accommodate customer requirements. In the context of automatic tagging classification and multimodal retrieval, prior works either defined a low generalizable supervised learning approach or more reusable CLIP-based techniques while, however, training on closed source data. In this work, we propose OpenFashionCLIP, a vision-and-language contrastive learning method that only adopts open-source fashion data stemming from diverse domains, and characterized by varying degrees of specificity. Our approach is extensively validated across several tasks and benchmarks, and experimental results highlight a significant out-of-domain generalization capability and consistent improvements over state-of-the-art methods both in terms of accuracy and recall. Source code and trained models are publicly available at: https://github.com/aimagelab/open-fashion-clip
Multimodal Garment Designer: Human-Centric Latent Diffusion Models for Fashion Image Editing
Fashion illustration is used by designers to communicate their vision and to
bring the design idea from conceptualization to realization, showing how
clothes interact with the human body. In this context, computer vision can thus
be used to improve the fashion design process. Differently from previous works
that mainly focused on the virtual try-on of garments, we propose the task of
multimodal-conditioned fashion image editing, guiding the generation of
human-centric fashion images by following multimodal prompts, such as text,
human body poses, and garment sketches. We tackle this problem by proposing a
new architecture based on latent diffusion models, an approach that has not
been used before in the fashion domain. Given the lack of existing datasets
suitable for the task, we also extend two existing fashion datasets, namely
Dress Code and VITON-HD, with multimodal annotations collected in a
semi-automatic manner. Experimental results on these new datasets demonstrate
the effectiveness of our proposal, both in terms of realism and coherence with
the given multimodal inputs. Source code and collected multimodal annotations
will be publicly released at:
https://github.com/aimagelab/multimodal-garment-designer
Spectral signatures from super-Earths, warm and hot-Neptunes
ESA's and NASA's planet characterization missions, will allow us to explore the diversity
of planets around stars of different spectral type, and will expand the existing field of
comparative planetology beyond our Solar System. In particular, terrestrial planets
greater than one Earth mass are not represented in our Solar System, but may occur in
others (Beaulieu et al., 2006; Rivera et al. 2005). The next generation of space telescopes,
the James Webb Space Telescope (2013), will have the capability of acquiring
transmission and emission spectra in the infrared of these extrasolar worlds. Further into
the future, the direct imaging of exoplanets, both in the optical and infrared, will extend
our understanding to extrasolar bodies orbiting few Astronomical Units from their parent
star and expand our knowledge to smaller-size objects
LaDI-VTON: Latent Diffusion Textual-Inversion Enhanced Virtual Try-On
The rapidly evolving fields of e-commerce and metaverse continue to seek
innovative approaches to enhance the consumer experience. At the same time,
recent advancements in the development of diffusion models have enabled
generative networks to create remarkably realistic images. In this context,
image-based virtual try-on, which consists in generating a novel image of a
target model wearing a given in-shop garment, has yet to capitalize on the
potential of these powerful generative solutions. This work introduces
LaDI-VTON, the first Latent Diffusion textual Inversion-enhanced model for the
Virtual Try-ON task. The proposed architecture relies on a latent diffusion
model extended with a novel additional autoencoder module that exploits
learnable skip connections to enhance the generation process preserving the
model's characteristics. To effectively maintain the texture and details of the
in-shop garment, we propose a textual inversion component that can map the
visual features of the garment to the CLIP token embedding space and thus
generate a set of pseudo-word token embeddings capable of conditioning the
generation process. Experimental results on Dress Code and VITON-HD datasets
demonstrate that our approach outperforms the competitors by a consistent
margin, achieving a significant milestone for the task. Source code and trained
models will be publicly released at: https://github.com/miccunifi/ladi-vton
M\uf6ssbauer spectroscopy of a monolayer of single molecule magnets
The use of single molecule magnets (SMMs) as cornerstone elements in spintronics and quantum computing applications demands that magnetic bistability is retained when molecules are interfaced with solid conducting surfaces. Here, we employ synchrotron M\uf6ssbauer spectroscopy to investigate a monolayer of a tetrairon(III) (Fe4) SMM chemically grafted on a gold substrate. At low temperature and zero magnetic field, we observe the magnetic pattern of the Fe4 molecule, indicating slow spin fluctuations compared to the M\uf6ssbauer timescale. Significant structural deformations of the magnetic core, induced by the interaction with the substrate, as predicted by ab initio molecular dynamics, are also observed. However, the effects of the modifications occurring at the individual iron sites partially compensate each other, so that slow magnetic relaxation is retained on the surface. Interestingly, these deformations escaped detection by conventional synchrotron-based techniques, like X-ray magnetic circular dichroism, thus highlighting the power of synchrotron M\uf6ssbauer spectroscopy for the investigation of hybrid interfaces
Traitement d'images différentielles à haut contraste pour la détection de planètes extrasolaires
This thesis is dedicated to the ground-based exoplanet detection and it has been developed in the context of the SPHERE project of ESO; the installation of this instrument is planned for 2011 on the VLT. To directly deleted exoplanets, SPHERE will combine an extreme adaptive optics System, a coronagraph and imagery at two or more wavelengths. To distinguish between the planet signal and the speckles due to the aberrations, we need to elaborate a sophisticated method of data processing which exploits also the temporal information. I developed a method, called ANDROMEDA, which allows to detect exoplanets and to estimate their flux. It uses image differences which aim at eliminating the effects of unknown quasi-static aberrations and of the variability of the quality oi adaptive optics correction. The algorithm couples images taken at different instants which present a field rotation between them, and possibly images taken at two different wavelengths. I showed that ANDROMEDA meets the requirements of SPHERE for the detections of planets at a contrast o 106 at a separation of 0. 5" from the star, and it can even do better, reaching a separation of 0. 2" for the same contrast. Then I carried out a parametric study of the method to analyze the robustness of the flux estimation towards different sources of error. Finally, I tested ANDROMEDA on experimental data provided by the NACO instrument; this analysis allowed us to understand how to deal with typical problems posed by data reduction and instrumental artifacts.Cette thèse est consacrée à la problématique de la détection des exoplanètes depuis le sol et s'insère dans le cadre du projet SPHERE de PESO ; l'installation de cet instrument est prévue en 2011 sur le VLT. Pour la détection directe des exoplanètes, SPHERE combinera un système d'optique adaptative extrême, un coronographe et l'imagerie à deux ou plusieurs longueurs d'onde. Pour distinguer entre le signal de la planète et les "speckles" dues aux aberrations, il faut élaborer une méthode sophistiquée de traitement d'images qui exploite aussi l'information temporelle. J'ai donc développé une méthode, appelée ANDROMEDA, qui permet de détecter les exoplanètes et d'estimer leur flux. Elle utilise des différences d'images qui ont pour but d'éliminer l'impact des aberrations quasi-statiques inconnues et de la variabilité de la qualité de correction de l'optique adaptative. L'algorithme couple des images prises à des instants différents qui présentent une rotation de champ entre elles, et éventuellement des images prises à différentes longueurs d'onde. J'ai démontré qu'ANDROMEDA répond aux spécifications de SPHERE pour la détection de planètes à un contraste de 106 à une séparation de 0,5" de l'étoile, et elle peut faire encore mieux, en atteignant une séparation de 0,2" pour le même contraste. J'ai ensuite effectué une étude paramétrique pour analyser la robustesse de l'estimation du flux face à différentes sources d'erreurs. Enfin, j'ai testé ANDROMEDA sur de; données expérimentales fournies par l'instrument NACO ; cette analyse a permis de comprendre comment faire face aux problèmes typiquement posés par la réduction des images et la gestion des artefacts instrumentaux