23 research outputs found

    Learning to Substitute Ingredients in Recipes

    Full text link
    Recipe personalization through ingredient substitution has the potential to help people meet their dietary needs and preferences, avoid potential allergens, and ease culinary exploration in everyone's kitchen. To address ingredient substitution, we build a benchmark, composed of a dataset of substitution pairs with standardized splits, evaluation metrics, and baselines. We further introduce Graph-based Ingredient Substitution Module (GISMo), a novel model that leverages the context of a recipe as well as generic ingredient relational information encoded within a graph to rank plausible substitutions. We show through comprehensive experimental validation that GISMo surpasses the best performing baseline by a large margin in terms of mean reciprocal rank. Finally, we highlight the benefits of GISMo by integrating it in an improved image-to-recipe generation pipeline, enabling recipe personalization through user intervention. Quantitative and qualitative results show the efficacy of our proposed system, paving the road towards truly personalized cooking and tasting experiences

    Sistema de reconeixement de cares basat en múltiples vistes

    Get PDF
    En aquest projecte es proposa un algorisme de detecció de pell que introdueix el veïnatge a l'hora de classificar píxels. Partim d'un espai de color invariant après a partir de múltiples vistes i introduïm la influència del veïnatge mitjançant camps aleatoris de Markov. A partir dels experiments realitzats podem concloure que la inclusió del veïnatge en el procés de classificació de píxels millora significativament els resultats de detecció.En este proyecto se propone un algoritmo de detección de piel que introduce la vecindad para clasificar píxeles. Usamos un espacio de color invariante aprendido a partir de múltiples vistas e introducimos la influencia de la vecindad mediante campos aleatorios de Markov. A partir de los experimentos realizados podemos concluir que la inclusión de la vecindad en el proceso de clasificación de píxeles mejora significativamente los resultados de detección.In this project, we propose a skin detection algorithm which introduces a neighborhood system to classify pixels. We use a color invariant model learnt from different views and we add the influence of the neighborhood using Markov random fields. From the experiments it is concluded that the inclusion of a neighborhood system in the classification process improves significantly the detection results

    DIG In: Evaluating Disparities in Image Generations with Indicators for Geographic Diversity

    Full text link
    The unprecedented photorealistic results achieved by recent text-to-image generative systems and their increasing use as plug-and-play content creation solutions make it crucial to understand their potential biases. In this work, we introduce three indicators to evaluate the realism, diversity and prompt-generation consistency of text-to-image generative systems when prompted to generate objects from across the world. Our indicators complement qualitative analysis of the broader impact of such systems by enabling automatic and efficient benchmarking of geographic disparities, an important step towards building responsible visual content creation systems. We use our proposed indicators to analyze potential geographic biases in state-of-the-art visual content creation systems and find that: (1) models have less realism and diversity of generations when prompting for Africa and West Asia than Europe, (2) prompting with geographic information comes at a cost to prompt-consistency and diversity of generated images, and (3) models exhibit more region-level disparities for some objects than others. Perhaps most interestingly, our indicators suggest that progress in image generation quality has come at the cost of real-world geographic representation. Our comprehensive evaluation constitutes a crucial step towards ensuring a positive experience of visual content creation for everyone

    Controllable Image Generation via Collage Representations

    Full text link
    Recent advances in conditional generative image models have enabled impressive results. On the one hand, text-based conditional models have achieved remarkable generation quality, by leveraging large-scale datasets of image-text pairs. To enable fine-grained controllability, however, text-based models require long prompts, whose details may be ignored by the model. On the other hand, layout-based conditional models have also witnessed significant advances. These models rely on bounding boxes or segmentation maps for precise spatial conditioning in combination with coarse semantic labels. The semantic labels, however, cannot be used to express detailed appearance characteristics. In this paper, we approach fine-grained scene controllability through image collages which allow a rich visual description of the desired scene as well as the appearance and location of the objects therein, without the need of class nor attribute labels. We introduce "mixing and matching scenes" (M&Ms), an approach that consists of an adversarially trained generative image model which is conditioned on appearance features and spatial positions of the different elements in a collage, and integrates these into a coherent image. We train our model on the OpenImages (OI) dataset and evaluate it on collages derived from OI and MS-COCO datasets. Our experiments on the OI dataset show that M&Ms outperforms baselines in terms of fine-grained scene controllability while being very competitive in terms of image quality and sample diversity. On the MS-COCO dataset, we highlight the generalization ability of our model by outperforming DALL-E in terms of the zero-shot FID metric, despite using two magnitudes fewer parameters and data. Collage based generative models have the potential to advance content creation in an efficient and effective way as they are intuitive to use and yield high quality generations

    Graph Inductive Biases in Transformers without Message Passing

    Full text link
    Transformers for graph data are increasingly widely studied and successful in numerous learning tasks. Graph inductive biases are crucial for Graph Transformers, and previous works incorporate them using message-passing modules and/or positional encodings. However, Graph Transformers that use message-passing inherit known issues of message-passing, and differ significantly from Transformers used in other domains, thus making transfer of research advances more difficult. On the other hand, Graph Transformers without message-passing often perform poorly on smaller datasets, where inductive biases are more crucial. To bridge this gap, we propose the Graph Inductive bias Transformer (GRIT) -- a new Graph Transformer that incorporates graph inductive biases without using message passing. GRIT is based on several architectural changes that are each theoretically and empirically justified, including: learned relative positional encodings initialized with random walk probabilities, a flexible attention mechanism that updates node and node-pair representations, and injection of degree information in each layer. We prove that GRIT is expressive -- it can express shortest path distances and various graph propagation matrices. GRIT achieves state-of-the-art empirical performance across a variety of graph datasets, thus showing the power that Graph Transformers without message-passing can deliver.Comment: Published as a conference paper at ICML 2023; 17 page

    Sistema de reconeixement de cares basat en múltiples vistes

    No full text
    En aquest projecte es proposa un algorisme de detecció de pell que introdueix el veïnatge a l’hora de classificar píxels. Partim d’un espai de color invariant après a partir de múltiples vistes i introduïm la influència del veïnatge mitjançant camps aleatoris de Markov. A partir dels experiments realitzats podem concloure que la inclusió del veïnatge en el procés de classificació de píxels millora significativament els resultats de detecció.En este proyecto se propone un algoritmo de detección de piel que introduce la vecindad para clasificar píxeles. Usamos un espacio de color invariante aprendido a partir de múltiples vistas e introducimos la influencia de la vecindad mediante campos aleatorios de Markov. A partir de los experimentos realizados podemos concluir que la inclusión de la vecindad en el proceso de clasificación de píxeles mejora significativamente los resultados de detección.In this project, we propose a skin detection algorithm which introduces a neighborhood system to classify pixels. We use a color invariant model learnt from different views and we add the influence of the neighborhood using Markov random fields. From the experiments it is concluded that the inclusion of a neighborhood system in the classification process improves significantly the detection results
    corecore