2,765 research outputs found
Multi-modal Embedding Fusion-based Recommender
Recommendation systems have lately been popularized globally, with primary
use cases in online interaction systems, with significant focus on e-commerce
platforms. We have developed a machine learning-based recommendation platform,
which can be easily applied to almost any items and/or actions domain. Contrary
to existing recommendation systems, our platform supports multiple types of
interaction data with multiple modalities of metadata natively. This is
achieved through multi-modal fusion of various data representations. We
deployed the platform into multiple e-commerce stores of different kinds, e.g.
food and beverages, shoes, fashion items, telecom operators. Here, we present
our system, its flexibility and performance. We also show benchmark results on
open datasets, that significantly outperform state-of-the-art prior work.Comment: 7 pages, 8 figure
SketchyGAN: Towards Diverse and Realistic Sketch to Image Synthesis
Synthesizing realistic images from human drawn sketches is a challenging
problem in computer graphics and vision. Existing approaches either need exact
edge maps, or rely on retrieval of existing photographs. In this work, we
propose a novel Generative Adversarial Network (GAN) approach that synthesizes
plausible images from 50 categories including motorcycles, horses and couches.
We demonstrate a data augmentation technique for sketches which is fully
automatic, and we show that the augmented data is helpful to our task. We
introduce a new network building block suitable for both the generator and
discriminator which improves the information flow by injecting the input image
at multiple scales. Compared to state-of-the-art image translation methods, our
approach generates more realistic images and achieves significantly higher
Inception Scores.Comment: Accepted to CVPR 201
Image Retrieval with Mixed Initiative and Multimodal Feedback
How would you search for a unique, fashionable shoe that a friend wore and
you want to buy, but you didn't take a picture? Existing approaches propose
interactive image search as a promising venue. However, they either entrust the
user with taking the initiative to provide informative feedback, or give all
control to the system which determines informative questions to ask. Instead,
we propose a mixed-initiative framework where both the user and system can be
active participants, depending on whose initiative will be more beneficial for
obtaining high-quality search results. We develop a reinforcement learning
approach which dynamically decides which of three interaction opportunities to
give to the user: drawing a sketch, providing free-form attribute feedback, or
answering attribute-based questions. By allowing these three options, our
system optimizes both the informativeness and exploration capabilities allowing
faster image retrieval. We outperform three baselines on three datasets and
extensive experimental settings.Comment: In submission to BMVC 201
Content-Based Search for Deep Generative Models
The growing proliferation of customized and pretrained generative models has
made it infeasible for a user to be fully cognizant of every model in
existence. To address this need, we introduce the task of content-based model
search: given a query and a large set of generative models, finding the models
that best match the query. As each generative model produces a distribution of
images, we formulate the search task as an optimization problem to select the
model with the highest probability of generating similar content as the query.
We introduce a formulation to approximate this probability given the query from
different modalities, e.g., image, sketch, and text. Furthermore, we propose a
contrastive learning framework for model retrieval, which learns to adapt
features for various query modalities. We demonstrate that our method
outperforms several baselines on Generative Model Zoo, a new benchmark we
create for the model retrieval task.Comment: Our project page is hosted at
https://generative-intelligence-lab.github.io/modelverse
High-Quality Facial Photo-Sketch Synthesis Using Multi-Adversarial Networks
Synthesizing face sketches from real photos and its inverse have many
applications. However, photo/sketch synthesis remains a challenging problem due
to the fact that photo and sketch have different characteristics. In this work,
we consider this task as an image-to-image translation problem and explore the
recently popular generative models (GANs) to generate high-quality realistic
photos from sketches and sketches from photos. Recent GAN-based methods have
shown promising results on image-to-image translation problems and
photo-to-sketch synthesis in particular, however, they are known to have
limited abilities in generating high-resolution realistic images. To this end,
we propose a novel synthesis framework called Photo-Sketch Synthesis using
Multi-Adversarial Networks, (PS2-MAN) that iteratively generates low resolution
to high resolution images in an adversarial way. The hidden layers of the
generator are supervised to first generate lower resolution images followed by
implicit refinement in the network to generate higher resolution images.
Furthermore, since photo-sketch synthesis is a coupled/paired translation
problem, we leverage the pair information using CycleGAN framework. Both Image
Quality Assessment (IQA) and Photo-Sketch Matching experiments are conducted to
demonstrate the superior performance of our framework in comparison to existing
state-of-the-art solutions. Code available at:
https://github.com/lidan1/PhotoSketchMAN.Comment: Accepted by 2018 13th IEEE International Conference on Automatic Face
& Gesture Recognition (FG 2018)(Oral
An affective computing and image retrieval approach to support diversified and emotion-aware reminiscence therapy sessions
A demĂȘncia Ă© uma das principais causas de dependĂȘncia e incapacidade entre as pessoas idosas em todo o mundo. A terapia de reminiscĂȘncia Ă© uma terapia nĂŁo farmacolĂłgica comummente utilizada nos cuidados com demĂȘncia devido ao seu valor terapĂȘutico para as pessoas com demĂȘncia. Esta terapia Ă© Ăștil para criar uma comunicação envolvente entre pessoas com demĂȘncia e o resto do mundo, utilizando as capacidades preservadas da memĂłria a longo prazo, em vez de enfatizar as limitaçÔes existentes por forma a aliviar a experiĂȘncia de fracasso e isolamento social. As soluçÔes tecnolĂłgicas de assistĂȘncia existentes melhoram a terapia de reminiscĂȘncia ao proporcionar uma experiĂȘncia mais envolvente para todos os participantes (pessoas com demĂȘncia, familiares e clĂnicos), mas nĂŁo estĂŁo livres de lacunas: a) os dados multimĂ©dia utilizados permanecem inalterados ao longo das sessĂ”es, e hĂĄ uma falta de personalização para cada pessoa com demĂȘncia; b) nĂŁo tĂȘm em conta as emoçÔes transmitidas pelos dados multimĂ©dia utilizados nem as reacçÔes emocionais da pessoa com demĂȘncia aos dados multimĂ©dia apresentados; c) a perspectiva dos cuidadores ainda nĂŁo foi totalmente tida em consideração. Para superar estes desafios, seguimos uma abordagem de concepção centrada no utilizador atravĂ©s de inquĂ©ritos mundiais, entrevistas de seguimento, e grupos de discussĂŁo com cuidadores formais e informais para informar a concepção de soluçÔes tecnolĂłgicas no Ăąmbito dos cuidados de demĂȘncia. Para cumprir com os requisitos identificados, propomos novos mĂ©todos que facilitam a inclusĂŁo de emoçÔes no loop durante a terapia de reminiscĂȘncia para personalizar e diversificar o conteĂșdo das sessĂ”es ao longo do tempo. As contribuiçÔes desta tese incluem: a) um conjunto de requisitos funcionais validados recolhidos com os cuidadores formais e informais, os resultados esperados com o cumprimento de cada requisito, e um modelo de arquitectura para o desenvolvimento de soluçÔes tecnolĂłgicas de assistĂȘncia para cuidados de demĂȘncia; b) uma abordagem end-to-end para identificar automaticamente mĂșltiplas informaçÔes emocionais transmitidas por imagens; c) uma abordagem para reduzir a quantidade de imagens que precisam ser anotadas pelas pessoas sem comprometer o desempenho dos modelos de reconhecimento; d) uma tĂ©cnica de fusĂŁo tardia interpretĂĄvel que combina dinamicamente mĂșltiplos sistemas de recuperação de imagens com base em conteĂșdo para procurar eficazmente por imagens semelhantes para diversificar e personalizar o conjunto de imagens disponĂveis para serem utilizadas nas sessĂ”es.Dementia is one of the major causes of dependency and disability among elderly subjects worldwide. Reminiscence therapy is an inexpensive non-pharmacological therapy commonly used within dementia care due to its therapeutic value for people with dementia. This therapy is useful to create engaging communication between people with dementia and the rest of the world by using the preserved abilities of long-term memory rather than emphasizing the existing impairments to alleviate the experience of failure and social isolation. Current assistive technological solutions improve reminiscence therapy by providing a more lively and engaging experience to all participants (people with dementia, family members, and clinicians), but they are not free of drawbacks: a) the multimedia data used remains unchanged throughout sessions, and there is a lack of customization for each person with dementia; b) they do not take into account the emotions conveyed by the multimedia data used nor the person with dementiaâs emotional reactions to the multimedia presented; c) the caregiversâ perspective have not been fully taken into account yet. To overcome these challenges, we followed a usercentered design approach through worldwide surveys, follow-up interviews, and focus groups with formal and informal caregivers to inform the design of technological solutions within dementia care. To fulfil the requirements identified, we propose novel methods that facilitate the inclusion of emotions in the loop during reminiscence therapy to personalize and diversify the content of the sessions over time. Contributions from this thesis include: a) a set of validated functional requirements gathered from formal and informal caregivers, the expected outcomes with the fulfillment of each requirement, and an architectureâs template for the development of assistive technology solutions for dementia care; b) an end-to-end approach to automatically identify multiple emotional information conveyed by images; c) an approach to reduce the amount of images that need to be annotated by humans without compromising the recognition modelsâ performance; d) an interpretable late-fusion technique that dynamically combines multiple content-based image retrieval systems to effectively search for similar images to diversify and personalize the pool of images available to be used in sessions
ReLoop2: Building Self-Adaptive Recommendation Models via Responsive Error Compensation Loop
Industrial recommender systems face the challenge of operating in
non-stationary environments, where data distribution shifts arise from evolving
user behaviors over time. To tackle this challenge, a common approach is to
periodically re-train or incrementally update deployed deep models with newly
observed data, resulting in a continual training process. However, the
conventional learning paradigm of neural networks relies on iterative
gradient-based updates with a small learning rate, making it slow for large
recommendation models to adapt. In this paper, we introduce ReLoop2, a
self-correcting learning loop that facilitates fast model adaptation in online
recommender systems through responsive error compensation. Inspired by the
slow-fast complementary learning system observed in human brains, we propose an
error memory module that directly stores error samples from incoming data
streams. These stored samples are subsequently leveraged to compensate for
model prediction errors during testing, particularly under distribution shifts.
The error memory module is designed with fast access capabilities and undergoes
continual refreshing with newly observed data samples during the model serving
phase to support fast model adaptation. We evaluate the effectiveness of
ReLoop2 on three open benchmark datasets as well as a real-world production
dataset. The results demonstrate the potential of ReLoop2 in enhancing the
responsiveness and adaptiveness of recommender systems operating in
non-stationary environments.Comment: Accepted by KDD 2023. See the project page at
https://xpai.github.io/ReLoo
- âŠ