8,681 research outputs found
Learning Fashion Compatibility with Bidirectional LSTMs
The ubiquity of online fashion shopping demands effective recommendation
services for customers. In this paper, we study two types of fashion
recommendation: (i) suggesting an item that matches existing components in a
set to form a stylish outfit (a collection of fashion items), and (ii)
generating an outfit with multimodal (images/text) specifications from a user.
To this end, we propose to jointly learn a visual-semantic embedding and the
compatibility relationships among fashion items in an end-to-end fashion. More
specifically, we consider a fashion outfit to be a sequence (usually from top
to bottom and then accessories) and each item in the outfit as a time step.
Given the fashion items in an outfit, we train a bidirectional LSTM (Bi-LSTM)
model to sequentially predict the next item conditioned on previous ones to
learn their compatibility relationships. Further, we learn a visual-semantic
space by regressing image features to their semantic representations aiming to
inject attribute and category information as a regularization for training the
LSTM. The trained network can not only perform the aforementioned
recommendations effectively but also predict the compatibility of a given
outfit. We conduct extensive experiments on our newly collected Polyvore
dataset, and the results provide strong qualitative and quantitative evidence
that our framework outperforms alternative methods.Comment: ACM MM 1
Natural Notation for the Domestic Internet of Things
This study explores the use of natural language to give instructions that
might be interpreted by Internet of Things (IoT) devices in a domestic `smart
home' environment. We start from the proposition that reminders can be
considered as a type of end-user programming, in which the executed actions
might be performed either by an automated agent or by the author of the
reminder. We conducted an experiment in which people wrote sticky notes
specifying future actions in their home. In different conditions, these notes
were addressed to themselves, to others, or to a computer agent.We analyse the
linguistic features and strategies that are used to achieve these tasks,
including the use of graphical resources as an informal visual language. The
findings provide a basis for design guidance related to end-user development
for the Internet of Things.Comment: Proceedings of the 5th International symposium on End-User
Development (IS-EUD), Madrid, Spain, May, 201
‘My favourite things to do’ and ‘my favourite people’: Exploring salient aspects of children’s self-concept
This study explores the potential of the ‘draw-and-write’ method for inviting children to communicate salient aspects of their self-concept. Irish primary school children aged 10–13 years drew and wrote about their favourite people and things to do (social and active self). Children drew and described many salient activities (39 in total) and people – including pets. Results suggest that widely used, adult-constructed self-esteem scales for children, while multidimensional, are limited, and that ‘draw-and-write’ is an effective multimodal method with which children can express their social and active self-concepts
Everyday writing in Graeco-Roman and late antique Egypt : outline of a new research programme
In October 2017, the European Research Council awarded a Starting Grant to Klaas Bentein for his project EVWRIT: Everyday writing in Graeco-Roman and Late Antique Egypt: A socio-semiotic study of communicative variation. In what follows, the research goals, methodology, and corpus of this new project are briefly outlined
Speech-driven Animation with Meaningful Behaviors
Conversational agents (CAs) play an important role in human computer
interaction. Creating believable movements for CAs is challenging, since the
movements have to be meaningful and natural, reflecting the coupling between
gestures and speech. Studies in the past have mainly relied on rule-based or
data-driven approaches. Rule-based methods focus on creating meaningful
behaviors conveying the underlying message, but the gestures cannot be easily
synchronized with speech. Data-driven approaches, especially speech-driven
models, can capture the relationship between speech and gestures. However, they
create behaviors disregarding the meaning of the message. This study proposes
to bridge the gap between these two approaches overcoming their limitations.
The approach builds a dynamic Bayesian network (DBN), where a discrete variable
is added to constrain the behaviors on the underlying constraint. The study
implements and evaluates the approach with two constraints: discourse functions
and prototypical behaviors. By constraining on the discourse functions (e.g.,
questions), the model learns the characteristic behaviors associated with a
given discourse class learning the rules from the data. By constraining on
prototypical behaviors (e.g., head nods), the approach can be embedded in a
rule-based system as a behavior realizer creating trajectories that are timely
synchronized with speech. The study proposes a DBN structure and a training
approach that (1) models the cause-effect relationship between the constraint
and the gestures, (2) initializes the state configuration models increasing the
range of the generated behaviors, and (3) captures the differences in the
behaviors across constraints by enforcing sparse transitions between shared and
exclusive states per constraint. Objective and subjective evaluations
demonstrate the benefits of the proposed approach over an unconstrained model.Comment: 13 pages, 12 figures, 5 table
Using Sound to Enhance Users’ Experiences of Mobile Applications
The latest smartphones with GPS, electronic compass, directional audio, touch screens etc. hold potentials for location based services that are easier to use compared to traditional tools. Rather than interpreting maps, users may focus on their activities and the environment around them. Interfaces may be designed that let users search for information by simply pointing in a direction. Database queries can be created from GPS location and compass direction data. Users can get guidance to locations through pointing gestures, spatial sound and simple graphics. This article describes two studies testing prototypic applications with multimodal user interfaces built on spatial audio, graphics and text. Tests show that users appreciated the applications for their ease of use, for being fun and effective to use and for allowing users to interact directly with the environment rather than with abstractions of the same. The multimodal user interfaces contributed significantly to the overall user experience
Taking It to the Street: How Roadway Design Helped Shape a Neighborhood's Development
Summarizes lessons learned during the process of planning the redesign of one of the region's most historic corridors in Minneapolis -- Lake Street -- and illustrates how roadway design can help shape a neighborhood's development
- …