3,046 research outputs found
DIY Human Action Data Set Generation
The recent successes in applying deep learning techniques to solve standard
computer vision problems has aspired researchers to propose new computer vision
problems in different domains. As previously established in the field, training
data itself plays a significant role in the machine learning process,
especially deep learning approaches which are data hungry. In order to solve
each new problem and get a decent performance, a large amount of data needs to
be captured which may in many cases pose logistical difficulties. Therefore,
the ability to generate de novo data or expand an existing data set, however
small, in order to satisfy data requirement of current networks may be
invaluable. Herein, we introduce a novel way to partition an action video clip
into action, subject and context. Each part is manipulated separately and
reassembled with our proposed video generation technique. Furthermore, our
novel human skeleton trajectory generation along with our proposed video
generation technique, enables us to generate unlimited action recognition
training data. These techniques enables us to generate video action clips from
an small set without costly and time-consuming data acquisition. Lastly, we
prove through extensive set of experiments on two small human action
recognition data sets, that this new data generation technique can improve the
performance of current action recognition neural nets
Simple yet efficient real-time pose-based action recognition
Recognizing human actions is a core challenge for autonomous systems as they
directly share the same space with humans. Systems must be able to recognize
and assess human actions in real-time. In order to train corresponding
data-driven algorithms, a significant amount of annotated training data is
required. We demonstrated a pipeline to detect humans, estimate their pose,
track them over time and recognize their actions in real-time with standard
monocular camera sensors. For action recognition, we encode the human pose into
a new data format called Encoded Human Pose Image (EHPI) that can then be
classified using standard methods from the computer vision community. With this
simple procedure we achieve competitive state-of-the-art performance in
pose-based action detection and can ensure real-time performance. In addition,
we show a use case in the context of autonomous driving to demonstrate how such
a system can be trained to recognize human actions using simulation data.Comment: Submitted to IEEE Intelligent Transportation Systems Conference
(ITSC) 2019. Code will be available soon at
https://github.com/noboevbo/ehpi_action_recognitio
Attend to You: Personalized Image Captioning with Context Sequence Memory Networks
We address personalization issues of image captioning, which have not been
discussed yet in previous research. For a query image, we aim to generate a
descriptive sentence, accounting for prior knowledge such as the user's active
vocabularies in previous documents. As applications of personalized image
captioning, we tackle two post automation tasks: hashtag prediction and post
generation, on our newly collected Instagram dataset, consisting of 1.1M posts
from 6.3K users. We propose a novel captioning model named Context Sequence
Memory Network (CSMN). Its unique updates over previous memory network models
include (i) exploiting memory as a repository for multiple types of context
information, (ii) appending previously generated words into memory to capture
long-term information without suffering from the vanishing gradient problem,
and (iii) adopting CNN memory structure to jointly represent nearby ordered
memory slots for better context understanding. With quantitative evaluation and
user studies via Amazon Mechanical Turk, we show the effectiveness of the three
novel features of CSMN and its performance enhancement for personalized image
captioning over state-of-the-art captioning models.Comment: Accepted paper at CVPR 201
Towards Practicality of Sketch-Based Visual Understanding
Sketches have been used to conceptualise and depict visual objects from
pre-historic times. Sketch research has flourished in the past decade,
particularly with the proliferation of touchscreen devices. Much of the
utilisation of sketch has been anchored around the fact that it can be used to
delineate visual concepts universally irrespective of age, race, language, or
demography. The fine-grained interactive nature of sketches facilitates the
application of sketches to various visual understanding tasks, like image
retrieval, image-generation or editing, segmentation, 3D-shape modelling etc.
However, sketches are highly abstract and subjective based on the perception of
individuals. Although most agree that sketches provide fine-grained control to
the user to depict a visual object, many consider sketching a tedious process
due to their limited sketching skills compared to other query/support
modalities like text/tags. Furthermore, collecting fine-grained sketch-photo
association is a significant bottleneck to commercialising sketch applications.
Therefore, this thesis aims to progress sketch-based visual understanding
towards more practicality.Comment: PhD thesis successfully defended by Ayan Kumar Bhunia, Supervisor:
Prof. Yi-Zhe Song, Thesis Examiners: Prof Stella Yu and Prof Adrian Hilto
The affective extension of ‘Family’ in the context of changing elite business networks
Drawing on 49 oral-history interviews with Scottish family business owner-managers, six key-informant interviews, and secondary sources, this interdisciplinary study analyses the decline of kinship-based connections and the emergence of new kinds of elite networks around the 1980s. As the socioeconomic context changed rapidly during this time, cooperation built primarily around literal family ties could not survive unaltered. Instead of finding unity through bio-legal family connections, elite networks now came to redefine their ‘family businesses’ in terms of affectively loaded ‘family values’ such as loyalty, care, commitment, and even ‘love’. Consciously nurturing ‘as-if-family’ emotional and ethical connections arose as a psychologically effective way to bring together network members who did not necessarily share pre-existing connections of bio-legal kinship. The social-psychological processes involved in this extension of the ‘family’ can be understood using theories of the moral sentiments first developed in the Scottish Enlightenment. These theories suggest that, when the context is amenable, family-like emotional bonds can be extended via sympathy to those to whom one is not literally related. As a result of this ‘progress of sentiments’, one now earns his/her place in a Scottish family business, not by inheriting or marrying into it, but by performing family-like behaviours motivated by shared ethics and affects
TALK COMMONSENSE TO ME! ENRICHING LANGUAGE MODELS WITH COMMONSENSE KNOWLEDGE
Human cognition is exciting, it is a mesh up of several neural phenomena which really
strive our ability to constantly reason and infer about the involving world. In cognitive
computer science, Commonsense Reasoning is the terminology given to our ability to
infer uncertain events and reason about Cognitive Knowledge. The introduction of Commonsense
to intelligent systems has been for years desired, but the mechanism for this
introduction remains a scientific jigsaw. Some, implicitly believe language understanding
is enough to achieve some level of Commonsense [90]. In a less common ground, there
are others who think enriching language with Knowledge Graphs might be enough for
human-like reasoning [63], while there are others who believe human-like reasoning can
only be truly captured with symbolic rules and logical deduction powered by Knowledge
Bases, such as taxonomies and ontologies [50]. We focus on Commonsense Knowledge
integration to Language Models, because we believe that this integration is a step towards
a beneficial embedding of Commonsense Reasoning to interactive Intelligent Systems,
such as conversational assistants.
Conversational assistants, such as Alexa from Amazon, are user driven systems. Thus,
giving birth to a more human-like interaction is strongly desired to really capture the
user’s attention and empathy. We believe that such humanistic characteristics can be
leveraged through the introduction of stronger Commonsense Knowledge and Reasoning
to fruitfully engage with users.
To this end, we intend to introduce a new family of models, the Relation-Aware
BART (RA-BART), leveraging language generation abilities of BART [51] with explicit
Commonsense Knowledge extracted from Commonsense Knowledge Graphs to further
extend human capabilities on these models.
We evaluate our model on three different tasks: Abstractive Question Answering, Text
Generation conditioned on certain concepts and aMulti-Choice Question Answering task.
We find out that, on generation tasks, RA-BART outperforms non-knowledge enriched
models, however, it underperforms on the multi-choice question answering task.
Our Project can be consulted in our open source, public GitHub repository (Explicit
Commonsense).A cognição humana é entusiasmante, é uma malha de vários fenómenos neuronais que
nos estimulam vivamente a capacidade de raciocinar e inferir constantemente sobre o
mundo envolvente. Na ciência cognitiva computacional, o raciocínio de senso comum é
a terminologia dada à nossa capacidade de inquirir sobre acontecimentos incertos e de
raciocinar sobre o conhecimento cognitivo. A introdução do senso comum nos sistemas
inteligentes é desejada há anos, mas o mecanismo para esta introdução continua a ser
um quebra-cabeças científico. Alguns acreditam que apenas compreensão da linguagem
é suficiente para alcançar o senso comum [90], num campo menos similar há outros que
pensam que enriquecendo a linguagem com gráfos de conhecimento pode serum caminho
para obter um raciocínio mais semelhante ao ser humano [63], enquanto que há outros
ciêntistas que acreditam que o raciocínio humano só pode ser verdadeiramente capturado
com regras simbólicas e deduções lógicas alimentadas por bases de conhecimento, como
taxonomias e ontologias [50]. Concentramo-nos na integração de conhecimento de censo
comum em Modelos Linguísticos, acreditando que esta integração é um passo no sentido
de uma incorporação benéfica no racíocinio de senso comum em Sistemas Inteligentes
Interactivos, como é o caso dos assistentes de conversação.
Assistentes de conversação, como o Alexa da Amazon, são sistemas orientados aos
utilizadores. Assim, dar origem a uma comunicação mais humana é fortemente desejada
para captar realmente a atenção e a empatia do utilizador. Acreditamos que tais características
humanísticas podem ser alavancadas por meio de uma introdução mais rica de
conhecimento e raciocínio de senso comum de forma a proporcionar uma interação mais
natural com o utilizador.
Para tal, pretendemos introduzir uma nova família de modelos, o Relation-Aware
BART (RA-BART), alavancando as capacidades de geração de linguagem do BART [51]
com conhecimento de censo comum extraído a partir de grafos de conhecimento explícito
de senso comum para alargar ainda mais as capacidades humanas nestes modelos.
Avaliamos o nosso modelo em três tarefas distintas: Respostas a Perguntas Abstratas,
Geração de Texto com base em conceitos e numa tarefa de Resposta a Perguntas de Escolha Múltipla . Descobrimos que, nas tarefas de geração, o RA-BART tem um desempenho
superior aos modelos sem enriquecimento de conhecimento, contudo, tem um
desempenho inferior na tarefa de resposta a perguntas de múltipla escolha.
O nosso Projecto pode ser consultado no nosso repositório GitHub público, de código
aberto (Explicit Commonsense)
InterTracker: Discovering and Tracking General Objects Interacting with Hands in the Wild
Understanding human interaction with objects is an important research topic
for embodied Artificial Intelligence and identifying the objects that humans
are interacting with is a primary problem for interaction understanding.
Existing methods rely on frame-based detectors to locate interacting objects.
However, this approach is subjected to heavy occlusions, background clutter,
and distracting objects. To address the limitations, in this paper, we propose
to leverage spatio-temporal information of hand-object interaction to track
interactive objects under these challenging cases. Without prior knowledge of
the general objects to be tracked like object tracking problems, we first
utilize the spatial relation between hands and objects to adaptively discover
the interacting objects from the scene. Second, the consistency and continuity
of the appearance of objects between successive frames are exploited to track
the objects. With this tracking formulation, our method also benefits from
training on large-scale general object-tracking datasets. We further curate a
video-level hand-object interaction dataset for testing and evaluation from
100DOH. The quantitative results demonstrate that our proposed method
outperforms the state-of-the-art methods. Specifically, in scenes with
continuous interaction with different objects, we achieve an impressive
improvement of about 10% as evaluated using the Average Precision (AP) metric.
Our qualitative findings also illustrate that our method can produce more
continuous trajectories for interacting objects.Comment: IROS 202
Recommended from our members
Developing Interventions for Scaling Up UK Upcycling
open access articleUpcycling presents one of many opportunities for reducing consumption of materials and energy. Despite recent growth evidenced by increasing numbers of practitioners and businesses based on upcycling, it remains a niche activity and requires scaling up to realise its potential benefits. This paper investigates UK household upcycling in order to develop interventions for scaling up upcycling in the UK. Mixed methods were used in four stages: (a) Interviews to gain insights into UK upcycling; (b) a survey to discover key factors influencing UK upcycling; (c) intervention development based on the synthesis of interviews and survey; and (d) use of a semi-Delphi technique to evaluate and develop initial interventions. The results showed approaches to upcycling (e.g., wood, metal and fabric as frequently used materials, online platforms as frequently used source of materials), context for upcycling (e.g., predominant use of home for upcycling), factors influencing UK upcycling with key determinants (i.e., intention, attitude and subjective norm), important demographic characteristics considering a target audience for interventions (i.e., 30+ females) and prioritised interventions for scaling up (e.g., TV and inspirational media and community workshops as short-term high priority interventions). The paper further discusses implications of the study in terms of development of theory and practice of upcycling
HANDLING CHANGE IN A PRODUCTION TASKBOT. EFFICIENTLY MANAGING THE GROWTH OF TWIZ, AN ALEXA ASSISTANT
A Conversational Agent aims to converse with users, with a focus on natural behaviour
and responses. They can be extremely complex as there are several parts which constitute
it, several courses of action and infinite possible inputs. As so, behaviour checking is
essential, especially if used in a production context, as wrong behaviour can have big
consequences. Nevertheless, developing a robust and correctly behaving Task Bot, should
not hinder research and must allow for continuous improvement of vanguard solutions.
Hence, manual testing of such a complex system is bound to encounter several limits,
either on the extension of the testing or on the time consumption of developers’ work.
As so, we propose the development of a tool to automatically test, with a much broader
test surface, these highly sophisticated systems. We introduce a solution, which leverages
past conversation replay and mimicking to generate synthetic conversations. This allows
for time-savings on quality assurance and better change handling.
A key part of a Conversational Agent is the retrieval component. This is responsible
for the correct retrieval of information, that is useful to the user. In task-guiding assistants,
the retrieval element should not narrow the user’s behaviour, by omitting tasks that
could be relevant. However, achieving perfect information matching to a user’s query is
arduous, since there could be a plethora of words the user could say in order to attempt
to accomplish an objective. To tackle this, we make use of a semantic retrieval algorithm
adapting it to this domain by generating a synthetic dataset.Um Agente Conversacional visa ter conversas com utilizadores, focando-se no comportamento
e nas respostas naturais. Estes podem ser, no entanto, extremamente complexos.
São várias as partes que os constituem, os fluxos possíveis e os pedidos que o utilizador
pode fazer. Assim, a verificação de comportamento é essencial, especialmente se usada em
um contexto de produção, pois o comportamento errado pode ter grandes consequências.
No entanto, o desenvolvimento de um Task Bot robusto e de comportamento correto não
deve prejudicar a pesquisa e deve permitir a melhoria contínua das soluções. Portanto,
testagem manual de um sistema tão complexo depara-se com vários limites, seja na extensão
do teste ou no consumo de tempo do trabalho dos developers. Assim, propomos
também o desenvolvimento de uma ferramenta para testes automáticos, com uma frente
de teste muito mais ampla, para estes sistemas sofisticados. Apresentamos uma solução
que aproveita a repetição e a simulação de conversas anteriores para gerar conversas sintéticas.
Isso permite reduzir o tempo gasto na verificação de qualidade e permite melhor
adaptação a mudanças.
Uma parte fundamental de um agente conversacional é o retriever. Esta é a componente
responsável pela obtenção de informação relevante. Nos assistentes que têm como
objetivo a orientação de tarefas, o retriever não deve restringir o comportamento do utilizador,
ao omitir tarefas que possam ser relevantes. No entanto, obter uma correspondência
perfeita de informações com o pedido do utilizador é árduo, pois pode haver uma infinidade
de formas que o utilizador pode formular o seu pedido pretendendo o mesmo
objetivo. Para ultrupassar este problema, utilizamos um algoritmo de retrieval semântico,
adaptando-o ao domínio em questão através da geração de um conjunto de dados
sintético
3D Printing in the Era of the Prosumer: The Role of Technology Readiness, Gender, and Age in User Acceptance of Desktop 3D Printing in American Households
Technology acceptance of Desktop 3D printing for fabrication at home is an emerging field of research in Asia and Europe. The proposal explains how Desktop 3D printing provides an innovative manufacturing alternative to the traditional manufacturing processes and as such facilitates innovation among prosumers. The link of how such innovations have the potential to sustain economic growth is also explained thus substantiating the need to understand the Technology acceptance of Desktop 3D printing for fabrication at home. The unified theory of acceptance and use of technology (UTAUT) model (Williams et al., 2015) was the most commonly used model in previous research to study the adoption of Desktop 3D printing for fabrication at home. The current research proposes an extension to the UTAUT model that accounts for the Technology Readiness of the individual. The extended UTAUT model is applied to study the acceptance of Desktop 3D printing for fabrication in American households which will be a new contribution to the literature. Partial Least Squares Structural Equation Modeling (PLS-SEM) is proposed to analyze the extended UTAUT model to determine the key factors that influence the acceptance of Desktop 3D printing. A multi-group analysis based on Gender is also proposed to identify how significant the differences are in the key factors. This research contributes theoretically to the emerging stream of research that focuses on integrating technology acceptance theories with the Technology readiness concept. Practically, this research contributes to the techno-marketing literature of 3D printer manufactures that seek to increase the adoption rate of Desktop 3D printers by women in American households
- …