3,046 research outputs found

    DIY Human Action Data Set Generation

    Full text link
    The recent successes in applying deep learning techniques to solve standard computer vision problems has aspired researchers to propose new computer vision problems in different domains. As previously established in the field, training data itself plays a significant role in the machine learning process, especially deep learning approaches which are data hungry. In order to solve each new problem and get a decent performance, a large amount of data needs to be captured which may in many cases pose logistical difficulties. Therefore, the ability to generate de novo data or expand an existing data set, however small, in order to satisfy data requirement of current networks may be invaluable. Herein, we introduce a novel way to partition an action video clip into action, subject and context. Each part is manipulated separately and reassembled with our proposed video generation technique. Furthermore, our novel human skeleton trajectory generation along with our proposed video generation technique, enables us to generate unlimited action recognition training data. These techniques enables us to generate video action clips from an small set without costly and time-consuming data acquisition. Lastly, we prove through extensive set of experiments on two small human action recognition data sets, that this new data generation technique can improve the performance of current action recognition neural nets

    Simple yet efficient real-time pose-based action recognition

    Full text link
    Recognizing human actions is a core challenge for autonomous systems as they directly share the same space with humans. Systems must be able to recognize and assess human actions in real-time. In order to train corresponding data-driven algorithms, a significant amount of annotated training data is required. We demonstrated a pipeline to detect humans, estimate their pose, track them over time and recognize their actions in real-time with standard monocular camera sensors. For action recognition, we encode the human pose into a new data format called Encoded Human Pose Image (EHPI) that can then be classified using standard methods from the computer vision community. With this simple procedure we achieve competitive state-of-the-art performance in pose-based action detection and can ensure real-time performance. In addition, we show a use case in the context of autonomous driving to demonstrate how such a system can be trained to recognize human actions using simulation data.Comment: Submitted to IEEE Intelligent Transportation Systems Conference (ITSC) 2019. Code will be available soon at https://github.com/noboevbo/ehpi_action_recognitio

    Attend to You: Personalized Image Captioning with Context Sequence Memory Networks

    Get PDF
    We address personalization issues of image captioning, which have not been discussed yet in previous research. For a query image, we aim to generate a descriptive sentence, accounting for prior knowledge such as the user's active vocabularies in previous documents. As applications of personalized image captioning, we tackle two post automation tasks: hashtag prediction and post generation, on our newly collected Instagram dataset, consisting of 1.1M posts from 6.3K users. We propose a novel captioning model named Context Sequence Memory Network (CSMN). Its unique updates over previous memory network models include (i) exploiting memory as a repository for multiple types of context information, (ii) appending previously generated words into memory to capture long-term information without suffering from the vanishing gradient problem, and (iii) adopting CNN memory structure to jointly represent nearby ordered memory slots for better context understanding. With quantitative evaluation and user studies via Amazon Mechanical Turk, we show the effectiveness of the three novel features of CSMN and its performance enhancement for personalized image captioning over state-of-the-art captioning models.Comment: Accepted paper at CVPR 201

    Towards Practicality of Sketch-Based Visual Understanding

    Full text link
    Sketches have been used to conceptualise and depict visual objects from pre-historic times. Sketch research has flourished in the past decade, particularly with the proliferation of touchscreen devices. Much of the utilisation of sketch has been anchored around the fact that it can be used to delineate visual concepts universally irrespective of age, race, language, or demography. The fine-grained interactive nature of sketches facilitates the application of sketches to various visual understanding tasks, like image retrieval, image-generation or editing, segmentation, 3D-shape modelling etc. However, sketches are highly abstract and subjective based on the perception of individuals. Although most agree that sketches provide fine-grained control to the user to depict a visual object, many consider sketching a tedious process due to their limited sketching skills compared to other query/support modalities like text/tags. Furthermore, collecting fine-grained sketch-photo association is a significant bottleneck to commercialising sketch applications. Therefore, this thesis aims to progress sketch-based visual understanding towards more practicality.Comment: PhD thesis successfully defended by Ayan Kumar Bhunia, Supervisor: Prof. Yi-Zhe Song, Thesis Examiners: Prof Stella Yu and Prof Adrian Hilto

    The affective extension of ‘Family’ in the context of changing elite business networks

    Get PDF
    Drawing on 49 oral-history interviews with Scottish family business owner-managers, six key-informant interviews, and secondary sources, this interdisciplinary study analyses the decline of kinship-based connections and the emergence of new kinds of elite networks around the 1980s. As the socioeconomic context changed rapidly during this time, cooperation built primarily around literal family ties could not survive unaltered. Instead of finding unity through bio-legal family connections, elite networks now came to redefine their ‘family businesses’ in terms of affectively loaded ‘family values’ such as loyalty, care, commitment, and even ‘love’. Consciously nurturing ‘as-if-family’ emotional and ethical connections arose as a psychologically effective way to bring together network members who did not necessarily share pre-existing connections of bio-legal kinship. The social-psychological processes involved in this extension of the ‘family’ can be understood using theories of the moral sentiments first developed in the Scottish Enlightenment. These theories suggest that, when the context is amenable, family-like emotional bonds can be extended via sympathy to those to whom one is not literally related. As a result of this ‘progress of sentiments’, one now earns his/her place in a Scottish family business, not by inheriting or marrying into it, but by performing family-like behaviours motivated by shared ethics and affects

    TALK COMMONSENSE TO ME! ENRICHING LANGUAGE MODELS WITH COMMONSENSE KNOWLEDGE

    Get PDF
    Human cognition is exciting, it is a mesh up of several neural phenomena which really strive our ability to constantly reason and infer about the involving world. In cognitive computer science, Commonsense Reasoning is the terminology given to our ability to infer uncertain events and reason about Cognitive Knowledge. The introduction of Commonsense to intelligent systems has been for years desired, but the mechanism for this introduction remains a scientific jigsaw. Some, implicitly believe language understanding is enough to achieve some level of Commonsense [90]. In a less common ground, there are others who think enriching language with Knowledge Graphs might be enough for human-like reasoning [63], while there are others who believe human-like reasoning can only be truly captured with symbolic rules and logical deduction powered by Knowledge Bases, such as taxonomies and ontologies [50]. We focus on Commonsense Knowledge integration to Language Models, because we believe that this integration is a step towards a beneficial embedding of Commonsense Reasoning to interactive Intelligent Systems, such as conversational assistants. Conversational assistants, such as Alexa from Amazon, are user driven systems. Thus, giving birth to a more human-like interaction is strongly desired to really capture the user’s attention and empathy. We believe that such humanistic characteristics can be leveraged through the introduction of stronger Commonsense Knowledge and Reasoning to fruitfully engage with users. To this end, we intend to introduce a new family of models, the Relation-Aware BART (RA-BART), leveraging language generation abilities of BART [51] with explicit Commonsense Knowledge extracted from Commonsense Knowledge Graphs to further extend human capabilities on these models. We evaluate our model on three different tasks: Abstractive Question Answering, Text Generation conditioned on certain concepts and aMulti-Choice Question Answering task. We find out that, on generation tasks, RA-BART outperforms non-knowledge enriched models, however, it underperforms on the multi-choice question answering task. Our Project can be consulted in our open source, public GitHub repository (Explicit Commonsense).A cognição humana é entusiasmante, é uma malha de vários fenómenos neuronais que nos estimulam vivamente a capacidade de raciocinar e inferir constantemente sobre o mundo envolvente. Na ciência cognitiva computacional, o raciocínio de senso comum é a terminologia dada à nossa capacidade de inquirir sobre acontecimentos incertos e de raciocinar sobre o conhecimento cognitivo. A introdução do senso comum nos sistemas inteligentes é desejada há anos, mas o mecanismo para esta introdução continua a ser um quebra-cabeças científico. Alguns acreditam que apenas compreensão da linguagem é suficiente para alcançar o senso comum [90], num campo menos similar há outros que pensam que enriquecendo a linguagem com gráfos de conhecimento pode serum caminho para obter um raciocínio mais semelhante ao ser humano [63], enquanto que há outros ciêntistas que acreditam que o raciocínio humano só pode ser verdadeiramente capturado com regras simbólicas e deduções lógicas alimentadas por bases de conhecimento, como taxonomias e ontologias [50]. Concentramo-nos na integração de conhecimento de censo comum em Modelos Linguísticos, acreditando que esta integração é um passo no sentido de uma incorporação benéfica no racíocinio de senso comum em Sistemas Inteligentes Interactivos, como é o caso dos assistentes de conversação. Assistentes de conversação, como o Alexa da Amazon, são sistemas orientados aos utilizadores. Assim, dar origem a uma comunicação mais humana é fortemente desejada para captar realmente a atenção e a empatia do utilizador. Acreditamos que tais características humanísticas podem ser alavancadas por meio de uma introdução mais rica de conhecimento e raciocínio de senso comum de forma a proporcionar uma interação mais natural com o utilizador. Para tal, pretendemos introduzir uma nova família de modelos, o Relation-Aware BART (RA-BART), alavancando as capacidades de geração de linguagem do BART [51] com conhecimento de censo comum extraído a partir de grafos de conhecimento explícito de senso comum para alargar ainda mais as capacidades humanas nestes modelos. Avaliamos o nosso modelo em três tarefas distintas: Respostas a Perguntas Abstratas, Geração de Texto com base em conceitos e numa tarefa de Resposta a Perguntas de Escolha Múltipla . Descobrimos que, nas tarefas de geração, o RA-BART tem um desempenho superior aos modelos sem enriquecimento de conhecimento, contudo, tem um desempenho inferior na tarefa de resposta a perguntas de múltipla escolha. O nosso Projecto pode ser consultado no nosso repositório GitHub público, de código aberto (Explicit Commonsense)

    InterTracker: Discovering and Tracking General Objects Interacting with Hands in the Wild

    Full text link
    Understanding human interaction with objects is an important research topic for embodied Artificial Intelligence and identifying the objects that humans are interacting with is a primary problem for interaction understanding. Existing methods rely on frame-based detectors to locate interacting objects. However, this approach is subjected to heavy occlusions, background clutter, and distracting objects. To address the limitations, in this paper, we propose to leverage spatio-temporal information of hand-object interaction to track interactive objects under these challenging cases. Without prior knowledge of the general objects to be tracked like object tracking problems, we first utilize the spatial relation between hands and objects to adaptively discover the interacting objects from the scene. Second, the consistency and continuity of the appearance of objects between successive frames are exploited to track the objects. With this tracking formulation, our method also benefits from training on large-scale general object-tracking datasets. We further curate a video-level hand-object interaction dataset for testing and evaluation from 100DOH. The quantitative results demonstrate that our proposed method outperforms the state-of-the-art methods. Specifically, in scenes with continuous interaction with different objects, we achieve an impressive improvement of about 10% as evaluated using the Average Precision (AP) metric. Our qualitative findings also illustrate that our method can produce more continuous trajectories for interacting objects.Comment: IROS 202

    HANDLING CHANGE IN A PRODUCTION TASKBOT. EFFICIENTLY MANAGING THE GROWTH OF TWIZ, AN ALEXA ASSISTANT

    Get PDF
    A Conversational Agent aims to converse with users, with a focus on natural behaviour and responses. They can be extremely complex as there are several parts which constitute it, several courses of action and infinite possible inputs. As so, behaviour checking is essential, especially if used in a production context, as wrong behaviour can have big consequences. Nevertheless, developing a robust and correctly behaving Task Bot, should not hinder research and must allow for continuous improvement of vanguard solutions. Hence, manual testing of such a complex system is bound to encounter several limits, either on the extension of the testing or on the time consumption of developers’ work. As so, we propose the development of a tool to automatically test, with a much broader test surface, these highly sophisticated systems. We introduce a solution, which leverages past conversation replay and mimicking to generate synthetic conversations. This allows for time-savings on quality assurance and better change handling. A key part of a Conversational Agent is the retrieval component. This is responsible for the correct retrieval of information, that is useful to the user. In task-guiding assistants, the retrieval element should not narrow the user’s behaviour, by omitting tasks that could be relevant. However, achieving perfect information matching to a user’s query is arduous, since there could be a plethora of words the user could say in order to attempt to accomplish an objective. To tackle this, we make use of a semantic retrieval algorithm adapting it to this domain by generating a synthetic dataset.Um Agente Conversacional visa ter conversas com utilizadores, focando-se no comportamento e nas respostas naturais. Estes podem ser, no entanto, extremamente complexos. São várias as partes que os constituem, os fluxos possíveis e os pedidos que o utilizador pode fazer. Assim, a verificação de comportamento é essencial, especialmente se usada em um contexto de produção, pois o comportamento errado pode ter grandes consequências. No entanto, o desenvolvimento de um Task Bot robusto e de comportamento correto não deve prejudicar a pesquisa e deve permitir a melhoria contínua das soluções. Portanto, testagem manual de um sistema tão complexo depara-se com vários limites, seja na extensão do teste ou no consumo de tempo do trabalho dos developers. Assim, propomos também o desenvolvimento de uma ferramenta para testes automáticos, com uma frente de teste muito mais ampla, para estes sistemas sofisticados. Apresentamos uma solução que aproveita a repetição e a simulação de conversas anteriores para gerar conversas sintéticas. Isso permite reduzir o tempo gasto na verificação de qualidade e permite melhor adaptação a mudanças. Uma parte fundamental de um agente conversacional é o retriever. Esta é a componente responsável pela obtenção de informação relevante. Nos assistentes que têm como objetivo a orientação de tarefas, o retriever não deve restringir o comportamento do utilizador, ao omitir tarefas que possam ser relevantes. No entanto, obter uma correspondência perfeita de informações com o pedido do utilizador é árduo, pois pode haver uma infinidade de formas que o utilizador pode formular o seu pedido pretendendo o mesmo objetivo. Para ultrupassar este problema, utilizamos um algoritmo de retrieval semântico, adaptando-o ao domínio em questão através da geração de um conjunto de dados sintético

    3D Printing in the Era of the Prosumer: The Role of Technology Readiness, Gender, and Age in User Acceptance of Desktop 3D Printing in American Households

    Get PDF
    Technology acceptance of Desktop 3D printing for fabrication at home is an emerging field of research in Asia and Europe. The proposal explains how Desktop 3D printing provides an innovative manufacturing alternative to the traditional manufacturing processes and as such facilitates innovation among prosumers. The link of how such innovations have the potential to sustain economic growth is also explained thus substantiating the need to understand the Technology acceptance of Desktop 3D printing for fabrication at home. The unified theory of acceptance and use of technology (UTAUT) model (Williams et al., 2015) was the most commonly used model in previous research to study the adoption of Desktop 3D printing for fabrication at home. The current research proposes an extension to the UTAUT model that accounts for the Technology Readiness of the individual. The extended UTAUT model is applied to study the acceptance of Desktop 3D printing for fabrication in American households which will be a new contribution to the literature. Partial Least Squares Structural Equation Modeling (PLS-SEM) is proposed to analyze the extended UTAUT model to determine the key factors that influence the acceptance of Desktop 3D printing. A multi-group analysis based on Gender is also proposed to identify how significant the differences are in the key factors. This research contributes theoretically to the emerging stream of research that focuses on integrating technology acceptance theories with the Technology readiness concept. Practically, this research contributes to the techno-marketing literature of 3D printer manufactures that seek to increase the adoption rate of Desktop 3D printers by women in American households
    corecore