    Predicting Activites from Smartphones

    Os sensores embutidos na maior parte dos smartphones modernos, tal como acelerómetros e giroscópios, abrem um mundo de infinitas possibilidades para um novo tipo de aplicações baseadas no contexto adquirido a partir dos dados que fornecem. O Reconhecimento de Actividades Humanas (HAR) é uma aplicação directa desta tecnologia, que apesar de ser uma área de estudo bastante activa nos últimos anos, ainda possui diversas estratégias não exploradas e aspectos-chave não correspondidos. Um desafio de HAR normalmente ignorado é a diferença de sinais produzidos por diferentes pessoas enquanto praticam as mesmas actividades. Assim, o sistema de classificação de actividades deveria ser capaz de gerar resultados adaptados para cada utilizador individual. Este documento propõe e explora uma solução para este problema à base de Aprendizagem Semi-supervisionada Online, uma abordagem incremental pouco explorada capaz de adaptar o modelo de classificação ao utilizador da aplicação ao actualiza-lo continuamente à medida que os dados específicos gerados pelo utilizador entram no sistema. O cenário ideal deste projecto seria a criação de uma aplicação para smartphone capaz de, logo de raiz, ser capaz de classificar as actividades de um novo utilizador, com um certo erro, e à medida que o tempo passa e o utilizador usa a aplicação, sem qualquer tipo de instruções manuais, o erro do sistema de classificação diminuiria autonomamente, até ser virtualmente insignificante para esse utilizador específico. Vários modelos de classificação serão gerados a partir de diferentes abordagens de aprendizagem semi-supervisionada, e posteriormente avaliados e comparados, de forma a decidir na melhor selecção. O sucesso desta abordagem resultaria num vasto leque de aplicações, e poderia aumentar consideravelmente a interacção actual entre as pessoas e os seus dispositivos móveis, elevando o conceito de "smartphone" a um nível nunca antes concebido.Built-in hardware sensors in many of the modern smartphones, such as accelerometers and gyroscopes, open a world of infinite opportunities for novel applications based on the context perceived from the data they provide. Human activity recognition (HAR) is a direct application of this technology, which despite being a very active field of study in the past years, leaves many strategies left to explore and key aspects left to address. A commonly ignored challenge of HAR is the difference of input signals produced by different people when doing the same activities. As a result, the activity classification method should be able to generate adapted results for each different user. This document proposes and explores a solution to this problem by means of "Online Semi-supervised Learning", an underexplored incremental approach capable of adapting the classification model to the user of the application by continuously updating it as the data from the user's own specific input signals arrives. The ideal scenario of this project would be the creation of a smartphone application capable from the beginning of classifying the user's activities with a certain error, and as the time passes and the user utilizes the application, without manual input, the system's classification error would decrease autonomously until it is virtually insignificant for that specific user. Several classification models will be generated from different online semi-supervised approaches, and further evaluated and compared, in order to decide on a best fit. The success of this approach would result in innumerable applications, and could considerably enhance the current interaction between people and their mobile devices, taking the concept of "smartphone" to a whole new level

    Active Collaboration of Classifiers for Visual Tracking

    Recently, discriminative visual trackers obtain state-of-the-art performance, yet they suffer in the presence of different real-world challenges such as target motion and appearance changes. In a discriminative tracker, one or more classifiers are employed to obtain the target/nontarget label for the samples, which in turn determine the target’s location. To cope with variations of the target shape and appearance, the classifier(s) are updated online with different samples of the target and the background. Sample selection, labeling, and updating the classifier are prone to various sources of errors that drift the tracker. In this study, we motivate, conceptualize, realize, and formalize a novel active co-tracking framework, step by step to demonstrate the challenges and generic solutions for them. In this framework, not only classifiers cooperate in labeling the samples but also exchange their information to robustify the labeling, improve the sampling, and realize efficient yet effective updating. The proposed framework is evaluated against state-of-the-art trackers on public dataset and showed promising results

    Optimization-Free Test-Time Adaptation for Cross-Person Activity Recognition

    Human Activity Recognition (HAR) models often suffer from performance degradation in real-world applications due to distribution shifts in activity patterns across individuals. Test-Time Adaptation (TTA) is an emerging learning paradigm that aims to utilize the test stream to adjust predictions in real-time inference, which has not been explored in HAR before. However, the high computational cost of optimization-based TTA algorithms makes it intractable to run on resource-constrained edge devices. In this paper, we propose an Optimization-Free Test-Time Adaptation (OFTTA) framework for sensor-based HAR. OFTTA adjusts the feature extractor and linear classifier simultaneously in an optimization-free manner. For the feature extractor, we propose Exponential DecayTest-time Normalization (EDTN) to replace the conventional batch normalization (CBN) layers. EDTN combines CBN and Test-time batch Normalization (TBN) to extract reliable features against domain shifts with TBN's influence decreasing exponentially in deeper layers. For the classifier, we adjust the prediction by computing the distance between the feature and the prototype, which is calculated by a maintained support set. In addition, the update of the support set is based on the pseudo label, which can benefit from reliable features extracted by EDTN. Extensive experiments on three public cross-person HAR datasets and two different TTA settings demonstrate that OFTTA outperforms the state-of-the-art TTA approaches in both classification performance and computational efficiency. Finally, we verify the superiority of our proposed OFTTA on edge devices, indicating possible deployment in real applications. Our code is available at \href{https://github.com/Claydon-Wang/OFTTA}{this https URL}.Comment: To be presented at UbiComp 2024; Accepted by Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies (IMWUT

    Gesture-Based Robot Path Shaping

    For many individuals, aging is frequently associated with diminished mobility and dexterity. Such decreases may be accompanied by a loss of independence, increased burden to caregivers, or institutionalization. It is foreseen that the ability to retain independence and quality of life as one ages will increasingly depend on environmental sensing and robotics which facilitate aging in place. The development of ubiquitous sensing strategies in the home underpins the promise of adaptive services, assistive robotics, and architectural design which would support a person\u27s ability to live independently as they age. Instrumentation (sensors and processing) which is capable of recognizing the actions and behavioral patterns of an individual is key to the effective component design in these areas. Recognition of user activity and the inference of user intention may be used to inform the action plans of support systems and service robotics within the environment. Automated activity recognition involves detection of events in a sensor data stream, conversion to a compact format, and classification as one of a known set of actions. Once classified, an action may be used to elicit a specific response from those systems designed to provide support to the user. It is this response that is the ultimate use of recognized activity. Hence, the activity may be considered as a command to the system. Extending this concept, a set of distinct activities in the form of hand and arm gestures may form the basis of a command interface for human-robot interaction. A gesture-based interface of this type promises an intuitive method for accessing computing and other assistive resources so as to promote rapid adoption by elderly, impaired, or otherwise unskilled users. This thesis includes a thorough survey of relevant work in the area of machine learning for activity and gesture recognition. Previous approaches are compared for their relative benefits and limitations. A novel approach is presented which utilizes user-generated feedback to rate the desirability of a robotic response to gesture. Poorly rated responses are altered so as to elicit improved ratings on subsequent observations. In this way, responses are honed toward increasing effectiveness. A clustering method based on the Growing Neural Gas (GNG) algorithm is used to create a topological map of reference nodes representing input gesture types. It is shown that learning of desired responses to gesture may be accelerated by exploiting well-rewarded actions associated with reference nodes in a local neighborhood of the growing neural gas topology. Significant variation in the user\u27s performance of gestures is interpreted as a new gesture for which the system must learn a desired response. A method for allowing the system to learn new gestures while retaining past training is also proposed and shown to be effective

    Human-robot interaction and computer-vision-based services for autonomous robots

    L'Aprenentatge per Imitació (IL), o Programació de robots per Demostració (PbD), abasta mètodes pels quals un robot aprèn noves habilitats a través de l'orientació humana i la imitació. La PbD s'inspira en la forma en què els éssers humans aprenen noves habilitats per imitació amb la finalitat de desenvolupar mètodes pels quals les noves tasques es poden transferir als robots. Aquesta tesi està motivada per la pregunta genèrica de "què imitar?", Que es refereix al problema de com extreure les característiques essencials d'una tasca. Amb aquesta finalitat, aquí adoptem la perspectiva del Reconeixement d'Accions (AR) per tal de permetre que el robot decideixi el què cal imitar o inferir en interactuar amb un ésser humà. L'enfoc proposat es basa en un mètode ben conegut que prové del processament del llenguatge natural: és a dir, la bossa de paraules (BoW). Aquest mètode s'aplica a grans bases de dades per tal d'obtenir un model entrenat. Encara que BoW és una tècnica d'aprenentatge de màquines que s'utilitza en diversos camps de la investigació, en la classificació d'accions per a l'aprenentatge en robots està lluny de ser acurada. D'altra banda, se centra en la classificació d'objectes i gestos en lloc d'accions. Per tant, en aquesta tesi es demostra que el mètode és adequat, en escenaris de classificació d'accions, per a la fusió d'informació de diferents fonts o de diferents assajos. Aquesta tesi fa tres contribucions: (1) es proposa un mètode general per fer front al reconeixement d'accions i per tant contribuir a l'aprenentatge per imitació; (2) la metodologia pot aplicar-se a grans bases de dades, que inclouen diferents modes de captura de les accions; i (3) el mètode s'aplica específicament en un projecte internacional d'innovació real anomenat Vinbot.El Aprendizaje por Imitación (IL), o Programación de robots por Demostración (PbD), abarca métodos por los cuales un robot aprende nuevas habilidades a través de la orientación humana y la imitación. La PbD se inspira en la forma en que los seres humanos aprenden nuevas habilidades por imitación con el fin de desarrollar métodos por los cuales las nuevas tareas se pueden transferir a los robots. Esta tesis está motivada por la pregunta genérica de "qué imitar?", que se refiere al problema de cómo extraer las características esenciales de una tarea. Con este fin, aquí adoptamos la perspectiva del Reconocimiento de Acciones (AR) con el fin de permitir que el robot decida lo que hay que imitar o inferir al interactuar con un ser humano. El enfoque propuesto se basa en un método bien conocido que proviene del procesamiento del lenguaje natural: es decir, la bolsa de palabras (BoW). Este método se aplica a grandes bases de datos con el fin de obtener un modelo entrenado. Aunque BoW es una técnica de aprendizaje de máquinas que se utiliza en diversos campos de la investigación, en la clasificación de acciones para el aprendizaje en robots está lejos de ser acurada. Además, se centra en la clasificación de objetos y gestos en lugar de acciones. Por lo tanto, en esta tesis se demuestra que el método es adecuado, en escenarios de clasificación de acciones, para la fusión de información de diferentes fuentes o de diferentes ensayos. Esta tesis hace tres contribuciones: (1) se propone un método general para hacer frente al reconocimiento de acciones y por lo tanto contribuir al aprendizaje por imitación; (2) la metodología puede aplicarse a grandes bases de datos, que incluyen diferentes modos de captura de las acciones; y (3) el método se aplica específicamente en un proyecto internacional de innovación real llamado Vinbot.Imitation Learning (IL), or robot Programming by Demonstration (PbD), covers methods by which a robot learns new skills through human guidance and imitation. PbD takes its inspiration from the way humans learn new skills by imitation in order to develop methods by which new tasks can be transmitted to robots. This thesis is motivated by the generic question of “what to imitate?” which concerns the problem of how to extract the essential features of a task. To this end, here we adopt Action Recognition (AR) perspective in order to allow the robot to decide what has to be imitated or inferred when interacting with a human kind. The proposed approach is based on a well-known method from natural language processing: namely, Bag of Words (BoW). This method is applied to large databases in order to obtain a trained model. Although BoW is a machine learning technique that is used in various fields of research, in action classification for robot learning it is far from accurate. Moreover, it focuses on the classification of objects and gestures rather than actions. Thus, in this thesis we show that the method is suitable in action classification scenarios for merging information from different sources or different trials. This thesis makes three contributions: (1) it proposes a general method for dealing with action recognition and thus to contribute to imitation learning; (2) the methodology can be applied to large databases which include different modes of action captures; and (3) the method is applied specifically in a real international innovation project called Vinbot