10 research outputs found

    Recognizing Hospital Care Activities with a Coat Pocket Worn Smartphone

    Get PDF
    In this work, we show how a smart-phone worn unobtrusively in a nurses coat pocket can be used to document the patient care activities performed during a regular morning routine. The main contribution is to show how, taking into account certain domain specific boundary conditions, a single sensor node worn in such an (from the sensing point of view) unfavorable location can still recognize complex, sometimes subtle activities. We evaluate our approach in a large real life dataset from day to day hospital operation. In total, 4 runs of patient care per day were collected for 14 days at a geriatric ward and annotated in high detail by following the performing nurses for the entire duration. This amounts to over 800 hours of sensor data including acceleration, gyroscope, compass, wifi and sound annotated with groundtruth at less than 1min resolution

    A database for fine grained activity detection of cooking activities

    Full text link

    Daily Routine Recognition for Hearing Aid Personalization

    Get PDF
    This work focuses on daily routine recognition to personalize the hearing aid (HA) configuration for each user. So far, there is only one public data set containing the data of two acceleration sensors taken under unconstrained real-life conditions of one person. Therefore, we create a realistic and extensive data set with seven subjects and a total length of 63449 min. For the recordings, the HA streams the acceleration and audio data to a mobile phone, where the user simultaneously annotates it. This builds the grounds for our comprehensive simulations, where we train a set of classifiers in an offline and online manner to analyze the model generalization abilities across subjects for high-level activities. To achieve this, we build a feature representation, which describes the recurring daily situations and environments well. For the offline classification, the deep neural network, multi-layer perceptron (MLP), and random forest (RF) trained in a person-dependent manner show the significantly best F-measure performance of 86.6%, 87.1%, and 87.3%, respectively. We confirm that for high-level activities the person-dependent model outperforms the independent one. In our online experiments, we personalize a model that was pretrained in a person-independent manner by daily updates. Thereby, multiple incremental learners and an online RF are tested. We demonstrate that MLP and RF improve the F-measure compared to the offline baselines

    The Opportunity challenge: A benchmark database for on-body sensor-based activity recognition

    Get PDF
    There is a growing interest on using ambient and wearable sensors for human activity recognition, fostered by several application domains and wider availability of sensing technologies. This has triggered increasing attention on the development of robust machine learning techniques that exploits multimodal sensor setups. However, unlike other applications, there are no established benchmarking problems for this field. As a matter of fact, methods are usually tested on custom datasets acquired in very specific experimental setups. Furthermore, data is seldom shared between different groups. Our goal is to address this issue by introducing a versatile human activity dataset recorded in a sensor-rich environment. This database was the basis of an open challenge on activity recognition. We report here the outcome of this challenge, as well as baseline performance using different classification techniques. We expect this benchmarking database will motivate other researchers to replicate and outperform the presented results, thus contributing to further advances in the state-of-the-art of activity recognition methods

    Early Abstraction of Inertial Sensor Data for Long-Term Deployments

    Get PDF
    Advances in microelectronics over the last decades have led to miniaturization of computing devices and sensors. A driving force to use these in various application scenarios is the desire to grasp physical phenomena from the environment, objects and living entities. We investigate sensing in two particularly challenging applications: one where small sensor modules are worn by people to detect their activities, and one where wirelessly networked sensors observe events over an area. This thesis takes a data-driven approach, focusing on human motion and vibrations caused by trains that are captured by accelerometer sensors as time series and shall be analyzed for characteristic patterns. For both, the acceleration sensor must be sampled at relatively high rates in order to capture the essence of the phenomena, and remain active for long stretches of time. The large amounts of gathered sensor data demand novel approaches that are able to swiftly process the data while guaranteeing accurate classification results. The following contributions are made in particular: * A data logger that would suit the requirements of long-term deployments is designed and evaluated. In a power profiling study both hardware components and firmware parameters are thoroughly tested, revealing that the sensor is able to log acceleration data at a sampling rate of 100 Hertz for up to 14 full days on a single battery charge. * A technique is proposed that swiftly and accurately abstracts an original signal with a set of linear segments, thus preserving its shape, while being twice as fast as a similar method. This allows for more efficient pattern matching, since for each pattern only a fraction of data points must be considered. A second study shows that this algorithm can perform data abstraction directly on a data logger with limited resources. * The railway monitoring scenario requires streaming vibration data to be analyzed for particular sparse and complex events directly on the sensor node, extracting relevant information such as train type or length from the shape of the vibration footprint. In a study conducted on real-world data, a set of efficient shape features is identified that facilitates train type prediction and length estimation with very high accuracies. * To achieve fast and accurate activity recognition for long-term bipolar patients monitoring scenarios, we present an approach that relies on the salience of motion patterns (motifs) that are characteristic for the target activity. These motifs are accumulated by using a symbolic abstraction that encodes the shape of the original signal. A large-scale study shows that a simple bag-of-words classifier trained with extracted motifs is on par with traditional approaches regarding the accuracy, while being much faster. * Some activities are hard to predict from acceleration data alone with the aforementioned approach. We argue that human-object interactions, captured as human motion and grasped objects through RFID, are an ideal supplement. A custom bracelet-like antenna to detect objects from up to 14 cm is proposed, along with a novel benchmark to evaluate such wearable setups. By aiming for wearable and wirelessly networked sensor systems, these contributions apply for particularly challenging applications that require long-term deployments of miniature sensors in general. They form the basis of a framework towards efficient event detection that relies heavily on early data abstraction and shape-based features for time series, while focusing less on the classification techniques

    Desarrollo y versatilidad del algoritmo de discretización Ameva.

    Get PDF
    Esta tesis presentada como un compendio de artículos, analiza el problema de reconocimiento de actividades y detección de caídas en dispositivos móviles donde el consumo de batería y la precisión del sistema son las principales áreas de investigación. Estos problemas se abordan mediante el establecimiento de un nuevo algoritmo de selección, discretización y clasificación basado en el núcleo del algoritmo Ameva. Gracias al proceso de discretización, se obtiene un sistema eficiente en términos de energía y precisión. El nuevo algoritmo de reconocimiento de actividad ha sido diseñado para ejecutarse en dispositivos móviles y smartphones, donde el consumo de energía es la característica más importante a tener en cuenta. Además, el algoritmo es eficiente en términos de precisión dando un resultado en tiempo real. Estas características se probaron tanto en una amplia gama de dispositivos móviles utilizando diferentes datasets del estado del arte en reconocimiento de actividades así como en escenarios reales como la competición EvAAL donde personas no relacionadas con el equipo de investigación llevaron un smartphone con el sistema desarrollado. En general, ha sido posible lograr un equilibrio entre la precisión y el consumo de energía. El algoritmo desarrollado se presentó en el track de reconocimiento de actividades de la competición EvAAL (Evaluation of Ambient Assisted Living Systems through Competitive Benchmarking), que tiene como objetivo principal la medición del rendimiento de hardware y software. El sistema fue capaz de detectar las actividades a través del conjunto establecido de puntos de referencia y métricas de evaluación. Se desarrolló para varias clases de actividades y obtiene una gran precisión cuando hay aproximadamente el dataset está balanceado en cuanto al número de ejemplos para cada clase durante la fase de entrenamiento. La solución logró el primer premio en la edición de 2012 y el tercer premio en la edición de 2013.This thesis, presented as a set of research papers, studies the problem of activity recognition and fall detection in mobile systems where the battery draining and the accuracy are the main areas of researching. These problems are tackled through the establishment of a new selection, discretization and classification algorithm based on the core of the algorithm Ameva. Thanks to the discretization process, it allows to get an efficient system in terms of energy and accuracy. The new activity recognition algorithm has been designed to be run in mobile systems, smartphones, where the energy consumption is the most important feature to take into account. Also, the algorithm had to be efficient in terms of accuracy giving an output in real time. These features were tested both in a wide range of mobile devices by applying usage data from recognized databases and in some real scenarios like the EvAAL competition where non-related people carried a smartphone with the developed system. In general, it had therefore been possible to achieve a trade-off between accuracy and energy consumption. The developed algorithm was presented in the Activity Recognition track of the competition EvAAL (Evaluation of Ambient Assisted Living Systems through Competitive Benchmarking), which has as main objective the measurement of hardware and software performance. The system was capable of detecting some activities through the established set of benchmarks and evaluation metrics. It has been developed for multi-class datasets and obtains a good accuracy when there is approximately the same number of examples for each class during the training phase. The solution achieved the first award in 2012 competition and the third award in 2013 edition

    Combining visual recognition and computational linguistics : linguistic knowledge for visual recognition and natural language descriptions of visual content

    Get PDF
    Extensive efforts are being made to improve visual recognition and semantic understanding of language. However, surprisingly little has been done to exploit the mutual benefits of combining both fields. In this thesis we show how the different fields of research can profit from each other. First, we scale recognition to 200 unseen object classes and show how to extract robust semantic relatedness from linguistic resources. Our novel approach extends zero-shot to few shot recognition and exploits unlabeled data by adopting label propagation for transfer learning. Second, we capture the high variability but low availability of composite activity videos by extracting the essential information from text descriptions. For this we recorded and annotated a corpus for fine-grained activity recognition. We show improvements in a supervised case but we are also able to recognize unseen composite activities. Third, we present a corpus of videos and aligned descriptions. We use it for grounding activity descriptions and for learning how to automatically generate natural language descriptions for a video. We show that our proposed approach is also applicable to image description and that it outperforms baselines and related work. In summary, this thesis presents a novel approach for automatic video description and shows the benefits of extracting linguistic knowledge for object and activity recognition as well as the advantage of visual recognition for understanding activity descriptions.Trotz umfangreicher Anstrengungen zur Verbesserung der die visuelle Erkennung und dem automatischen Verständnis von Sprache, ist bisher wenig getan worden, um diese beiden Forschungsbereiche zu kombinieren. In dieser Dissertation zeigen wir, wie beide voneinander profitieren können. Als erstes skalieren wir Objekterkennung zu 200 ungesehen Klassen und zeigen, wie man robust semantische Ähnlichkeiten von Sprachressourcen extrahiert. Unser neuer Ansatz kombiniert Transfer und halbüberwachten Lernverfahren und kann so Daten ohne Annotation ausnutzen und mit keinen als auch mit wenigen Trainingsbeispielen auskommen. Zweitens erfassen wir die hohe Variabilität aber geringe Verfügbarkeit von Videos mit zusammengesetzten Aktivitäten durch Extraktion der wesentlichen Informationen aus Textbeschreibungen. Wir verbessern überwachtes Training als auch die Erkennung von ungesehenen Aktivitäten. Drittens stellen wir einen parallelen Datensatz von Videos und Beschreibungen vor. Wir verwenden ihn für Grounding von Aktivitätsbeschreibungen und um die automatische Generierung natürlicher Sprache für ein Video zu erlernen. Wir zeigen, dass sich unsere Ansatz auch für Bildbeschreibung einsetzten lässt und das er bisherige Ansätze übertrifft. Zusammenfassend stellt die Dissertation einen neuen Ansatz zur automatische Videobeschreibung vor und zeigt die Vorteile von sprachbasierten Ähnlichkeitsmaßen für die Objekt- und Aktivitätserkennung als auch umgekehrt

    From head to toe:body movement for human-computer interaction

    Get PDF
    Our bodies are the medium through which we experience the world around us, so human-computer interaction can highly benefit from the richness of body movements and postures as an input modality. In recent years, the widespread availability of inertial measurement units and depth sensors led to the development of a plethora of applications for the body in human-computer interaction. However, the main focus of these works has been on using the upper body for explicit input. This thesis investigates the research space of full-body human-computer interaction through three propositions. The first proposition is that there is more to be inferred by natural users’ movements and postures, such as the quality of activities and psychological states. We develop this proposition in two domains. First, we explore how to support users in performing weight lifting activities. We propose a system that classifies different ways of performing the same activity; an object-oriented model-based framework for formally specifying activities; and a system that automatically extracts an activity model by demonstration. Second, we explore how to automatically capture nonverbal cues for affective computing. We developed a system that annotates motion and gaze data according to the Body Action and Posture coding system. We show that quality analysis can add another layer of information to activity recognition, and that systems that support the communication of quality information should strive to support how we implicitly communicate movement through nonverbal communication. Further, we argue that working at a higher level of abstraction, affect recognition systems can more directly translate findings from other areas into their algorithms, but also contribute new knowledge to these fields. The second proposition is that the lower limbs can provide an effective means of interacting with computers beyond assistive technology To address the problem of the dispersed literature on the topic, we conducted a comprehensive survey on the lower body in HCI, under the lenses of users, systems and interactions. To address the lack of a fundamental understanding of foot-based interactions, we conducted a series of studies that quantitatively characterises several aspects of foot-based interaction, including Fitts’s Law performance models, the effects of movement direction, foot dominance and visual feedback, and the overhead incurred by using the feet together with the hand. To enable all these studies, we developed a foot tracker based on a Kinect mounted under the desk. We show that the lower body can be used as a valuable complementary modality for computing input. Our third proposition is that by treating body movements as multiple modalities, rather than a single one, we can enable novel user experiences. We develop this proposition in the domain of 3D user interfaces, as it requires input with multiple degrees of freedom and offers a rich set of complex tasks. We propose an approach for tracking the whole body up close, by splitting the sensing of different body parts across multiple sensors. Our setup allows tracking gaze, head, mid-air gestures, multi-touch gestures, and foot movements. We investigate specific applications for multimodal combinations in the domain of 3DUI, specifically how gaze and mid-air gestures can be combined to improve selection and manipulation tasks; how the feet can support the canonical 3DUI tasks; and how a multimodal sensing platform can inspire new 3D game mechanics. We show that the combination of multiple modalities can lead to enhanced task performance, that offloading certain tasks to alternative modalities not only frees the hands, but also allows simultaneous control of multiple degrees of freedom, and that by sensing different modalities separately, we achieve a more detailed and precise full body tracking

    Articulated people detection and pose estimation in challenging real world environments

    Get PDF
    In this thesis we are interested in the problem of articulated people detection and pose estimation being key ingredients towards understanding visual scenes containing people. First, we investigate how statistical 3D human shape models from computer graphics can be leveraged to ease training data generation. Second, we develop expressive models for 2D single- and multi-person pose estimation. Third, we introduce a novel human pose estimation benchmark that makes a significant advance in terms of diversity and difficulty. Thorough experimental evaluation on standard benchmarks demonstrates significant improvements due to the proposed data augmentation techniques and novel body models, while detailed performance analysis of competing approaches on our novel benchmark allows to identify the most promising directions of improvement.In dieser Arbeit untersuchen wir das Problem der artikulierten Detektion und Posenschätzung von Personen als Schlüsselkomponenten des Verstehens von visuellen Szenen mit Personen. Obwohl es umfangreiche Bemühungen gibt, die Lösung dieser Probleme anzugehen, haben wir drei vielversprechende Herangehensweisen ermittelt, die unserer Meinung nach bisher nicht ausreichend beachtet wurden. Erstens untersuchen wir, wie statistische 3 D Modelle des menschlichen Umrisses, die aus der Computergrafik stammen, wirksam eingesetzt werden können, um die Generierung von Trainingsdaten zu erleichtern. Wir schlagen eine Reihe von Techniken zur automatischen Datengenerierung vor, die eine direkte Repräsentation relevanter Variationen in den Trainingsdaten erlauben. Indem wir Stichproben aus der zu Grunde liegenden Verteilung des menschlichen Umrisses und aus einem großen Datensatz von menschlichen Posen ziehen, erzeugen wir eine neue für unsere Aufgabe relevante Auswahl mit regulierbaren Variationen von Form und Posen. Darüber hinaus verbessern wir das neueste 3 D Modell des menschlichen Umrisses selbst, indem wir es aus einem großen handelsüblichen Datensatz von 3 D Körpern neu aufbauen. Zweitens entwickeln wir ausdrucksstarke räumliche Modelle und ErscheinungsbildModelle für die 2 D Posenschätzung einzelner und mehrerer Personen. Wir schlagen ein ausdrucksstarkes Einzelperson-Modell vor, das Teilabhängigkeiten höherer Ordnung einbezieht, aber dennoch effizient bleibt. Wir verstärken dieses Modell durch verschiedene Arten von starken Erscheinungsbild-Repräsentationen, um die Körperteilhypothesen erheblich zu verbessern. Schließlich schlagen wir ein ausdruckstarkes Modell zur gemeinsamen Posenschätzung mehrerer Personen vor. Dazu entwickeln wir starke Deep Learning-basierte Körperteildetektoren und ein ausdrucksstarkes voll verbundenes räumliches Modell. Der vorgeschlagene Ansatz behandelt die Posenschätzung mehrerer Personen als ein Problem der gemeinsamen Aufteilung und Annotierung eines Satzes von Körperteilhypothesen: er erschließt die Anzahl von Personen in einer Szene, identifiziert verdeckte Körperteile und unterscheidet eindeutig Körperteile von Personen, die sich nahe beieinander befinden. Drittens führen wir eine gründliche Bewertung und Performanzanalyse führender Methoden der menschlichen Posenschätzung und Aktivitätserkennung durch. Dazu stellen wir einen neuen Benchmark vor, der einen bedeutenden Fortschritt bezüglich Diversität und Schwierigkeit im Vergleich zu bisherigen Datensätzen mit sich bringt und über 40 . 000 annotierte Körperposen und mehr als 1 . 5 Millionen Einzelbilder enthält. Darüber hinaus stellen wir einen reichhaltigen Satz an Annotierungen zur Verfügung, die zu einer detaillierten Analyse konkurrierender Herangehensweisen benutzt werden, wodurch wir Erkenntnisse zu Erfolg und Mißerfolg dieser Methoden erhalten. Zusammengefasst präsentiert diese Arbeit einen neuen Ansatz zur artikulierten Detektion und Posenschätzung von Personen. Eine gründliche experimentelle Evaluation auf Standard-Benchmarkdatensätzen zeigt signifikante Verbesserungen durch die vorgeschlagenen Datenverstärkungstechniken und neuen Körpermodelle, während eine detaillierte Performanzanalyse konkurrierender Herangehensweisen auf unserem neu vorgestellten großen Benchmark uns erlaubt, die vielversprechendsten Bereiche für Verbesserungen zu erkennen

    An Analysis of Sensor‐oriented vs. Model‐based Activity Recognition

    No full text
    corecore