519 research outputs found

    Adabook and Multibook: adaptive boosting with chance correction

    Get PDF
    There has been considerable interest in boosting and bagging, including the combination of the adaptive techniques of AdaBoost with the random selection with replacement techniques of Bagging. At the same time there has been a revisiting of the way we evaluate, with chance-corrected measures like Kappa, Informedness, Correlation or ROC AUC being advocated. This leads to the question of whether learning algorithms can do better by optimizing an appropriate chance corrected measure. Indeed, it is possible for a weak learner to optimize Accuracy to the detriment of the more reaslistic chance-corrected measures, and when this happens the booster can give up too early. This phenomenon is known to occur with conventional Accuracy-based AdaBoost, and the MultiBoost algorithm has been developed to overcome such problems using restart techniques based on bagging. This paper thus complements the theoretical work showing the necessity of using chance-corrected measures for evaluation, with empirical work showing how use of a chance-corrected measure can improve boosting. We show that the early surrender problem occurs in MultiBoost too, in multiclass situations, so that chance-corrected AdaBook and Multibook can beat standard Multiboost or AdaBoost, and we further identify which chance-corrected measures to use when

    Hand gesture recognition through capacitive sensing : a thesis presented in partial fulfilment of the requirements for the degree of Master of Engineering in Electronics & Computer Engineering at Massey University, School of Food and Advanced Technology (SF&AT), Auckland, New Zealand

    Get PDF
    Figures 1.1, 1.2, 1.3, 2.1, 2.3 & 2.4 are re-used with permission. Figure 2.2 (=Smith, 1996 Fig 1) ©1996 by International Business Machines Corporation was removed.This thesis investigated capacitive sensing-based hand gesture recognition by developing and validating through custom built hardware. We attempted to discover if massed arrays of capacitance sensors can produce a robust system capable of simple hand gesture detection and recognition. The first stage of this research was to build the hardware that performed capacitance sensing. This hardware needs to be sensitive enough to capture minor variations in capacitance values, while also reducing stray capacitance to their minimum. The hardware designed in this stage formed the basis of all the data captured and utilised for subsequent training and testing of machine learning based classifiers. The second stage of this system used mass arrays of capacitance sensor pads to capture frames of hand gestures in the form of low-resolution 2D images. The raw data was then processed to account for random variations and noise present naturally in the surrounding environment. Five different gestures were captured from several test participants and used to train, validate and test the classifiers. Different methods were explored in the recognition and classification stage: initially, simple probabilistic classifiers were used; afterwards, neural networks were used. Two types of neural networks are explored, namely Multilayer Perceptron (MLP) and Convolutional Neural Network (CNN), which are capable of achieving upwards of 92.34 % classification accuracy

    Sensing via signal analysis, analytics, and cyberbiometric patterns

    Get PDF
    Includes bibliographical references.2022 Fall.Internet-connected, or Internet of Things (IoT), sensor technologies have been increasingly incorporated into everyday technology and processes. Their functions are situationally dependent and have been used for vital recordings such as electrocardiograms, gait analysis and step counting, fall detection, and environmental analysis. For instance, environmental sensors, which exist through various technologies, are used to monitor numerous domains, including but not limited to pollution, water quality, and the presence of biota, among others. Past research into IoT sensors has varied depending on the technology. For instance, previous environmental gas sensor IoT research has focused on (i) the development of these sensors for increased sensitivity and increased lifetimes, (ii) integration of these sensors into sensor arrays to combat cross-sensitivity and background interferences, and (iii) sensor network development, including communication between widely dispersed sensors in a large-scale environment. IoT inertial measurement units (IMU's), such as accelerometers and gyroscopes, have been previously researched for gait analysis, movement detection, and gesture recognition, which are often related to human-computer interface (HCI). Methods of IoT Device feature-based pattern recognition for machine learning (ML) and artificial intelligence (AI) are frequently investigated as well, including primitive classification methods and deep learning techniques. The result of this research gives insight into each of these topics individually, i.e., using a specific sensor technology to detect carbon monoxide in an indoor environment, or using accelerometer readings for gesture recognition. Less research has been performed on analyzing the systems aspects of the IoT sensors themselves. However, an important part of attaining overall situational awareness is authenticating the surroundings, which in the case of IoT means the individual sensors, humans interacting with the sensors, and other elements of the surroundings. There is a clear opportunity for the systematic evaluation of the identity and performance of an IoT sensor/sensor array within a system that is to be utilized for "full situational awareness". This awareness may include (i) non-invasive diagnostics (i.e., what is occurring inside the body), (ii) exposure analysis (i.e., what has gone into the body through both respiratory and eating/drinking pathways), and (iii) potential risk of exposure (i.e., what the body is exposed to environmentally). Simultaneously, the system has the capability to harbor security measures through the same situational assessment in the form of multiple levels of biometrics. Through the interconnective abilities of the IoT sensors, it is possible to integrate these capabilities into one portable, hand-held system. The system will exist within a "magic wand", which will be used to collect the various data needed to assess the environment of the user, both inside and outside of their bodies. The device can also be used to authenticate the user, as well as the system components, to discover potential deception within the system. This research introduces levels of biometrics for various scenarios through the investigation of challenge-based biometrics; that is, biometrics based upon how the sensor, user, or subject of study responds to a challenge. These will be applied to multiple facets surrounding "situational awareness" for living beings, non-human beings, and non-living items or objects (which we have termed "abiometrics"). Gesture recognition for intent of sensing was first investigated as a means of deliberate activation of sensors/sensor arrays for situational awareness while providing a level of user authentication through biometrics. Equine gait analysis was examined next, and the level of injury in the lame limbs of the horse was quantitatively measured and classified using data from IoT sensors. Finally, a method of evaluating the identity and health of a sensor/sensory array was examined through different challenges to their environments

    Classification of Alzheimers Disease with Deep Learning on Eye-tracking Data

    Full text link
    Existing research has shown the potential of classifying Alzheimers Disease (AD) from eye-tracking (ET) data with classifiers that rely on task-specific engineered features. In this paper, we investigate whether we can improve on existing results by using a Deep-Learning classifier trained end-to-end on raw ET data. This classifier (VTNet) uses a GRU and a CNN in parallel to leverage both visual (V) and temporal (T) representations of ET data and was previously used to detect user confusion while processing visual displays. A main challenge in applying VTNet to our target AD classification task is that the available ET data sequences are much longer than those used in the previous confusion detection task, pushing the limits of what is manageable by LSTM-based models. We discuss how we address this challenge and show that VTNet outperforms the state-of-the-art approaches in AD classification, providing encouraging evidence on the generality of this model to make predictions from ET data.Comment: ICMI 2023 long pape

    Audio-coupled video content understanding of unconstrained video sequences

    Get PDF
    Unconstrained video understanding is a difficult task. The main aim of this thesis is to recognise the nature of objects, activities and environment in a given video clip using both audio and video information. Traditionally, audio and video information has not been applied together for solving such complex task, and for the first time we propose, develop, implement and test a new framework of multi-modal (audio and video) data analysis for context understanding and labelling of unconstrained videos. The framework relies on feature selection techniques and introduces a novel algorithm (PCFS) that is faster than the well-established SFFS algorithm. We use the framework for studying the benefits of combining audio and video information in a number of different problems. We begin by developing two independent content recognition modules. The first one is based on image sequence analysis alone, and uses a range of colour, shape, texture and statistical features from image regions with a trained classifier to recognise the identity of objects, activities and environment present. The second module uses audio information only, and recognises activities and environment. Both of these approaches are preceded by detailed pre-processing to ensure that correct video segments containing both audio and video content are present, and that the developed system can be made robust to changes in camera movement, illumination, random object behaviour etc. For both audio and video analysis, we use a hierarchical approach of multi-stage classification such that difficult classification tasks can be decomposed into simpler and smaller tasks. When combining both modalities, we compare fusion techniques at different levels of integration and propose a novel algorithm that combines advantages of both feature and decision-level fusion. The analysis is evaluated on a large amount of test data comprising unconstrained videos collected for this work. We finally, propose a decision correction algorithm which shows that further steps towards combining multi-modal classification information effectively with semantic knowledge generates the best possible results

    Managing heterogeneous cues in social contexts. A holistic approach for social interactions analysis

    Get PDF
    Une interaction sociale désigne toute action réciproque entre deux ou plusieurs individus, au cours de laquelle des informations sont partagées sans "médiation technologique". Cette interaction, importante dans la socialisation de l'individu et les compétences qu'il acquiert au cours de sa vie, constitue un objet d'étude pour différentes disciplines (sociologie, psychologie, médecine, etc.). Dans le contexte de tests et d'études observationnelles, de multiples mécanismes sont utilisés pour étudier ces interactions tels que les questionnaires, l'observation directe des événements et leur analyse par des opérateurs humains, ou l'observation et l'analyse à posteriori des événements enregistrés par des spécialistes (psychologues, sociologues, médecins, etc.). Cependant, de tels mécanismes sont coûteux en termes de temps de traitement, ils nécessitent un niveau élevé d'attention pour analyser simultanément plusieurs descripteurs, ils sont dépendants de l'opérateur (subjectivité de l'analyse) et ne peuvent viser qu'une facette de l'interaction. Pour faire face aux problèmes susmentionnés, il peut donc s'avérer utile d'automatiser le processus d'analyse de l'interaction sociale. Il s'agit donc de combler le fossé entre les processus d'analyse des interactions sociales basés sur l'homme et ceux basés sur la machine. Nous proposons donc une approche holistique qui intègre des signaux hétérogènes multimodaux et des informations contextuelles (données "exogènes" complémentaires) de manière dynamique et optionnelle en fonction de leur disponibilité ou non. Une telle approche permet l'analyse de plusieurs "signaux" en parallèle (où les humains ne peuvent se concentrer que sur un seul). Cette analyse peut être encore enrichie à partir de données liées au contexte de la scène (lieu, date, type de musique, description de l'événement, etc.) ou liées aux individus (nom, âge, sexe, données extraites de leurs réseaux sociaux, etc.) Les informations contextuelles enrichissent la modélisation des métadonnées extraites et leur donnent une dimension plus "sémantique". La gestion de cette hétérogénéité est une étape essentielle pour la mise en œuvre d'une approche holistique. L'automatisation de la capture et de l'observation " in vivo " sans scénarios prédéfinis lève des verrous liés à i) la protection de la vie privée et à la sécurité ; ii) l'hétérogénéité des données ; et iii) leur volume. Par conséquent, dans le cadre de l'approche holistique, nous proposons (1) un modèle de données complet préservant la vie privée qui garantit le découplage entre les méthodes d'extraction des métadonnées et d'analyse des interactions sociales ; (2) une méthode géométrique non intrusive de détection par contact visuel ; et (3) un modèle profond de classification des repas français pour extraire les informations du contenu vidéo. L'approche proposée gère des signaux hétérogènes provenant de différentes modalités en tant que sources multicouches (signaux visuels, signaux vocaux, informations contextuelles) à différentes échelles de temps et différentes combinaisons entre les couches (représentation des signaux sous forme de séries temporelles). L'approche a été conçue pour fonctionner sans dispositifs intrusifs, afin d'assurer la capture de comportements réels et de réaliser l'observation naturaliste. Nous avons déployé l'approche proposée sur la plateforme OVALIE qui vise à étudier les comportements alimentaires dans différents contextes de la vie réelle et qui est située à l'Université Toulouse-Jean Jaurès, en France.Social interaction refers to any interaction between two or more individuals, in which information sharing is carried out without any mediating technology. This interaction is a significant part of individual socialization and experience gaining throughout one's lifetime. It is interesting for different disciplines (sociology, psychology, medicine, etc.). In the context of testing and observational studies, multiple mechanisms are used to study these interactions such as questionnaires, direct observation and analysis of events by human operators, or a posteriori observation and analysis of recorded events by specialists (psychologists, sociologists, doctors, etc.). However, such mechanisms are expensive in terms of processing time. They require a high level of attention to analyzing several cues simultaneously. They are dependent on the operator (subjectivity of the analysis) and can only target one side of the interaction. In order to face the aforementioned issues, the need to automatize the social interaction analysis process is highlighted. So, it is a question of bridging the gap between human-based and machine-based social interaction analysis processes. Therefore, we propose a holistic approach that integrates multimodal heterogeneous cues and contextual information (complementary "exogenous" data) dynamically and optionally according to their availability or not. Such an approach allows the analysis of multi "signals" in parallel (where humans are able only to focus on one). This analysis can be further enriched from data related to the context of the scene (location, date, type of music, event description, etc.) or related to individuals (name, age, gender, data extracted from their social networks, etc.). The contextual information enriches the modeling of extracted metadata and gives them a more "semantic" dimension. Managing this heterogeneity is an essential step for implementing a holistic approach. The automation of " in vivo " capturing and observation using non-intrusive devices without predefined scenarios introduces various issues that are related to data (i) privacy and security; (ii) heterogeneity; and (iii) volume. Hence, within the holistic approach we propose (1) a privacy-preserving comprehensive data model that grants decoupling between metadata extraction and social interaction analysis methods; (2) geometric non-intrusive eye contact detection method; and (3) French food classification deep model to extract information from the video content. The proposed approach manages heterogeneous cues coming from different modalities as multi-layer sources (visual signals, voice signals, contextual information) at different time scales and different combinations between layers (representation of the cues like time series). The approach has been designed to operate without intrusive devices, in order to ensure the capture of real behaviors and achieve the naturalistic observation. We have deployed the proposed approach on OVALIE platform which aims to study eating behaviors in different real-life contexts and it is located in University Toulouse-Jean Jaurès, France

    Recognizing Speech in a Novel Accent: The Motor Theory of Speech Perception Reframed

    Get PDF
    The motor theory of speech perception holds that we perceive the speech of another in terms of a motor representation of that speech. However, when we have learned to recognize a foreign accent, it seems plausible that recognition of a word rarely involves reconstruction of the speech gestures of the speaker rather than the listener. To better assess the motor theory and this observation, we proceed in three stages. Part 1 places the motor theory of speech perception in a larger framework based on our earlier models of the adaptive formation of mirror neurons for grasping, and for viewing extensions of that mirror system as part of a larger system for neuro-linguistic processing, augmented by the present consideration of recognizing speech in a novel accent. Part 2 then offers a novel computational model of how a listener comes to understand the speech of someone speaking the listener's native language with a foreign accent. The core tenet of the model is that the listener uses hypotheses about the word the speaker is currently uttering to update probabilities linking the sound produced by the speaker to phonemes in the native language repertoire of the listener. This, on average, improves the recognition of later words. This model is neutral regarding the nature of the representations it uses (motor vs. auditory). It serve as a reference point for the discussion in Part 3, which proposes a dual-stream neuro-linguistic architecture to revisits claims for and against the motor theory of speech perception and the relevance of mirror neurons, and extracts some implications for the reframing of the motor theory
    • …
    corecore