8,069 research outputs found

    Virtual Reality Games for Motor Rehabilitation

    Get PDF
    This paper presents a fuzzy logic based method to track user satisfaction without the need for devices to monitor users physiological conditions. User satisfaction is the key to any product’s acceptance; computer applications and video games provide a unique opportunity to provide a tailored environment for each user to better suit their needs. We have implemented a non-adaptive fuzzy logic model of emotion, based on the emotional component of the Fuzzy Logic Adaptive Model of Emotion (FLAME) proposed by El-Nasr, to estimate player emotion in UnrealTournament 2004. In this paper we describe the implementation of this system and present the results of one of several play tests. Our research contradicts the current literature that suggests physiological measurements are needed. We show that it is possible to use a software only method to estimate user emotion

    Social media and audio visual learning in the study of human anatomy. Study of a group of students on Facebook and YouTube

    Get PDF
    The development of social network sites (SNS) has been one of the most influential phenomena of digital technology in recent years. According to a survey by the Pew Research Center on the use of SNS in the United States (Smith, 2013), two-thirds of adults use tools such as Facebook, Twitter, Myspac and Linkedin and 60% of mobile applications used by smartphone owners are linked to the social network. The growing interest of Internet users for SNS is also confirmed by one of the latest Nielsen studies (2011). According to this study, Internet users in Europe spend more and more time on social networks and blogs: for example, Italian users spend 31% of their total time on the Internet by visiting these categories of services. Among the various SNS, Facebook is now the most popular, with over 900 million users, of which over 500 million access via mobile products (Facebook, 2012). " Social Media" refers to a wide range of applications that allow users to create, share, comment and discuss a multitude of digital content. Social media is considered "dynamic", "interactive", "democratic", "people-centred", "volatile", "social" and "adaptive" (Manca & Ranieri, 2016b). Another aspect of SM that is often overlooked is its ability to transform teaching/learning into a more social, open and collaborative activity. Researchers have used many theories/models to determine the feasibility of using social media for educational purposes. With the exponential growth of social media and the ease of information flow, new horizons are opening up, as the technological progress that allows the teacher to create a student/group on a social media platform and motivating learners to ask questions at any time to clarify their doubts. In addition, teachers can regularly provide parents has been with immediate feedback about the students’ progress. The use of social media, especially Facebook and Youtube, has boomed in the field of education. Thanks to Youtube videos, students can learn through sight and hearing by fixing and memorizing the key concepts of what has been acquired. Multisensory learning, as the name suggests, is the process of learning a new topic through the use of two or more senses. Sensory integration occurs in the central nervous system where complex interactions such as coordination, attention, emotions and memory are processed to give a meaningful response. This can include the visual and auditory combination. By activating brain regions associated with hearing and vision, they indicate a direct relationship between knowledge and sensory mechanisms in the brain. In this study I have examined with particular attention the relationship between technologies, digital media and the learning process. In the first part of the work the theoretical frame of reference is outlined for reconstructing the transition from a monosensory society to a multisensory society increasingly dominated by digital artifacts. The second part focuses on the results obtained by answering the question: "How do social media affect the visual and auditory learning of anatomy?" A multisensory integration model has been developed and there is a clear evidence that this model, based on social media Facebook and Youtube, has improved students’ performance in anatomy. Perceptual learning is a good testing ground for the multisensory as it is typically very slow, it requires many days of training and has been shown to be mediated by early visual areas of the brain traditionally considered to be highly unimodal. An auditory-visual study has been chosen because visual motion stimuli are typically accompanied by sounds and because there is anatomical evidence in animals and human neurophysiology studies indicating that hearing-sight interactions affect visual processing in the primary visual cortex as well. This study aims at the following: - investigating the role of the visual element in the study of anatomy thanks to the use of Youtube; - exploring how Youtube and Facebook technology have played a role in enhancing students' learning skills - shedding new light on the importance of Youtube and Facebook as fundamental teaching tools and as a resource for both teachers and students. This empirical research shows the positive results achieved using SNS and multisensory models in both higher and university education. not

    Deep audio-visual speech recognition

    Get PDF
    Decades of research in acoustic speech recognition have led to systems that we use in our everyday life. However, even the most advanced speech recognition systems fail in the presence of noise. The degraded performance can be compensated by introducing visual speech information. However, Visual Speech Recognition (VSR) in naturalistic conditions is very challenging, in part due to the lack of architectures and annotations. This thesis contributes towards the problem of Audio-Visual Speech Recognition (AVSR) from different aspects. Firstly, we develop AVSR models for isolated words. In contrast to previous state-of-the-art methods that consists of a two-step approach, feature extraction and recognition, we present an End-to-End (E2E) approach inside a deep neural network, and this has led to a significant improvement in audio-only, visual-only and audio-visual experiments. We further replace Bi-directional Gated Recurrent Unit (BGRU) with Temporal Convolutional Networks (TCN) to greatly simplify the training procedure. Secondly, we extend our AVSR model for continuous speech by presenting a hybrid Connectionist Temporal Classification (CTC)/Attention model, that can be trained in an end-to-end manner. We then propose the addition of prediction-based auxiliary tasks to a VSR model and highlight the importance of hyper-parameter optimisation and appropriate data augmentations. Next, we present a self-supervised framework, Learning visual speech Representations from Audio via self-supervision (LiRA). Specifically, we train a ResNet+Conformer model to predict acoustic features from unlabelled visual speech, and find that this pre-trained model can be leveraged towards word-level and sentence-level lip-reading. We also investigate the Lombard effect influence in an end-to-end AVSR system, which is the first work using end-to-end deep architectures and presents results on unseen speakers. We show that even if a relatively small amount of Lombard speech is added to the training set then the performance in a real scenario, where noisy Lombard speech is present, can be significantly improved. Lastly, we propose a detection method against adversarial examples in an AVSR system, where the strong correlation between audio and visual streams is leveraged. The synchronisation confidence score is leveraged as a proxy for audio-visual correlation and based on it, we can detect adversarial attacks. We apply recent adversarial attacks on two AVSR models and the experimental results demonstrate that the proposed approach is an effective way for detecting such attacks.Open Acces

    A Novel Driver Distraction Behavior Detection Based on Self-Supervised Learning Framework with Masked Image Modeling

    Full text link
    Driver distraction causes a significant number of traffic accidents every year, resulting in economic losses and casualties. Currently, the level of automation in commercial vehicles is far from completely unmanned, and drivers still play an important role in operating and controlling the vehicle. Therefore, driver distraction behavior detection is crucial for road safety. At present, driver distraction detection primarily relies on traditional Convolutional Neural Networks (CNN) and supervised learning methods. However, there are still challenges such as the high cost of labeled datasets, limited ability to capture high-level semantic information, and weak generalization performance. In order to solve these problems, this paper proposes a new self-supervised learning method based on masked image modeling for driver distraction behavior detection. Firstly, a self-supervised learning framework for masked image modeling (MIM) is introduced to solve the serious human and material consumption issues caused by dataset labeling. Secondly, the Swin Transformer is employed as an encoder. Performance is enhanced by reconfiguring the Swin Transformer block and adjusting the distribution of the number of window multi-head self-attention (W-MSA) and shifted window multi-head self-attention (SW-MSA) detection heads across all stages, which leads to model more lightening. Finally, various data augmentation strategies are used along with the best random masking strategy to strengthen the model's recognition and generalization ability. Test results on a large-scale driver distraction behavior dataset show that the self-supervised learning method proposed in this paper achieves an accuracy of 99.60%, approximating the excellent performance of advanced supervised learning methods

    Understanding Human Actions in Video

    Full text link
    Understanding human behavior is crucial for any autonomous system which interacts with humans. For example, assistive robots need to know when a person is signaling for help, and autonomous vehicles need to know when a person is waiting to cross the street. However, identifying human actions in video is a challenging and unsolved problem. In this work, we address several of the key challenges in human action recognition. To enable better representations of video sequences, we develop novel deep learning architectures which improve representations both at the level of instantaneous motion as well as at the level of long-term context. In addition, to reduce reliance on fixed action vocabularies, we develop a compositional representation of actions which allows novel action descriptions to be represented as a sequence of sub-actions. Finally, we address the issue of data collection for human action understanding by creating a large-scale video dataset, consisting of 70 million videos collected from internet video sharing sites and their matched descriptions. We demonstrate that these contributions improve the generalization performance of human action recognition systems on several benchmark datasets.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/162887/1/stroud_1.pd

    Transfer learning: bridging the gap between deep learning and domain-specific text mining

    Get PDF
    Inspired by the success of deep learning techniques in Natural Language Processing (NLP), this dissertation tackles the domain-specific text mining problems for which the generic deep learning approaches would fail. More specifically, the domain-specific problems are: (1) success prediction in crowdfunding, (2) variants identification in biomedical literature, and (3) text data augmentation for domains with low-resources. In the first part, transfer learning in a multimodal perspective is utilized to facilitate solving the project success prediction on the crowdfunding application. Even though the information in a project profile can be of different modalities such as text, images, and metadata, most existing prediction approaches leverage only the text modality. It is promising to utilize the visual images in project profiles to find out how images could contribute to the success prediction. An advanced neural network scheme is designed and evaluated combining information learned from different modalities for project success prediction. In the second part, transfer learning is combined with deep learning techniques to solve genomic variants Named Entity Recognition (NER) problems in biomedical literature. Most of the advanced generic NER algorithms can fail due to the restricted training corpus. However, those generic deep learning algorithms are capable of learning from a canonical corpus, without any effort on feature engineering. This work aims to build an end-to-end deep learning approach to transfer the domain-specific knowledge to those advanced generic NER algorithms, addressing the challenges in low-resource training and requiring neither hand-crafted features nor post-processing rules. For the last part, transfer learning with knowledge distillation and active learning are utilized to solve text augmentation for domains with low-resources. Most of the recent text augmentation methods heavily rely on large external resources. This work is dedicates to solving the text augmentation problem adaptively and consistently with minimal resources for token-level tasks like NER. The solution can also assure the reliability of machine labels for noisy data and can enhance training consistency with noisy labels. All the works are evaluated on different domain-specific benchmarks, respectively. Experimental results demonstrate the effectiveness of those proposed methods. The advantages also indicate promising potential for transfer learning in domain-specific applications
    • …
    corecore