768 research outputs found

    Bag of Deep Features for Instructor Activity Recognition in Lecture Room

    Get PDF
    This paper has been presented at : 25th International Conference on MultiMedia Modeling (MMM2019)This research aims to explore contextual visual information in the lecture room, to assist an instructor to articulate the effectiveness of the delivered lecture. The objective is to enable a self-evaluation mechanism for the instructor to improve lecture productivity by understanding their activities. Teacher’s effectiveness has a remarkable impact on uplifting students performance to make them succeed academically and professionally. Therefore, the process of lecture evaluation can significantly contribute to improve academic quality and governance. In this paper, we propose a vision-based framework to recognize the activities of the instructor for self-evaluation of the delivered lectures. The proposed approach uses motion templates of instructor activities and describes them through a Bag-of-Deep features (BoDF) representation. Deep spatio-temporal features extracted from motion templates are utilized to compile a visual vocabulary. The visual vocabulary for instructor activity recognition is quantized to optimize the learning model. A Support Vector Machine classifier is used to generate the model and predict the instructor activities. We evaluated the proposed scheme on a self-captured lecture room dataset, IAVID-1. Eight instructor activities: pointing towards the student, pointing towards board or screen, idle, interacting, sitting, walking, using a mobile phone and using a laptop, are recognized with an 85.41% accuracy. As a result, the proposed framework enables instructor activity recognition without human intervention.Sergio A Velastin has received funding from the Universidad Carlos III de Madrid, the European Union’s Seventh Framework Programme for research, technological development and demonstration under grant agreement no 600371, el Ministerio de Economía, Industria y Competitividad (COFUND2014-51509) el Ministerio de Educación, Cultura y Deporte (CEI-15-17) and Banco Santander

    Deep temporal motion descriptor (DTMD) for human action recognition

    Get PDF
    Spatiotemporal features have significant importance in human action recognition, as they provide the actor's shape and motion characteristics specific to each action class. This paper presents a new deep spatiotemporal human action representation, \Deep Temporal Motion Descriptor (DTMD)", which shares the attributes of holistic and deep learned features. To generate the DTMD descriptor, the actor's silhouettes are gathered into single motion templates through applying motion history images. These motion templates capture the spatiotemporal movements of the actor and compactly represents the human actions using a single 2D template. Then, deep convolutional neural networks are used to compute discriminative deep features from motion history templates to produce DTMD. Later, DTMD is used for learn a model to recognise human actions using a softmax classifier. The advantage of DTMD comes from (i) DTMD is automatically learned from videos and contains higher dimensional discriminative spatiotemporal representation as compared to handcrafted features; (ii) DTMD reduces the computational complexity of human activity recognition as all the video frames are compactly represented as a single motion template; (iii) DTMD works e ectively for single and multiview action recognition. We conducted experiments on three challenging datasets: MuHAVI-Uncut, iXMAS, and IAVID-1. The experimental findings reveal that DTMD outperforms previous methods and achieves the highest action prediction rate on the MuHAVI-Uncut datase

    Instructor activity recognition through deep spatiotemporal features and feedforward Extreme Learning Machines

    Get PDF
    Human action recognition has the potential to predict the activities of an instructor within the lecture room. Evaluation of lecture delivery can help teachers analyze shortcomings and plan lectures more effectively. However, manual or peer evaluation is time-consuming, tedious and sometimes it is difficult to remember all the details of the lecture. Therefore, automation of lecture delivery evaluation significantly improves teaching style. In this paper, we propose a feedforward learning model for instructor's activity recognition in the lecture room. The proposed scheme represents a video sequence in the form of a single frame to capture the motion profile of the instructor by observing the spatiotemporal relation within the video frames. First, we segment the instructor silhouettes from input videos using graph-cut segmentation and generate a motion profile. These motion profiles are centered by obtaining the largest connected components and normalized. Then, these motion profiles are represented in the form of feature maps by a deep convolutional neural network. Then, an extreme learning machine (ELM) classifier is trained over the obtained feature representations to recognize eight different activities of the instructor within the classroom. For the evaluation of the proposed method, we created an instructor activity video (IAVID-1) dataset and compared our method against different state-of-the-art activity recognition methods. Furthermore, two standard datasets, MuHAVI and IXMAS, were also considered for the evaluation of the proposed scheme.We gratefully acknowledge the support of NVIDIA Corporation with the donation of the Titan X Pascal GPU used for research work carried in Centre for Computer Vision Research (C2VR) at University of Engineering and Technology Taxila, Pakistan. Sergio A Velastin acknowledges funding by the Universidad Carlos III de Madrid, the European Union’s Seventh Framework Programme for Research, Technological Development and Demonstration under grant agreement no. 600371, el Ministerio de Economía y Competitividad (COFUND2013-51509), and Banco Santander.We are also very thankful to participants, faculty, and postgraduate students of Computer Engineering Department who took part in the data acquisition phase.Without their consent, this work was not possible

    COMPUTATIONAL ANALYSIS OF KNOWLEDGE SHARING IN COLLABORATIVE DISTANCE LEARNING

    Get PDF
    The rapid advance of distance learning and networking technology has enabled universities and corporations to reach out and educate students across time and space barriers. This technology supports structured, on-line learning activities, and provides facilities for assessment and collaboration. Structured collaboration, in the classroom, has proven itself a successful and uniquely powerful learning method. Online collaborative learners, however, do not enjoy the same benefits as face-to-face learners because the technology provides no guidance or direction during online discussion sessions. Integrating intelligent facilitation agents into collaborative distance learning environments may help bring the benefits of the supportive classroom closer to distance learners.In this dissertation, I describe a new approach to analyzing and supporting online peer interaction. The approach applies Hidden Markov Models, and Multidimensional Scaling with a threshold-based clustering method, to analyze and assess sequences of coded on-line student interaction. These analysis techniques were used to train a system to dynamically recognize when and why students may be experiencing breakdowns while sharing knowledge and learning from each other. I focus on knowledge sharing interaction because students bring a great deal of specialized knowledge and experiences to the group, and how they share and assimilate this knowledge shapes the collaboration and learning processes. The results of this research could be used to dynamically inform and assist an intelligent instructional agent in facilitating knowledge sharing interaction, and helping to improve the quality of online learning interaction

    Human robot interaction in a crowded environment

    No full text
    Human Robot Interaction (HRI) is the primary means of establishing natural and affective communication between humans and robots. HRI enables robots to act in a way similar to humans in order to assist in activities that are considered to be laborious, unsafe, or repetitive. Vision based human robot interaction is a major component of HRI, with which visual information is used to interpret how human interaction takes place. Common tasks of HRI include finding pre-trained static or dynamic gestures in an image, which involves localising different key parts of the human body such as the face and hands. This information is subsequently used to extract different gestures. After the initial detection process, the robot is required to comprehend the underlying meaning of these gestures [3]. Thus far, most gesture recognition systems can only detect gestures and identify a person in relatively static environments. This is not realistic for practical applications as difficulties may arise from people‟s movements and changing illumination conditions. Another issue to consider is that of identifying the commanding person in a crowded scene, which is important for interpreting the navigation commands. To this end, it is necessary to associate the gesture to the correct person and automatic reasoning is required to extract the most probable location of the person who has initiated the gesture. In this thesis, we have proposed a practical framework for addressing the above issues. It attempts to achieve a coarse level understanding about a given environment before engaging in active communication. This includes recognizing human robot interaction, where a person has the intention to communicate with the robot. In this regard, it is necessary to differentiate if people present are engaged with each other or their surrounding environment. The basic task is to detect and reason about the environmental context and different interactions so as to respond accordingly. For example, if individuals are engaged in conversation, the robot should realize it is best not to disturb or, if an individual is receptive to the robot‟s interaction, it may approach the person. Finally, if the user is moving in the environment, it can analyse further to understand if any help can be offered in assisting this user. The method proposed in this thesis combines multiple visual cues in a Bayesian framework to identify people in a scene and determine potential intentions. For improving system performance, contextual feedback is used, which allows the Bayesian network to evolve and adjust itself according to the surrounding environment. The results achieved demonstrate the effectiveness of the technique in dealing with human-robot interaction in a relatively crowded environment [7]
    • …
    corecore