4,521 research outputs found

    An Immersive Telepresence System using RGB-D Sensors and Head Mounted Display

    Get PDF
    We present a tele-immersive system that enables people to interact with each other in a virtual world using body gestures in addition to verbal communication. Beyond the obvious applications, including general online conversations and gaming, we hypothesize that our proposed system would be particularly beneficial to education by offering rich visual contents and interactivity. One distinct feature is the integration of egocentric pose recognition that allows participants to use their gestures to demonstrate and manipulate virtual objects simultaneously. This functionality enables the instructor to ef- fectively and efficiently explain and illustrate complex concepts or sophisticated problems in an intuitive manner. The highly interactive and flexible environment can capture and sustain more student attention than the traditional classroom setting and, thus, delivers a compelling experience to the students. Our main focus here is to investigate possible solutions for the system design and implementation and devise strategies for fast, efficient computation suitable for visual data processing and network transmission. We describe the technique and experiments in details and provide quantitative performance results, demonstrating our system can be run comfortably and reliably for different application scenarios. Our preliminary results are promising and demonstrate the potential for more compelling directions in cyberlearning.Comment: IEEE International Symposium on Multimedia 201

    Generalized Video Anomaly Event Detection: Systematic Taxonomy and Comparison of Deep Models

    Full text link
    Video Anomaly Detection (VAD) serves as a pivotal technology in the intelligent surveillance systems, enabling the temporal or spatial identification of anomalous events within videos. While existing reviews predominantly concentrate on conventional unsupervised methods, they often overlook the emergence of weakly-supervised and fully-unsupervised approaches. To address this gap, this survey extends the conventional scope of VAD beyond unsupervised methods, encompassing a broader spectrum termed Generalized Video Anomaly Event Detection (GVAED). By skillfully incorporating recent advancements rooted in diverse assumptions and learning frameworks, this survey introduces an intuitive taxonomy that seamlessly navigates through unsupervised, weakly-supervised, supervised and fully-unsupervised VAD methodologies, elucidating the distinctions and interconnections within these research trajectories. In addition, this survey facilitates prospective researchers by assembling a compilation of research resources, including public datasets, available codebases, programming tools, and pertinent literature. Furthermore, this survey quantitatively assesses model performance, delves into research challenges and directions, and outlines potential avenues for future exploration.Comment: Accepted by ACM Computing Surveys. For more information, please see our project page: https://github.com/fudanyliu/GVAE

    Real-time analysis of video signals

    Get PDF
    Many practical and experimental systems employing image processing techniques have been built by other workers for various applications. Most of these systems are computer-based and very few operate in a real time environment. The objective of this work is to build a microprocessor-based system for video image processing. The system is used in conjunction with an on-line TV camera and processing is carried out in real time. The enormous storage requirement of digitized TV signals and the real time constraint suggest that some simplification of the data must take place prior to any viable processing. Data reduction is attained through the representation of objects by their edges, an approach often adopted for feature extraction in pattern recognition systems. A new technique for edge detection by applying comparison criteria to differentials at adjacent pixels of the video image is developed and implemented as a preprocessing hardware unit. A circuit for the generation of the co-ordinates of edge points is constructed to free the processing computer of this task, allowing it more time for on-line analysis of video signals. Besides the edge detector and co-ordinate generator the hardware built consists of a microprocessor system based on a Texas Instruments T.US 9900 device, a first-in-first-out buffer store and interface circuitry to a TV camera and display devices. All hardware modules and their power supplies are assembled in one unit to provide a standalone instrument. The problem chosen for investigation is analysis of motion in a visual scene. Aspects of motion studied concern the tracking of moving objects with simple geometric shapes and description of their motion. More emphasis is paid to the analysis of human eye movements and measurement of its point-of-regard which has many practical applications in the fields of physiology and psychology. This study provides a basis for the design of a processing unit attached to an oculometer to replace bulky minicomputer-based eye motion analysis systems. Programs are written for storage, analysis and display of results in real time

    Digital tools in media studies: analysis and research. An overview

    Get PDF
    Digital tools are increasingly used in media studies, opening up new perspectives for research and analysis, while creating new problems at the same time. In this volume, international media scholars and computer scientists present their projects, varying from powerful film-historical databases to automatic video analysis software, discussing their application of digital tools and reporting on their results. This book is the first publication of its kind and a helpful guide to both media scholars and computer scientists who intend to use digital tools in their research, providing information on applications, standards, and problems

    Digital Tools in Media Studies

    Get PDF
    Digital tools are increasingly used in media studies, opening up new perspectives for research and analysis, while creating new problems at the same time. In this volume, international media scholars and computer scientists present their projects, varying from powerful film-historical databases to automatic video analysis software, discussing their application of digital tools and reporting on their results. This book is the first publication of its kind and a helpful guide to both media scholars and computer scientists who intend to use digital tools in their research, providing information on applications, standards, and problems

    Object Recognition: Physiological and Computational Insights

    Get PDF
    Visual object recognition is the identification of a thing in the outside world based on the sense of vision. Our eyes are bombarded by a wide variety of visual forms, from simple shapes like cups an

    Deep Architectures for Visual Recognition and Description

    Get PDF
    In recent times, digital media contents are inherently of multimedia type, consisting of the form text, audio, image and video. Several of the outstanding computer Vision (CV) problems are being successfully solved with the help of modern Machine Learning (ML) techniques. Plenty of research work has already been carried out in the field of Automatic Image Annotation (AIA), Image Captioning and Video Tagging. Video Captioning, i.e., automatic description generation from digital video, however, is a different and complex problem altogether. This study compares various existing video captioning approaches available today and attempts their classification and analysis based on different parameters, viz., type of captioning methods (generation/retrieval), type of learning models employed, the desired output description length generated, etc. This dissertation also attempts to critically analyze the existing benchmark datasets used in various video captioning models and the evaluation metrics for assessing the final quality of the resultant video descriptions generated. A detailed study of important existing models, highlighting their comparative advantages as well as disadvantages are also included. In this study a novel approach for video captioning on the Microsoft Video Description (MSVD) dataset and Microsoft Video-to-Text (MSR-VTT) dataset is proposed using supervised learning techniques to train a deep combinational framework, for achieving better quality video captioning via predicting semantic tags. We develop simple shallow CNN (2D and 3D) as feature extractors, Deep Neural Networks (DNNs and Bidirectional LSTMs (BiLSTMs) as tag prediction models and Recurrent Neural Networks (RNNs) (LSTM) model as the language model. The aim of the work was to provide an alternative narrative to generating captions from videos via semantic tag predictions and deploy simpler shallower deep model architectures with lower memory requirements as solution so that it is not very memory extensive and the developed models prove to be stable and viable options when the scale of the data is increased. This study also successfully employed deep architectures like the Convolutional Neural Network (CNN) for speeding up automation process of hand gesture recognition and classification of the sign languages of the Indian classical dance form, ‘Bharatnatyam’. This hand gesture classification is primarily aimed at 1) building a novel dataset of 2D single hand gestures belonging to 27 classes that were collected from (i) Google search engine (Google images), (ii) YouTube videos (dynamic and with background considered) and (iii) professional artists under staged environment constraints (plain backgrounds). 2) exploring the effectiveness of CNNs for identifying and classifying the single hand gestures by optimizing the hyperparameters, and 3) evaluating the impacts of transfer learning and double transfer learning, which is a novel concept explored for achieving higher classification accuracy

    Learning as a Nonlinear Line of Attraction for Pattern Association, Classification and Recognition

    Get PDF
    Development of a mathematical model for learning a nonlinear line of attraction is presented in this dissertation, in contrast to the conventional recurrent neural network model in which the memory is stored in an attractive fixed point at discrete location in state space. A nonlinear line of attraction is the encapsulation of attractive fixed points scattered in state space as an attractive nonlinear line, describing patterns with similar characteristics as a family of patterns. It is usually of prime imperative to guarantee the convergence of the dynamics of the recurrent network for associative learning and recall. We propose to alter this picture. That is, if the brain remembers by converging to the state representing familiar patterns, it should also diverge from such states when presented by an unknown encoded representation of a visual image. The conception of the dynamics of the nonlinear line attractor network to operate between stable and unstable states is the second contribution in this dissertation research. These criteria can be used to circumvent the plasticity-stability dilemma by using the unstable state as an indicator to create a new line for an unfamiliar pattern. This novel learning strategy utilizes stability (convergence) and instability (divergence) criteria of the designed dynamics to induce self-organizing behavior. The self-organizing behavior of the nonlinear line attractor model can manifest complex dynamics in an unsupervised manner. The third contribution of this dissertation is the introduction of the concept of manifold of color perception. The fourth contribution of this dissertation is the development of a nonlinear dimensionality reduction technique by embedding a set of related observations into a low-dimensional space utilizing the result attained by the learned memory matrices of the nonlinear line attractor network. Development of a system for affective states computation is also presented in this dissertation. This system is capable of extracting the user\u27s mental state in real time using a low cost computer. It is successfully interfaced with an advanced learning environment for human-computer interaction

    A qualitative analysis of figural memory performance in persons with epilepsy

    Get PDF
    This study examined nonverbal memory in patients with intractable temporal lobe epilepsy (TLE) on a figural reproduction task, the Rey-Osterrieth Complex Figure (ROCF). The Boston Qualitative Scoring System (BQSS) was used to examine whether qualitative features of ROCF performance could discriminate between those with right and left TLE. As predicted, seizure groups did not differ on a standard quantitative scoring system for the ROCF. Contrary to prediction, the right TLE group did not perform more poorly on BQSS measures of quality or organization, and they did not have greater difficulty recalling the figure after a delay. There was a trend towards poorer performance by the right TLE group on 2 BQSS scales, those quantifying the presence or absence of elements of the figure. ROCF performance was more strongly correlated with measures of visuoperception than with additional measures of nonverbal memory. Thus, the BQSS does not appear to be assessing nonverbal memory, and the implications of the ROCF as a visuoperceptual task are discussed

    Conflicts, integration, hybridization of subcultures: An ecological approach to the case of queercore

    Get PDF
    This paper investigates the case study of queercore, providing a socio-historical analysis of its subcultural production, in the terms of what Michel Foucault has called archaeology of knowledge (1969). In particular, we will focus on: the self-definition of the movement; the conflicts between the two merged worlds of punk and queer culture; the \u201cinternal-subcultural\u201d conflicts between both queercore and punk, and between queercore and gay\lesbian music culture; the political aspects of differentiation. In the conclusion, we will offer an innovative theoretical proposal about the interpretation of subcultures in ecological and semiotic terms, combining the contribution of the American sociologist Andrew Abbot and of the Russian semiologist Jurij Michajlovi\u10d Lotma
    • …
    corecore