1,438 research outputs found

    Fast Fight Detection

    Get PDF
    Action recognition has become a hot topic within computer vision. However, the action recognition community has focused mainly on relatively simple actions like clapping, walking, jogging, etc. The detection of specific events with direct practical use such as fights or in general aggressive behavior has been comparatively less studied. Such capability may be extremely useful in some video surveillance scenarios like prisons, psychiatric centers or even embedded in camera phones. As a consequence, there is growing interest in developing violence detection algorithms. Recent work considered the well-known Bag-of-Words framework for the specific problem of fight detection. Under this framework, spatio-temporal features are extracted from the video sequences and used for classification. Despite encouraging results in which high accuracy rates were achieved, the computational cost of extracting such features is prohibitive for practical applications. This work proposes a novel method to detect violence sequences. Features extracted from motion blobs are used to discriminate fight and non-fight sequences. Although the method is outperformed in accuracy by state of the art, it has a significantly faster computation time thus making it amenable for real-time applications

    Spatio-temporal action localization with Deep Learning

    Get PDF
    Dissertação de mestrado em Engenharia InformáticaThe system that detects and identifies human activities are named human action recognition. On the video approach, human activity is classified into four different categories, depending on the complexity of the steps and the number of body parts involved in the action, namely gestures, actions, interactions, and activities, which is challenging for video Human action recognition to capture valuable and discriminative features because of the human body’s variations. So, deep learning techniques have provided practical applications in multiple fields of signal processing, usually surpassing traditional signal processing on a large scale. Recently, several applications, namely surveillance, human-computer interaction, and video recovery based on its content, have studied violence’s detection and recognition. In recent years there has been a rapid growth in the production and consumption of a wide variety of video data due to the popularization of high quality and relatively low-price video devices. Smartphones and digital cameras contributed a lot to this factor. At the same time, there are about 300 hours of video data updates every minute on YouTube. Along with the growing production of video data, new technologies such as video captioning, answering video surveys, and video-based activity/event detection are emerging every day. From the video input data, the detection of human activity indicates which activity is contained in the video and locates the regions in the video where the activity occurs. This dissertation has conducted an experiment to identify and detect violence with spatial action localization, adapting a public dataset for effect. The idea was used an annotated dataset of general action recognition and adapted only for violence detection.O sistema que deteta e identifica as atividades humanas é denominado reconhecimento da ação humana. Na abordagem por vídeo, a atividade humana é classificada em quatro categorias diferentes, dependendo da complexidade das etapas e do número de partes do corpo envolvidas na ação, a saber, gestos, ações, interações e atividades, o que é desafiador para o reconhecimento da ação humana do vídeo para capturar características valiosas e discriminativas devido às variações do corpo humano. Portanto, as técnicas de deep learning forneceram aplicações práticas em vários campos de processamento de sinal, geralmente superando o processamento de sinal tradicional em grande escala. Recentemente, várias aplicações, nomeadamente na vigilância, interação humano computador e recuperação de vídeo com base no seu conteúdo, estudaram a deteção e o reconhecimento da violência. Nos últimos anos, tem havido um rápido crescimento na produção e consumo de uma ampla variedade de dados de vídeo devido à popularização de dispositivos de vídeo de alta qualidade e preços relativamente baixos. Smartphones e cameras digitais contribuíram muito para esse fator. Ao mesmo tempo, há cerca de 300 horas de atualizações de dados de vídeo a cada minuto no YouTube. Junto com a produção crescente de dados de vídeo, novas tecnologias, como legendagem de vídeo, respostas a pesquisas de vídeo e deteção de eventos / atividades baseadas em vídeo estão surgindo todos os dias. A partir dos dados de entrada de vídeo, a deteção de atividade humana indica qual atividade está contida no vídeo e localiza as regiões no vídeo onde a atividade ocorre. Esta dissertação conduziu uma experiência para identificar e detetar violência com localização espacial, adaptando um dataset público para efeito. A ideia foi usada um conjunto de dados anotado de reconhecimento de ações gerais e adaptá-la apenas para deteção de violência

    The Bullying Game: Sexism Based Toxic Language Analysis on Online Games Chat Logs by Text Mining

    Get PDF
    As a unique type of social network, the online gaming industry is a fast-growing, changing, and men-dominated field which attracts diverse backgrounds. Being dominated by male users, game developers, game players, game investors, the non-inclusiveness and gender inequality reside as salient problems in the community. In the online gaming communities, most women players report toxic and offensive language or experiences of verbal abuse. Symbolic interactionists and feminists assume that words matter since the use of particular language and terms can dehumanize and harm particular groups such as women. Identifying and reporting the toxic behavior, sexism, and harassment that occur in online games is a critical need in preventing cyberbullying, and it will help gender diversity and equality grow in the online gaming industry. However, the research on this topic is still rare, except for some milestone studies. This paper aims to contribute to the theory and practice of sexist toxic language detection in the online gaming community, through the automatic detection and analysis of toxic comments in online games chat logs. We adopted the MaXQDA tool as a data visualization technique to reveal the most frequently used toxic words used against women in online gaming communities. We also applied the Naïve Bayes Classifier for text mining to classify if a chat log content is sexist and toxic. We also refined the text mining model Laplace estimator and re-tested the model’s accuracy. The study also revealed that the accuracy of the Naïve Bayes Classifier did not change by the Laplace estimator. The findings of the study are expected to raise awareness about the use of gender-based toxic language in the online gaming community. Moreover, the proposed mining model can inspire similar research on practical tools to help moderate the use of sexist toxic language and disinfect these communities from gender-based discrimination and sexist bullying

    A review on Video Classification with Methods, Findings, Performance, Challenges, Limitations and Future Work

    Get PDF
    In recent years, there has been a rapid development in web users and sufficient bandwidth. Internet connectivity, which is so low cost, makes the sharing of information (text, audio, and videos) more common and faster. This video content needs to be analyzed for prediction it classes in different purpose for the users. Many machines learning approach has been developed for the classification of video to save people time and energy. There are a lot of existing review papers on video classification, but they have some limitations such as limitation of the analysis, badly structured, not mention research gaps or findings, not clearly describe advantages, disadvantages, and future work. But our review paper almost overcomes these limitations. This study attempts to review existing video-classification procedures and to examine the existing methods of video-classification comparatively and critically and to recommend the most effective and productive process. First of all, our analysis examines the classification of videos with taxonomical details, the latest application, process, and datasets information. Secondly, overall inconvenience, difficulties, shortcomings and potential work, data, performance measurements with the related recent relation in science, deep learning, and the model of machine learning. Study on video classification systems using their tools, benefits, drawbacks, as well as other features to compare the techniques they have used also constitutes a key task of this review. Lastly, we also present a quick summary table based on selected features. In terms of precision and independence extraction functions, the RNN (Recurrent Neural Network), CNN (Convolutional Neural Network) and combination approach performs better than the CNN dependent method

    Spatio-Temporal Information for Action Recognition in Thermal Video Using Deep Learning Model

    Get PDF
    Researchers can evaluate numerous information to ensure automated monitoring due to the widespread use of surveillance cameras in smart cities. For the monitoring of violence or abnormal behaviors in smart cities, schools, hospitals, residences, and other observational domains, an enhanced safety and security system is required to prevent any injuries that might result in ecological, economic and social losses. Automatic detection for prompt actions is vital and may help the respective departments effectively. Based on thermal imaging, several researchers have concentrated on object detection, tracking, and action identification. Few studies have simultaneously extracted spatial-temporal information from a thermal image and utilized it to recognize human actions. This research provides a novelty based on frame-level and spatial and temporal features which combines richer context temporal information to address the issue of poor efficiency and less accuracy in detecting abnormal/violent behavior in thermal monitoring devices. The model can locate (bounded box) video frame areas involving different human activities and recognize (classify) the actions. The dataset on human behavior includes videos captured with infrared cameras in both indoor and outdoor environments. The experimental results using the publicly available benchmark datasets reveal the proposed model\u27s efficiency. Our model achieves 98.5% and 94.85% accuracy on IITR Infrared Action Recognition (IITR-IAR) and Thermal Simulated Fall (TSF) datasets, respectively. In addition, the proposed method may be evaluated in more realistic conditions, such as zooming in and out etc

    Multi-perspective cost-sensitive context-aware multi-instance sparse coding and its application to sensitive video recognition

    Get PDF
    With the development of video-sharing websites, P2P, micro-blog, mobile WAP websites, and so on, sensitive videos can be more easily accessed. Effective sensitive video recognition is necessary for web content security. Among web sensitive videos, this paper focuses on violent and horror videos. Based on color emotion and color harmony theories, we extract visual emotional features from videos. A video is viewed as a bag and each shot in the video is represented by a key frame which is treated as an instance in the bag. Then, we combine multi-instance learning (MIL) with sparse coding to recognize violent and horror videos. The resulting MIL-based model can be updated online to adapt to changing web environments. We propose a cost-sensitive context-aware multi- instance sparse coding (MI-SC) method, in which the contextual structure of the key frames is modeled using a graph, and fusion between audio and visual features is carried out by extending the classic sparse coding into cost-sensitive sparse coding. We then propose a multi-perspective multi- instance joint sparse coding (MI-J-SC) method that handles each bag of instances from an independent perspective, a contextual perspective, and a holistic perspective. The experiments demonstrate that the features with an emotional meaning are effective for violent and horror video recognition, and our cost-sensitive context-aware MI-SC and multi-perspective MI-J-SC methods outperform the traditional MIL methods and the traditional SVM and KNN-based methods

    Visual Concept Detection in Images and Videos

    Get PDF
    The rapidly increasing proliferation of digital images and videos leads to a situation where content-based search in multimedia databases becomes more and more important. A prerequisite for effective image and video search is to analyze and index media content automatically. Current approaches in the field of image and video retrieval focus on semantic concepts serving as an intermediate description to bridge the “semantic gap” between the data representation and the human interpretation. Due to the large complexity and variability in the appearance of visual concepts, the detection of arbitrary concepts represents a very challenging task. In this thesis, the following aspects of visual concept detection systems are addressed: First, enhanced local descriptors for mid-level feature coding are presented. Based on the observation that scale-invariant feature transform (SIFT) descriptors with different spatial extents yield large performance differences, a novel concept detection system is proposed that combines feature representations for different spatial extents using multiple kernel learning (MKL). A multi-modal video concept detection system is presented that relies on Bag-of-Words representations for visual and in particular for audio features. Furthermore, a method for the SIFT-based integration of color information, called color moment SIFT, is introduced. Comparative experimental results demonstrate the superior performance of the proposed systems on the Mediamill and on the VOC Challenge. Second, an approach is presented that systematically utilizes results of object detectors. Novel object-based features are generated based on object detection results using different pooling strategies. For videos, detection results are assembled to object sequences and a shot-based confidence score as well as further features, such as position, frame coverage or movement, are computed for each object class. These features are used as additional input for the support vector machine (SVM)-based concept classifiers. Thus, other related concepts can also profit from object-based features. Extensive experiments on the Mediamill, VOC and TRECVid Challenge show significant improvements in terms of retrieval performance not only for the object classes, but also in particular for a large number of indirectly related concepts. Moreover, it has been demonstrated that a few object-based features are beneficial for a large number of concept classes. On the VOC Challenge, the additional use of object-based features led to a superior performance for the image classification task of 63.8% mean average precision (AP). Furthermore, the generalization capabilities of concept models are investigated. It is shown that different source and target domains lead to a severe loss in concept detection performance. In these cross-domain settings, object-based features achieve a significant performance improvement. Since it is inefficient to run a large number of single-class object detectors, it is additionally demonstrated how a concurrent multi-class object detection system can be constructed to speed up the detection of many object classes in images. Third, a novel, purely web-supervised learning approach for modeling heterogeneous concept classes in images is proposed. Tags and annotations of multimedia data in the WWW are rich sources of information that can be employed for learning visual concepts. The presented approach is aimed at continuous long-term learning of appearance models and improving these models periodically. For this purpose, several components have been developed: a crawling component, a multi-modal clustering component for spam detection and subclass identification, a novel learning component, called “random savanna”, a validation component, an updating component, and a scalability manager. Only a single word describing the visual concept is required to initiate the learning process. Experimental results demonstrate the capabilities of the individual components. Finally, a generic concept detection system is applied to support interdisciplinary research efforts in the field of psychology and media science. The psychological research question addressed in the field of behavioral sciences is, whether and how playing violent content in computer games may induce aggression. Therefore, novel semantic concepts most notably “violence” are detected in computer game videos to gain insights into the interrelationship of violent game events and the brain activity of a player. Experimental results demonstrate the excellent performance of the proposed automatic concept detection approach for such interdisciplinary research
    corecore