16,716 research outputs found

    Multi-perspective cost-sensitive context-aware multi-instance sparse coding and its application to sensitive video recognition

    Get PDF
    With the development of video-sharing websites, P2P, micro-blog, mobile WAP websites, and so on, sensitive videos can be more easily accessed. Effective sensitive video recognition is necessary for web content security. Among web sensitive videos, this paper focuses on violent and horror videos. Based on color emotion and color harmony theories, we extract visual emotional features from videos. A video is viewed as a bag and each shot in the video is represented by a key frame which is treated as an instance in the bag. Then, we combine multi-instance learning (MIL) with sparse coding to recognize violent and horror videos. The resulting MIL-based model can be updated online to adapt to changing web environments. We propose a cost-sensitive context-aware multi- instance sparse coding (MI-SC) method, in which the contextual structure of the key frames is modeled using a graph, and fusion between audio and visual features is carried out by extending the classic sparse coding into cost-sensitive sparse coding. We then propose a multi-perspective multi- instance joint sparse coding (MI-J-SC) method that handles each bag of instances from an independent perspective, a contextual perspective, and a holistic perspective. The experiments demonstrate that the features with an emotional meaning are effective for violent and horror video recognition, and our cost-sensitive context-aware MI-SC and multi-perspective MI-J-SC methods outperform the traditional MIL methods and the traditional SVM and KNN-based methods

    AUTOMATIC PARENTAL GUIDE SCENE CLASSIFICATION MENGGUNAKAN METODE DEEP CONVOLUTIONAL NEURAL NETWORK DAN LSTM

    Get PDF
    Menonton film merupakan salah satu hobi yang paling digemari oleh berbagai kalangan. Seiring dengan semakin bertambahnya film yang beredar di pasaran, semakin banyak pula konten tidak pantas pada film-film tersebutu. Oleh karena itu, dibutuhkan sebuah metode untuk mengklasifikasikan film agar konten yang ditonton sesuai dengan usia penonton. Konten film yang kurang cocok untuk pengguna di bawah umur yang akan diklasifikasikan pada penelitian ini antara lain: kekerasan, pronografi, kata-kata kasar, minuman keras, penggunaan obat-obatan terlarang, merokok, adegan mengerikan (horror) dan intens. Metode klasifikasi yang digunakan berupa modifikasi dari convolutional neural network dan LSTM. Gabungan kedua metode ini dapat mengakomodasi data training dalam jumlah yang kecil, serta dapat melakukan multi klasifikasi berdasarkan video, audio, dan subtitle film. Penggunaan multi klasifikasi ini dikarenakan sebuah film selalu memiliki lebih dari satu klasifikasi. Dalam proses training dan testing pada penelitian ini digunakan sebanyak 1000 data untuk klasifikasi video, 600 data klasifikasi audio, dan 400 data klasifikasi subtitle yang didapatkan dari internet. Dari hasil percobaan dihasilkan tingkat akurasi yang diukur dengan menggunakan F1-Score sebesar 0.922 untuk klasifikasi video, 0.741 untuk klasifikasi audio, dan 0.844 untuk klasifikasi subtitle dengan rata-rata akurasi sebesar 0.835. Pada penelitian berikutnya akan dicoba dengan menggunakan metode Deep Convolutional Neural Network yang lain serta dengan memperbanyak jumlah dan variasi dari data testing

    Multi-view multi-instance learning based on joint sparse representation and multi-view dictionary learning

    Get PDF
    In multi-instance learning (MIL), the relations among instances in a bag convey important contextual information in many applications. Previous studies on MIL either ignore such relations or simply model them with a fixed graph structure so that the overall performance inevitably degrades in complex environments. To address this problem, this paper proposes a novel multi-view multi-instance learning algorithm (M2IL) that combines multiple context structures in a bag into a unified framework. The novel aspects are: (i) we propose a sparse "-graph model that can generate different graphs with different parameters to represent various context relations in a bag, (ii) we propose a multi-view joint sparse representation that integrates these graphs into a unified framework for bag classification, and (iii) we propose a multi-view dictionary learning algorithm to obtain a multi-view graph dictionary that considers cues from all views simultaneously to improve the discrimination of the M2IL. Experiments and analyses in many practical applications prove the effectiveness of the M2IL

    Horror image recognition based on context-aware multi-instance learning

    Get PDF
    Horror content sharing on the Web is a growing phenomenon that can interfere with our daily life and affect the mental health of those involved. As an important form of expression, horror images have their own characteristics that can evoke extreme emotions. In this paper, we present a novel context-aware multi-instance learning (CMIL) algorithm for horror image recognition. The CMIL algorithm identifies horror images and picks out the regions that cause the sensation of horror in these horror images. It obtains contextual cues among adjacent regions in an image using a random walk on a contextual graph. Borrowing the strength of the Fuzzy Support Vector Machine (FSVM), we define a heuristic optimization procedure based on the FSVM to search for the optimal classifier for the CMIL. To improve the initialization of the CMIL, we propose a novel visual saliency model based on tensor analysis. The average saliency value of each segmented region is set as its initial fuzzy membership in the CMIL. The advantage of the tensor-based visual saliency model is that it not only adaptively selects features, but also dynamically determines fusion weights for saliency value combination from different feature subspaces. The effectiveness of the proposed CMIL model is demonstrated by its use in horror image recognition on two large scale image sets collected from the Internet

    Video Classification With CNNs: Using The Codec As A Spatio-Temporal Activity Sensor

    Get PDF
    We investigate video classification via a two-stream convolutional neural network (CNN) design that directly ingests information extracted from compressed video bitstreams. Our approach begins with the observation that all modern video codecs divide the input frames into macroblocks (MBs). We demonstrate that selective access to MB motion vector (MV) information within compressed video bitstreams can also provide for selective, motion-adaptive, MB pixel decoding (a.k.a., MB texture decoding). This in turn allows for the derivation of spatio-temporal video activity regions at extremely high speed in comparison to conventional full-frame decoding followed by optical flow estimation. In order to evaluate the accuracy of a video classification framework based on such activity data, we independently train two CNN architectures on MB texture and MV correspondences and then fuse their scores to derive the final classification of each test video. Evaluation on two standard datasets shows that the proposed approach is competitive to the best two-stream video classification approaches found in the literature. At the same time: (i) a CPU-based realization of our MV extraction is over 977 times faster than GPU-based optical flow methods; (ii) selective decoding is up to 12 times faster than full-frame decoding; (iii) our proposed spatial and temporal CNNs perform inference at 5 to 49 times lower cloud computing cost than the fastest methods from the literature.Comment: Accepted in IEEE Transactions on Circuits and Systems for Video Technology. Extension of ICIP 2017 conference pape

    Multiple Instance Learning: A Survey of Problem Characteristics and Applications

    Full text link
    Multiple instance learning (MIL) is a form of weakly supervised learning where training instances are arranged in sets, called bags, and a label is provided for the entire bag. This formulation is gaining interest because it naturally fits various problems and allows to leverage weakly labeled data. Consequently, it has been used in diverse application fields such as computer vision and document classification. However, learning from bags raises important challenges that are unique to MIL. This paper provides a comprehensive survey of the characteristics which define and differentiate the types of MIL problems. Until now, these problem characteristics have not been formally identified and described. As a result, the variations in performance of MIL algorithms from one data set to another are difficult to explain. In this paper, MIL problem characteristics are grouped into four broad categories: the composition of the bags, the types of data distribution, the ambiguity of instance labels, and the task to be performed. Methods specialized to address each category are reviewed. Then, the extent to which these characteristics manifest themselves in key MIL application areas are described. Finally, experiments are conducted to compare the performance of 16 state-of-the-art MIL methods on selected problem characteristics. This paper provides insight on how the problem characteristics affect MIL algorithms, recommendations for future benchmarking and promising avenues for research

    Chinmoku MQP

    Get PDF
    This report details the developmental process of Chinmoku (“silence”), an educational game developed to fulfill the Major Qualifying Project requirement for Worcester Polytechnic Institute’s Interactive Media and Game Development (IMGD) and Computer Science majors. This project was developed over a three month period at Ritsumeikan University’s Biwako-Kusatsu Campus in Shiga Prefecture, Japan. The game seeks to teach Hiragana, one of the Japanese writing systems, to a target audience of young adults familiar with gaming. This report covers all aspects of the team’s development process, research, playtesting, and the possibilities of future work on this project

    Chinmoku MQP

    Get PDF
    This report details the developmental process of Chinmoku (“silence”), an educational game developed to fulfill the Major Qualifying Project requirement for Worcester Polytechnic Institute’s Interactive Media and Game Development (IMGD) and Computer Science majors. This project was developed over a three month period at Ritsumeikan University’s Biwako-Kusatsu Campus in Shiga Prefecture, Japan. The game seeks to teach Hiragana, one of the Japanese writing systems, to a target audience of young adults familiar with gaming. This report covers all aspects of the team’s development process, research, playtesting, and the possibilities of future work on this project
    corecore