277 research outputs found

    Evaluating Multimedia Features and Fusion for Example-Based Event Detection

    Get PDF
    Multimedia event detection (MED) is a challenging problem because of the heterogeneous content and variable quality found in large collections of Internet videos. To study the value of multimedia features and fusion for representing and learning events from a set of example video clips, we created SESAME, a system for video SEarch with Speed and Accuracy for Multimedia Events. SESAME includes multiple bag-of-words event classifiers based on single data types: low-level visual, motion, and audio features; high-level semantic visual concepts; and automatic speech recognition. Event detection performance was evaluated for each event classifier. The performance of low-level visual and motion features was improved by the use of difference coding. The accuracy of the visual concepts was nearly as strong as that of the low-level visual features. Experiments with a number of fusion methods for combining the event detection scores from these classifiers revealed that simple fusion methods, such as arithmetic mean, perform as well as or better than other, more complex fusion methods. SESAME’s performance in the 2012 TRECVID MED evaluation was one of the best reported

    Learning to detect video events from zero or very few video examples

    Get PDF
    In this work we deal with the problem of high-level event detection in video. Specifically, we study the challenging problems of i) learning to detect video events from solely a textual description of the event, without using any positive video examples, and ii) additionally exploiting very few positive training samples together with a small number of ``related'' videos. For learning only from an event's textual description, we first identify a general learning framework and then study the impact of different design choices for various stages of this framework. For additionally learning from example videos, when true positive training samples are scarce, we employ an extension of the Support Vector Machine that allows us to exploit ``related'' event videos by automatically introducing different weights for subsets of the videos in the overall training set. Experimental evaluations performed on the large-scale TRECVID MED 2014 video dataset provide insight on the effectiveness of the proposed methods.Comment: Image and Vision Computing Journal, Elsevier, 2015, accepted for publicatio

    TRECVID 2014 -- An Overview of the Goals, Tasks, Data, Evaluation Mechanisms and Metrics

    No full text
    International audienceThe TREC Video Retrieval Evaluation (TRECVID) 2014 was a TREC-style video analysis and retrieval evaluation, the goal of which remains to promote progress in content-based exploitation of digital video via open, metrics-based evaluation. Over the last dozen years this effort has yielded a better under- standing of how systems can effectively accomplish such processing and how one can reliably benchmark their performance. TRECVID is funded by the NIST with support from other US government agencies. Many organizations and individuals worldwide contribute significant time and effort

    The AXES submissions at TrecVid 2013

    Get PDF
    The AXES project participated in the interactive instance search task (INS), the semantic indexing task (SIN) the multimedia event recounting task (MER), and the multimedia event detection task (MED) for TRECVid 2013. Our interactive INS focused this year on using classifiers trained at query time with positive examples collected from external search engines. Participants in our INS experiments were carried out by students and researchers at Dublin City University. Our best INS runs performed on par with the top ranked INS runs in terms of P@10 and P@30, and around the median in terms of mAP. For SIN, MED and MER, we use systems based on state- of-the-art local low-level descriptors for motion, image, and sound, as well as high-level features to capture speech and text and the visual and audio stream respectively. The low-level descriptors were aggregated by means of Fisher vectors into high- dimensional video-level signatures, the high-level features are aggregated into bag-of-word histograms. Using these features we train linear classifiers, and use early and late-fusion to combine the different features. Our MED system achieved the best score of all submitted runs in the main track, as well as in the ad-hoc track. This paper describes in detail our INS, MER, and MED systems and the results and findings of our experimen

    TNO at TRECVID 2013 : multimedia event detection and instance search

    Get PDF
    We describe the TNO system and the evaluation results for TRECVID 2013 Multimedia Event Detection (MED) and instance search (INS) tasks. The MED system consists of a bag-of-word (BOW) approach with spatial tiling that uses low-level static and dynamic visual features, an audio feature and high-level concepts. Automatic speech recognition (ASR) and optical character recognition (OCR) are not used in the system. In the MED case with 100 example training videos, support-vector machines (SVM) are trained and fused to detect an event in the test set. In the case with 0 example videos, positive and negative concepts are extracted as keywords from the textual event description and events are detected with the high-level concepts. The MED results show that the SIFT keypoint descriptor is the one which contributes best to the results, fusion of multiple low-level features helps to improve the performance, and the textual event-description chain currently performs poorly. The TNO INS system presents a baseline open-source approach using standard SIFT keypoint detection and exhaustive matching. In order to speed up search times for queries a basic map-reduce scheme is presented to be used on a multi-node cluster. Our INS results show above-median results with acceptable search times.This research for the MED submission was performed in the GOOSE project, which is jointly funded by the enabling technology program Adaptive Multi Sensor Networks (AMSN) and the MIST research program of the Dutch Ministry of Defense. The INS submission was partly supported by the MIME project of the creative industries knowledge and innovation network CLICKNL.peer-reviewe

    Review of existing and operable observing systems and sensors

    Get PDF
    Deliverable 1.4 is aimed at identification of existing and operable observing systems and sensors which are relevant to COMMON SENSE objectives. Report aggregates information on existing observing initiatives, programmes, systems, platforms and sensors. The Report includes: ‱ inventory of previous and current EU funded projects. Some of the them, even if started before 2007, were aimed at activities which are relevant or in line with those stemming from MSFD in 2008. The ‘granulation’ of the contents and objectives of the projects varies from sensors development through observation methodologies to monitoring strategies, ‱ inventory of research infrastructure in Europe. It starts from an attempt to define of Marine Research Infrastructure, as there is not a single definition of Research Infrastructure (RI) or of Marine Research Infrastructure (MRI), and there are different ways to categorise them. The chapter gives the categorization of the MRI, together with detailed description and examples of MRI – research platforms, marine data systems, research sites and laboratories with respect of four MSFD descriptors relevant to COMMON SENSE project, ‱ two chapters on Research Programs and Infrastructure Networks; the pan-European initiatives aimed at cooperation and efficient use of infrastructural resources for marine observation and monitoring and data exchange are analysed. The detailed description of observing sensors and system are presented as well as frameworks for cooperation, ‱ information on platforms (research vessels) available to the Project for testing developed sensors and systems. Platforms are available and operating in all three regions of interest to the project (Mediterranean, North Sea, Baltic), ‱ annexed detailed description of two world-wide observation networks and systems. These systems are excellent examples of added value offered by integrated systems of ocean observation (from data to knowledge) and how they work in practice. Report concludes that it is seen a shortage of new classes of sensors to fulfil the emerging monitoring needs. Sensors proposed to be developed by COMMON SENSE project shall answer to the needs stemmed from introduction of MSFD and GES descriptors

    MDRED: Multi-Modal Multi-Task Distributed Recognition for Event Detection

    Get PDF
    Title from PDF of title page viewed September 28, 2018Thesis advisor: Yugyung LeeVitaIncludes bibliographical references (pages 63-67)Thesis (M.S.)--School of Computing and Engineering. University of Missouri--Kansas City, 2018Understanding users’ context is essential in emerging mobile sensing applications, such as Metal Detector, Glint Finder, Facefirst. Over the last decade, Machine Learning (ML) techniques have evolved dramatically for real-world applications. Specifically, Deep Learning (DL) has attracted tremendous attention for diverse applications including speech recognition, computer vision. However, ML requires extensive computing resources. ML applications are not suitable for devices with limited computing capabilities. Furthermore, customizing ML applications for users’ context is not easy. Such a situation presents real challenges to mobile based ML applications. We are motivated to solve this problem by designing a distributed and collaborative computing framework for ML edge computing and applications. In this thesis, we propose the Multi-Modal Multi-Task Distributed Recognition for Event Detection (MDRED) framework for complex event recognition with images. The MDRED framework is based on a hybrid ML model that is composed of Deep Learning (DL) and Shallow Learning (SL). The lower level of the MDRED framework is based on the DL models for (1) object detection, (2) color recognition, (3) emotion recognition, (4) face detection, (5) text detection with event images. The higher level is based on the SL-based fusion techniques for the event detection based on the outcomes from the lower level DL models. The fusion model is designed as a weighted feature vector generated by a modified Term Frequency and Inverse Document Frequency (TF-IDF) algorithm, considering common and unique multi-modal features that are recognized for event detection. The prototype of the MDRED framework has been implemented: A master-slave architecture was designed for coordinating the distributed computing among multiple mobile devices at the edge while connecting the edge devices to the cloud ML servers. The MDRED model has been evaluated with the benchmark event datasets and compared with the state-of-the-art event detection models. The MDRED accuracy of 90.5%, 98.8%, 78% for SocEID, UIUC Sports, RED Events datasets, respectively, outperformed the baseline models of AlexNet-fc7, WEBLY-fc7, WIDER-fc7 and Event concepts. We also demonstrate the MDRED application running on Android devices for the real-time event detection.Introduction -- Background and related work -- Proposed work -- Results and evaluation -- Conclusion and future wor

    Per-exemplar analysis with MFoM fusion learning for multimedia retrieval and recounting

    Get PDF
    As a large volume of digital video data becomes available, along with revolutionary advances in multimedia technologies, demand related to efficiently retrieving and recounting multimedia data has grown. However, the inherent complexity in representing and recognizing multimedia data, especially for large-scale and unconstrained consumer videos, poses significant challenges. In particular, the following challenges are major concerns in the proposed research. One challenge is that consumer-video data (e.g., videos on YouTube) are mostly unstructured; therefore, evidence for a targeted semantic category is often sparsely located across time. To address the issue, a segmental multi-way local feature pooling method by using scene concept analysis is proposed. In particular, the proposed method utilizes scene concepts that are pre-constructed by clustering video segments into categories in an unsupervised manner. Then, a video is represented with multiple feature descriptors with respect to scene concepts. Finally, multiple kernels are constructed from the feature descriptors, and then, are combined into a final kernel that improves the discriminative power for multimedia event detection. Another challenge is that most semantic categories used for multimedia retrieval have inherent within-class diversity that can be dramatic and can raise the question as to whether conventional approaches are still successful and scalable. To consider such huge variability and further improve recounting capabilities, a per-exemplar learning scheme is proposed with a focus on fusing multiple types of heterogeneous features for video retrieval. While the conventional approach for multimedia retrieval involves learning a single classifier per category, the proposed scheme learns multiple detection models, one for each training exemplar. In particular, a local distance function is defined as a linear combination of element distance measured by each features. Then, a weight vector of the local distance function is learned in a discriminative learning method by taking only neighboring samples around an exemplar as training samples. In this way, a retrieval problem is redefined as an association problem, i.e., test samples are retrieved by association-based rules. In addition, the quality of a multimedia-retrieval system is often evaluated by domain-specific performance metrics that serve sophisticated user needs. To address such criteria for evaluating a multimedia-retrieval system, in MFoM learning, novel algorithms were proposed to explicitly optimize two challenging metrics, AP and a weighted sum of the probabilities of false alarms and missed detections at a target error ratio. Most conventional learning schemes attempt to optimize their own learning criteria, as opposed to domain-specific performance measures. By addressing this discrepancy, the proposed learning scheme approximates the given performance measure, which is discrete and makes it difficult to apply conventional optimization schemes, with a continuous and differentiable loss function which can be directly optimized. Then, a GPD algorithm is applied to optimizing this loss function.Ph.D

    Intelligent Control of Home Appliances via Network

    Get PDF
    • 

    corecore