131,145 research outputs found

    Semantic Indexing of Sport Program Sequences by Audio-Visual Analysis

    Get PDF
    Semantic indexing of sports videos is a subject of great interest to researchers working on multimedia content characterization. Sports programs appeal to large audiences and their efficient distribution over various networks should contribute to widespread usage of multimedia services. In this paper, we propose a semantic indexing algorithm for soccer programs which uses both audio and visual information for content characterization. The video signal is processed first by extracting low-level visual descriptors from the MPEG compressed bit-stream. The temporal evolution of these descriptors during a semantic event is supposed to be governed by a controlled Markov chain. This allows to determine a list of those video segments where a semantic event of interest is likely to be found, based on the maximum likelihood criterion. The audio information is then used to refine the results of the video classification procedure by ranking the candidate video segments in the list so that the segments associated to the event of interest appear in the very first positions of the ordered list. The proposed method is applied to goal detection. Experimental results show the effectiveness of the proposed cross-modal approach

    Using high resolution optical imagery to detect earthquake-induced liquefaction: the 2011 Christchurch earthquake

    Get PDF
    Using automated supervised methods with satellite and aerial imageries for liquefaction mapping is a promising step in providing detailed and region-scale maps of liquefaction extent immediately after an earthquake. The accuracy of these methods depends on the quantity and quality of training samples and the number of available spectral bands. Digitizing a large number of high-quality training samples from an event may not be feasible in the desired timeframe for rapid response as the training pixels for each class should be typical and accurately represent the spectral diversity of that specific class. To perform automated classification for liquefaction detection, we need to understand how to build the optimal and accurate training dataset. Using multispectral optical imagery from the 22 February, 2011 Christchurch earthquake, we investigate the effects of quantity of high-quality training pixel samples as well as the number of spectral bands on the performance of a pixel-based parametric supervised maximum likelihood classifier for liquefaction detection. We find that the liquefaction surface effects are bimodal in terms of spectral signature and therefore, should be classified as either wet liquefaction or dry liquefaction. This is due to the difference in water content between these two modes. Using 5-fold cross-validation method, we evaluate performance of the classifier on datasets with different pixel sizes of 50, 100, 500, 2000, and 4000. Also, the effect of adding spectral information was investigated by adding once only the near infrared (NIR) band to the visible red, green, and blue (RGB) bands and the other time using all available 8 spectral bands of the World-View 2 satellite imagery. We find that the classifier has high accuracies (75%–95%) when using the 2000 pixels-size dataset that includes the RGB+NIR spectral bands and therefore, increasing to 4000 pixels-size dataset and/or eight spectral bands may not be worth the required time and cost. We also investigate accuracies of the classifier when using aerial imagery with same number of training pixels and either RGB or RGB+NIR bands and find that the classifier accuracies are higher when using satellite imagery with same number of training pixels and spectral information. The classifier identifies dry liquefaction with higher user accuracy than wet liquefaction across all evaluated scenarios. To improve classification performance for wet liquefaction detection, we also investigate adding geospatial information of building footprints to improve classification performance. We find that using a building footprint mask to remove them from the classification process, increases wet liquefaction user accuracy by roughly 10%.Published versio

    An audio-based sports video segmentation and event detection algorithm

    Get PDF
    In this paper, we present an audio-based event detection algorithm shown to be effective when applied to Soccer video. The main benefit of this approach is the ability to recognise patterns that display high levels of crowd response correlated to key events. The soundtrack from a Soccer sequence is first parameterised using Mel-frequency Cepstral coefficients. It is then segmented into homogenous components using a windowing algorithm with a decision process based on Bayesian model selection. This decision process eliminated the need for defining a heuristic set of rules for segmentation. Each audio segment is then labelled using a series of Hidden Markov model (HMM) classifiers, each a representation of one of 6 predefined semantic content classes found in Soccer video. Exciting events are identified as those segments belonging to a crowd cheering class. Experimentation indicated that the algorithm was more effective for classifying crowd response when compared to traditional model-based segmentation and classification techniques

    Integrating dynamic stopping, transfer learning and language models in an adaptive zero-training ERP speller

    Get PDF
    Objective. Most BCIs have to undergo a calibration session in which data is recorded to train decoders with machine learning. Only recently zero-training methods have become a subject of study. This work proposes a probabilistic framework for BCI applications which exploit event-related potentials (ERPs). For the example of a visual P300 speller we show how the framework harvests the structure suitable to solve the decoding task by (a) transfer learning, (b) unsupervised adaptation, (c) language model and (d) dynamic stopping. Approach. A simulation study compares the proposed probabilistic zero framework (using transfer learning and task structure) to a state-of-the-art supervised model on n = 22 subjects. The individual influence of the involved components (a)–(d) are investigated. Main results. Without any need for a calibration session, the probabilistic zero-training framework with inter-subject transfer learning shows excellent performance—competitive to a state-of-the-art supervised method using calibration. Its decoding quality is carried mainly by the effect of transfer learning in combination with continuous unsupervised adaptation. Significance. A high-performing zero-training BCI is within reach for one of the most popular BCI paradigms: ERP spelling. Recording calibration data for a supervised BCI would require valuable time which is lost for spelling. The time spent on calibration would allow a novel user to spell 29 symbols with our unsupervised approach. It could be of use for various clinical and non-clinical ERP-applications of BCI
    corecore