770 research outputs found

    Soccer event detection via collaborative multimodal feature analysis and candidate ranking

    Get PDF
    This paper presents a framework for soccer event detection through collaborative analysis of the textual, visual and aural modalities. The basic notion is to decompose a match video into smaller segments until ultimately the desired eventful segment is identified. Simple features are considered namely the minute-by-minute reports from sports websites (i.e. text), the semantic shot classes of far and closeup-views (i.e. visual), and the low-level features of pitch and log-energy (i.e. audio). The framework demonstrates that despite considering simple features, and by averting the use of labeled training examples, event detection can be achieved at very high accuracy. Experiments conducted on ~30-hours of soccer video show very promising results for the detection of goals, penalties, yellow cards and red cards

    Automatic Summarization of Soccer Highlights Using Audio-visual Descriptors

    Get PDF
    Automatic summarization generation of sports video content has been object of great interest for many years. Although semantic descriptions techniques have been proposed, many of the approaches still rely on low-level video descriptors that render quite limited results due to the complexity of the problem and to the low capability of the descriptors to represent semantic content. In this paper, a new approach for automatic highlights summarization generation of soccer videos using audio-visual descriptors is presented. The approach is based on the segmentation of the video sequence into shots that will be further analyzed to determine its relevance and interest. Of special interest in the approach is the use of the audio information that provides additional robustness to the overall performance of the summarization system. For every video shot a set of low and mid level audio-visual descriptors are computed and lately adequately combined in order to obtain different relevance measures based on empirical knowledge rules. The final summary is generated by selecting those shots with highest interest according to the specifications of the user and the results of relevance measures. A variety of results are presented with real soccer video sequences that prove the validity of the approach

    Video analysis for replay detection in sport events

    Get PDF
    The postproduction cost of a sport event video requires lots of resources dedication and expenses of time trying to find the best highlights moments that will be used, for instance, in creating the summary of the event. This process can be optimized and improved in efficiency. During the event, the most important moments are repeated to offer to the audience the outstanding scene several times and from different points of view. The objective of the project is to automatically find the replays in live or pre-recorded transmission and accelerating the post-production process. The results will be part of the project CENIT-E BUSCAMEDIA CEN20091026, developed in the studios of Televisió de Catalunya (TVC) and which are focused on automated generation through content analysis. A software has been developed to detect the replays for different kind of sport events, principally soccer. This, implements many operation modes detailed during this report. We find from a mode rather manual to a full automatic mode, and moreover the percentages of success are presented after testing then using some videos from the TVC database. The structure of the work has been divided into five major sections: The first chapter begins by introducing us to the context in which it places the project, proposing the objectives to be achieved, and also discusses the data and tools used for their development. Subsequently, there is exposed the state of the art with a collection of methods used for the detection of repeats, which are the foundations on which we developed our methodology. The third chapter is the longest and complex. This contains the entire process of experimentation and improvements planned from the inception until the system implemented. In addition, the following section talks about the technical and exhibits the algorithm implemented in form of block diagram detailing all the operation modes. Finally, the last chapter contains all the results and conclusions after applying the algorithm on a set of videos taken from the database o f TVC, as well as its application in other areas such as Formula1 videos.Català: El cost de postproducció d‟un vídeo d‟un esdeveniment esportiu requereix la dedicació de molt recursos i temps en situar sobre el vídeo els moments destacats que s‟utilitzaran, per exemple, en la creació del resum del l‟esdeveniment. Aquest procés pot ser optimitzat i millorat en quant a eficiència. Durant el transcurs d‟aquest, els moments més destacats solen repetir-se per tal d‟oferir l‟escena varies vegades i des de diferents punts de vista. Aquest treball té com a objectiu principal la detecció d‟aquestes repeticions per tal d‟identificar els moments destacats i senyalitzar-ho per tal d‟agilitzar el procés de postproducció. Els resultats formaran part del projecte CENIT-E BUSCAMEDIA CEN20091026, desenvolupat als estudis de Televisió de Catalunya (TVC) i que tracta de generació automàtica mitjançant l‟anàlisi de continguts. S‟ha desenvolupat un software capaç de detectar les repeticions que apareixen en diferents tipus d‟esdeveniments esportius, principalment futbol. Aquest, implementa diferents modes d‟operació que veurem explicats en detall al llarg de la memòria. Trobem des d‟un mode mes aviat manual fins a un completament automàtic i es mostren els percentatge d‟èxit obtinguts després de realitzar proves funcionals utilitzant vídeos de la basa de dades de TVC. L‟estructura del treball s‟ha dividit en cinc grans apartats: El primer capítol comença introduint-nos en el context on es situa el projecte, proposant els objectius que es volen assolir, així com també parla sobre les dades i eines utilitzades pel seu desenvolupament. Posteriorment, s‟exposarà l‟estat de l‟art amb un recull dels mètodes més emprats per la detecció de repeticions i que han estat els fonaments sobre els que hem desenvolupat la nostra metodologia. El tercer capítol és el més llarg i complex. Conté tot el procés d‟experimentació i millores plantejat des de l‟inici fins arribar al sistema que s‟ha implementat. D‟altra banda, el següent apartat ens fa cinc cèntims de la part tècnica i exposa en forma de diagrama de blocs l‟algorisme implementat, explicant els mètodes possibles per utilitzar el sistema. Finalment, l‟últim capítol recull tot els resultats i conclusions extretes després d‟aplicar l‟algorisme en un conjunt de vídeos extrets de la base de dades de TVC, així com també l‟aplicació del mateix en altres àmbits com vídeos de Formula1

    Goal event detection in soccer videos via collaborative multimodal analysis

    Get PDF
    Detecting semantic events in sports video is crucial for video indexing and retrieval. Most existing works have exclusively relied on video content features, namely, directly available and extractable data from the visual and/or aural channels. Sole reliance on such data however, can be problematic due to the high-level semantic nature of video and the difficulty to properly align detected events with their exact time of occurrences. This paper proposes a framework for soccer goal event detection through collaborative analysis of multimodal features. Unlike previous approaches, the visual and aural contents are not directly scrutinized. Instead, an external textual source (i.e., minute-by-minute reports from sports websites) is used to initially localize the event search space. This step is vital as the event search space can significantly be reduced. This also makes further visual and aural analysis more efficient since excessive and unnecessary non-eventful segments are discarded, culminating in the accurate identification of the actual goal event segment. Experiments conducted on thirteen soccer matches are very promising with high accuracy rates being reported

    Object Recognition in Videos Utilizing Hierarchical and Temporal Objectness with Deep Neural Networks

    Get PDF
    This dissertation develops a novel system for object recognition in videos. The input of the system is a set of unconstrained videos containing a known set of objects. The output is the locations and categories for each object in each frame across all videos. Initially, a shot boundary detection algorithm is applied to the videos to divide them into multiple sequences separated by the identified shot boundaries. Since each of these sequences still contains moderate content variations, we further use a cost optimization-based key frame extraction method to select key frames in each sequence and use these key frames to divide the videos into shorter sub-sequences with little content variations. Next, we learn object proposals on the first frame of each sub-sequence. Building upon the state-of-the-art object detection algorithms, we develop a tree-based hierarchical model to improve the object detection. Using the learned object proposals as the initial object positions in the first frame of each sub-sequence, we apply the SPOT tracker to track the object proposals and re-rank them using the proposed temporal objectness to obtain object proposals tubes by removing unlikely objects. Finally, we employ the deep Convolution Neural Network (CNN) to perform classification on these tubes. Experiments show that the proposed system significantly improves the object detection rate of the learned proposals when comparing with some state-of-the-art object detectors. Due to the improvement in object detection, the proposed system also achieves higher mean average precision at the stage of proposal classification than the state-of-the-art methods

    Audiovisual framework for automatic soccer highlights generation

    Get PDF
    Extracting low-level and mid-level descriptors from a soccer match to generate a summary of soccer highlights.Automatic generation of sports highlights from recorded audiovisual content has been object of great interest in recent years. The problem is indeed especially important in the production of second and third division highlights videos where the quantity of raw material is significant and does not contain manual annotations. In this thesis, a new approach for automatic generation of soccer highlights is proposed. The approach is based on the segmentation of the video sequence into shots that will be further ana- lyzed to determine its relevance and interest. For every video shot a set of low and mid level audio-visual descriptors are computed and combined in order to obtain different relevance measures based on empirical knowledge rules. The final summary is generated by selecting those shots with highest interest according to the specifications of the user and the results of relevance measures. The main novelties of this work have been the temporal combination of two shot boundary detectors; the selection of keyframes using motion and color features; the generation of new soccer audio mid-level descriptors; the robust detection of soccer players; the employment of a novel object detection technique to spot goal-posts and finally, the creation of a flexible and user-friendly highlight gen- eration framework. The thesis is mainly devoted to the description of the global visual segmentation module, the selection of audiovisual descriptors and the general scheme for evaluating the measures of relevance. Several results have been produced using real soccer video sequences that prove the validity of the proposed framework
    corecore