16 research outputs found

    Inférence de la grammaire structurelle d’une émission TV récurrente à partir du contenu

    Get PDF
    TV program structuring raises as a major theme in last decade for the task of high quality indexing. In this thesis, we address the problem of unsupervised TV program structuring from the point of view of grammatical inference, i.e., discovering a common structural model shared by a collection of episodes of a recurrent program. Using grammatical inference makes it possible to rely on only minimal domain knowledge. In particular, we assume no prior knowledge on the structural elements that might be present in a recurrent program and very limited knowledge on the program type, e.g., to name structural elements, apart from the recurrence. With this assumption, we propose an unsupervised framework operating in two stages. The first stage aims at determining the structural elements that are relevant to the structure of a program. We address this issue making use of the property of element repetitiveness in recurrent programs, leveraging temporal density analysis to filter out irrelevant events and determine valid elements. Having discovered structural elements, the second stage is to infer a grammar of the program. We explore two inference techniques based either on multiple sequence alignment or on uniform resampling. A model of the structure is derived from the grammars and used to predict the structure of new episodes. Evaluations are performed on a selection of four different types of recurrent programs. Focusing on structural element determination, we analyze the effect on the number of determined structural elements, fixing the threshold applied on the density function as well as the size of collection of episodes. For structural grammar inference, we discuss the quality of the grammars obtained and show that they accurately reflect the structure of the program. We also demonstrate that the models obtained by grammatical inference can accurately predict the structure of unseen episodes, conducting a quantitative and comparative evaluation of the two methods by segmenting the new episodes into their structural components. Finally, considering the limitations of our work, we discuss a number of open issues in structure discovery and propose three new research directions to address in future work.Dans cette thèse, on aborde le problème de structuration des programmes télévisés de manière non supervisée à partir du point de vue de l'inférence grammaticale, focalisant sur la découverte de la structure des programmes récurrents à partir une collection homogène. On vise à découvrir les éléments structuraux qui sont pertinents à la structure du programme, et à l’inférence grammaticale de la structure des programmes. Des expérimentations montrent que l'inférence grammaticale permet de utiliser minimum des connaissances de domaine a priori pour atteindre la découverte de la structure des programmes

    Using grammar induction to discover the structure of recurrent TV programs

    Get PDF
    International audienceVideo structuring, in particular applied to TV programs which have strong editing structures, mostly relies on supervised approaches either to retrieve a known structure for which a model has been obtained or to detect key elements from which a known structure is inferred. In this paper, we propose an unsupervised approach to recurrent TV program structuring, exploiting the repetitiveness of key structural elements across episodes of the same show. We cast the problem of structure discovery as a grammatical inference problem and show that a suited symbolic representation can be obtained by filtering generic events based on their reoccurring property. The method follows three steps: i) generic event detection, ii) selection of events relevant to the structure and iii) grammatical inference from a symbolic representation. Experimental evaluation is performed on three types of shows, viz., game shows, news and magazines, demonstrating that grammatical inference can be used to discover the structure of recurrent programs with very limited supervision

    Highly efficient low-level feature extraction for video representation and retrieval.

    Get PDF
    PhDWitnessing the omnipresence of digital video media, the research community has raised the question of its meaningful use and management. Stored in immense multimedia databases, digital videos need to be retrieved and structured in an intelligent way, relying on the content and the rich semantics involved. Current Content Based Video Indexing and Retrieval systems face the problem of the semantic gap between the simplicity of the available visual features and the richness of user semantics. This work focuses on the issues of efficiency and scalability in video indexing and retrieval to facilitate a video representation model capable of semantic annotation. A highly efficient algorithm for temporal analysis and key-frame extraction is developed. It is based on the prediction information extracted directly from the compressed domain features and the robust scalable analysis in the temporal domain. Furthermore, a hierarchical quantisation of the colour features in the descriptor space is presented. Derived from the extracted set of low-level features, a video representation model that enables semantic annotation and contextual genre classification is designed. Results demonstrate the efficiency and robustness of the temporal analysis algorithm that runs in real time maintaining the high precision and recall of the detection task. Adaptive key-frame extraction and summarisation achieve a good overview of the visual content, while the colour quantisation algorithm efficiently creates hierarchical set of descriptors. Finally, the video representation model, supported by the genre classification algorithm, achieves excellent results in an automatic annotation system by linking the video clips with a limited lexicon of related keywords

    Video Categorization Using Data Mining

    Get PDF
    Video categorization using data mining is the area of the research that aims to propose adeveloped method based on Artificial Neural Network (ANN), which could be used to classify video files into different categories according to the content. In order to test this method, the classifications of video files are discussed. The applied system proposes that the video could be categorized in two classes. The first one is educational while is noneducational. The classification is conducted based on the motion using optical flow. Several experiments were conducted using Artificial Neural Network (ANN) model. The research facilitate access to the required educational video to the learners students, especially novice students. This research objective is to investigate how the effect of motion feature can be useful in such lassification. We believe that other effects such audio features, text features, and other factors can enhance accuracy, but this requires wider studies and need more time. The accuracy of results in video classification to educational and non-educational through technique 3 fold cross validation and using (ANN) model is 54%. This result may can be improved by introducing other factors mentioned above

    Content-based discovery of multiple structures from episodes of recurrent TV programs based on grammatical inference

    Get PDF
    International audienceTV program structuring is essential for program indexing and retrieval. Practically, various types of programs lead to a diversity of program structures. In addition, several episodes of a recurrent program might exhibit different structures. Previous work mostly relies on supervised approaches by adopting prior knowledge about program structures. In this paper, we address the problem of unsupervised program structuring with minimal prior knowledge about the programs. We propose an approach to identify multiple structures and infer structural grammars for recurrent TV programs of different types. It involves three sub-problems: i) we determine the structural elements contained in programs with minimal knowledge about which type of elements may be present; ii) we identify multiple structures for the programs if any and model the structures of programs; iii) we generate the structural grammar for each corresponding structure. Finally, we conduct use cases on real recurrent programs of three different types to demonstrate the effectiveness of proposed approach

    A scalable approach to video summarization and adaptation

    Full text link
    Tesis doctoral inédita. Universidad Autónoma de Madrid, Escuela Politécnica Superior, octubre de 201

    Deliverable D1.6 Intelligent hypervideo analysis evaluation, final results

    Get PDF
    This deliverable describes the conducted evaluation activities for assessing the performance of a number of developed methods for intelligent hypervideo analysis and the usability of the implemented Editor Tool for supporting video annotation and enrichment. Based on the performance evaluations reported in D1.4 regarding a set of LinkedTV analysis components, we extended our experiments for assessing the effectiveness of newer versions of these methods as well as of entirely new techniques, concerning the accuracy and the time efficiency of the analysis. For this purpose, in-house experiments and participations at international benchmarking activities were made, and the outcomes are reported in this deliverable. Moreover, we present the results of user trials regarding the developed Editor Tool, where groups of experts assessed its usability and the supported functionalities, and evaluated the usefulness and the accuracy of the implemented video segmentation approaches based on the analysis requirements of the LinkedTV scenarios. By this deliverable we complete the reporting of WP1 evaluations that aimed to assess the efficiency of the developed multimedia analysis methods throughout the project, according to the analysis requirements of the LinkedTV scenarios

    Event Based Retrieval From Digital Libraries Containing Data Streams

    Get PDF
    The objective of this research is to study the issues involved in building a digital library that contains data streams and allows event-based retrieval. “Digital Libraries are storehouses of information available through the Internet that provide ways to collect, store, and organize data and make it accessible for search, retrieval, and processing” [29]. Data streams are sources of information for applications such as news-on-demand, weather services, and scientific research, to name a few. A data stream is a sequence of data units produced over a period of time. Examples of data streams are video streams, audio stream, and sensor readings. Saving data streams in digital libraries is advantageous because of the services provided by digital libraries such as archiving, preservation, administration, and access control. Events are noteworthy occurrences that happen during data streams. Events are easier to remember than specific time instances at which they occur; hence using them for retrieval is more commensurate with human behavior and can be more efficient via direct accessing instead of scanning. The focus of this research is not only on storing data streams in a digital library and using event-based retrieval, but also on relating streams and playing them back at the same time, possibly in a synchronized manner, to facilitate better understanding in research or other working situations. Our approach for this research starts by considering digital libraries for: stock market, news streams, census bureau statistics, weather, sports games, and the educational environment. For each of these applications, we form categories of possible users and the basic requirements for each of them. As a result, we identify a list of design goals that we take into consideration in developing the architecture of the library. To illustrate and validate our approach we implement a medical digital library containing actual Computed Tomography (CT) scan streams. It also contains sample medical text and audio streams to show the heterogeneity of the library. Streams are displayed in a concise, yet complete, way that makes it unproblematic for users to decide whether or not to playback a stream and to set playback options. The playback interface itself is organized in a way that accommodates synchronous and asynchronous streams and enables users to control the playback of these streams. We study the performance of the specialized search and retrieval processes in comparison to traditional search and retrieval processes. We conclude with a discussion on how to adapt the library to additional stream types in addition to suggesting other future efforts in this area

    Deliverable D1.1 State of the art and requirements analysis for hypervideo

    Get PDF
    This deliverable presents a state-of-art and requirements analysis report for hypervideo authored as part of the WP1 of the LinkedTV project. Initially, we present some use-case (viewers) scenarios in the LinkedTV project and through the analysis of the distinctive needs and demands of each scenario we point out the technical requirements from a user-side perspective. Subsequently we study methods for the automatic and semi-automatic decomposition of the audiovisual content in order to effectively support the annotation process. Considering that the multimedia content comprises of different types of information, i.e., visual, textual and audio, we report various methods for the analysis of these three different streams. Finally we present various annotation tools which could integrate the developed analysis results so as to effectively support users (video producers) in the semi-automatic linking of hypervideo content, and based on them we report on the initial progress in building the LinkedTV annotation tool. For each one of the different classes of techniques being discussed in the deliverable we present the evaluation results from the application of one such method of the literature to a dataset well-suited to the needs of the LinkedTV project, and we indicate the future technical requirements that should be addressed in order to achieve higher levels of performance (e.g., in terms of accuracy and time-efficiency), as necessary

    Indexation sémantique des images et des vidéos par apprentissage actif

    Get PDF
    Le cadre général de cette thèse est l'indexation sémantique et la recherche d'informations, appliquée à des documents multimédias. Plus précisément, nous nous intéressons à l'indexation sémantique des concepts dans des images et vidéos par les approches d'apprentissage actif, que nous utilisons pour construire des corpus annotés. Tout au long de cette thèse, nous avons montré que les principales difficultés de cette tâche sont souvent liées, en général, à l'fossé sémantique. En outre, elles sont liées au problème de classe-déséquilibre dans les ensembles de données à grande échelle, où les concepts sont pour la plupart rares. Pour l'annotation de corpus, l'objectif principal de l'utilisation de l'apprentissage actif est d'augmenter la performance du système en utilisant que peu d'échantillons annotés que possible, ainsi minimisant les coûts de l'annotations des données (par exemple argent et temps). Dans cette thèse, nous avons contribué à plusieurs niveaux de l'indexation multimédia et nous avons proposé trois approches qui succèdent des systèmes de l'état de l'art: i) l'approche multi-apprenant (ML) qui surmonte le problème de classe-déséquilibre dans les grandes bases de données, ii) une méthode de reclassement qui améliore l'indexation vidéo, iii) nous avons évalué la normalisation en loi de puissance et de l'APC et a montré son efficacité dans l'indexation multimédia. En outre, nous avons proposé l'approche ALML qui combine le multi-apprenant avec l'apprentissage actif, et nous avons également proposé une méthode incrémentale qui accélère l'approche proposé (ALML). En outre, nous avons proposé l'approche de nettoyage actif, qui aborde la qualité des annotations. Les méthodes proposées ont été tous validées par plusieurs expériences, qui ont été menées et évaluées sur des collections à grande échelle de l'indice de benchmark internationale bien connue, appelés TRECVID. Enfin, nous avons présenté notre système d'annotation dans le monde réel basé sur l'apprentissage actif, qui a été utilisé pour mener les annotations de l'ensemble du développement de la campagne TRECVID en 2011, et nous avons présenté notre participation à la tâche d'indexation sémantique de cette campagne, dans laquelle nous nous sommes classés à la 3ème place sur 19 participants.The general framework of this thesis is semantic indexing and information retrieval, applied to multimedia documents. More specifically, we are interested in the semantic indexing of concepts in images and videos by the active learning approaches that we use to build annotated corpus. Throughout this thesis, we have shown that the main difficulties of this task are often related, in general, to the semantic-gap. Furthermore, they are related to the class-imbalance problem in large scale datasets, where concepts are mostly sparse. For corpus annotation, the main objective of using active learning is to increase the system performance by using as few labeled samples as possible, thereby minimizing the cost of labeling data (e.g. money and time). In this thesis, we have contributed in several levels of multimedia indexing and proposed three approaches that outperform state-of-the-art systems: i) the multi-learner approach (ML) that overcomes the class-imbalance problem in large-scale datasets, ii) a re-ranking method that improves the video indexing, iii) we have evaluated the power-law normalization and the PCA and showed its effectiveness in multimedia indexing. Furthermore, we have proposed the ALML approach that combines the multi-learner with active learning, and also proposed an incremental method that speeds up ALML approach. Moreover, we have proposed the active cleaning approach, which tackles the quality of annotations. The proposed methods were validated through several experiments, which were conducted and evaluated on large-scale collections of the well-known international benchmark, called TrecVid. Finally, we have presented our real-world annotation system based on active learning, which was used to lead the annotations of the development set of TrecVid 2011 campaign, and we have presented our participation at the semantic indexing task of the mentioned campaign, in which we were ranked at the 3rd place out of 19 participants.SAVOIE-SCD - Bib.électronique (730659901) / SudocGRENOBLE1/INP-Bib.électronique (384210012) / SudocGRENOBLE2/3-Bib.électronique (384219901) / SudocSudocFranceF
    corecore