66 research outputs found

    VITALAS at TRECVID-2008

    Get PDF
    In this paper, we present our experiments in TRECVID 2008 about High-Level feature extraction task. This is the first year for our participation in TRECVID, our system adopts some popular approaches that other workgroups proposed before. We proposed 2 advanced low-level features NEW Gabor texture descriptor and the Compact-SIFT Codeword histogram. Our system applied well-known LIBSVM to train the SVM classifier for the basic classifier. In fusion step, some methods were employed such as the Voting, SVM-base, HCRF and Bootstrap Average AdaBoost(BAAB)

    Novel perspectives and approaches to video summarization

    Get PDF
    The increasing volume of videos requires efficient and effective techniques to index and structure videos. Video summarization is such a technique that extracts the essential information from a video, so that tasks such as comprehension by users and video content analysis can be conducted more effectively and efficiently. The research presented in this thesis investigates three novel perspectives of the video summarization problem and provides approaches to such perspectives. Our first perspective is to employ local keypoint to perform keyframe selection. Two criteria, namely Coverage and Redundancy, are introduced to guide the keyframe selection process in order to identify those representing maximum video content and sharing minimum redundancy. To efficiently deal with long videos, a top-down strategy is proposed, which splits the summarization problem to two sub-problems: scene identification and scene summarization. Our second perspective is to formulate the task of video summarization to the problem of sparse dictionary reconstruction. Our method utilizes the true sparse constraint L0 norm, instead of the relaxed constraint L2,1 norm, such that keyframes are directly selected as a sparse dictionary that can reconstruct the video frames. In addition, a Percentage Of Reconstruction (POR) criterion is proposed to intuitively guide users in selecting an appropriate length of the summary. In addition, an L2,0 constrained sparse dictionary selection model is also proposed to further verify the effectiveness of sparse dictionary reconstruction for video summarization. Lastly, we further investigate the multi-modal perspective of multimedia content summarization and enrichment. There are abundant images and videos on the Web, so it is highly desirable to effectively organize such resources for textual content enrichment. With the support of web scale images, our proposed system, namely StoryImaging, is capable of enriching arbitrary textual stories with visual content

    Video-4-Video: using video for searching, classifying and summarising video

    Get PDF
    YouTube has meant that we are now becoming accustomed to searching for video clips, and finding them, for both work and leisure pursuits. But YouTube, like the Internet Archive, OpenVideo and almost everything other video library, doesn't use video to find video, it uses metadata, usually based on user generated content (UGC). But what if we don't know what we're looking for and the metadata doesn't help, or we have poor metadata or no UGC, can we use the video to find video ? Can we automatically derive semantic concepts directly from video which we can use for retrieval or summarisation ? Many dozens of research groups throughout the world work on the problems associated with content-based video search, content-based detection of semantic concepts, shot boundary detection, content-based summarisation and content-based event detection. In this presentation we give a summary of the achievements of almost a decade of research by the TRECVid community, including a report on performance of groups in different TRECVid tasks. We present the modus operandi of the annual TRECVid benchmarking, the problems associated with running an annual evaluation for nearly 100 research groups every year and an overview of the most successful approaches to each task

    VITALAS at TRECVID-2009

    Get PDF
    This paper describes the participation of VITALAS in the TRECVID-2009 evaluation where we submitted runs for the High-Level Feature Extraction (HLFE) and Interactive Search tasks. For the HLFE task, we focus on the evaluation of low-level feature sets and fusion methods. The runs employ multiple low-level features based on all available modalities (visual, audio and text) and the results show that use of such features improves the retrieval eectiveness signicantly. We also use a concept score fusion approach that achieves good results with reduced low-level feature vector dimensionality. Furthermore, a weighting scheme is introduced for cluster assignment in the \bag-of-words" approach. Our runs achieved good performance compared to a baseline run and the submissions of other TRECVID-2009 participants. For the Interactive Search task, we focus on the evaluation of the integrated VITALAS system in order to gain insights into the use and eectiveness of the system's search functionalities on (the combination of) multiple modalities and study the behavior of two user groups: professional archivists and non-professional users. Our analysis indicates that both user groups submit about the same total number of queries and use the search functionalities in a similar way, but professional users save twice as many shots and examine shots deeper in the ranked retrieved list.The agreement between the TRECVID assessors and our users was quite low. In terms of the eectiveness of the dierent search modalities, similarity searches retrieve on average twice as many relevant shots as keyword searches, fused searches three times as many, while concept searches retrieve even up to ve times as many relevant shots, indicating the benets of the use of robust concept detectors in multimodal video retrieval. High-Level Feature Extraction Runs 1. A VITALAS.CERTH-ITI 1: Early fusion of all available low-level features. 2. A VITALAS.CERTH-ITI 2: Concept score fusion for ve low-level features and 100 concepts, text features and bag-of-words with color SIFT descriptor based on dense sampling. 3. A VITALAS.CERTH-ITI 3: Concept score fusion for ve low-level features and 100 concepts combined with text features. 4. A VITALAS.CERTH-ITI 4: Weighting scheme for bag-of-words based on dense sampling of the color SIFT descriptor. 5. A VITALAS.CERTH-ITI 5: Baseline run, bag-of-words based on dense sampling of the color SIFT descriptor. Interactive Search Runs 1. vitalas 1: Interactive run by professional archivists 2. vitalas 2: Interactive run by professional archivists 3. vitalas 3: Interactive run by non-professional users 4. vitalas 4: Interactive run by non-professional user

    Adapting content based video retrieval systems to accommodate the novice user on mobile devices.

    Get PDF
    With recent uptake in the usage of mobile devices, such as smartphones and tablets, increasing at an exponential rate, these devices have become part of everyday life. This high yield of information access comes at a cost. With still limited input metrics, it is prudent to develop content based techniques to filter the amount of content that is returned, for example, from search requests to video search engines. In addition, such handheld devices are used by a highly heterogeneous user community, including people with little or no experience. In this work, we focus on the latter, i.e. such casual users (ā€˜novicesā€™), and target video search and retrieval. We begin by examining new methods of developing related Content-Based Multimedia Information Retrieval systems for novices on handheld tablet devices. We analyze the shortcomings of traditional desktop systems which favor the expert user formulating complex queries and focus on the simplicity of design and interaction on tablet devices. We create and test three prototype demonstrators over three years of the TRECVid known item search task in order to determine the best features and appropriate usage to attain both high quality, usability, and precision from our novice users. In the first experiment, we determine that novice users perform similarly to an expert user group, one major premise of this research. In our second experiment, we analyze methods which can be applied automatically to aid novice users, thus enhancing their search performance. Our final experiment deals with different visualization approaches which can further aid the users. Overall, our results show that each year our systems made an incremental improvement. The 2011 TRECVid system performed best of all submissions in that year, despite the reduced complexity, enabling novice users to perform equally well as experts and experienced searchers

    BilVideo-7 : video parsing, indexing and retrieval

    Get PDF
    Ankara : The Department of Computer Engineering and the Institute of Engineering and Science of Bilkent University, 2010.Thesis (Ph. D.) -- Bilkent University, 2010.Includes bibliographical references leaves 91-103.Video indexing and retrieval aims to provide fast, natural and intuitive access to large video collections. This is getting more and more important as the amount of video data increases at a stunning rate. This thesis introduces the BilVideo-7 system to address the issues related to video parsing, indexing and retrieval. BilVideo-7 is a distributed and MPEG-7 compatible video indexing and retrieval system that supports complex multimodal queries in a unified framework. The video data model is based on an MPEG-7 profile which is designed to represent the videos by decomposing them into Shots, Keyframes, Still Regions and Moving Regions. The MPEG-7 compatible XML representations of videos according to this profile are obtained by the MPEG-7 compatible video feature extraction and annotation tool of BilVideo-7, and stored in a native XML database. Users can formulate text, color, texture, shape, location, motion and spatio-temporal queries on an intuitive, easy-touse visual query interface, whose composite query interface can be used to formulate very complex queries containing any type and number of video segments with their descriptors and specifying the spatio-temporal relations between them. The multithreaded query processing server parses incoming queries into subqueries and executes each subquery in a separate thread. Then, it fuses subquery results in a bottom-up manner to obtain the final query result and sends the result to the originating client. The whole system is unique in that it provides very powerful querying capabilities with a wide range of descriptors and multimodal query processing in an MPEG-7 compatible interoperable environment.Baştan, MuhammetPh.D

    Visual object category discovery in images and videos

    Get PDF
    textThe current trend in visual recognition research is to place a strict division between the supervised and unsupervised learning paradigms, which is problematic for two main reasons. On the one hand, supervised methods require training data for each and every category that the system learns; training data may not always be available and is expensive to obtain. On the other hand, unsupervised methods must determine the optimal visual cues and distance metrics that distinguish one category from another to group images into semantically meaningful categories; however, for unlabeled data, these are unknown a priori. I propose a visual category discovery framework that transcends the two paradigms and learns accurate models with few labeled exemplars. The main insight is to automatically focus on the prevalent objects in images and videos, and learn models from them for category grouping, segmentation, and summarization. To implement this idea, I first present a context-aware category discovery framework that discovers novel categories by leveraging context from previously learned categories. I devise a novel object-graph descriptor to model the interaction between a set of known categories and the unknown to-be-discovered categories, and group regions that have similar appearance and similar object-graphs. I then present a collective segmentation framework that simultaneously discovers the segmentations and groupings of objects by leveraging the shared patterns in the unlabeled image collection. It discovers an ensemble of representative instances for each unknown category, and builds top-down models from them to refine the segmentation of the remaining instances. Finally, building on these techniques, I show how to produce compact visual summaries for first-person egocentric videos that focus on the important people and objects. The system leverages novel egocentric and high-level saliency features to predict important regions in the video, and produces a concise visual summary that is driven by those regions. I compare against existing state-of-the-art methods for category discovery and segmentation on several challenging benchmark datasets. I demonstrate that we can discover visual concepts more accurately by focusing on the prevalent objects in images and videos, and show clear advantages of departing from the status quo division between the supervised and unsupervised learning paradigms. The main impact of my thesis is that it lays the groundwork for building large-scale visual discovery systems that can automatically discover visual concepts with minimal human supervision.Electrical and Computer Engineerin

    Deep Visual Instruments: Realtime Continuous, Meaningful Human Control over Deep Neural Networks for Creative Expression

    Get PDF
    In this thesis, we investigate Deep Learning models as an artistic medium for new modes of performative, creative expression. We call these Deep Visual Instruments: realtime interactive generative systems that exploit and leverage the capabilities of state-of-the-art Deep Neural Networks (DNN), while allowing Meaningful Human Control, in a Realtime Continuous manner. We characterise Meaningful Human Control in terms of intent, predictability, and accountability; and Realtime Continuous Control with regards to its capacity for performative interaction with immediate feedback, enhancing goal-less exploration. The capabilities of DNNs that we are looking to exploit and leverage in this manner, are their ability to learn hierarchical representations modelling highly complex, real-world data such as images. Thinking of DNNs as tools that extract useful information from massive amounts of Big Data, we investigate ways in which we can navigate and explore what useful information a DNN has learnt, and how we can meaningfully use such a model in the production of artistic and creative works, in a performative, expressive manner. We present five studies that approach this from different but complementary angles. These include: a collaborative, generative sketching application using MCTS and discriminative CNNs; a system to gesturally conduct the realtime generation of text in different styles using an ensemble of LSTM RNNs; a performative tool that allows for the manipulation of hyperparameters in realtime while a Convolutional VAE trains on a live camera feed; a live video feed processing software that allows for digital puppetry and augmented drawing; and a method that allows for long-form story telling within a generative model's latent space with meaningful control over the narrative. We frame our research with the realtime, performative expression provided by musical instruments as a metaphor, in which we think of these systems as not used by a user, but played by a performer

    An investigation into weighted data fusion for content-based multimedia information retrieval

    Get PDF
    Content Based Multimedia Information Retrieval (CBMIR) is characterised by the combination of noisy sources of information which, in unison, are able to achieve strong performance. In this thesis we focus on the combination of ranked results from the independent retrieval experts which comprise a CBMIR system through linearly weighted data fusion. The independent retrieval experts are low-level multimedia features, each of which contains an indexing function and ranking algorithm. This thesis is comprised of two halves. In the ļ¬rst half, we perform a rigorous empirical investigation into the factors which impact upon performance in linearly weighted data fusion. In the second half, we leverage these ļ¬nding to create a new class of weight generation algorithms for data fusion which are capable of determining weights at query-time, such that the weights are topic dependent
    • ā€¦
    corecore