35,892 research outputs found

    A Real-Time Feature Indexing System on Live Video Streams

    Get PDF
    Most of the existing video storage systems rely on offline processing to support the feature-based indexing on video streams. The feature-based indexing technique provides an effec- tive way for users to search video content through visual features, such as object categories (e.g., cars and persons). However, due to the reliance on offline processing, video streams along with their captured features cannot be searchable immediately after video streams are recorded. According to our investigation, buffering and storing live video steams are more time-consuming than the YOLO v3 object detector. Such observation motivates us to propose a real-time feature indexing (RTFI) system to enable instantaneous feature-based indexing on live video streams after video streams are captured and processed through object detectors. RTFI achieves its real-time goal via incorporating the novel design of metadata structure and data placement, the capability of modern object detector (i.e., YOLO v3), and the deduplication techniques to avoid storing repetitive video content. Notably, RTFI is the first system design for realizing real-time feature-based indexing on live video streams. RTFI is implemented on a Linux server and can improve the system throughput by upto 10.60x, compared with the base system without the proposed design. In addition, RTFI is able to make the video content searchable within 20 milliseconds for 10 live video streams after the video content is received by the proposed system, excluding the network transfer latency

    Associating low-level features with semantic concepts using video objects and relevance feedback

    Get PDF
    The holy grail of multimedia indexing and retrieval is developing algorithms capable of imitating human abilities in distinguishing and recognising semantic concepts within the content, so that retrieval can be based on ”real world” concepts that come naturally to users. In this paper, we discuss an approach to using segmented video objects as the midlevel connection between low-level features and semantic concept description. In this paper, we consider a video object as a particular instance of a semantic concept and we model the semantic concept as an average representation of its instances. A system supporting object-based search through a test corpus is presented that allows matching presegmented objects based on automatically extracted lowlevel features. In the system, relevance feedback is employed to drive the learning of the semantic model during a regular search process

    Clearer Frames, Anytime: Resolving Velocity Ambiguity in Video Frame Interpolation

    Full text link
    Existing video frame interpolation (VFI) methods blindly predict where each object is at a specific timestep t ("time indexing"), which struggles to predict precise object movements. Given two images of a baseball, there are infinitely many possible trajectories: accelerating or decelerating, straight or curved. This often results in blurry frames as the method averages out these possibilities. Instead of forcing the network to learn this complicated time-to-location mapping implicitly together with predicting the frames, we provide the network with an explicit hint on how far the object has traveled between start and end frames, a novel approach termed "distance indexing". This method offers a clearer learning goal for models, reducing the uncertainty tied to object speeds. We further observed that, even with this extra guidance, objects can still be blurry especially when they are equally far from both input frames (i.e., halfway in-between), due to the directional ambiguity in long-range motion. To solve this, we propose an iterative reference-based estimation strategy that breaks down a long-range prediction into several short-range steps. When integrating our plug-and-play strategies into state-of-the-art learning-based models, they exhibit markedly sharper outputs and superior perceptual quality in arbitrary time interpolations, using a uniform distance indexing map in the same format as time indexing. Additionally, distance indexing can be specified pixel-wise, which enables temporal manipulation of each object independently, offering a novel tool for video editing tasks like re-timing.Comment: Project page: https://zzh-tech.github.io/InterpAny-Clearer/ ; Code: https://github.com/zzh-tech/InterpAny-Cleare

    Unified Concept-based Multimedia Information Retrieval Technique

    Get PDF
    The explosion of digital data in the last two decades followed by the development of various types of data, including text, images, audio and video known as multimedia data. Multimedia Information Retrieval is required to search various type of media. There is comprehensive information need that can not be handled by the monolithic search engine like Google, Google Image, Youtube, or FindSounds. The shortcoming of search engine today related to their format or media is the dominance of text format, while the expected information could be an image, audio or video. Hence it is necessary to present multimedia format at the same time. This paper tries to design Unified Concept-based Multimedia Information Retrieval (UCpBMIR) technique to tackle those difficulties by using unified multimedia indexing. The indexing technique transforms the various of media with their features into text representation with the concept-based algorithm and put it into the concept detector. Learning model configures the concept detector to classify the multimedia object. The result of the concept detector process is placed in unified multimedia index database and waiting for the concept-based query to be matched into the Semantic Similarities with ontology. The ontology will provide the relationship between object representation of multimedia data. Due to indexing text, image, audio, and video respectively that naturally, they are heterogeneous, but conceptually they may have the relationship among them. From the preliminary result that multimedia document retrieved can be obtained through single query any format in order to retrieve all kind of multimedia format. Unified multimedia indexing technique with ontology will unify each format of multimedi

    Unified Concept-based Multimedia Information Retrieval Technique

    Get PDF
    The explosion of digital data in the last two decades followed by the development of various types of data, including text, images, audio and video known as multimedia data. Multimedia Information Retrieval is required to search various type of media. There is comprehensive information need that can not be handled by the monolithic search engine like Google, Google Image, Youtube, or FindSounds. The shortcoming of search engine today related to their format or media is the dominance of text format, while the expected information could be an image, audio or video. Hence it is necessary to present multimedia format at the same time. This paper tries to design Unified Concept-based Multimedia Information Retrieval (UCpBMIR) technique to tackle those difficulties by using unified multimedia indexing. The indexing technique transforms the various of media with their features into text representation with the concept-based algorithm and put it into the concept detector. Learning model configures the concept detector to classify the multimedia object. The result of the concept detector process is placed in unified multimedia index database and waiting for the concept-based query to be matched into the Semantic Similarities with ontology. The ontology will provide the relationship between object representation of multimedia data. Due to indexing text, image, audio, and video respectively that naturally, they are heterogeneous, but conceptually they may have the relationship among them. From the preliminary result that multimedia document retrieved can be obtained through single query any format in order to retrieve all kind of multimedia format. Unified multimedia indexing technique with ontology will unify each format of multimedia

    Advanced content-based semantic scene analysis and information retrieval: the SCHEMA project

    Get PDF
    The aim of the SCHEMA Network of Excellence is to bring together a critical mass of universities, research centers, industrial partners and end users, in order to design a reference system for content-based semantic scene analysis, interpretation and understanding. Relevant research areas include: content-based multimedia analysis and automatic annotation of semantic multimedia content, combined textual and multimedia information retrieval, semantic -web, MPEG-7 and MPEG-21 standards, user interfaces and human factors. In this paper, recent advances in content-based analysis, indexing and retrieval of digital media within the SCHEMA Network are presented. These advances will be integrated in the SCHEMA module-based, expandable reference system

    SAVASA project @ TRECVid 2013: semantic indexing and interactive surveillance event detection

    Get PDF
    In this paper we describe our participation in the semantic indexing (SIN) and interactive surveillance event detection (SED) tasks at TRECVid 2013 [11]. Our work was motivated by the goals of the EU SAVASA project (Standards-based Approach to Video Archive Search and Analysis) which supports search over multiple video archives. Our aims were: to assess a standard object detection methodology (SIN); evaluate contrasting runs in automatic event detection (SED) and deploy a distributed, cloud-based search interface for the interactive component of the SED task. Results from the SIN task, underlying retrospective classifiers for the surveillance event detection and a discussion of the contrasting aims of the SAVASA user interface compared with the TRECVid task requirements are presented

    Extensible Detection and Indexing of Highlight Events in Broadcasted Sports Video

    Get PDF
    Content-based indexing is fundamental to support and sustain the ongoing growth of broadcasted sports video. The main challenge is to design extensible frameworks to detect and index highlight events. This paper presents: 1) A statistical-driven event detection approach that utilizes a minimum amount of manual knowledge and is based on a universal scope-of-detection and audio-visual features; 2) A semi-schema-based indexing that combines the benefits of schema-based modeling to ensure that the video indexes are valid at all time without manual checking, and schema-less modeling to allow several passes of instantiation in which additional elements can be declared. To demonstrate the performance of the events detection, a large dataset of sport videos with a total of around 15 hours including soccer, basketball and Australian football is used

    A Database Approach for Modeling and Querying Video Data

    Get PDF
    Indexing video data is essential for providing content based access. In this paper, we consider how database technology can offer an integrated framework for modeling and querying video data. As many concerns in video (e.g., modeling and querying) are also found in databases, databases provide an interesting angle to attack many of the problems. From a video applications perspective, database systems provide a nice basis for future video systems. More generally, database research will provide solutions to many video issues even if these are partial or fragmented. From a database perspective, video applications provide beautiful challenges. Next generation database systems will need to provide support for multimedia data (e.g., image, video, audio). These data types require new techniques for their management (i.e., storing, modeling, querying, etc.). Hence new solutions are significant. This paper develops a data model and a rule-based query language for video content based indexing and retrieval. The data model is designed around the object and constraint paradigms. A video sequence is split into a set of fragments. Each fragment can be analyzed to extract the information (symbolic descriptions) of interest that can be put into a database. This database can then be searched to find information of interest. Two types of information are considered: (1) the entities (objects) of interest in the domain of a video sequence, (2) video frames which contain these entities. To represent these information, our data model allows facts as well as objects and constraints. We present a declarative, rule-based, constraint query language that can be used to infer relationships about information represented in the model. The language has a clear declarative and operational semantics. This work is a major revision and a consolidation of [12, 13].This is an extended version of the article in: 15th International Conference on Data Engineering, Sydney, Australia, 1999
    • 

    corecore