2,626 research outputs found
A quick search method for audio signals based on a piecewise linear representation of feature trajectories
This paper presents a new method for a quick similarity-based search through
long unlabeled audio streams to detect and locate audio clips provided by
users. The method involves feature-dimension reduction based on a piecewise
linear representation of a sequential feature trajectory extracted from a long
audio stream. Two techniques enable us to obtain a piecewise linear
representation: the dynamic segmentation of feature trajectories and the
segment-based Karhunen-L\'{o}eve (KL) transform. The proposed search method
guarantees the same search results as the search method without the proposed
feature-dimension reduction method in principle. Experiment results indicate
significant improvements in search speed. For example the proposed method
reduced the total search time to approximately 1/12 that of previous methods
and detected queries in approximately 0.3 seconds from a 200-hour audio
database.Comment: 20 pages, to appear in IEEE Transactions on Audio, Speech and
Language Processin
A semantic feature for human motion retrieval
With the explosive growth of motion capture data, it becomes very imperative in animation production to have an efficient search engine to retrieve motions from large motion repository. However, because of the high dimension of data space and complexity of matching methods, most of the existing approaches cannot return the result in real time. This paper proposes a high level semantic feature in a low dimensional space to represent the essential characteristic of different motion classes. On the basis of the statistic training of Gauss Mixture Model, this feature can effectively achieve motion matching on both global clip level and local frame level. Experiment results show that our approach can retrieve similar motions with rankings from large motion database in real-time and also can make motion annotation automatically on the fly. Copyright © 2013 John Wiley & Sons, Ltd
SeeSaw: Interactive Ad-hoc Search Over Image Databases
As image datasets become ubiquitous, the problem of ad-hoc searches over
image data is increasingly important. Many high-level data tasks in machine
learning, such as constructing datasets for training and testing object
detectors, imply finding ad-hoc objects or scenes within large image datasets
as a key sub-problem. New foundational visual-semantic embeddings trained on
massive web datasets such as Contrastive Language-Image Pre-Training (CLIP) can
help users start searches on their own data, but we find there is a long tail
of queries where these models fall short in practice. SeeSaw is a system for
interactive ad-hoc searches on image datasets that integrates state-of-the-art
embeddings like CLIP with user feedback in the form of box annotations to help
users quickly locate images of interest in their data even in the long tail of
harder queries. One key challenge for SeeSaw is that, in practice, many
sensible approaches to incorporating feedback into future results, including
state-of-the-art active-learning algorithms, can worsen results compared to
introducing no feedback, partly due to CLIP's high-average performance.
Therefore, SeeSaw includes several algorithms that empirically result in larger
and also more consistent improvements. We compare SeeSaw's accuracy to both
using CLIP alone and to a state-of-the-art active-learning baseline and find
SeeSaw consistently helps improve results for users across four datasets and
more than a thousand queries. SeeSaw increases Average Precision (AP) on search
tasks by an average of .08 on a wide benchmark (from a base of .72), and by a
.27 on a subset of more difficult queries where CLIP alone performs poorly.Comment: SIGMOD 2024 camera read
BilVideo: A video database management system
Cataloged from PDF version of article.The BilVideo video database management system provides integrated support for spatiotemporal and semantic queries for video. BilVideo can support any application with video data searching needs. It's query language provides a simple way to extend the system's query capabilities. Users can add application-dependent rules and facts to the knowledge base
CBCD Based on Color Features and Landmark MDS-Assisted Distance Estimation
Content-Based Copy Detection (CBCD) of digital videos is an important research field that aims at the identification of modified copies of an original clip, e.g., on the Internet. In this application, the video content is uniquely identified by the content itself, by extracting some compact features that are robust to a certain set of video transformations. Given the huge amount of data present in online video databases, the computational complexity of the feature extraction and comparison is a very important issue. In this paper, a landmark based multi-dimensional scaling technique is proposed to speed up the detection procedure which is based on exhaustive search and the MPEG-7 Dominant Color Descriptor. The method is evaluated under the MPEG Video Signature Core Experiment conditions, and simulation results show impressive time savings at the cost of a slightly reduced detection performance
STRG-QL: Spatio-Temporal Region Graph Query Language for Video Databases
Copyright 2008 Society of Photo-Optical Instrumentation Engineers. One print or electronic copy may be made for personal use only. Systematic reproduction and distribution, duplication of any material in this paper for a fee or for commercial purposes, or modification of the content of the paper are prohibited.In this paper, we present a new graph-based query language and its query processing for a Graph-based Video Database Management System (GVDBMS). Although extensive researches have proposed various query languages for video databases, most of them have the limitation in handling general-purpose video queries. Each method can handle specific data model, query type or application. In order to develop a general-purpose video query language, we first produce Spatio-Temporal Region Graph (STRG) for each video, which represents spatial and temporal information of video objects. An STRG data model is generated from the STRG by exploiting object-oriented model. Based on the STRG data model, we propose a new graph-based query language named STRG-QL, which supports various types of video query. To process the proposed STRG-QL, we introduce a rule-based query optimization that considers the characteristics of video data, i.e., the hierarchical correlations among video segments. The results of our extensive experimental study show that the proposed STRG-QL is promising in terms of accuracy and cost.http://dx.doi.org/10.1117/12.76553
- …