7 research outputs found
Listen, Look, and Gotcha: Instant Video Search with Mobile Phones by Layered Audio-Video Indexing *
ABSTRACT Mobile video is quickly becoming a mass consumer phenomenon. More and more people are using their smartphones to search and browse video content while on the move. In this paper, we have developed an innovative instant mobile video search system through which users can discover videos by simply pointing their phones at a screen to capture a very few seconds of what they are watching. The system is able to index large-scale video data using a new layered audio-video indexing approach in the cloud, as well as extract light-weight joint audio-video signatures in real time and perform progressive search on mobile devices. Unlike most existing mobile video search applications that simply send the original video query to the cloud, the proposed mobile system is one of the first attempts at instant and progressive video search leveraging the light-weight computing capacity of mobile devices. The system is characterized by four unique properties: 1) a joint audio-video signature to deal with the large aural and visual variances associated with the query video captured by the mobile phone, 2) layered audio-video indexing to holistically exploit the complementary nature of audio and video signals, 3) light-weight fingerprinting to comply with mobile processing capacity, and 4) a progressive query process to significantly reduce computational costs and improve the user experience-the search process can stop anytime once a confident result is achieved. We have collected 1,400 query videos captured by 25 mobile users from a dataset of 600 hours of video. The experiments show that our system outperforms state-of-the-art methods by achieving 90.79% precision when the query video is less than 10 seconds and 70.07% even when the query video is less than 5 seconds. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]. The search process can stop anytime once a confident search result is achieved. Thus, the user does not need to wait for a fixed time lag. The proposed system is characterized by its unique features such as layered audio-video indexing, as well as instant and progressive search. Categories and Subject Descriptor
Recommended from our members
A general state-based temporal pattern recognition
Time-series and state-sequences are ubiquitous patterns in temporal logic and are widely used to present temporal data in data mining. Generally speaking, there are three known choices for the time primitive: points, intervals, points and intervals. In this thesis, a formal characterization of time-series and state-sequences is presented for both complete and incomplete situations, where a state-sequence is defined as a list of sequential data validated on the corresponding time-series. In addition, subsequence matching is addressed to associate the state-sequences, where both non-temporal aspects as well as rich temporal aspects including temporal order, temporal duration and temporal gap should be taken into account.
Firstly, based on the typed point based time-elements and time-series, a formal characterization of time-series and state-sequences is introduced for both complete and incomplete situations, where a state-sequence is defined as a list of sequential data validated on the corresponding time-series. A time-series is formalized as a tetrad (T, R, Tdur, Tgap), which denotes: the temporal order of time- elements; the temporal relationship between time-elements; the temporal duration of each time-element and the temporal gap between each adjacent pair of time-elements respectively.
Secondly, benefiting from the formal characterization of time-series and state-sequences, a general similarity measurement (GSM) that takes into account both non-temporal and rich temporal information, including temporal order as well as temporal duration and temporal gap, is introduced for subsequence matching. This measurement is general enough to subsume most of the popular existing measurements as special cases. In particular, a new conception of temporal common subsequence is proposed. Furthermore, a new LCS-based algorithm named Optimal Temporal Common Subsequence (OTCS), which takes into account rich temporal information, is designed. The experimental results on 6 benchmark datasets demonstrate the effectiveness and robustness of GSM and its new case OTCS. Compared with binary-value distance measurements, GSM can distinguish between the distance caused by different states in the same operation; compared with the real-penalty distance measurements, it can filter out the noise that may push the similarity into abnormal levels.
Finally, two case studies are investigated for temporal pattern recognition: basketball zone-defence detection and video copy detection.
In the case of basketball zone-defence detection, the computational technique and algorithm for detecting zone-defence patterns from basketball videos is introduced, where the Laplacian Matrix-based algorithm is extended to take into account the effects from zoom and single defender‘s translation in zone-defence graph matching and a set of character-angle based features was proposed to describe the zone-defence graph. The experimental results show that the approach explored is useful in helping the coach of the defensive side check whether the players are keeping to the correct zone-defence strategy, as well as detecting the strategy of the opponent side. It can describe the structure relationship between defender-lines for basketball zone-defence, and has a robust performance in both simulation and real-life applications, especially when disturbances exist.
In the case of video copy detection, a framework for subsequence matching is introduced. A hybrid similarity framework addressing both non-temporal and temporal relationships between state-sequences, represented by bipartite graphs, is proposed. The experimental results using real-life video databases demonstrated that the proposed similarity framework is robust to states alignment with different numbers and different values, and various reordering including inversion and crossover
Effective and efficient query processing for video subsequence identification
With the growing demand for visual information of rich content, effective and efficient manipulations of large video databases are increasingly desired. Many investigations have been made on content-based video retrieval. However, despite the importance, video subsequence identification, which is to find the similar content to a short query clip from a long video sequence, has not been well addressed. This paper presents a graph transformation and matching approach to this problem, with extension to identify the occurrence of potentially different ordering or length due to content editing. With a novel batch query algorithm to retrieve similar frames, the mapping relationship between the query and database video is first represented by a bipartite graph. The densely matched parts along the long sequence are then extracted, followed by a filter-and-refine search strategy to prune some irrelevant subsequences. During the filtering stage, Maximum Size Matching is deployed for each subgraph constructed by the query and candidate subsequence to obtain a smaller set of candidates. During the refinement stage, Sub-Maximum Similarity Matching is devised to identify the subsequence with the highest aggregate score from all candidates, according to a robust video similarity model that incorporates visual content, temporal order, and frame alignment information. The performance studies conducted on a long video recording of 50 hours validate that our approach is promising in terms of both search accuracy and speed
Estudio del comportamiento del iDistance en la recuperación de video basada en contenido
En este proyecto se presenta el iDistance como método de indexación de datos altamente dimensionales utilizando la técnica reducción de la dimensionalidad y se estudia su comportamiento en un sistema de video basado en contenido (Content-Based Video Retrieval, CBVR). Para poder crear el índice “iDistance” es necesario obtener los puntos de referencia del conjunto de datos dim-dimensional y para ello se va a utilizar una técnica de clusteirng llamada kmeans. Una vez creado el iDistance, este puede incluirse en un CBVR para probar su comportamiento en la identificación de subsecuencias de video, de manera que será el iDistance el que recupere los frames similares para un procesado posterior con el objetivo de la identificación de la subsecuencia de consulta. Para comparar los resultados se ha utilizado otra técnica para resolver la maldición de la dimensionalidad basada en vectores de aproximación, como es el VA-File. Mientras que respecto a la búsqueda de videos, se realizan mejoras para la identificación de subsecuencias de video. En cuanto al contenido, en este proyecto se presentan las características mas importantes de los datos altamente dimensionales así como las métricas utilizadas para su clasificación en distancia. Se introduce el árbol B+ como núcleo en el que está basado el iDistance así como todas las operaciones asociadas a dicha estructura de datos. También se trata la teoría de grafos bipartitos y matching puesto que es imprescindible para la identificación de subsecuencias de video. Posteriormente, se estudia e implementa el iDistance como motor de indexación en bases de datos altamente dimensionales, prestando especial atención en la metodología de indexación y búsqueda en la consulta de los K vecinos más cercanos. Tras dicho estudio se proponen una serie de experimentos con datos de video reales con el objetivo de estudiar el rendimiento con la variación de parámetros clave en la configuración del iDistance. Una vez estudiado el iDistance, se procede a introducir dicho motor de indexación en un sistema de recuperación de video basado en contenido para la identificación de subsecuencias de video. En este proyecto, además, se propone la recuperación de las K mejores subsecuencias en ranking estudiando el comportamiento del acierto de las mismas en una batería de experimentos posterior