7 research outputs found

    Video copy detection by fast sequence matching

    Get PDF
    ABSTRACT Sequence matching techniques are effective for comparing two videos. However, existing approaches suffer from demanding computational costs and thus are not scalable for large-scale applications. In this paper we view video copy detection as a local alignment problem between two frame sequences and propose a two-level filtration approach which achieves significant acceleration to the matching process. First, we propose to use an adaptive vocabulary tree to index all frame descriptors extracted from the video database. In this step, each video is treated as a "bag of frames." Such an indexing structure not only provides a rich vocabulary for representing videos, but also enables efficient computation of a pyramid matching kernel between videos. This vocabulary tree filters those videos that are dissimilar to the query based on their histogram pyramid representations. Second, we propose a fast edit-distance-based sequence matching method that avoids unnecessary comparisons between dissimilar frame pairs. This step reduces the quadratic runtime to a linear time with respect to the lengths of the sequences under comparison. Experiments on the MUSCLE VCD benchmark demonstrate that our approach is effective and efficient. It is 18X faster than the original sequence matching algorithms. This technique can be applied to several other visual retrieval tasks including shape retrieval. We demonstrate that the proposed method can also achieve a significant speedup for the shape retrieval task on the MPEG-7 shape dataset

    The benefits of buddying

    Get PDF
    In June 2009, a meeting was held with Marishona Ortega (Academic Subject Librarian), Philippa Dyson and Lys Ann Reiners (Deputy Librarians) to discuss the principles of developing a mentoring system within the library at the University of Lincoln. During discussions we felt it important to develop something new that would create an ethos of mutual support within the department and this is where we felt buddying would be a step forward. So what is buddying? The National Council for Voluntary Organisations defines it as ‘a system for ena-bling peers to support each other by sharing experiences, offering advice and providing a sounding board for ideas and problems. Buddying is different from mentoring, which is a more formal and structured rela-tionship where the mentor is typically in a more senior role than the mentee.’ 1 The strength of buddying is that it takes the view that both partners can offer each support and opportunities to learn whatever role they fulfil. The buddying relationship need not and possibly should not be a permanent arrangement. Ideally, staff should change buddies regularly to ensure a broad range of perspectives is achieved as per the recom-mendations of Urquhart et al and Cunningham

    Large-scale predictive modeling and analytics through regression queries in data management systems

    Get PDF
    Regression analytics has been the standard approach to modeling the relationship between input and output variables, while recent trends aim to incorporate advanced regression analytics capabilities within data management systems (DMS). Linear regression queries are fundamental to exploratory analytics and predictive modeling. However, computing their exact answers leaves a lot to be desired in terms of efficiency and scalability. We contribute with a novel predictive analytics model and an associated statistical learning methodology, which are efficient, scalable and accurate in discovering piecewise linear dependencies among variables by observing only regression queries and their answers issued to a DMS. We focus on in-DMS piecewise linear regression and specifically in predicting the answers to mean-value aggregate queries, identifying and delivering the piecewise linear dependencies between variables to regression queries and predicting the data dependent variables within specific data subspaces defined by analysts and data scientists. Our goal is to discover a piecewise linear data function approximation over the underlying data only through query–answer pairs that is competitive with the best piecewise linear approximation to the ground truth. Our methodology is analyzed, evaluated and compared with exact solution and near-perfect approximations of the underlying relationships among variables achieving orders of magnitude improvement in analytics processing

    Transform Based And Search Aware Text Compression Schemes And Compressed Domain Text Retrieval

    Get PDF
    In recent times, we have witnessed an unprecedented growth of textual information via the Internet, digital libraries and archival text in many applications. While a good fraction of this information is of transient interest, useful information of archival value will continue to accumulate. We need ways to manage, organize and transport this data from one point to the other on data communications links with limited bandwidth. We must also have means to speedily find the information we need from this huge mass of data. Sometimes, a single site may also contain large collections of data such as a library database, thereby requiring an efficient search mechanism even to search within the local data. To facilitate the information retrieval, an emerging ad hoc standard for uncompressed text is XML which preprocesses the text by putting additional user defined metadata such as DTD or hyperlinks to enable searching with better efficiency and effectiveness. This increases the file size considerably, underscoring the importance of applying text compression. On account of efficiency (in terms of both space and time), there is a need to keep the data in compressed form for as much as possible. Text compression is concerned with techniques for representing the digital text data in alternate representations that takes less space. Not only does it help conserve the storage space for archival and online data, it also helps system performance by requiring less number of secondary storage (disk or CD Rom) accesses and improves the network transmission bandwidth utilization by reducing the transmission time. Unlike static images or video, there is no international standard for text compression, although compressed formats like .zip, .gz, .Z files are increasingly being used. In general, data compression methods are classified as lossless or lossy. Lossless compression allows the original data to be recovered exactly. Although used primarily for text data, lossless compression algorithms are useful in special classes of images such as medical imaging, finger print data, astronomical images and data bases containing mostly vital numerical data, tables and text information. Many lossy algorithms use lossless methods at the final stage of the encoding stage underscoring the importance of lossless methods for both lossy and lossless compression applications. In order to be able to effectively utilize the full potential of compression techniques for the future retrieval systems, we need efficient information retrieval in the compressed domain. This means that techniques must be developed to search the compressed text without decompression or only with partial decompression independent of whether the search is done on the text or on some inversion table corresponding to a set of key words for the text. In this dissertation, we make the following contributions: (1) Star family compression algorithms: We have proposed an approach to develop a reversible transformation that can be applied to a source text that improves existing algorithm\u27s ability to compress. We use a static dictionary to convert the English words into predefined symbol sequences. These transformed sequences create additional context information that is superior to the original text. Thus we achieve some compression at the preprocessing stage. We have a series of transforms which improve the performance. Star transform requires a static dictionary for a certain size. To avoid the considerable complexity of conversion, we employ the ternary tree data structure that efficiently converts the words in the text to the words in the star dictionary in linear time. (2) Exact and approximate pattern matching in Burrows-Wheeler transformed (BWT) files: We proposed a method to extract the useful context information in linear time from the BWT transformed text. The auxiliary arrays obtained from BWT inverse transform brings logarithm search time. Meanwhile, approximate pattern matching can be performed based on the results of exact pattern matching to extract the possible candidate for the approximate pattern matching. Then fast verifying algorithm can be applied to those candidates which could be just small parts of the original text. We present algorithms for both k-mismatch and k-approximate pattern matching in BWT compressed text. A typical compression system based on BWT has Move-to-Front and Huffman coding stages after the transformation. We propose a novel approach to replace the Move-to-Front stage in order to extend compressed domain search capability all the way to the entropy coding stage. A modification to the Move-to-Front makes it possible to randomly access any part of the compressed text without referring to the part before the access point. (3) Modified LZW algorithm that allows random access and partial decoding for the compressed text retrieval: Although many compression algorithms provide good compression ratio and/or time complexity, LZW is the first one studied for the compressed pattern matching because of its simplicity and efficiency. Modifications on LZW algorithm provide the extra advantage for fast random access and partial decoding ability that is especially useful for text retrieval systems. Based on this algorithm, we can provide a dynamic hierarchical semantic structure for the text, so that the text search can be performed on the expected level of granularity. For example, user can choose to retrieve a single line, a paragraph, or a file, etc. that contains the keywords. More importantly, we will show that parallel encoding and decoding algorithm is trivial with the modified LZW. Both encoding and decoding can be performed with multiple processors easily and encoding and decoding process are independent with respect to the number of processors

    ADVISE: advanced digital video information segmentation engine.

    Get PDF
    by Chung-Wing Ng.Thesis (M.Phil.)--Chinese University of Hong Kong, 2002.Includes bibliographical references (leaves 100-107).Abstracts in English and Chinese.Abstract --- p.iiAcknowledgment --- p.viTable of Contents --- p.viiList of Tables --- p.xList of Figures --- p.xiChapter Chapter 1 --- Introduction --- p.1Chapter 1.1 --- Image-based Video Description --- p.2Chapter 1.2 --- Video Summary --- p.5Chapter 1.3 --- Video Matching --- p.6Chapter 1.4 --- Contributions --- p.7Chapter 1.5 --- Outline of Thesis --- p.8Chapter Chapter 2 --- Literature Review --- p.10Chapter 2.1 --- Video Retrieval in Digital Video Libraries --- p.11Chapter 2.1.1 --- The VISION Project --- p.11Chapter 2.1.2 --- The INFORMEDIA Project --- p.12Chapter 2.1.3 --- Discussion --- p.13Chapter 2.2 --- Video Structuring --- p.14Chapter 2.2.1 --- Video Segmentation --- p.16Chapter 2.2.2 --- Color histogram Extraction --- p.17Chapter 2.2.3 --- Further Structuring --- p.18Chapter 2.3 --- XML Technologies --- p.19Chapter 2.3.1 --- XML Syntax --- p.20Chapter 2.3.2 --- "Document Type Definition, DTD" --- p.21Chapter 2.3.3 --- "Extensible Stylesheet Language, XSL" --- p.21Chapter 2.4 --- SMIL Technology --- p.22Chapter 2.4.1 --- SMIL Syntax --- p.23Chapter 2.4.2 --- Model of SMIL Applications --- p.23Chapter Chapter 3 --- Overview of ADVISE --- p.25Chapter 3.1 --- Objectives --- p.26Chapter 3.2 --- System Architecture --- p.26Chapter 3.2.1 --- Video Preprocessing Module --- p.26Chapter 3.2.2 --- Web-based Video Retrieval Module --- p.30Chapter 3.2.3 --- Video Streaming Server --- p.34Chapter 3.3 --- Summary --- p.35Chapter Chapter 4 --- Construction of Video Table-of-Contents (V-ToC) --- p.36Chapter 4.1 --- Video Structuring --- p.37Chapter 4.1.1 --- Terms and Definitions --- p.37Chapter 4.1.2 --- Regional Color Histograms --- p.39Chapter 4.1.3 --- Video Shot Boundaries Detection --- p.43Chapter 4.1.4 --- Video Groups Formation --- p.47Chapter 4.1.5 --- Video Scenes Formation --- p.50Chapter 4.2 --- Storage and Presentation --- p.53Chapter 4.2.1 --- Definition of XML Video Structure --- p.54Chapter 4.2.2 --- V-ToC Presentation Using XSL --- p.55Chapter 4.3 --- Evaluation of Video Structure --- p.58Chapter Chapter 5 --- Video Summarization --- p.62Chapter 5.1 --- Terms and Definitions --- p.64Chapter 5.2 --- Video Features Used for Summarization --- p.65Chapter 5.3 --- Video Summarization Algorithm --- p.67Chapter 5.3.1 --- Combining Extracted Video Segments --- p.68Chapter 5.3.2 --- Scoring the Extracted Video Segments --- p.69Chapter 5.3.3 --- Selecting Extracted Video Segments --- p.70Chapter 5.3.4 --- Refining the Selection Result --- p.71Chapter 5.4 --- Video Summary in SMIL --- p.74Chapter 5.5 --- Evaluations --- p.76Chapter 5.5.1 --- Experiment 1: Percentages of Features Extracted --- p.76Chapter 5.5.2 --- Experiment 2: Evaluation of the Refinement Process --- p.78Chapter Chapter 6 --- Video Matching Using V-ToC --- p.80Chapter 6.1 --- Terms and Definitions --- p.81Chapter 6.2 --- Video Features Used for Matching --- p.82Chapter 6.3 --- Non-ordered Tree Matching Algorithm --- p.83Chapter 6.4 --- Ordered Tree Matching Algorithms --- p.87Chapter 6.5 --- Evaluation of Video Matching --- p.91Chapter 6.5.1 --- Applying Non-ordered Tree Matching --- p.92Chapter 6.5.2 --- Applying Ordered Tree Matching --- p.94Chapter Chapter 7 --- Conclusion --- p.96Bibliography --- p.10

    Comparaison des documents audiovisuels<br />par Matrice de Similarité

    Get PDF
    The work of this thesis relates to the comparison of video documents. The field of digital video is in full expansion. Videos are now present in large quantity even for personal use. The video comparison is a basic analysis operation in complement of classification, extraction and structuring of videos.Traditional approaches of comparison are primarily based on the low-level features of the videos to be compared, considered as multidimensional vectors. Other approaches are based on the similarity of frames without taking into account neither the temporal composition of the video nor the audiolayer. The main disadvantage of these methods is that they reduce the comparison role to a simple operator robust to noise effects. Such operators are generally used in order to identify the various specimens of a same document.The originality of our approach lies in the introduction of the of style similarity notion, taking as a starting point the human criteria into the comparison. These criteria are more flexible, and do not impose a strict similarity of all the studied features at the same time.We define an algorithm of extraction of the similarities between the series of values produced bythe analysis of the audiovisual low-level features. The algorithm is inspired by the dynamic programmingand the time series comparison methods.We propose a representation of the data resulting from these processings in the form of a matrixpattern suitable for the visual and immediate comparison of two videos. This matrix is then used topropose a generic similarity measure. The measure is applicable independently to videos of comparableor heterogeneous contents.We developed several applications to demonstrate the behavior of the comparison method and thesimilarity measure. The experiments concern primarily: - the identification of the structure in acollection/sub-collection of documents, - the description of stylistics elements in a movie, and - theanalysis of the grid of programs from a TV stream.Les travaux de cette thèse concernent la comparaison des documents vidéo. Dans le domaine en pleine expansion de la vidéo numérique, les documents disponibles sont maintenant présents en quantité importante même dans les foyers. Opération de base de tout type d'analyse de contenus, en complément de la classification, de l'extraction et de la structuration, la comparaison dans le domaine de l'audiovisuel est d'une utilité qui n'est pas à démontrer.Des approches classiques de comparaison se basent essentiellement sur l'ensemble des caractéristiquesbas niveaux des documents à comparer, en les considérant comme des vecteurs multidimensionnels. D'autres approches se basent sur la similarité des images composant la vidéo sans tenir compte de la composition temporelle du document ni de la bandeson. Le défaut que l'on peut reprocher à ces méthodes est qu'elles restreignent la comparaison à un simple opérateur binaire robuste au bruit. De tels opérateurs sont généralement utilisés afin d'identifier les différents exemplaires d'un même document. L'originalité de notre démarche réside dans le fait que nous introduisons la notion de la similarité de styleen s'inspirant des critères humains dans la comparaison des documents vidéo. Ces critèressont plus souples, et n'imposent pas une similarité stricte de toutes les caractéristiques étudiéesà la fois.En nous inspirant de la programmation dynamique et de la comparaison des séries chronologiques, nous définissons un algorithme d'extraction des similarités entre les séries de valeurs produites par l'analyse de caractéristiques audiovisuelles de bas-niveau. Ensuite, un second traitement générique approxime le résultat de l'algorithme de la longueur de la PlusLongue Sous-Séquence Commune (PLSC) plus rapidement que ce dernier. Nous proposons une représentation des données issues de ces traitements sous la forme d'un schéma matriciel propre à la comparaison visuelle et immédiate de deux contenus. Cette matrice peut être également utilisée pour définir une mesure de similarité générique, applicable à des documents de même genre ou de genres hétérogènes.Plusieurs applications ont été mises en place pour démontrer le comportement de la méthode de comparaison et de la mesure de similarité, ainsi que leur pertinence. Les expérimentations concernent essentiellement : - l'identification d'une structure organisationnelle en collection / sous-collection d'une base de documents, - la mise en évidence d'élémentsstylistiques dans un film de cinéma, - la mise en évidence de la grille de programmes d'unflux de télévision
    corecore