1,178 research outputs found

    COMPRESSED DOMAIN IMAGE INDEXING AND RETRIEVAL BASED ON THE MINIMAL SPANNING TREE

    Get PDF
    ABSTRACT In this paper, a method for content-based retrieval of JPEG images is presented, utilizing features directly from the discrete cosine transform (DCT) domain. Image indexing is achieved by extracting color and texture feature vectors, using an efficient technique applied on the DCT coefficients. Similarity between the query-and database-images is provided based on a statistical graph matching approach. The proposed measure makes use of the Wald-Wolfowitz test, a nonparametric test that assesses the commonality between two different sets of multivariate observations. Experimental results demonstrate the enhanced performance of our approach, compared to previously reported methods

    A Survey on Array Storage, Query Languages, and Systems

    Full text link
    Since scientific investigation is one of the most important providers of massive amounts of ordered data, there is a renewed interest in array data processing in the context of Big Data. To the best of our knowledge, a unified resource that summarizes and analyzes array processing research over its long existence is currently missing. In this survey, we provide a guide for past, present, and future research in array processing. The survey is organized along three main topics. Array storage discusses all the aspects related to array partitioning into chunks. The identification of a reduced set of array operators to form the foundation for an array query language is analyzed across multiple such proposals. Lastly, we survey real systems for array processing. The result is a thorough survey on array data storage and processing that should be consulted by anyone interested in this research topic, independent of experience level. The survey is not complete though. We greatly appreciate pointers towards any work we might have forgotten to mention.Comment: 44 page

    Query engine of novelty in video streams

    Get PDF
    Prior research on novelty detection has primarily focused on algorithms to detect novelty for a given application domain. Effective storage, indexing and retrieval of novel events (beyond detection) are largely ignored as a problem in itself. In light of the recent advances in counter-terrorism efforts and link discovery initiatives, the need for effective data management of novel events assumes apparent importance. Automatically detecting novel events in video data streams is an extremely challenging task. The aim of this thesis is to provide evidence to the fact that the notion of novelty in video as perceived by a human is extremely subjective and therefore algorithmically illdefined. Though it comes as no surprise that current machine-based parametric learning systems to accurately mimic human novelty perception are far from perfect such systems have recently been very successful in exhaustively capturing novelty in video once the novelty function is well-defined by a human expert. So, how truly effective are these machine based novelty detection systems as compared to human novelty detection? In this paper we outline an experimental evaluation of the human vs machine based novelty systems in terms of qualitative performance. We then quantify this evaluation using a variety of metrics based on location of novel events, number of novel events found in the video, etc. We begin by describing a machine-based system for detecting novel events in video data streams. We then discuss the issues of designing an indexing-strategy or Manga (comic-book representation is termed as manga in Japanese) to effectively determine the most-representative novel frames for a video sequence. We then evaluate the performance of machine-based novelty detection system against human novelty detection and present the results. The distance metrics we suggest for novelty comparison may eventually aide a variety of end-users to effectively drive the indexing, retrieval and analysis of large video databases. It should also be noted that the techniques we describe in this paper are based on low-level features extracted from video such as color, intensity and focus of attention. The video processing component does not include any semantic processing such as object detection in video for this framework. We conjecture that such advances, though beyond the scope of this particular paper, would undoubtedly benefit the machine-based novelty detection systems and experimentally validate this. We believe that developing a novelty detection system that works in conjunction with the human expert will lead to a more user-centered data mining approach for such domains. JPEG 2000 is a new method of compressing images better than other image formats such as JPEG, GIF, PNG, etc. The main reason this format is in need for investigation is it allows metadata to be embedded within the image itself. The types of data can essentially be anything such as text, audio, video, images, etc. Currently image annotations are stored and collected side by side. Even though this method is very common, it brings up a lot of risks and flaws. Imagine if medical images were annotated by doctors to describe a tumor within the brain, then suddenly some of the annotations are lost. Without these annotations, the images itself would be useless. By embedding these annotations within the image will guarentee that the description and the image will never be seperated. The metadata embedded within the image has no influence to the image iteself. In this thesis we initially develop a metric to index novelty by comparing it to traditional indexing techniques and to human perception. In the second phase of this thesis, we will investigate the new emerging technology of JPEG 2000 and show that novelty stored in this format will outperform traditional image structures. One of the contributions this thesis is making is to develop metrics to measure the performance and quality between the query results of JPEG 2000 and traditional image formats. Since JPEG 2000 is a new technology, there are no existing metrics to measure this type of performance with traditional images

    Bolt: Accelerated Data Mining with Fast Vector Compression

    Full text link
    Vectors of data are at the heart of machine learning and data mining. Recently, vector quantization methods have shown great promise in reducing both the time and space costs of operating on vectors. We introduce a vector quantization algorithm that can compress vectors over 12x faster than existing techniques while also accelerating approximate vector operations such as distance and dot product computations by up to 10x. Because it can encode over 2GB of vectors per second, it makes vector quantization cheap enough to employ in many more circumstances. For example, using our technique to compute approximate dot products in a nested loop can multiply matrices faster than a state-of-the-art BLAS implementation, even when our algorithm must first compress the matrices. In addition to showing the above speedups, we demonstrate that our approach can accelerate nearest neighbor search and maximum inner product search by over 100x compared to floating point operations and up to 10x compared to other vector quantization methods. Our approximate Euclidean distance and dot product computations are not only faster than those of related algorithms with slower encodings, but also faster than Hamming distance computations, which have direct hardware support on the tested platforms. We also assess the errors of our algorithm's approximate distances and dot products, and find that it is competitive with existing, slower vector quantization algorithms.Comment: Research track paper at KDD 201

    Indexing Techniques for Image and Video Databases: an approach based on Animate Vision Paradigm

    Get PDF
    [ITALIANO]In questo lavoro di tesi vengono presentate e discusse delle innovative tecniche di indicizzazione per database video e di immagini basate sul paradigma della “Animate Vision” (Visione Animata). Da un lato, sarà mostrato come utilizzando, quali algoritmi di analisi di una data immagine, alcuni meccanismi di visione biologica, come i movimenti saccadici e le fissazioni dell'occhio umano, sia possibile ottenere un query processing in database di immagini più efficace ed efficiente. In particolare, verranno discussi, la metodologia grazie alla quale risulta possibile generare due sequenze di fissazioni, a partire rispettivamente, da un'immagine di query I_q ed una di test I_t del data set, e, come confrontare tali sequenze al fine di determinare una possibile misura della similarità (consistenza) tra le due immagini. Contemporaneamente, verrà discusso come tale approccio unito a tecniche classiche di clustering possa essere usato per scoprire le associazioni semantiche nascoste tra immagini, in termini di categorie, che, di contro, permettono un'automatica pre-classificazione (indicizzazione) delle immagini e possono essere usate per guidare e migliorare il processo di query. Saranno presentati, infine, dei risultati preliminari e l'approccio proposto sarà confrontato con le più recenti tecniche per il recupero di immagini descritte in letteratura. Dall'altro lato, sarà mostrato come utilizzando la precedente rappresentazione “foveata” di un'immagine, risulti possibile partizionare un video in shot. Più precisamente, il metodo per il rilevamento dei cambiamenti di shot si baserà sulla computazione, in ogni istante di tempo, della misura di consistenza tra le sequenze di fissazioni generate da un osservatore ideale che guarda il video. Lo schema proposto permette l'individuazione, attraverso l'utilizzo di un'unica tecnica anziché di più metodi dedicati, sia delle transizioni brusche sia di quelle graduali. Vengono infine mostrati i risultati ottenuti su varie tipologie di video e, come questi, validano l'approccio proposto. / [INGLESE]In this dissertation some novel indexing techniques for video and image database based on “Animate Vision” Paradigm are presented and discussed. From one hand, it will be shown how, by embedding within image inspection algorithms active mechanisms of biological vision such as saccadic eye movements and fixations, a more effective query processing in image database can be achieved. In particular, it will be discussed the way to generate two fixation sequences from a query image I_q and a test image I_t of the data set, respectively, and how to compare the two sequences in order to compute a possible similarity (consistency) measure between the two images. Meanwhile, it will be shown how the approach can be used with classical clustering techniques to discover and represent the hidden semantic associations among images, in terms of categories, which, in turn, allow an automatic pre-classification (indexing), and can be used to drive and improve the query processing. Eventually, preliminary results will be presented and the proposed approach compared with the most recent techniques for image retrieval described in the literature. From the other one, it will be discussed how by taking advantage of such foveated representation of an image, it is possible to partitioning of a video into shots. More precisely, the shot-change detection method will be based on the computation, at each time instant, of the consistency measure of the fixation sequences generated by an ideal observer looking at the video. The proposed scheme aims at detecting both abrupt and gradual transitions between shots using a single technique, rather than a set of dedicated methods. Results on videos of various content types are reported and validate the proposed approach

    AUTOMATED FEATURE EXTRACTION AND CONTENT-BASED RETRIEVAL OFPATHOLOGY MICROSCOPIC IMAGES USING K-MEANS CLUSTERING AND CODE RUN-LENGTH PROBABILITY DISTRIBUTION

    Get PDF
    The dissertation starts with an extensive literature survey on the current issues in content-based image retrieval (CBIR) research, the state-of-the-art theories, methodologies, and implementations, covering topics such as general information retrieval theories, imaging, image feature identification and extraction, feature indexing and multimedia database search, user-system interaction, relevance feedback, and performance evaluation. A general CBIR framework has been proposed with three layers: image document space, feature space, and concept space. The framework emphasizes that while the projection from the image document space to the feature space is algorithmic and unrestricted, the connection between the feature space and the concept space is based on statistics instead of semantics. The scheme favors image features that do not rely on excessive assumptions about image contentAs an attempt to design a new CBIR methodology following the above framework, k-means clustering color quantization is applied to pathology microscopic images, followed by code run-length probability distribution feature extraction. Kulback-Liebler divergence is used as distance measure for feature comparison. For content-based retrieval, the distance between two images is defined as a function of all individual features. The process is highly automated and the system is capable of working effectively across different tissues without human interference. Possible improvements and future directions have been discussed

    Exploratory search through large video corpora

    Get PDF
    Activity retrieval is a growing field in electrical engineering that specializes in the search and retrieval of relevant activities and events in video corpora. With the affordability and popularity of cameras for government, personal and retail use, the quantity of available video data is rapidly outscaling our ability to reason over it. Towards the end of empowering users to navigate and interact with the contents of these video corpora, we propose a framework for exploratory search that emphasizes activity structure and search space reduction over complex feature representations. Exploratory search is a user driven process wherein a person provides a system with a query describing the activity, event, or object he is interested in finding. Typically, this description takes the implicit form of one or more exemplar videos, but it can also involve an explicit description. The system returns candidate matches, followed by query refinement and iteration. System performance is judged by the run-time of the system and the precision/recall curve of of the query matches returned. Scaling is one of the primary challenges in video search. From vast web-video archives like youtube (1 billion videos and counting) to the 30 million active surveillance cameras shooting an estimated 4 billion hours of footage every week in the United States, trying to find a set of matches can be like looking for a needle in a haystack. Our goal is to create an efficient archival representation of video corpora that can be calculated in real-time as video streams in, and then enables a user to quickly get a set of results that match. First, we design a system for rapidly identifying simple queries in large-scale video corpora. Instead of focusing on feature design, our system focuses on the spatiotemporal relationships between those features as a means of disambiguating an activity of interest from background. We define a semantic feature vocabulary of concepts that are both readily extracted from video and easily understood by an operator. As data streams in, features are hashed to an inverted index and retrieved in constant time after the system is presented with a user's query. We take a zero-shot approach to exploratory search: the user manually assembles vocabulary elements like color, speed, size and type into a graph. Given that information, we perform an initial downsampling of the archived data, and design a novel dynamic programming approach based on genome-sequencing to search for similar patterns. Experimental results indicate that this approach outperforms other methods for detecting activities in surveillance video datasets. Second, we address the problem of representing complex activities that take place over long spans of space and time. Subgraph and graph matching methods have seen limited use in exploratory search because both problems are provably NP-hard. In this work, we render these problems computationally tractable by identifying the maximally discriminative spanning tree (MDST), and using dynamic programming to optimally reduce the archive data based on a custom algorithm for tree-matching in attributed relational graphs. We demonstrate the efficacy of this approach on popular surveillance video datasets in several modalities. Finally, we design an approach for successive search space reduction in subgraph matching problems. Given a query graph and archival data, our algorithm iteratively selects spanning trees from the query graph that optimize the expected search space reduction at each step until the archive converges. We use this approach to efficiently reason over video surveillance datasets, simulated data, as well as large graphs of protein data
    corecore