562 research outputs found

    Fast search algorithms for ECVQ using projection pyramids and variance of codewords

    Get PDF
    金沢大学大学院自然科学研究科情報システム金沢大学工学部Vector quantization for image compression requires expensive time to find the closest codeword through the codebook. Codebook design based on empirical data for entropy-constrained vector quantization (ECVQ) involves a time consuming training phase in which a Lagrangian cost measure has to be minimized over the set of codebook vectors. In this paper, we propose two fast codebook generation methods for ECVQ. In the first one, we use an appropriate topological structure of input vectors and codewords to reject many codewords that are impossible to be candidates for the best codeword. In the second method, we use the variance test to increase the ability of the first algorithm to reject more codewords. These algorithms allow significant acceleration in the codebook design process. Experimental results are presented on image block data. These results show that our new algorithms perform better than the previously known methods

    Feature Encoding Strategies for Multi-View Image Classification

    Get PDF
    Machine vision systems can vary greatly in size and complexity depending on the task at hand. However, the purpose of inspection, quality and reliability remains the same. This work sets out to bridge the gap between traditional machine vision and computer vision. By applying powerful computer vision techniques, we are able to achieve more robust solutions in manufacturing settings. This thesis presents a framework for applying powerful new image classification techniques used for image retrieval in the Bag of Words (BoW) framework. In addition, an exhaustive evaluation of commonly used feature pooling approaches is conducted with results showing that spatial augmentation can outperform mean and max descriptor pooling on an in-house dataset and the CalTech 3D dataset. The results for the experiments contained within, details a framework that performs classification using multiple view points. The results show that the feature encoding method known as Triangulation Embedding outperforms the Vector of Locally Aggregated Descriptors (VLAD) and the standard BoW framework with an accuracy of 99.28%. This improvement is also seen on the public Caltech 3D dataset where the improvement over VLAD and BoW was 5.64% and 12.23% respectively. This proposed multiple view classification system is also robust enough to handle the real world problem of camera failure and still classify with a high reliability. A missing camera input was simulated and showed that using the Triangulation Embedding method, the system could perform classification with a very minor reduction in accuracy at 98.89%, compared to the BoW baseline at 96.60% using the same techniques. The presented solution tackles the traditional machine vision problem of object identification and also allows for the training of a machine vision system that can be done without any expert level knowledge

    Detecting Nature in Pictures

    Get PDF
    Com o advento da partilha em grande escala de imagens geo-referenciadas na internet, em portais como o Flickr e Panoramio, existem agora grandes fontes de dados prontas a serem processadas para a extracção de informação útil. A utilização destes dados para a criação de um mapa das áreas naturais e de origem humana do nosso planeta, pode fornecer conhecimento adicional aos decisores políticos responsáveis pela conservação do planeta.O problema de determinar o grau de naturalidade de uma imagem, pré-condição para a criação de tal mapa, pode ser generalizado como um problema de classificação de paisagens. Foram executadas experiências para melhor compreender a aplicabilidade de cada uma das técnicas identificadas para a classificação de paisagens quando aplicada à tarefa de distinguir entre imagens naturais e de origem humana. As suas vantagens e limitações, como os seus requisitos computacionais, são detalhados.Com uma escolha cuidada das técnicas e respectivos parâmetros foi possível construir um classificador capaz de distinguir entre paisagens naturais e de origem humana com elevada precisão, mas também capaz de processar uma grande quantidade de imagens dentro de um espaço de tempo razoável.With the advent of large-scale geo-tagged image sharing on the internet, on websites such as Flickr and Panoramio, there are now large sources of data ready to be mined for useful information. Using this data to automatically create a map of man-made and natural areas of our planet, can provide additional knowledge to decision-makers responsible for world-conservation.The problem of determining the degree of naturalness of an image, required to create such a map, can be generalized as a scene classification task. Experiments were performed to better understand the applicability of each of the identified scene classification techniques to perform the distinction between man-made and natural images. Their advantages and limitations, such as their computational costs, are detailed.With careful selection of techniques and their parameters it was possible to build a classifier capable of distinguishing between natural and man-made scenery with high accuracy and that can also process a large amount of pictures within a reasonable time frame

    Sparse Modeling for Image and Vision Processing

    Get PDF
    In recent years, a large amount of multi-disciplinary research has been conducted on sparse models and their applications. In statistics and machine learning, the sparsity principle is used to perform model selection---that is, automatically selecting a simple model among a large collection of them. In signal processing, sparse coding consists of representing data with linear combinations of a few dictionary elements. Subsequently, the corresponding tools have been widely adopted by several scientific communities such as neuroscience, bioinformatics, or computer vision. The goal of this monograph is to offer a self-contained view of sparse modeling for visual recognition and image processing. More specifically, we focus on applications where the dictionary is learned and adapted to data, yielding a compact representation that has been successful in various contexts.Comment: 205 pages, to appear in Foundations and Trends in Computer Graphics and Visio

    New methods in image compression using multi-level transforms and adaptive statistical encoDing

    Full text link
    The need to meet the demand for high quality digital images, with comparatively modest storage requirements, is driving the development of new image compression techniques. This demand has spurred new techniques based on time to frequency spatial transformation methods. At the core of these methods are a family of transformations built on basis sets called wavelets. The wavelet transform permits an image to be represented in a substantially reduced space by transferring the energy of the image to a smaller set of coefficients. Although these techniques are lossy as the compression ratio rises, very adequate reconstructions can be made from surprisingly small sets of coefficients. This work explores the transformation process, storage of the representation and the application of these techniques to 24-bit color images. A working color image compression model is illustrated
    corecore