133 research outputs found

    Fast search algorithms for ECVQ using projection pyramids and variance of codewords

    Get PDF
    金沢大学大学院自然科学研究科情報システム金沢大学工学部Vector quantization for image compression requires expensive time to find the closest codeword through the codebook. Codebook design based on empirical data for entropy-constrained vector quantization (ECVQ) involves a time consuming training phase in which a Lagrangian cost measure has to be minimized over the set of codebook vectors. In this paper, we propose two fast codebook generation methods for ECVQ. In the first one, we use an appropriate topological structure of input vectors and codewords to reject many codewords that are impossible to be candidates for the best codeword. In the second method, we use the variance test to increase the ability of the first algorithm to reject more codewords. These algorithms allow significant acceleration in the codebook design process. Experimental results are presented on image block data. These results show that our new algorithms perform better than the previously known methods

    Detecting Nature in Pictures

    Get PDF
    Com o advento da partilha em grande escala de imagens geo-referenciadas na internet, em portais como o Flickr e Panoramio, existem agora grandes fontes de dados prontas a serem processadas para a extracção de informação útil. A utilização destes dados para a criação de um mapa das áreas naturais e de origem humana do nosso planeta, pode fornecer conhecimento adicional aos decisores políticos responsáveis pela conservação do planeta.O problema de determinar o grau de naturalidade de uma imagem, pré-condição para a criação de tal mapa, pode ser generalizado como um problema de classificação de paisagens. Foram executadas experiências para melhor compreender a aplicabilidade de cada uma das técnicas identificadas para a classificação de paisagens quando aplicada à tarefa de distinguir entre imagens naturais e de origem humana. As suas vantagens e limitações, como os seus requisitos computacionais, são detalhados.Com uma escolha cuidada das técnicas e respectivos parâmetros foi possível construir um classificador capaz de distinguir entre paisagens naturais e de origem humana com elevada precisão, mas também capaz de processar uma grande quantidade de imagens dentro de um espaço de tempo razoável.With the advent of large-scale geo-tagged image sharing on the internet, on websites such as Flickr and Panoramio, there are now large sources of data ready to be mined for useful information. Using this data to automatically create a map of man-made and natural areas of our planet, can provide additional knowledge to decision-makers responsible for world-conservation.The problem of determining the degree of naturalness of an image, required to create such a map, can be generalized as a scene classification task. Experiments were performed to better understand the applicability of each of the identified scene classification techniques to perform the distinction between man-made and natural images. Their advantages and limitations, such as their computational costs, are detailed.With careful selection of techniques and their parameters it was possible to build a classifier capable of distinguishing between natural and man-made scenery with high accuracy and that can also process a large amount of pictures within a reasonable time frame

    Feature Extraction and Recognition for Human Action Recognition

    Get PDF
    How to automatically label videos containing human motions is the task of human action recognition. Traditional human action recognition algorithms use the RGB videos as input, and it is a challenging task because of the large intra-class variations of actions, cluttered background, possible camera movement, and illumination variations. Recently, the introduction of cost-effective depth cameras provides a new possibility to address difficult issues. However, it also brings new challenges such as noisy depth maps and time alignment. In this dissertation, effective and computationally efficient feature extraction and recognition algorithms are proposed for human action recognition. At the feature extraction step, two novel spatial-temporal feature descriptors are proposed which can be combined with local feature detectors. The first proposed descriptor is the Shape and Motion Local Ternary Pattern (SMltp) descriptor which can dramatically reduced the number of features generated by dense sampling without sacrificing the accuracy. In addition, the Center-Symmetric Motion Local Ternary Pattern (CS-Mltp) descriptor is proposed, which describes the spatial and temporal gradients-like features. Both descriptors (SMltp and CS-Mltp) take advantage of the Local Binary Pattern (LBP) texture operator in terms of tolerance to illumination change, robustness in homogeneous region and computational efficiency. For better feature representation, this dissertation presents a new Dictionary Learning (DL) method to learn an overcomplete set of representative vectors (atoms) so that any input feature can be approximated by a linear combination of these atoms with minimum reconstruction error. Instead of simultaneously learning one overcomplete dictionary for all classes, we learn class-specific sub-dictionaries to increase the discrimination. In addition, the group sparsity and the geometry constraint are added to the learning process to further increase the discriminative power, so that features are well reconstructed by atoms from the same class and features from the same class with high similarity will be forced to have similar coefficients. To evaluate the proposed algorithms, three applications including single view action recognition, distributed multi-view action recognition, and RGB-D action recognition have been explored. Experimental results on benchmark datasets and comparative analyses with the state-of-the-art methods show the effectiveness and merits of the proposed algorithms

    Online Multi-Stage Deep Architectures for Feature Extraction and Object Recognition

    Get PDF
    Multi-stage visual architectures have recently found success in achieving high classification accuracies over image datasets with large variations in pose, lighting, and scale. Inspired by techniques currently at the forefront of deep learning, such architectures are typically composed of one or more layers of preprocessing, feature encoding, and pooling to extract features from raw images. Training these components traditionally relies on large sets of patches that are extracted from a potentially large image dataset. In this context, high-dimensional feature space representations are often helpful for obtaining the best classification performances and providing a higher degree of invariance to object transformations. Large datasets with high-dimensional features complicate the implementation of visual architectures in memory constrained environments. This dissertation constructs online learning replacements for the components within a multi-stage architecture and demonstrates that the proposed replacements (namely fuzzy competitive clustering, an incremental covariance estimator, and multi-layer neural network) can offer performance competitive with their offline batch counterparts while providing a reduced memory footprint. The online nature of this solution allows for the development of a method for adjusting parameters within the architecture via stochastic gradient descent. Testing over multiple datasets shows the potential benefits of this methodology when appropriate priors on the initial parameters are unknown. Alternatives to batch based decompositions for a whitening preprocessing stage which take advantage of natural image statistics and allow simple dictionary learners to work well in the problem domain are also explored. Expansions of the architecture using additional pooling statistics and multiple layers are presented and indicate that larger codebook sizes are not the only step forward to higher classification accuracies. Experimental results from these expansions further indicate the important role of sparsity and appropriate encodings within multi-stage visual feature extraction architectures
    corecore