15 research outputs found

    Minimalistic Unsupervised Learning with the Sparse Manifold Transform

    Full text link
    We describe a minimalistic and interpretable method for unsupervised learning, without resorting to data augmentation, hyperparameter tuning, or other engineering designs, that achieves performance close to the SOTA SSL methods. Our approach leverages the sparse manifold transform, which unifies sparse coding, manifold learning, and slow feature analysis. With a one-layer deterministic sparse manifold transform, one can achieve 99.3% KNN top-1 accuracy on MNIST, 81.1% KNN top-1 accuracy on CIFAR-10 and 53.2% on CIFAR-100. With a simple gray-scale augmentation, the model gets 83.2% KNN top-1 accuracy on CIFAR-10 and 57% on CIFAR-100. These results significantly close the gap between simplistic ``white-box'' methods and the SOTA methods. Additionally, we provide visualization to explain how an unsupervised representation transform is formed. The proposed method is closely connected to latent-embedding self-supervised methods and can be treated as the simplest form of VICReg. Though there remains a small performance gap between our simple constructive model and SOTA methods, the evidence points to this as a promising direction for achieving a principled and white-box approach to unsupervised learning

    Artificial Intelligence for Multimedia Signal Processing

    Get PDF
    Artificial intelligence technologies are also actively applied to broadcasting and multimedia processing technologies. A lot of research has been conducted in a wide variety of fields, such as content creation, transmission, and security, and these attempts have been made in the past two to three years to improve image, video, speech, and other data compression efficiency in areas related to MPEG media processing technology. Additionally, technologies such as media creation, processing, editing, and creating scenarios are very important areas of research in multimedia processing and engineering. This book contains a collection of some topics broadly across advanced computational intelligence algorithms and technologies for emerging multimedia signal processing as: Computer vision field, speech/sound/text processing, and content analysis/information mining

    Sparse Coding with Structured Sparsity Priors and Multilayer Architecture for Image Classification

    Get PDF
    Applying sparse coding on large dataset for image classification is a long standing problem in the field of computer vision. It has been found that the sparse coding models exhibit disappointing performance on these large datasets where variability is broad and anomalies are common. Conversely, deep neural networks thrive on bountiful data. Their success has encouraged researchers to try and augment the learning capacity of traditionally shallow sparse coding methods by adding layers. Multilayer sparse coding networks are expected to combine the best of both sparsity regularizations and deep architectures. To date, however, endeavors to marry the two techniques have not achieved significant improvements over their individual counterparts. In this thesis, we first briefly review multiple structured sparsity priors as well as various supervised dictionary learning techniques with applications on hyperspectral image classification. Based on the structured sparsity priors and dictionary learning techniques, we then develop a novel multilayer sparse coding network that contains thirteen sparse coding layers. The proposed sparse coding network learns both the dictionaries and the regularization parameters simultaneously using an end-to-end supervised learning scheme. We show empirical evidence that the regularization parameters can adapt to the given training data. We also propose applying dimension reduction within sparse coding networks to dramatically reduce the output dimensionality of the sparse coding layers and mitigate computational costs. Moreover, our sparse coding network is compatible with other powerful deep learning techniques such as drop out, batch normalization and shortcut connections. Experimental results show that the proposed multilayer sparse coding network produces classification accuracy competitive with the deep neural networks while using significantly fewer parameters and layers

    Object Recognition in Videos Utilizing Hierarchical and Temporal Objectness with Deep Neural Networks

    Get PDF
    This dissertation develops a novel system for object recognition in videos. The input of the system is a set of unconstrained videos containing a known set of objects. The output is the locations and categories for each object in each frame across all videos. Initially, a shot boundary detection algorithm is applied to the videos to divide them into multiple sequences separated by the identified shot boundaries. Since each of these sequences still contains moderate content variations, we further use a cost optimization-based key frame extraction method to select key frames in each sequence and use these key frames to divide the videos into shorter sub-sequences with little content variations. Next, we learn object proposals on the first frame of each sub-sequence. Building upon the state-of-the-art object detection algorithms, we develop a tree-based hierarchical model to improve the object detection. Using the learned object proposals as the initial object positions in the first frame of each sub-sequence, we apply the SPOT tracker to track the object proposals and re-rank them using the proposed temporal objectness to obtain object proposals tubes by removing unlikely objects. Finally, we employ the deep Convolution Neural Network (CNN) to perform classification on these tubes. Experiments show that the proposed system significantly improves the object detection rate of the learned proposals when comparing with some state-of-the-art object detectors. Due to the improvement in object detection, the proposed system also achieves higher mean average precision at the stage of proposal classification than the state-of-the-art methods

    Multimodal Data Analytics and Fusion for Data Science

    Get PDF
    Advances in technologies have rapidly accumulated a zettabyte of “new” data every two years. The huge amount of data have a powerful impact on various areas in science and engineering and generates enormous research opportunities, which calls for the design and development of advanced approaches in data analytics. Given such demands, data science has become an emerging hot topic in both industry and academia, ranging from basic business solutions, technological innovations, and multidisciplinary research to political decisions, urban planning, and policymaking. Within the scope of this dissertation, a multimodal data analytics and fusion framework is proposed for data-driven knowledge discovery and cross-modality semantic concept detection. The proposed framework can explore useful knowledge hidden in different formats of data and incorporate representation learning from data in multimodalities, especial for disaster information management. First, a Feature Affinity-based Multiple Correspondence Analysis (FA-MCA) method is presented to analyze the correlations between low-level features from different features, and an MCA-based Neural Network (MCA-NN) ispro- posedto capture the high-level features from individual FA-MCA models and seamlessly integrate the semantic data representations for video concept detection. Next, a genetic algorithm-based approach is presented for deep neural network selection. Furthermore, the improved genetic algorithm is integrated with deep neural networks to generate populations for producing optimal deep representation learning models. Then, the multimodal deep representation learning framework is proposed to incorporate the semantic representations from data in multiple modalities efficiently. At last, fusion strategies are applied to accommodate multiple modalities. In this framework, cross-modal mapping strategies are also proposed to organize the features in a better structure to improve the overall performance

    Visual Scene Understanding by Deep Fisher Discriminant Learning

    No full text
    Modern deep learning has recently revolutionized several fields of classic machine learning and computer vision, such as, scene understanding, natural language processing and machine translation. The substitution of feature hand-crafting with automatic feature learning, provides an excellent opportunity for gaining an in-depth understanding of large-scale data statistics. Deep neural networks generally train models with huge numbers of parameters, facilitating efficient search for optimal and sub-optimal spaces of highly non-convex objective functions. On the other hand, Fisher discriminant analysis has been widely employed to impose class discrepancy, for the sake of segmentation, classification, and recognition tasks. This thesis bridges between contemporary deep learning and classic discriminant analysis, to accommodate some important challenges in visual scene understanding, i.e. semantic segmentation, texture classification, and object recognition. The aim is to accomplish specific tasks in some new high-dimensional spaces, covered by the statistical information of the datasets under study. Inspired by a new formulation of Fisher discriminant analysis, this thesis introduces some novel arrangements of well-known deep learning architectures, to achieve better performances on the targeted missions. The theoretical justifications are based upon a large body of experimental work, and consolidate the contribution of the proposed idea; Deep Fisher Discriminant Learning, to several challenges in visual scene understanding

    Texture feature extraction and classification : a comparative study between traditional methods and deep learning : a thesis presented in partial fulfilment of the requirements for the degree of Master of Information Science in Computer Sciences at Massey University, Auckland, New Zealand

    Get PDF
    Figure 3.1 (=Kaehler & Bradski, 2017 Fig 1-4, p. 9) was removed for copyright reasons.Image classification has always been a core problem of computer vision. With the development of deep learning, it also provides a good solution for us to solve the problem of image feature extraction in image classification. In this thesis we used machine learning and convolutional neural network to study texture feature extraction and classification problems. We implemented a pipeline within the sklearn framework that utilized Local Binary Pattern (LBP) and Haralick as our feature descriptor and various classifiers (namely KNearest Neighbors, Linear Discriminant Analysis, Support Vector Machines, Multilayer Perceptron, Gaussian Naive Bayes, Random Forest, AdaBoost, Logistic Regression and Decision Tree) to evaluate the performance on some popular texture datasets (Brodatz dataset, four extended Outex datasets and VisTex dataset). We also employed Linear Discriminant Analysis as our dimension reduction schema to observe the changes in classification accuracy. We also took advantage of Keras with TensorFlow backend framework and built a pipeline that uses ImageNet-trained convolutional neural network models to train and analyze classifier, extract image feature information and make predictions on test dataset samples. This allowed us to compare the results between traditional methods and CNN based methods. It was found that the classification accuracy has been greatly improved with the CNN based method
    corecore