2 research outputs found

    Robust Content Identification and De-Duplication with Scalable Fisher Vector In video with Temporal Sampling

    Get PDF
    Title from PDF of title page, viewed august 29, 2017Thesis advisor: Zhu LiVitaIncludes bibliographical references (pages 41-43)Thesis (M.S.)--School of Computing and Engineering. University of Missouri--Kansas City, 2017Robust content identification and de-duplication of video content in networks and caches have many important applications in content delivery networks. In this work, we propose a scalable hashing scheme based Fisher Vector aggregation of selected key point features, and a frame significance function based non-uniform temporal sampling scheme on the video segments, to create a very compact binary representation of the content fragments that is agnostic to the typical coding and transcoding variations. The key innovations are a key point repeatability model that selects the best key point features, and a non-uniform sampling scheme that significantly reduces the bits required to represent a segment, and scalability from PCA feature dimension reduction and Fisher Vector features, and Simulation with various frame size and bit rate video contents for DASH streaming are tested and the proposed solution have very good performance of precision-recall, achieving 100% precision in duplication detection with recalls at 98% and above range.Introduction -- Software description -- Image processing -- SIFT feature extraction -- Principal component analysis -- Fisher vector aggregation -- Simulation results and discussions -- Conclusion and future work -- Appendi

    Neural network based image representation for small scale object recognition

    Get PDF
    Object recognition can be abstractedly viewed as a two-stage process. The features learning stage selects key information that can represent the input image in a compact, robust, and discriminative manner in some feature space. Then the classification stage learns the rules to differentiate object classes based on the representations of their images in feature space. Consequently, if the first stage can produce a highly separable features set, simple and cost-effective classifiers can be used to make the recognition system more applicable in practice. Features, or representations, used to be engineered manually with different assumptions about the data population to limit the complexity in a manageable range. As more practical problems are tackled, those assumptions are no longer valid, and so are the representations built on them. More parameters and test cases have to be considered in those new challenges, that causes manual engineering to become too complicated. Machine learning approaches ease those difficulties by allowing computer to learn to identify the appropriate representation automatically. As the number of parameters increases with the divergence of data, it is always beneficial to eliminate irrelevant information from input data to reduce the complexity of learning. Chapter 3 of the thesis reports the study case where removal of colour leads to an improvement in recognition accuracy. Deep learning appears to be a very strong representation learner with new achievements coming in monthly basic. While training the phase of deep structures requires huge amount of data, tremendous calculation, and careful calibration, the inferencing phase is affordable and straightforward. Utilizing knowledge in trained deep networks is therefore promising for efficient feature extraction in smaller systems. Many approaches have been proposed under the name of “transfer learning”, aimed to take advantage of that “deep knowledge”. However, the results achieved so far could be classified as a learning room for improvement. Chapter 4 presents a new method to utilize a trained deep convolutional structure as a feature extractor and achieved state-of-the-art accuracy on the Washington RGBD dataset. Despite some good results, the potential of transfer learning is just barely exploited. On one hand, a dimensionality reduction can be used to make the deep neural network representation even more computationally efficient and allow a wider range of use cases. Inspired by the structure of the network itself, a new random orthogonal projection method for the dimensionality reduction is presented in the first half of Chapter 5. The t-SNE mimicking neural network for low-dimensional embedding is also discussed in this part with promising results. In another approach, feature encoding can be used to improve deep neural network features for classification applications. Thanks to the spatially organized structure, deep neural network features can be considered as local image descriptors, and thus the traditional feature encoding approaches such as the Fisher vector can be applied to improve those features. This method combines the advantages of both discriminative learning and generative learning to boost the features performance in difficult scenarios such as when data is noisy or incomplete. The problem of high dimensionality in deep neural network features is alleviated with the use of the Fisher vector based on sparse coding, where infinite number of Gaussian mixtures was used to model the feature space. In the second half of Chapter 5, the regularized Fisher encoding was shown to be effective in improving classification results on difficult classes. Also, the low cost incremental k-means learning was shown to be a potential dictionary learning approach that can be used to replace the slow and computationally expensive sparse coding method
    corecore