10 research outputs found

    Structured Models for Representation Inference and Data Generation

    Full text link
    University of Technology Sydney. Faculty of Engineering and Information Technology.Many problems, e.g., image segmentation, image generation involve generating complex structured outputs that consist of several correlated variables. Developing structured models to effectively capture the correlation among these variables is essential to solve these problems. This dissertation mainly focuses on two different types of structured models: Principle Component Analysis (PCA) and deep generative models. We explore to utilize these two models to predict structured outputs in machine learning problems. In real-world applications, collected data may contain random noise and data compression is necessary when one transfers high-dimensional and large-scale data. This dissertation presents an online compressed robust PCA model to efficiently recover the structured low-rank component of the high-dimensional data from the compressed data and remove the random noise. Though severe information loss occurs in the data compression process, our method can asymptotically recover the low-rank component under mild conditions. The proposed method is memory efficient since it processes data in an online fashion. This dissertation presents a special generative model, i.e., an energy network for structured output prediction. The energy network is an implicit model that measures the quality of outputs by assigning different energy values to different output configurations. Previous energy-based methods suffer from substantial computation costs due to enormous amounts of gradient steps in the inference process. Our method addresses this issue by learning an inference network to estimate good initializations and reduce the searching space for the inference process. We propose a novel framework analogous to the adversarial learning framework to learn the inference network and the energy network. In the proposed framework, the inference network is treated as a generator and the energy network is treated as a discriminator. These two networks can benefit each other mutually in the whole training process. On the one hand, the inference can generate training samples for the energy network. On the other hand, the energy network can evaluate the quality of the generated output from the inference network and provides a guide for the training of the inference network. This dissertation also presents two works for image generation. The first work is generating realistic images from text descriptions. We propose a novel Dynamic Memory Generative Adversarial Network (DM-GAN) to address two main problems in recent text-to-image methods: (1) The quality of generated high-resolution images heavily depends on the quality of initial low-resolution images; (2) Different words in the text description contributes differently when depicting image contents. Our method introduces a dynamic memory module to refine the initial images and select the important text information based on the initial image content. The second work is generating normal-light images from extremely low-light images. We first generate initial images by utilizing a multi-exposure fusion network to combine well-exposed areas of images with different exposure time. Then, we utilize an edge enhancement module to refine the initial image with the help of the edge information

    Adversarial Localized Energy Network for Structured Prediction

    No full text
    This paper focuses on energy model based structured output prediction. Though inheriting the benefits from energy-based models to handle the sophisticated cases, previous deep energy-based methods suffered from the substantial computation cost introduced by the enormous amounts of gradient steps in the inference process. To boost the efficiency and accuracy of the energy-based models on structured output prediction, we propose a novel method analogous to the adversarial learning framework. Specifically, in our proposed framework, the generator consists of an inference network while the discriminator is comprised of an energy network. The two sub-modules, i.e., the inference network and the energy network, can benefit each other mutually during the whole computation process. On the one hand, our modified inference network can boost the efficiency by predicting good initializations and reducing the searching space for the inference process; On the other hand, inheriting the benefits of the energy network, the energy module in our network can evaluate the quality of the generated output from the inference network and correspondingly provides a resourceful guide to the training of the inference network. In the ideal case, the adversarial learning strategy makes sure the two sub-modules can achieve an equilibrium state after steps. We conduct extensive experiments to verify the effectiveness and efficiency of our proposed method

    Multiple graph unsupervised feature selection

    No full text
    Feature selection improves the quality of the model by filtering out the noisy or redundant part. In the unsupervised scenarios, the selection is challenging due to the unavailability of the labels. To overcome that, the graphs which can unfold the geometry structure on the manifold are usually used to regularize the selection process. These graphs can be constructed either in the local view or the global view. As the local graph is more discriminative, previous methods tended to use the local graph rather than the global graph. But the global graph also has useful information. In light of this, in this paper, we propose a multiple graph unsupervised feature selection method to leverage the information from both local and global graphs. Besides that, we enforce the ll norm to achieve more flexible sparse learning. The experiments which inspect the effects of multiple graph and ll norm are conducted respectively on various datasets, and the comparisons to other mainstream methods are also presented in this paper. The results support that the multiple graph could be better than the single graph in the unsupervised feature selection, and the overall performance of the proposed method is higher than the other comparisons

    Image Classification of Wheat Rust Based on Ensemble Learning

    No full text
    Rust is a common disease in wheat that significantly impacts its growth and yield. Stem rust and leaf rust of wheat are difficult to distinguish, and manual detection is time-consuming. With the aim of improving this situation, this study proposes a method for identifying wheat rust based on ensemble learning (WR-EL). The WR-EL method extracts and integrates multiple convolutional neural network (CNN) models, namely VGG, ResNet 101, ResNet 152, DenseNet 169, and DenseNet 201, based on bagging, snapshot ensembling, and the stochastic gradient descent with warm restarts (SGDR) algorithm. The identification results of the WR-EL method were compared to those of five individual CNN models. The results show that the identification accuracy increases by 32%, 19%, 15%, 11%, and 8%. Additionally, we proposed the SGDR-S algorithm, which improved the f1 scores of healthy wheat, stem rust wheat and leaf rust wheat by 2%, 3% and 2% compared to the SGDR algorithm, respectively. This method can more accurately identify wheat rust disease and can be implemented as a timely prevention and control measure, which can not only prevent economic losses caused by the disease, but also improve the yield and quality of wheat

    Compact representation for large-scale unconstrained video analysis

    No full text
    Recently, newly invented features (e.g. Fisher vector, VLAD) have achieved state-of-the-art performance in large-scale video analysis systems that aims to understand the contents in videos, such as concept recognition and event detection. However, these features are in high-dimensional representations, which remarkably increases computation costs and correspondingly deteriorates the performance of subsequent learning tasks. Notably, the situation becomes even worse when dealing with large-scale video data where the number of class labels are limited. To address this problem, we propose a novel algorithm to compactly represent huge amounts of unconstrained video data. Specifically, redundant feature dimensions are removed by using our proposed feature selection algorithm. Considering unlabeled videos that are easy to obtain on the web, we apply this feature selection algorithm in a semi-supervised framework coping with a shortage of class information. Different from most of the existing semi-supervised feature selection algorithms, our proposed algorithm does not rely on manifold approximation, i.e. graph Laplacian, which is quite expensive for a large number of data. Thus, it is possible to apply the proposed algorithm to a real large-scale video analysis system. Besides, due to the difficulty of solving the non-smooth objective function, we develop an efficient iterative approach to seeking the global optimum. Extensive experiments are conducted on several real-world video datasets, including KTH, CCV, and HMDB. The experimental results have demonstrated the effectiveness of the proposed algorithm
    corecore