294 research outputs found

    Rail Infrastructure Defect Detection Through Video Analytics

    Full text link
    University of Technology Sydney. Faculty of Engineering and Information Technology.Compared with the traditional railway infrastructure maintenance process, which relies on manual inspection by professional maintenance engineers, inspection through automatic video analytics will significantly improve the working efficiency and eliminate the potential safety concern by reducing physical contact between maintenance engineers and infrastructure facilities. However, the defect does not always have a stable appearance and involves many uncertainties exposed in the clutter environments. On the other hand, various brands of the same devices are used widely on the railway, which shows diverse physical models. Therefore, it creates many challenges to the existing computer vision algorithms for defect detection. In this thesis, two key challenges are abstracted about video/image analytics using computer vision techniques for railway infrastructure defect detection, resulting from the fine-grained defect recognition and the limited labelled learning (few-shot learning). This thesis summarizes the works that have been conducted on utilizing different methods to solve the two challenges. The first challenge is fine-grained defect recognition. For railway infrastructure defect inspection, damaged or worn equipment defects are usually found in some small parts. That is, the differences between the defective ones and standard ones are fine-grained. Finding these subtle defects is a fine-grained recognition problem. This thesis proposes a bilinear CNNs model to tackle the defect detection problem, which effectively captures the invariant representation of the dataset and learns high-order discriminative features for fine-grained defect recognition. Another challenge is the limited labelled data. In many scenarios, how to obtain abundant labelled samples is laborious. For example, in industrial defect detection, most defects exist only in a few common categories, while most other categories only contain a small portion of defects. Moreover, annotating a large-scale dataset of defects is labour-intensive, which requires high expertise in railway maintenance. Thus, how to obtain an effective model with sparse labelled samples remains an open problem. To address this issue, this thesis proposes a framework to simultaneously reduce the intra-class variance and enlarge the inter-class discrimination for both fine-grained defect recognition and general fine-grained recognition under the few-shot setting. Three models are designed according to this framework, and comprehensive experimental analyses are provided to validate the effectiveness of the models. This thesis further studies the few-shot learning problem by mining the unlabelled information to boost the few-shot learner for defect/general object recognition and proposes a Poisson Transfer Model to maximize the value of the extra unlabelled data through robust classifier construction and self-supervised representation learning

    Fine-Grained Image Analysis with Deep Learning: A Survey

    Get PDF
    Fine-grained image analysis (FGIA) is a longstanding and fundamental problem in computer vision and pattern recognition, and underpins a diverse set of real-world applications. The task of FGIA targets analyzing visual objects from subordinate categories, e.g., species of birds or models of cars. The small inter-class and large intra-class variation inherent to fine-grained image analysis makes it a challenging problem. Capitalizing on advances in deep learning, in recent years we have witnessed remarkable progress in deep learning powered FGIA. In this paper we present a systematic survey of these advances, where we attempt to re-define and broaden the field of FGIA by consolidating two fundamental fine-grained research areas -- fine-grained image recognition and fine-grained image retrieval. In addition, we also review other key issues of FGIA, such as publicly available benchmark datasets and related domain-specific applications. We conclude by highlighting several research directions and open problems which need further exploration from the community.Comment: Accepted by IEEE TPAM

    Locally-Enriched Cross-Reconstruction for Few-Shot Fine-Grained Image Classification

    Get PDF
    Few-shot fine-grained image classification has attracted considerable attention in recent years for its realistic setting to imitate how humans conduct recognition tasks. Metric-based few-shot classifiers have achieved high accuracies. However, their metric function usually requires two arguments of vectors, while transforming or reshaping three-dimensional feature maps to vectors can result in loss of spatial information. Image reconstruction is thus involved to retain more appearance details: the test images are reconstructed by different classes and then classified to the one with the smallest reconstruction error. However, discriminative local information, vital to distinguish sub-categories in fine-grained images with high similarities, is not well elaborated when only the base features from a usual embedding module are adopted for reconstruction. Hence, we propose the novel local content-enriched cross-reconstruction network (LCCRN) for few-shot fine-grained classification. In LCCRN, we design two new modules: the local content-enriched module (LCEM) to learn the discriminative local features, and the cross-reconstruction module (CRM) to fully engage the local features with the appearance details obtained from a separate embedding module. The classification score is calculated based on the weighted sum of reconstruction errors of the cross-reconstruction tasks, with weights learnt from the training process. Extensive experiments on four fine-grained datasets showcase the superior classification performance of LCCRN compared with the state-of-the-art few-shot classification methods. Codes are available at: https://github.com/lutsong/LCCRN

    Learning Deep SPD Visual Representation for Image Classification

    Get PDF
    Symmetric positive definite (SPD) visual representations are effective due to their ability to capture high-order statistics to describe images. Reliable and efficient calculation of SPD matrix representation from small sized feature maps with a high number of channels in CNN is a challenging issue. This thesis presents three novel methods to address the above challenge. The first method, called Relation Dropout (ReDro), is inspired by the fact that eigen-decomposition of a block diagonal matrix can be efficiently obtained by eigendecomposition of each block separately. Thus, instead of using a full covariance matrix as in the literature, this thesis randomly group the channels and form a covariance matrix per group. ReDro is inserted as an additional layer preceding the matrix normalisation step and the random grouping is made transparent to all subsequent layers. ReDro can be seen as a dropout-related regularisation which discards some pair-wise channel relationships across each group. The second method, called FastCOV, exploits the intrinsic connection between eigensytems of XXT and XTX. Specifically, it computes position-wise covariance matrix upon convolutional feature maps instead of the typical channel-wise covariance matrix. As the spatial size of feature maps is usually much smaller than the channel number, conducting eigen-decomposition of the position-wise covariance matrix avoids rank-deficiency and it is faster than the decomposition of the channel-wise covariance matrix. The eigenvalues and eigenvectors of the normalised channel-wise covariance matrix can be retrieved by the connection of the XXT and XTX eigen-systems. The third method, iSICE, deals with the reliable covariance estimation from small sized and highdimensional CNN feature maps. It exploits the prior structure of the covariance matrix to estimate sparse inverse covariance which is developed in the literature to deal with the covariance matrix’s small sample issue. Given a covariance matrix, this thesis iteratively minimises its log-likelihood penalised by a sparsity with gradient descend. The resultant representation characterises partial correlation instead of indirect correlation characterised in covariance representation. As experimentally demonstrated, all three proposed methods improve the image classification performance, whereas the first two proposed methods reduce the computational cost of learning large SPD visual representations

    A Survey on Knowledge Graphs: Representation, Acquisition and Applications

    Full text link
    Human knowledge provides a formal understanding of the world. Knowledge graphs that represent structural relations between entities have become an increasingly popular research direction towards cognition and human-level intelligence. In this survey, we provide a comprehensive review of knowledge graph covering overall research topics about 1) knowledge graph representation learning, 2) knowledge acquisition and completion, 3) temporal knowledge graph, and 4) knowledge-aware applications, and summarize recent breakthroughs and perspective directions to facilitate future research. We propose a full-view categorization and new taxonomies on these topics. Knowledge graph embedding is organized from four aspects of representation space, scoring function, encoding models, and auxiliary information. For knowledge acquisition, especially knowledge graph completion, embedding methods, path inference, and logical rule reasoning, are reviewed. We further explore several emerging topics, including meta relational learning, commonsense reasoning, and temporal knowledge graphs. To facilitate future research on knowledge graphs, we also provide a curated collection of datasets and open-source libraries on different tasks. In the end, we have a thorough outlook on several promising research directions
    • …
    corecore