5 research outputs found

    Keeping the Human in the Loop: Towards Automatic Visual Monitoring in Biodiversity Research

    Get PDF
    More and more methods in the area of biodiversity research grounds upon new opportunities arising from modern sensing devices that in principle make it possible to continuously record sensor data from the environment. However, these opportunities allow easy recording of huge amount of data, while its evaluation is difficult, if not impossible due to the enormous effort of manual inspection by the researchers. At the same time, we observe impressive results in computer vision and machine learning that are based on two major developments: firstly, the increased performance of hardware together with the advent of powerful graphical processing units applied in scientific computing. Secondly, the huge amount of, in part, annotated image data provided by today's generation of Facebook and Twitter users that are available easily over databases (e.g., Flickr) and/or search engines. However, for biodiversity applications appropriate data bases of annotated images are still missing. In this presentation we discuss already available methods from computer vision and machine learning together with upcoming challenges in automatic monitoring in biodiversity research. We argue that the key element towards success of any automatic method is the possibility to keep the human in the loop - either for correcting errors and improving the system's quality over time, for providing annotation data at moderate effort, or for acceptance and validation reasons. Thus, we summarize already existing techniques from active and life-long learning together with the enormous developments in automatic visual recognition during the past years. In addition, to allow detection of the unexpected such an automatic system must be capable to find anomalies or novel events in the data. We discuss a generic framework for automatic monitoring in biodiversity research which is the result of collaboration between computer scientists and ecologists of the past years. The key ingredients of such a framework are initial, generic classifier, for example, powerful deep learning architectures, active learning to reduce costly annotation effort by experts, fine-grained recognition to differentiate between visually very similar species, and efficient incremental update of the classifier's model over time. For most of these challenges, we present initial solutions in sample applications. The results comprise the automatic evaluation of images from camera traps, attribute estimation for species, as well as monitoring in-situ data in environmental science. Overall, we like to demonstrate the potentials and open issues in bringing together computer scientists and ecologist to open new research directions for either area

    Classification-Specific Parts for Improving Fine-Grained Visual Categorization

    Full text link
    Fine-grained visual categorization is a classification task for distinguishing categories with high intra-class and small inter-class variance. While global approaches aim at using the whole image for performing the classification, part-based solutions gather additional local information in terms of attentions or parts. We propose a novel classification-specific part estimation that uses an initial prediction as well as back-propagation of feature importance via gradient computations in order to estimate relevant image regions. The subsequently detected parts are then not only selected by a-posteriori classification knowledge, but also have an intrinsic spatial extent that is determined automatically. This is in contrast to most part-based approaches and even to available ground-truth part annotations, which only provide point coordinates and no additional scale information. We show in our experiments on various widely-used fine-grained datasets the effectiveness of the mentioned part selection method in conjunction with the extracted part features.Comment: Presented at the GCPR201

    End-to-end Learning of a Fisher Vector Encoding for Part Features in Fine-grained Recognition

    Full text link
    Part-based approaches for fine-grained recognition do not show the expected performance gain over global methods, although being able to explicitly focus on small details that are relevant for distinguishing highly similar classes. We assume that part-based methods suffer from a missing representation of local features, which is invariant to the order of parts and can handle a varying number of visible parts appropriately. The order of parts is artificial and often only given by ground-truth annotations, whereas viewpoint variations and occlusions result in parts that are not observable. Therefore, we propose integrating a Fisher vector encoding of part features into convolutional neural networks. The parameters for this encoding are estimated jointly with those of the neural network in an end-to-end manner. Our approach improves state-of-the-art accuracies for bird species classification on CUB-200-2011 from 90.40\% to 90.95\%, on NA-Birds from 89.20\% to 90.30\%, and on Birdsnap from 84.30\% to 86.97\%

    Pre-trained models are not enough: active and lifelong learning is important for long-term visual monitoring of mammals in biodiversity research—Individual identification and attribute prediction with image features from deep neural networks and decoupled decision models applied to elephants and great apes

    Get PDF
    Animal re-identification based on image data, either recorded manually by photographers or automatically with camera traps, is an important task for ecological studies about biodiversity and conservation that can be highly automatized with algorithms from computer vision and machine learning. However, fixed identification models only trained with standard datasets before their application will quickly reach their limits, especially for long-term monitoring with changing environmental conditions, varying visual appearances of individuals over time that differ a lot from those in the training data, and new occurring individuals that have not been observed before. Hence, we believe that active learning with human-in-the-loop and continuous lifelong learning is important to tackle these challenges and to obtain high-performance recognition systems when dealing with huge amounts of additional data that become available during the application. Our general approach with image features from deep neural networks and decoupled decision models can be applied to many different mammalian species and is perfectly suited for continuous improvements of the recognition systems via lifelong learning. In our identification experiments, we consider four different taxa, namely two elephant species: African forest elephants and Asian elephants, as well as two species of great apes: gorillas and chimpanzees. Going beyond classical re-identification, our decoupled approach can also be used for predicting attributes of individuals such as gender or age using classification or regression methods. Although applicable for small datasets of individuals as well, we argue that even better recognition performance will be achieved by improving decision models gradually via lifelong learning to exploit huge datasets and continuous recordings from long-term applications. We highlight that algorithms for deploying lifelong learning in real observational studies exist and are ready for use. Hence, lifelong learning might become a valuable concept that supports practitioners when analyzing large-scale image data during long-term monitoring of mammals

    Deep Clustering and Deep Network Compression

    Get PDF
    The use of deep learning has grown increasingly in recent years, thereby becoming a much-discussed topic across a diverse range of fields, especially in computer vision, text mining, and speech recognition. Deep learning methods have proven to be robust in representation learning and attained extraordinary achievement. Their success is primarily due to the ability of deep learning to discover and automatically learn feature representations by mapping input data into abstract and composite representations in a latent space. Deep learning’s ability to deal with high-level representations from data has inspired us to make use of learned representations, aiming to enhance unsupervised clustering and evaluate the characteristic strength of internal representations to compress and accelerate deep neural networks.Traditional clustering algorithms attain a limited performance as the dimensionality in-creases. Therefore, the ability to extract high-level representations provides beneficial components that can support such clustering algorithms. In this work, we first present DeepCluster, a clustering approach embedded in a deep convolutional auto-encoder. We introduce two clustering methods, namely DCAE-Kmeans and DCAE-GMM. The DeepCluster allows for data points to be grouped into their identical cluster, in the latent space, in a joint-cost function by simultaneously optimizing the clustering objective and the DCAE objective, producing stable representations, which is appropriate for the clustering process. Both qualitative and quantitative evaluations of proposed methods are reported, showing the efficiency of deep clustering on several public datasets in comparison to the previous state-of-the-art methods.Following this, we propose a new version of the DeepCluster model to include varying degrees of discriminative power. This introduces a mechanism which enables the imposition of regularization techniques and the involvement of a supervision component. The key idea of our approach is to distinguish the discriminatory power of numerous structures when searching for a compact structure to form robust clusters. The effectiveness of injecting various levels of discriminatory powers into the learning process is investigated alongside the exploration and analytical study of the discriminatory power obtained through the use of two discriminative attributes: data-driven discriminative attributes with the support of regularization techniques, and supervision discriminative attributes with the support of the supervision component. An evaluation is provided on four different datasets.The use of neural networks in various applications is accompanied by a dramatic increase in computational costs and memory requirements. Making use of the characteristic strength of learned representations, we propose an iterative pruning method that simultaneously identifies the critical neurons and prunes the model during training without involving any pre-training or fine-tuning procedures. We introduce a majority voting technique to compare the activation values among neurons and assign a voting score to evaluate their importance quantitatively. This mechanism effectively reduces model complexity by eliminating the less influential neurons and aims to determine a subset of the whole model that can represent the reference model with much fewer parameters within the training process. Empirically, we demonstrate that our pruning method is robust across various scenarios, including fully-connected networks (FCNs), sparsely-connected networks (SCNs), and Convolutional neural networks (CNNs), using two public datasets.Moreover, we also propose a novel framework to measure the importance of individual hidden units by computing a measure of relevance to identify the most critical filters and prune them to compress and accelerate CNNs. Unlike existing methods, we introduce the use of the activation of feature maps to detect valuable information and the essential semantic parts, with the aim of evaluating the importance of feature maps, inspired by novel neural network interpretability. A majority voting technique based on the degree of alignment between a se-mantic concept and individual hidden unit representations is utilized to evaluate feature maps’ importance quantitatively. We also propose a simple yet effective method to estimate new convolution kernels based on the remaining crucial channels to accomplish effective CNN compression. Experimental results show the effectiveness of our filter selection criteria, which outperforms the state-of-the-art baselines.To conclude, we present a comprehensive, detailed review of time-series data analysis, with emphasis on deep time-series clustering (DTSC), and a founding contribution to the area of applying deep clustering to time-series data by presenting the first case study in the context of movement behavior clustering utilizing the DeepCluster method. The results are promising, showing that the latent space encodes sufficient patterns to facilitate accurate clustering of movement behaviors. Finally, we identify state-of-the-art and present an outlook on this important field of DTSC from five important perspectives