48 research outputs found

    Applications

    Get PDF
    Volume 3 describes how resource-aware machine learning methods and techniques are used to successfully solve real-world problems. The book provides numerous specific application examples: in health and medicine for risk modelling, diagnosis, and treatment selection for diseases in electronics, steel production and milling for quality control during manufacturing processes in traffic, logistics for smart cities and for mobile communications

    Data Management for Dynamic Multimedia Analytics and Retrieval

    Get PDF
    Multimedia data in its various manifestations poses a unique challenge from a data storage and data management perspective, especially if search, analysis and analytics in large data corpora is considered. The inherently unstructured nature of the data itself and the curse of dimensionality that afflicts the representations we typically work with in its stead are cause for a broad range of issues that require sophisticated solutions at different levels. This has given rise to a huge corpus of research that puts focus on techniques that allow for effective and efficient multimedia search and exploration. Many of these contributions have led to an array of purpose-built, multimedia search systems. However, recent progress in multimedia analytics and interactive multimedia retrieval, has demonstrated that several of the assumptions usually made for such multimedia search workloads do not hold once a session has a human user in the loop. Firstly, many of the required query operations cannot be expressed by mere similarity search and since the concrete requirement cannot always be anticipated, one needs a flexible and adaptable data management and query framework. Secondly, the widespread notion of staticity of data collections does not hold if one considers analytics workloads, whose purpose is to produce and store new insights and information. And finally, it is impossible even for an expert user to specify exactly how a data management system should produce and arrive at the desired outcomes of the potentially many different queries. Guided by these shortcomings and motivated by the fact that similar questions have once been answered for structured data in classical database research, this Thesis presents three contributions that seek to mitigate the aforementioned issues. We present a query model that generalises the notion of proximity-based query operations and formalises the connection between those queries and high-dimensional indexing. We complement this by a cost-model that makes the often implicit trade-off between query execution speed and results quality transparent to the system and the user. And we describe a model for the transactional and durable maintenance of high-dimensional index structures. All contributions are implemented in the open-source multimedia database system Cottontail DB, on top of which we present an evaluation that demonstrates the effectiveness of the proposed models. We conclude by discussing avenues for future research in the quest for converging the fields of databases on the one hand and (interactive) multimedia retrieval and analytics on the other

    Fundamentals

    Get PDF
    Volume 1 establishes the foundations of this new field. It goes through all the steps from data collection, their summary and clustering, to different aspects of resource-aware learning, i.e., hardware, memory, energy, and communication awareness. Machine learning methods are inspected with respect to resource requirements and how to enhance scalability on diverse computing architectures ranging from embedded systems to large computing clusters

    Applications of Markov Random Field Optimization and 3D Neural Network Pruning in Computer Vision

    Get PDF
    Recent years witness the rapid development of Convolutional Neural Network (CNN) in various computer vision applications that were traditionally addressed by Markov Random Field (MRF) optimization methods. Even though CNN based methods achieve high accuracy in these tasks, a high level of fine results are difficult to be achieved. For instance, a pairwise MRF optimization method is capable of segmenting objects with the auxiliary edge information through the second-order terms, which is very uncertain to be achieved by a deep neural network. MRF optimization methods, however, are able to enhance the performance with an explicit theoretical and experimental supports using iterative energy minimization. Secondly, such an edge detector can be learned by CNNs, and thus, seeking to transfer the task of a CNN for another task becomes valuable. It is desirable to fuse the superpixel contours from a state-of-the-art CNN with semantic segmentation results from another state-of-the-art CNN so that such a fusion enhances the object contours in semantic segmentation to be aligned with the superpixel contours. This kind of fusion is not limited to semantic segmentation but also other tasks with a collective effect of multiple off-the-shelf CNNs. While fusing multiple CNNs is useful to enhance the performance, each of such CNNs is usually specifically designed and trained with an empirical configuration of resources. With such a large batch size, however, the joint CNN training is possible to be out of GPU memory. Such a problem is usually involved in efficient CNN training yet with limited resources. This issue is more obvious and severe in 3D CNNs than 2D CNNs due to the high requirement of training resources. To solve the first problem, we propose two fast and differentiable message passing algorithms, namely Iterative Semi-Global Matching Revised (ISGMR) and Parallel Tree-Reweighted Message Passing (TRWP), for both energy minimization problems and deep learning applications. Our experiments on stereo vision dataset and image inpainting dataset validate the effectiveness and efficiency of our methods with minimum energies comparable to the state-of-the-art algorithm TRWS and greatly improve the forward and backward propagation speed using CUDA programming on massive parallel trees. Applying these two methods on deep learning semantic segmentation on PASCAL VOC 2012 with Canny edges achieves enhanced segmentation results measured by mean Intersection over Union (mIoU). In the second problem, to effectively fuse and finetune multiple CNNs, we present a transparent initialization module that identically maps the output of a multiple-layer module to its input at the early stage of finetuning. The pretrained model parameters are then gradually divergent in training as the loss decreases. This transparent initialization has a higher initialization rate than Net2Net and a higher recovery rate compared with random initialization and Xavier initialization. Our experiments validate the effectiveness of the proposed transparent initialization and the sparse encoder with sparse matrix operations. The edges of segmented objects achieve a higher performance ratio and a higher F-measure than other comparable methods. In the third problem, to compress a CNN effectually, especially for resource-inefficient 3D CNNs, we propose a single-shot neuron pruning method with resource constraints. The pruning principle is to remove the neurons with low neuron importance corresponding to small connection sensitivities. The reweighting strategy with the layerwise consumption of memory or FLOPs improves the pruning ability by avoiding infeasible pruning of the whole layer(s). Our experiments on point cloud dataset, ShapeNet, and medical image dataset, BraTS'18, prove the effectiveness of our method. Applying our method to video classification on UCF101 dataset using MobileNetV2 and I3D further strengthens the benefits of our method

    Object Recognition

    Get PDF
    Vision-based object recognition tasks are very familiar in our everyday activities, such as driving our car in the correct lane. We do these tasks effortlessly in real-time. In the last decades, with the advancement of computer technology, researchers and application developers are trying to mimic the human's capability of visually recognising. Such capability will allow machine to free human from boring or dangerous jobs

    Pattern Recognition

    Get PDF
    Pattern recognition is a very wide research field. It involves factors as diverse as sensors, feature extraction, pattern classification, decision fusion, applications and others. The signals processed are commonly one, two or three dimensional, the processing is done in real- time or takes hours and days, some systems look for one narrow object class, others search huge databases for entries with at least a small amount of similarity. No single person can claim expertise across the whole field, which develops rapidly, updates its paradigms and comprehends several philosophical approaches. This book reflects this diversity by presenting a selection of recent developments within the area of pattern recognition and related fields. It covers theoretical advances in classification and feature extraction as well as application-oriented works. Authors of these 25 works present and advocate recent achievements of their research related to the field of pattern recognition

    Robust and Accurate Camera Localisation at a Large Scale

    Get PDF
    The task of camera-based localization aims to quickly and precisely pinpoint at which location (and viewing direction) the image was taken, against a pre-stored large-scale map of the environment. This technique can be used in many 3D computer vision applications, e.g., AR/VR and autonomous driving. Mapping the world is the first step to enable camera-based localization since a pre-stored map serves as a reference for a query image/sequence. In this thesis, we exploit three readily available sources: (i) satellite images; (ii) ground-view images; (iii) 3D points cloud. Based on the above three sources, we propose solutions to localize a query camera both effectively and efficiently, i.e., accurately localizing a query camera under a variety of lighting and viewing conditions within a small amount of time. The main contributions are summarized as follows. In chapter 3, we separately present a minimal 4-point and 2-point solver to estimate a relative and absolute camera pose. The core idea is exploiting the vertical direction from IMU or vanishing point to derive a closed-form solution of a quartic equation and a quadratic equation for the relative and absolute camera pose, respectively. In chapter 4, we localize a ground-view query image against a satellite map. Inspired by the insight that humans commonly use orientation information as an important cue for spatial localization, we propose a method that endows deep neural networks with the 'commonsense' of orientation. We design a Siamese network that explicitly encodes each pixel's orientation of the ground-view and satellite images. Our method boosts the learned deep features' discriminative power, outperforming all previous methods. In chapter 5, we localize a ground-view query image against a ground-view image database. We propose a representation learning method having higher location-discriminating power. The core idea is learning discriminative image embedding. Similarities among intra-place images (viewing the same landmarks) are maximized while similarities among inter-place images (viewing different landmarks) are minimized. The method is easy to implement and pluggable into any CNN. Experiments show that our method outperforms all previous methods. In chapter 6, we localize a ground-view query image against a large-scale 3D points cloud with visual descriptors. To address the ambiguities in direct 2D--3D feature matching, we introduce a global matching method that harnesses global contextual information exhibited both within the query image and among all the 3D points in the map. The core idea is to find the optimal 2D set to 3D set matching. Tests on standard benchmark datasets show the effectiveness of our method. In chapter 7, we localize a ground-view query image against a 3D points cloud with only coordinates. The problem is also known as blind Perspective-n-Point. We propose a deep CNN model that simultaneously solves for both the 6-DoF absolute camera pose and 2D--3D correspondences. The core idea is extracting point-wise 2D and 3D features from their coordinates and matching 2D and 3D features effectively in a global feature matching module. Extensive tests on both real and simulated data have shown that our method substantially outperforms existing approaches. Last, in chapter 8, we study the potential of using 3D lines. Specifically, we study the problem of aligning two partially overlapping 3D line reconstructions in Euclidean space. This technique can be used for localization with respect to a 3D line database when query 3D line reconstructions are available (e.g., from stereo triangulation). We propose a neural network, taking Pluecker representations of lines as input, and solving for line-to-line matches and estimate a 6-DoF rigid transformation. Experiments on indoor and outdoor datasets show that our method's registration (rotation and translation) precision outperforms baselines significantly

    Pattern Recognition

    Get PDF
    A wealth of advanced pattern recognition algorithms are emerging from the interdiscipline between technologies of effective visual features and the human-brain cognition process. Effective visual features are made possible through the rapid developments in appropriate sensor equipments, novel filter designs, and viable information processing architectures. While the understanding of human-brain cognition process broadens the way in which the computer can perform pattern recognition tasks. The present book is intended to collect representative researches around the globe focusing on low-level vision, filter design, features and image descriptors, data mining and analysis, and biologically inspired algorithms. The 27 chapters coved in this book disclose recent advances and new ideas in promoting the techniques, technology and applications of pattern recognition

    Video anomaly detection using deep generative models

    Full text link
    Video anomaly detection faces three challenges: a) no explicit definition of abnormality; b) scarce labelled data and c) dependence on hand-crafted features. This thesis introduces novel detection systems using unsupervised generative models, which can address the first two challenges. By working directly on raw pixels, they also bypass the last
    corecore