2,553 research outputs found
HW-Flow: A Multi-Abstraction Level HW-CNN Codesign Pruning Methodology
Convolutional neural networks (CNNs) have produced unprecedented accuracy for many computer vision problems in the recent past. In power and compute-constrained embedded platforms, deploying modern CNNs can present many challenges. Most CNN architectures do not run in real-time due to the high number of computational operations involved during the inference phase. This emphasizes the role of CNN optimization techniques in early design space exploration. To estimate their efficacy in satisfying the target constraints, existing techniques are either hardware (HW) agnostic, pseudo-HW-aware by considering parameter and operation counts, or HW-aware through inflexible hardware-in-the-loop (HIL) setups. In this work, we introduce HW-Flow, a framework for optimizing and exploring CNN models based on three levels of hardware abstraction: Coarse, Mid and Fine. Through these levels, CNN design and optimization can be iteratively refined towards efficient execution on the target hardware platform. We present HW-Flow in the context of CNN pruning by augmenting a reinforcement learning agent with key metrics to understand the influence of its pruning actions on the inference hardware. With 2× reduction in energy and latency, we prune ResNet56, ResNet50, and DeepLabv3 with minimal accuracy degradation on the CIFAR-10, ImageNet, and CityScapes datasets, respectively
Scalable discovery of hybrid process models in a cloud computing environment
Process descriptions are used to create products and deliver services. To lead better processes and services, the first step
is to learn a process model. Process discovery is such a technique which can automatically extract process models from event logs.
Although various discovery techniques have been proposed, they focus on either constructing formal models which are very powerful
but complex, or creating informal models which are intuitive but lack semantics. In this work, we introduce a novel method that returns
hybrid process models to bridge this gap. Moreover, to cope with today’s big event logs, we propose an efficient method, called f-HMD,
aims at scalable hybrid model discovery in a cloud computing environment. We present the detailed implementation of our approach
over the Spark framework, and our experimental results demonstrate that the proposed method is efficient and scalabl
Generic 3D Representation via Pose Estimation and Matching
Though a large body of computer vision research has investigated developing
generic semantic representations, efforts towards developing a similar
representation for 3D has been limited. In this paper, we learn a generic 3D
representation through solving a set of foundational proxy 3D tasks:
object-centric camera pose estimation and wide baseline feature matching. Our
method is based upon the premise that by providing supervision over a set of
carefully selected foundational tasks, generalization to novel tasks and
abstraction capabilities can be achieved. We empirically show that the internal
representation of a multi-task ConvNet trained to solve the above core problems
generalizes to novel 3D tasks (e.g., scene layout estimation, object pose
estimation, surface normal estimation) without the need for fine-tuning and
shows traits of abstraction abilities (e.g., cross-modality pose estimation).
In the context of the core supervised tasks, we demonstrate our representation
achieves state-of-the-art wide baseline feature matching results without
requiring apriori rectification (unlike SIFT and the majority of learned
features). We also show 6DOF camera pose estimation given a pair local image
patches. The accuracy of both supervised tasks come comparable to humans.
Finally, we contribute a large-scale dataset composed of object-centric street
view scenes along with point correspondences and camera pose information, and
conclude with a discussion on the learned representation and open research
questions.Comment: Published in ECCV16. See the project website
http://3drepresentation.stanford.edu/ and dataset website
https://github.com/amir32002/3D_Street_Vie
Incremental Learning Using a Grow-and-Prune Paradigm with Efficient Neural Networks
Deep neural networks (DNNs) have become a widely deployed model for numerous
machine learning applications. However, their fixed architecture, substantial
training cost, and significant model redundancy make it difficult to
efficiently update them to accommodate previously unseen data. To solve these
problems, we propose an incremental learning framework based on a
grow-and-prune neural network synthesis paradigm. When new data arrive, the
neural network first grows new connections based on the gradients to increase
the network capacity to accommodate new data. Then, the framework iteratively
prunes away connections based on the magnitude of weights to enhance network
compactness, and hence recover efficiency. Finally, the model rests at a
lightweight DNN that is both ready for inference and suitable for future
grow-and-prune updates. The proposed framework improves accuracy, shrinks
network size, and significantly reduces the additional training cost for
incoming data compared to conventional approaches, such as training from
scratch and network fine-tuning. For the LeNet-300-100 and LeNet-5 neural
network architectures derived for the MNIST dataset, the framework reduces
training cost by up to 64% (63%) and 67% (63%) compared to training from
scratch (network fine-tuning), respectively. For the ResNet-18 architecture
derived for the ImageNet dataset and DeepSpeech2 for the AN4 dataset, the
corresponding training cost reductions against training from scratch (network
fine-tunning) are 64% (60%) and 67% (62%), respectively. Our derived models
contain fewer network parameters but achieve higher accuracy relative to
conventional baselines
Feature Map Filtering: Improving Visual Place Recognition with Convolutional Calibration
Convolutional Neural Networks (CNNs) have recently been shown to excel at
performing visual place recognition under changing appearance and viewpoint.
Previously, place recognition has been improved by intelligently selecting
relevant spatial keypoints within a convolutional layer and also by selecting
the optimal layer to use. Rather than extracting features out of a particular
layer, or a particular set of spatial keypoints within a layer, we propose the
extraction of features using a subset of the channel dimensionality within a
layer. Each feature map learns to encode a different set of weights that
activate for different visual features within the set of training images. We
propose a method of calibrating a CNN-based visual place recognition system,
which selects the subset of feature maps that best encodes the visual features
that are consistent between two different appearances of the same location.
Using just 50 calibration images, all collected at the beginning of the current
environment, we demonstrate a significant and consistent recognition
improvement across multiple layers for two different neural networks. We
evaluate our proposal on three datasets with different types of appearance
changes - afternoon to morning, winter to summer and night to day.
Additionally, the dimensionality reduction approach improves the computational
processing speed of the recognition system.Comment: Accepted to the Australasian Conference on Robotics and Automation
201
- …