Search CORE

5,245 research outputs found

Exploring Context with Deep Structured models for Semantic Segmentation

Author: Hengel Anton van den
Lin Guosheng
Reid Ian
Shen Chunhua
Publication venue
Publication date: 01/01/2017
Field of study

State-of-the-art semantic image segmentation methods are mostly based on training deep convolutional neural networks (CNNs). In this work, we proffer to improve semantic segmentation with the use of contextual information. In particular, we explore `patch-patch' context and `patch-background' context in deep CNNs. We formulate deep structured models by combining CNNs and Conditional Random Fields (CRFs) for learning the patch-patch context between image regions. Specifically, we formulate CNN-based pairwise potential functions to capture semantic correlations between neighboring patches. Efficient piecewise training of the proposed deep structured model is then applied in order to avoid repeated expensive CRF inference during the course of back propagation. For capturing the patch-background context, we show that a network design with traditional multi-scale image inputs and sliding pyramid pooling is very effective for improving performance. We perform comprehensive evaluation of the proposed method. We achieve new state-of-the-art performance on a number of challenging semantic segmentation datasets including

NYUDv2

PASCAL

VOC2012

Cityscapes

PASCAL

Context

SUN

RGBD

SIFT

flow

, and

KITTI

datasets. Particularly, we report an intersection-over-union score of

77.8

on the

PASCAL

VOC2012

dataset.Comment: 16 pages. Accepted to IEEE T. Pattern Analysis & Machine Intelligence, 2017. Extended version of arXiv:1504.0101

arXiv.org e-Print Archive

DR-NTU (Digital Repository of NTU)

Speed/accuracy trade-offs for modern convolutional object detectors

Author: Fathi Alireza
Fischer Ian
Guadarrama Sergio
Huang Jonathan
Korattikara Anoop
Murphy Kevin
Rathod Vivek
Song Yang
Sun Chen
Wojna Zbigniew
Zhu Menglong
Publication venue
Publication date: 24/04/2017
Field of study

The goal of this paper is to serve as a guide for selecting a detection architecture that achieves the right speed/memory/accuracy balance for a given application and platform. To this end, we investigate various ways to trade accuracy for speed and memory usage in modern convolutional object detection systems. A number of successful systems have been proposed in recent years, but apples-to-apples comparisons are difficult due to different base feature extractors (e.g., VGG, Residual Networks), different default image resolutions, as well as different hardware and software platforms. We present a unified implementation of the Faster R-CNN [Ren et al., 2015], R-FCN [Dai et al., 2016] and SSD [Liu et al., 2015] systems, which we view as "meta-architectures" and trace out the speed/accuracy trade-off curve created by using alternative feature extractors and varying other critical parameters such as image size within each of these meta-architectures. On one extreme end of this spectrum where speed and memory are critical, we present a detector that achieves real time speeds and can be deployed on a mobile device. On the opposite end in which accuracy is critical, we present a detector that achieves state-of-the-art performance measured on the COCO detection task.Comment: Accepted to CVPR 201

arXiv.org e-Print Archive

Crossref

Event-based Row-by-Row Multi-convolution engine for Dynamic-Vision Feature Extraction on FPGA

Author: Domínguez Morales Juan Pedro
Domínguez Morales Manuel Jesús
Gutiérrez Galán Daniel
Jiménez Fernández Ángel Francisco
Linares Barranco Alejandro
Ríos Navarro José Antonio
Tapiador Morales Ricardo
Publication venue: IEEE Computer Society
Publication date: 01/01/2018
Field of study

Neural networks algorithms are commonly used to recognize patterns from different data sources such as audio or vision. In image recognition, Convolutional Neural Networks are one of the most effective techniques due to the high accuracy they achieve. This kind of algorithms require billions of addition and multiplication operations over all pixels of an image. However, it is possible to reduce the number of operations using other computer vision techniques rather than frame-based ones, e.g. neuromorphic frame-free techniques. There exists many neuromorphic vision sensors that detect pixels that have changed their luminosity. In this study, an event-based convolution engine for FPGA is presented. This engine models an array of leaky integrate and fire neurons. It is able to apply different kernel sizes, from 1x1 to 7x7, which are computed row by row, with a maximum number of 64 different convolution kernels. The design presented is able to process 64 feature maps of 7x7 with a latency of 8.98 s.Ministerio de Economía y Competitividad TEC2016-77785-

idUS. Depósito de Investigación Universidad de Sevilla

Best-first heuristic search for multicore machines

Author: Burns Ethan
Lemons Sofia N.
Ruml Wheeler
Zhou Rong
Publication venue: University of New Hampshire Scholars\u27 Repository
Publication date: 01/12/2010
Field of study

To harness modern multicore processors, it is imperative to develop parallel versions of fundamental algorithms. In this paper, we compare different approaches to parallel best-first search in a shared-memory setting. We present a new method, PBNF, that uses abstraction to partition the state space and to detect duplicate states without requiring frequent locking. PBNF allows speculative expansions when necessary to keep threads busy. We identify and fix potential livelock conditions in our approach, proving its correctness using temporal logic. Our approach is general, allowing it to extend easily to suboptimal and anytime heuristic search. In an empirical comparison on STRIPS planning, grid pathfinding, and sliding tile puzzle problems using 8-core machines, we show that A*, weighted A* and Anytime weighted A* implemented using PBNF yield faster search than improved versions of previous parallel search proposals

UNH Scholars' Repository

A2-RL: Aesthetics Aware Reinforcement Learning for Image Cropping

Author: Huang Kaiqi
Li Debang
Wu Huikai
Zhang Junge
Publication venue
Publication date: 12/03/2018
Field of study

Image cropping aims at improving the aesthetic quality of images by adjusting their composition. Most weakly supervised cropping methods (without bounding box supervision) rely on the sliding window mechanism. The sliding window mechanism requires fixed aspect ratios and limits the cropping region with arbitrary size. Moreover, the sliding window method usually produces tens of thousands of windows on the input image which is very time-consuming. Motivated by these challenges, we firstly formulate the aesthetic image cropping as a sequential decision-making process and propose a weakly supervised Aesthetics Aware Reinforcement Learning (A2-RL) framework to address this problem. Particularly, the proposed method develops an aesthetics aware reward function which especially benefits image cropping. Similar to human's decision making, we use a comprehensive state representation including both the current observation and the historical experience. We train the agent using the actor-critic architecture in an end-to-end manner. The agent is evaluated on several popular unseen cropping datasets. Experiment results show that our method achieves the state-of-the-art performance with much fewer candidate windows and much less time compared with previous weakly supervised methods.Comment: Accepted by CVPR 201

arXiv.org e-Print Archive

Crossref