1,105 research outputs found
Attribute-Aware Attention Model for Fine-grained Representation Learning
How to learn a discriminative fine-grained representation is a key point in
many computer vision applications, such as person re-identification,
fine-grained classification, fine-grained image retrieval, etc. Most of the
previous methods focus on learning metrics or ensemble to derive better global
representation, which are usually lack of local information. Based on the
considerations above, we propose a novel Attribute-Aware Attention Model
(), which can learn local attribute representation and global category
representation simultaneously in an end-to-end manner. The proposed model
contains two attention models: attribute-guided attention module uses attribute
information to help select category features in different regions, at the same
time, category-guided attention module selects local features of different
attributes with the help of category cues. Through this attribute-category
reciprocal process, local and global features benefit from each other. Finally,
the resulting feature contains more intrinsic information for image recognition
instead of the noisy and irrelevant features. Extensive experiments conducted
on Market-1501, CompCars, CUB-200-2011 and CARS196 demonstrate the
effectiveness of our . Code is available at
https://github.com/iamhankai/attribute-aware-attention.Comment: Accepted by ACM Multimedia 2018 (Oral). Code is available at
https://github.com/iamhankai/attribute-aware-attentio
Drosophila-Inspired 3D Moving Object Detection Based on Point Clouds
3D moving object detection is one of the most critical tasks in dynamic scene
analysis. In this paper, we propose a novel Drosophila-inspired 3D moving
object detection method using Lidar sensors. According to the theory of
elementary motion detector, we have developed a motion detector based on the
shallow visual neural pathway of Drosophila. This detector is sensitive to the
movement of objects and can well suppress background noise. Designing neural
circuits with different connection modes, the approach searches for motion
areas in a coarse-to-fine fashion and extracts point clouds of each motion area
to form moving object proposals. An improved 3D object detection network is
then used to estimate the point clouds of each proposal and efficiently
generates the 3D bounding boxes and the object categories. We evaluate the
proposed approach on the widely-used KITTI benchmark, and state-of-the-art
performance was obtained by using the proposed approach on the task of motion
detection
3D Pose Estimation for Fine-Grained Object Categories
Existing object pose estimation datasets are related to generic object types
and there is so far no dataset for fine-grained object categories. In this
work, we introduce a new large dataset to benchmark pose estimation for
fine-grained objects, thanks to the availability of both 2D and 3D fine-grained
data recently. Specifically, we augment two popular fine-grained recognition
datasets (StanfordCars and CompCars) by finding a fine-grained 3D CAD model for
each sub-category and manually annotating each object in images with 3D pose.
We show that, with enough training data, a full perspective model with
continuous parameters can be estimated using 2D appearance information alone.
We achieve this via a framework based on Faster/Mask R-CNN. This goes beyond
previous works on category-level pose estimation, which only estimate
discrete/continuous viewpoint angles or recover rotation matrices often with
the help of key points. Furthermore, with fine-grained 3D models available, we
incorporate a dense 3D representation named as location field into the
CNN-based pose estimation framework to further improve the performance. The new
dataset is available at www.umiacs.umd.edu/~wym/3dpose.htmlComment: 4th International Workshop on Recovering 6D Object Pose (ECCVW 2018).
arXiv admin note: text overlap with arXiv:1810.0926
Unsupervised Feature Learning Toward a Real-time Vehicle Make and Model Recognition
Vehicle Make and Model Recognition (MMR) systems provide a fully automatic
framework to recognize and classify different vehicle models. Several
approaches have been proposed to address this challenge, however they can
perform in restricted conditions. Here, we formulate the vehicle make and model
recognition as a fine-grained classification problem and propose a new
configurable on-road vehicle make and model recognition framework. We benefit
from the unsupervised feature learning methods and in more details we employ
Locality constraint Linear Coding (LLC) method as a fast feature encoder for
encoding the input SIFT features. The proposed method can perform in real
environments of different conditions. This framework can recognize fifty models
of vehicles and has an advantage to classify every other vehicle not belonging
to one of the specified fifty classes as an unknown vehicle. The proposed MMR
framework can be configured to become faster or more accurate based on the
application domain. The proposed approach is examined on two datasets including
Iranian on-road vehicle dataset and CompuCar dataset. The Iranian on-road
vehicle dataset contains images of 50 models of vehicles captured in real
situations by traffic cameras in different weather and lighting conditions.
Experimental results show superiority of the proposed framework over the
state-of-the-art methods on Iranian on-road vehicle datatset and comparable
results on CompuCar dataset with 97.5% and 98.4% accuracies, respectively.Comment: 15 pages include 14 figures and 5 table
cvpaper.challenge in 2016: Futuristic Computer Vision through 1,600 Papers Survey
The paper gives futuristic challenges disscussed in the cvpaper.challenge. In
2015 and 2016, we thoroughly study 1,600+ papers in several
conferences/journals such as CVPR/ICCV/ECCV/NIPS/PAMI/IJCV
Cascaded Models for Better Fine-Grained Named Entity Recognition
Named Entity Recognition (NER) is an essential precursor task for many
natural language applications, such as relation extraction or event extraction.
Much of the NER research has been done on datasets with few classes of entity
types (e.g. PER, LOC, ORG, MISC), but many real world applications (disaster
relief, complex event extraction, law enforcement) can benefit from a larger
NER typeset. More recently, datasets were created that have hundreds to
thousands of types of entities, sparking new lines of research (Sekine,
2008;Ling and Weld, 2012; Gillick et al., 2014; Choiet al., 2018). In this
paper we present a cascaded approach to labeling fine-grained NER, applying to
a newly released fine-grained NER dataset that was used in the TAC KBP 2019
evaluation (Ji et al., 2019), inspired by the fact that training data is
available for some of the coarse labels. Using a combination of transformer
networks, we show that performance can be improved by about 20 F1 absolute, as
compared with the straightforward model built on the full fine-grained types,
and show that, surprisingly, using course-labeled data in three languages leads
to an improvement in the English data
FAIR1M: A Benchmark Dataset for Fine-grained Object Recognition in High-Resolution Remote Sensing Imagery
With the rapid development of deep learning, many deep learning-based
approaches have made great achievements in object detection task. It is
generally known that deep learning is a data-driven method. Data directly
impact the performance of object detectors to some extent. Although existing
datasets have included common objects in remote sensing images, they still have
some limitations in terms of scale, categories, and images. Therefore, there is
a strong requirement for establishing a large-scale benchmark on object
detection in high-resolution remote sensing images. In this paper, we propose a
novel benchmark dataset with more than 1 million instances and more than 15,000
images for Fine-grAined object recognItion in high-Resolution remote sensing
imagery which is named as FAIR1M. All objects in the FAIR1M dataset are
annotated with respect to 5 categories and 37 sub-categories by oriented
bounding boxes. Compared with existing detection datasets dedicated to object
detection, the FAIR1M dataset has 4 particular characteristics: (1) it is much
larger than other existing object detection datasets both in terms of the
quantity of instances and the quantity of images, (2) it provides more rich
fine-grained category information for objects in remote sensing images, (3) it
contains geographic information such as latitude, longitude and resolution, (4)
it provides better image quality owing to a careful data cleaning procedure. To
establish a baseline for fine-grained object recognition, we propose a novel
evaluation method and benchmark fine-grained object detection tasks and a
visual classification task using several State-Of-The-Art (SOTA) deep
learning-based models on our FAIR1M dataset. Experimental results strongly
indicate that the FAIR1M dataset is closer to practical application and it is
considerably more challenging than existing datasets.Comment: 19 pages, 13 figure
City-Scale Road Audit System using Deep Learning
Road networks in cities are massive and is a critical component of mobility.
Fast response to defects, that can occur not only due to regular wear and tear
but also because of extreme events like storms, is essential. Hence there is a
need for an automated system that is quick, scalable and cost-effective for
gathering information about defects. We propose a system for city-scale road
audit, using some of the most recent developments in deep learning and semantic
segmentation. For building and benchmarking the system, we curated a dataset
which has annotations required for road defects. However, many of the labels
required for road audit have high ambiguity which we overcome by proposing a
label hierarchy. We also propose a multi-step deep learning model that segments
the road, subdivide the road further into defects, tags the frame for each
defect and finally localizes the defects on a map gathered using GPS. We
analyze and evaluate the models on image tagging as well as segmentation at
different levels of the label hierarchy.Comment: IROS'1
AI Oriented Large-Scale Video Management for Smart City: Technologies, Standards and Beyond
Deep learning has achieved substantial success in a series of tasks in
computer vision. Intelligent video analysis, which can be broadly applied to
video surveillance in various smart city applications, can also be driven by
such powerful deep learning engines. To practically facilitate deep neural
network models in the large-scale video analysis, there are still unprecedented
challenges for the large-scale video data management. Deep feature coding,
instead of video coding, provides a practical solution for handling the
large-scale video surveillance data. To enable interoperability in the context
of deep feature coding, standardization is urgent and important. However, due
to the explosion of deep learning algorithms and the particularity of feature
coding, there are numerous remaining problems in the standardization process.
This paper envisions the future deep feature coding standard for the AI
oriented large-scale video management, and discusses existing techniques,
standards and possible solutions for these open problems.Comment: 8 pages, 8 figures, 5 table
cvpaper.challenge in 2015 - A review of CVPR2015 and DeepSurvey
The "cvpaper.challenge" is a group composed of members from AIST, Tokyo Denki
Univ. (TDU), and Univ. of Tsukuba that aims to systematically summarize papers
on computer vision, pattern recognition, and related fields. For this
particular review, we focused on reading the ALL 602 conference papers
presented at the CVPR2015, the premier annual computer vision event held in
June 2015, in order to grasp the trends in the field. Further, we are proposing
"DeepSurvey" as a mechanism embodying the entire process from the reading
through all the papers, the generation of ideas, and to the writing of paper.Comment: Survey Pape
- …