12,068 research outputs found
Scene Parsing via Dense Recurrent Neural Networks with Attentional Selection
Recurrent neural networks (RNNs) have shown the ability to improve scene
parsing through capturing long-range dependencies among image units. In this
paper, we propose dense RNNs for scene labeling by exploring various long-range
semantic dependencies among image units. Different from existing RNN based
approaches, our dense RNNs are able to capture richer contextual dependencies
for each image unit by enabling immediate connections between each pair of
image units, which significantly enhances their discriminative power. Besides,
to select relevant dependencies and meanwhile to restrain irrelevant ones for
each unit from dense connections, we introduce an attention model into dense
RNNs. The attention model allows automatically assigning more importance to
helpful dependencies while less weight to unconcerned dependencies. Integrating
with convolutional neural networks (CNNs), we develop an end-to-end scene
labeling system. Extensive experiments on three large-scale benchmarks
demonstrate that the proposed approach can improve the baselines by large
margins and outperform other state-of-the-art algorithms.Comment: 10 pages. arXiv admin note: substantial text overlap with
arXiv:1801.0683
Multi-level Contextual RNNs with Attention Model for Scene Labeling
Context in image is crucial for scene labeling while existing methods only
exploit local context generated from a small surrounding area of an image patch
or a pixel, by contrast long-range and global contextual information is
ignored. To handle this issue, we in this work propose a novel approach for
scene labeling by exploring multi-level contextual recurrent neural networks
(ML-CRNNs). Specifically, we encode three kinds of contextual cues, i.e., local
context, global context and image topic context in structural recurrent neural
networks (RNNs) to model long-range local and global dependencies in image. In
this way, our method is able to `see' the image in terms of both long-range
local and holistic views, and make a more reliable inference for image
labeling. Besides, we integrate the proposed contextual RNNs into hierarchical
convolutional neural networks (CNNs), and exploit dependence relationships in
multiple levels to provide rich spatial and semantic information. Moreover, we
novelly adopt an attention model to effectively merge multiple levels and show
that it outperforms average- or max-pooling fusion strategies. Extensive
experiments demonstrate that the proposed approach achieves new
state-of-the-art results on the CamVid, SiftFlow and Stanford-background
datasets.Comment: 8 pages, 8 figure
Data-Driven Shape Analysis and Processing
Data-driven methods play an increasingly important role in discovering
geometric, structural, and semantic relationships between 3D shapes in
collections, and applying this analysis to support intelligent modeling,
editing, and visualization of geometric data. In contrast to traditional
approaches, a key feature of data-driven approaches is that they aggregate
information from a collection of shapes to improve the analysis and processing
of individual shapes. In addition, they are able to learn models that reason
about properties and relationships of shapes without relying on hard-coded
rules or explicitly programmed instructions. We provide an overview of the main
concepts and components of these techniques, and discuss their application to
shape classification, segmentation, matching, reconstruction, modeling and
exploration, as well as scene analysis and synthesis, through reviewing the
literature and relating the existing works with both qualitative and numerical
comparisons. We conclude our report with ideas that can inspire future research
in data-driven shape analysis and processing.Comment: 10 pages, 19 figure
Dense Recurrent Neural Networks for Scene Labeling
Recently recurrent neural networks (RNNs) have demonstrated the ability to
improve scene labeling through capturing long-range dependencies among image
units. In this paper, we propose dense RNNs for scene labeling by exploring
various long-range semantic dependencies among image units. In comparison with
existing RNN based approaches, our dense RNNs are able to capture richer
contextual dependencies for each image unit via dense connections between each
pair of image units, which significantly enhances their discriminative power.
Besides, to select relevant and meanwhile restrain irrelevant dependencies for
each unit from dense connections, we introduce an attention model into dense
RNNs. The attention model enables automatically assigning more importance to
helpful dependencies while less weight to unconcerned dependencies. Integrating
with convolutional neural networks (CNNs), our method achieves state-of-the-art
performances on the PASCAL Context, MIT ADE20K and SiftFlow benchmarks.Comment: Tech. Repor
Cross-Sentence N-ary Relation Extraction with Graph LSTMs
Past work in relation extraction has focused on binary relations in single
sentences. Recent NLP inroads in high-value domains have sparked interest in
the more general setting of extracting n-ary relations that span multiple
sentences. In this paper, we explore a general relation extraction framework
based on graph long short-term memory networks (graph LSTMs) that can be easily
extended to cross-sentence n-ary relation extraction. The graph formulation
provides a unified way of exploring different LSTM approaches and incorporating
various intra-sentential and inter-sentential dependencies, such as sequential,
syntactic, and discourse relations. A robust contextual representation is
learned for the entities, which serves as input to the relation classifier.
This simplifies handling of relations with arbitrary arity, and enables
multi-task learning with related relations. We evaluate this framework in two
important precision medicine settings, demonstrating its effectiveness with
both conventional supervised learning and distant supervision. Cross-sentence
extraction produced larger knowledge bases. and multi-task learning
significantly improved extraction accuracy. A thorough analysis of various LSTM
approaches yielded useful insight the impact of linguistic analysis on
extraction accuracy.Comment: Conditional accepted by TACL in December 2016; published in April
2017; presented at ACL in August 201
Beneath the Tip of the Iceberg: Current Challenges and New Directions in Sentiment Analysis Research
Sentiment analysis as a field has come a long way since it was first
introduced as a task nearly 20 years ago. It has widespread commercial
applications in various domains like marketing, risk management, market
research, and politics, to name a few. Given its saturation in specific
subtasks -- such as sentiment polarity classification -- and datasets, there is
an underlying perception that this field has reached its maturity. In this
article, we discuss this perception by pointing out the shortcomings and
under-explored, yet key aspects of this field that are necessary to attain true
sentiment understanding. We analyze the significant leaps responsible for its
current relevance. Further, we attempt to chart a possible course for this
field that covers many overlooked and unanswered questions.Comment: Published in the IEEE Transactions on Affective Computing (TAFFC
Parsing Geometry Using Structure-Aware Shape Templates
Real-life man-made objects often exhibit strong and easily-identifiable
structure, as a direct result of their design or their intended functionality.
Structure typically appears in the form of individual parts and their
arrangement. Knowing about object structure can be an important cue for object
recognition and scene understanding - a key goal for various AR and robotics
applications. However, commodity RGB-D sensors used in these scenarios only
produce raw, unorganized point clouds, without structural information about the
captured scene. Moreover, the generated data is commonly partial and
susceptible to artifacts and noise, which makes inferring the structure of
scanned objects challenging. In this paper, we organize large shape collections
into parameterized shape templates to capture the underlying structure of the
objects. The templates allow us to transfer the structural information onto new
objects and incomplete scans. We employ a deep neural network that matches the
partial scan with one of the shape templates, then match and fit it to complete
and detailed models from the collection. This allows us to faithfully label its
parts and to guide the reconstruction of the scanned object. We showcase the
effectiveness of our method by comparing it to other state-of-the-art
approaches
SEGCloud: Semantic Segmentation of 3D Point Clouds
3D semantic scene labeling is fundamental to agents operating in the real
world. In particular, labeling raw 3D point sets from sensors provides
fine-grained semantics. Recent works leverage the capabilities of Neural
Networks (NNs), but are limited to coarse voxel predictions and do not
explicitly enforce global consistency. We present SEGCloud, an end-to-end
framework to obtain 3D point-level segmentation that combines the advantages of
NNs, trilinear interpolation(TI) and fully connected Conditional Random Fields
(FC-CRF). Coarse voxel predictions from a 3D Fully Convolutional NN are
transferred back to the raw 3D points via trilinear interpolation. Then the
FC-CRF enforces global consistency and provides fine-grained semantics on the
points. We implement the latter as a differentiable Recurrent NN to allow joint
optimization. We evaluate the framework on two indoor and two outdoor 3D
datasets (NYU V2, S3DIS, KITTI, Semantic3D.net), and show performance
comparable or superior to the state-of-the-art on all datasets.Comment: Accepted as a spotlight at the International Conference of 3D Vision
(3DV 2017
Global Relation Embedding for Relation Extraction
We study the problem of textual relation embedding with distant supervision.
To combat the wrong labeling problem of distant supervision, we propose to
embed textual relations with global statistics of relations, i.e., the
co-occurrence statistics of textual and knowledge base relations collected from
the entire corpus. This approach turns out to be more robust to the training
noise introduced by distant supervision. On a popular relation extraction
dataset, we show that the learned textual relation embedding can be used to
augment existing relation extraction models and significantly improve their
performance. Most remarkably, for the top 1,000 relational facts discovered by
the best existing model, the precision can be improved from 83.9% to 89.3%.Comment: Accepted to NAACL HLT 201
Towards Building a Knowledge Base of Monetary Transactions from a News Collection
We address the problem of extracting structured representations of economic
events from a large corpus of news articles, using a combination of natural
language processing and machine learning techniques. The developed techniques
allow for semi-automatic population of a financial knowledge base, which, in
turn, may be used to support a range of data mining and exploration tasks. The
key challenge we face in this domain is that the same event is often reported
multiple times, with varying correctness of details. We address this challenge
by first collecting all information pertinent to a given event from the entire
corpus, then considering all possible representations of the event, and
finally, using a supervised learning method, to rank these representations by
the associated confidence scores. A main innovative element of our approach is
that it jointly extracts and stores all attributes of the event as a single
representation (quintuple). Using a purpose-built test set we demonstrate that
our supervised learning approach can achieve 25% improvement in F1-score over
baseline methods that consider the earliest, the latest or the most frequent
reporting of the event.Comment: Proceedings of the 17th ACM/IEEE-CS Joint Conference on Digital
Libraries (JCDL '17), 201
- …