53,476 research outputs found
Fighting with the Sparsity of Synonymy Dictionaries
Graph-based synset induction methods, such as MaxMax and Watset, induce
synsets by performing a global clustering of a synonymy graph. However, such
methods are sensitive to the structure of the input synonymy graph: sparseness
of the input dictionary can substantially reduce the quality of the extracted
synsets. In this paper, we propose two different approaches designed to
alleviate the incompleteness of the input dictionaries. The first one performs
a pre-processing of the graph by adding missing edges, while the second one
performs a post-processing by merging similar synset clusters. We evaluate
these approaches on two datasets for the Russian language and discuss their
impact on the performance of synset induction methods. Finally, we perform an
extensive error analysis of each approach and discuss prominent alternative
methods for coping with the problem of the sparsity of the synonymy
dictionaries.Comment: In Proceedings of the 6th Conference on Analysis of Images, Social
Networks, and Texts (AIST'2017): Springer Lecture Notes in Computer Science
(LNCS
Abstractive Multi-Document Summarization via Phrase Selection and Merging
We propose an abstraction-based multi-document summarization framework that
can construct new sentences by exploring more fine-grained syntactic units than
sentences, namely, noun/verb phrases. Different from existing abstraction-based
approaches, our method first constructs a pool of concepts and facts
represented by phrases from the input documents. Then new sentences are
generated by selecting and merging informative phrases to maximize the salience
of phrases and meanwhile satisfy the sentence construction constraints. We
employ integer linear optimization for conducting phrase selection and merging
simultaneously in order to achieve the global optimal solution for a summary.
Experimental results on the benchmark data set TAC 2011 show that our framework
outperforms the state-of-the-art models under automated pyramid evaluation
metric, and achieves reasonably well results on manual linguistic quality
evaluation.Comment: 11 pages, 1 figure, accepted as a full paper at ACL 201
Lattice score based data cleaning for phrase-based statistical machine translation
Statistical machine translation relies heavily
on parallel corpora to train its models
for translation tasks. While more and
more bilingual corpora are readily available,
the quality of the sentence pairs
should be taken into consideration. This
paper presents a novel lattice score-based
data cleaning method to select proper sentence
pairs from the ones extracted from a
bilingual corpus by the sentence alignment
methods. The proposed method is carried
out as follows: firstly, an initial phrasebased
model is trained on the full sentencealigned
corpus; then for each of the sentence
pairs in the corpus, word alignments
are used to create anchor pairs and sourceside
lattices; thirdly, based on the translation
model, target-side phrase networks
are expanded on the lattices and Viterbi
searching is used to find approximated decoding
results; finally, BLEU score thresholds
are used to filter out the low-score
sentence pairs for the data cleaning purpose.
Our experiments on the FBIS corpus
showed improvements of BLEU score
from 23.78 to 24.02 in Chinese-English
3D Registration of Aerial and Ground Robots for Disaster Response: An Evaluation of Features, Descriptors, and Transformation Estimation
Global registration of heterogeneous ground and aerial mapping data is a
challenging task. This is especially difficult in disaster response scenarios
when we have no prior information on the environment and cannot assume the
regular order of man-made environments or meaningful semantic cues. In this
work we extensively evaluate different approaches to globally register UGV
generated 3D point-cloud data from LiDAR sensors with UAV generated point-cloud
maps from vision sensors. The approaches are realizations of different
selections for: a) local features: key-points or segments; b) descriptors:
FPFH, SHOT, or ESF; and c) transformation estimations: RANSAC or FGR.
Additionally, we compare the results against standard approaches like applying
ICP after a good prior transformation has been given. The evaluation criteria
include the distance which a UGV needs to travel to successfully localize, the
registration error, and the computational cost. In this context, we report our
findings on effectively performing the task on two new Search and Rescue
datasets. Our results have the potential to help the community take informed
decisions when registering point-cloud maps from ground robots to those from
aerial robots.Comment: Awarded Best Paper at the 15th IEEE International Symposium on
Safety, Security, and Rescue Robotics 2017 (SSRR 2017
- …