16,329 research outputs found
Character-level Convolutional Networks for Text Classification
This article offers an empirical exploration on the use of character-level
convolutional networks (ConvNets) for text classification. We constructed
several large-scale datasets to show that character-level convolutional
networks could achieve state-of-the-art or competitive results. Comparisons are
offered against traditional models such as bag of words, n-grams and their
TFIDF variants, and deep learning models such as word-based ConvNets and
recurrent neural networks.Comment: An early version of this work entitled "Text Understanding from
Scratch" was posted in Feb 2015 as arXiv:1502.01710. The present paper has
considerably more experimental results and a rewritten introduction, Advances
in Neural Information Processing Systems 28 (NIPS 2015
TGSum: Build Tweet Guided Multi-Document Summarization Dataset
The development of summarization research has been significantly hampered by
the costly acquisition of reference summaries. This paper proposes an effective
way to automatically collect large scales of news-related multi-document
summaries with reference to social media's reactions. We utilize two types of
social labels in tweets, i.e., hashtags and hyper-links. Hashtags are used to
cluster documents into different topic sets. Also, a tweet with a hyper-link
often highlights certain key points of the corresponding document. We
synthesize a linked document cluster to form a reference summary which can
cover most key points. To this aim, we adopt the ROUGE metrics to measure the
coverage ratio, and develop an Integer Linear Programming solution to discover
the sentence set reaching the upper bound of ROUGE. Since we allow summary
sentences to be selected from both documents and high-quality tweets, the
generated reference summaries could be abstractive. Both informativeness and
readability of the collected summaries are verified by manual judgment. In
addition, we train a Support Vector Regression summarizer on DUC generic
multi-document summarization benchmarks. With the collected data as extra
training resource, the performance of the summarizer improves a lot on all the
test sets. We release this dataset for further research.Comment: 7 pages, 1 figure in AAAI 201
Sentiment Analysis: State of the Art
We present the state of art in sentiment analysis which covers the purpose of sentiment analysis, levels of sentiment analysis and processes that could be used to measure polarity and classify labels. Moreover, brief details about some resources of sentiment analysis are included
- …