We propose a multi-view network for text classification. Our method
automatically creates various views of its input text, each taking the form of
soft attention weights that distribute the classifier's focus among a set of
base features. For a bag-of-words representation, each view focuses on a
different subset of the text's words. Aggregating many such views results in a
more discriminative and robust representation. Through a novel architecture
that both stacks and concatenates views, we produce a network that emphasizes
both depth and width, allowing training to converge quickly. Using our
multi-view architecture, we establish new state-of-the-art accuracies on two
benchmark tasks.Comment: 6 page