Weakly supervised aspect extraction for domain-specific texts

Abstract

Aspect extraction, identifying aspects of text segments from a pre-defined set of aspects, is one of the keystones in text understanding. It benefits numerous applications, including sentiment analysis and product review summarization. Most existing aspect extraction methods heavily rely on human-curated aspect annotations of massive text segments, thus making them expensive to be applied in specific domains. Recent attempts leveraging clustering methods can alleviate such annotation effort, but they require domain-specific knowledge and effort to further filter, aggregate, and align the clustering results to desired aspects. Therefore, in this paper, we explore to extract aspects from the domain-specific raw texts with very limited supervision – only a few user-provided seed words per each aspect. Specifically, our proposed neural model is equipped with multi-head attention and self-training. The multi-head attention is learned from the seed words to ensure that the aspect-related words in text segments are weighted higher than those unrelated ones. The self-training mechanism provides more pseudo labels in addition to limited supervision. Extensive experiments on real-world datasets demonstrate the superior performance of our proposed framework, as well as the effectiveness of both the attention module and the self-training mechanism. Case studies on the attention weights further shed lights on the interpretability of our aspect extraction results

    Similar works