1 research outputs found
On Semi-Supervised Multiple Representation Behavior Learning
We propose a novel paradigm of semi-supervised learning (SSL)--the
semi-supervised multiple representation behavior learning (SSMRBL). SSMRBL aims
to tackle the difficulty of learning a grammar for natural language parsing
where the data are natural language texts and the 'labels' for marking data are
parsing trees and/or grammar rule pieces. We call such 'labels' as compound
structured labels which require a hard work for training. SSMRBL is an
incremental learning process that can learn more than one representation, which
is an appropriate solution for dealing with the scarce of labeled training data
in the age of big data and with the heavy workload of learning compound
structured labels. We also present a typical example of SSMRBL, regarding
behavior learning in form of a grammatical approach towards domain-based
multiple text summarization (DBMTS). DBMTS works under the framework of
rhetorical structure theory (RST). SSMRBL includes two representations: text
embedding (for representing information contained in the texts) and grammar
model (for representing parsing as a behavior). The first representation was
learned as embedded digital vectors called impacts in a low dimensional space.
The grammar model was learned in an iterative way. Then an automatic
domain-oriented multi-text summarization approach was proposed based on the two
representations discussed above. Experimental results on large-scale Chinese
dataset SogouCA indicate that the proposed method brings a good performance
even if only few labeled texts are used for training with respect to our
defined automated metrics.Comment: 18 pages, 7 figure