276 research outputs found
Unsupervised Spoken Term Detection with Spoken Queries by Multi-level Acoustic Patterns with Varying Model Granularity
This paper presents a new approach for unsupervised Spoken Term Detection
with spoken queries using multiple sets of acoustic patterns automatically
discovered from the target corpus. The different pattern HMM
configurations(number of states per model, number of distinct models, number of
Gaussians per state)form a three-dimensional model granularity space. Different
sets of acoustic patterns automatically discovered on different points properly
distributed over this three-dimensional space are complementary to one another,
thus can jointly capture the characteristics of the spoken terms. By
representing the spoken content and spoken query as sequences of acoustic
patterns, a series of approaches for matching the pattern index sequences while
considering the signal variations are developed. In this way, not only the
on-line computation load can be reduced, but the signal distributions caused by
different speakers and acoustic conditions can be reasonably taken care of. The
results indicate that this approach significantly outperformed the unsupervised
feature-based DTW baseline by 16.16\% in mean average precision on the TIMIT
corpus.Comment: Accepted by ICASSP 201
Keyphrase Based Evaluation of Automatic Text Summarization
The development of methods to deal with the informative contents of the text
units in the matching process is a major challenge in automatic summary
evaluation systems that use fixed n-gram matching. The limitation causes
inaccurate matching between units in a peer and reference summaries. The
present study introduces a new Keyphrase based Summary Evaluator KpEval for
evaluating automatic summaries. The KpEval relies on the keyphrases since they
convey the most important concepts of a text. In the evaluation process, the
keyphrases are used in their lemma form as the matching text unit. The system
was applied to evaluate different summaries of Arabic multi-document data set
presented at TAC2011. The results showed that the new evaluation technique
correlates well with the known evaluation systems: Rouge1, Rouge2, RougeSU4,
and AutoSummENG MeMoG. KpEval has the strongest correlation with AutoSummENG
MeMoG, Pearson and spearman correlation coefficient measures are 0.8840, 0.9667
respectively.Comment: 4 pages, 1 figure, 3 table
Better Document-level Sentiment Analysis from RST Discourse Parsing
Discourse structure is the hidden link between surface features and
document-level properties, such as sentiment polarity. We show that the
discourse analyses produced by Rhetorical Structure Theory (RST) parsers can
improve document-level sentiment analysis, via composition of local information
up the discourse tree. First, we show that reweighting discourse units
according to their position in a dependency representation of the rhetorical
structure can yield substantial improvements on lexicon-based sentiment
analysis. Next, we present a recursive neural network over the RST structure,
which offers significant improvements over classification-based methods.Comment: Published at Empirical Methods in Natural Language Processing (EMNLP
2015
MORSE: Semantic-ally Drive-n MORpheme SEgment-er
We present in this paper a novel framework for morpheme segmentation which
uses the morpho-syntactic regularities preserved by word representations, in
addition to orthographic features, to segment words into morphemes. This
framework is the first to consider vocabulary-wide syntactico-semantic
information for this task. We also analyze the deficiencies of available
benchmarking datasets and introduce our own dataset that was created on the
basis of compositionality. We validate our algorithm across datasets and
present state-of-the-art results
- …