173 research outputs found
Variable Word Rate N-grams
The rate of occurrence of words is not uniform but varies from document to
document. Despite this observation, parameters for conventional n-gram language
models are usually derived using the assumption of a constant word rate. In
this paper we investigate the use of variable word rate assumption, modelled by
a Poisson distribution or a continuous mixture of Poissons. We present an
approach to estimating the relative frequencies of words or n-grams taking
prior information of their occurrences into account. Discounting and smoothing
schemes are also considered. Using the Broadcast News task, the approach
demonstrates a reduction of perplexity up to 10%.Comment: 4 pages, 4 figures, ICASSP-200
Combining semantic and syntactic structure for language modeling
Structured language models for speech recognition have been shown to remedy
the weaknesses of n-gram models. All current structured language models are,
however, limited in that they do not take into account dependencies between
non-headwords. We show that non-headword dependencies contribute to
significantly improved word error rate, and that a data-oriented parsing model
trained on semantically and syntactically annotated data can exploit these
dependencies. This paper also contains the first DOP model trained by means of
a maximum likelihood reestimation procedure, which solves some of the
theoretical shortcomings of previous DOP models.Comment: 4 page
Relating Turing's Formula and Zipf's Law
An asymptote is derived from Turing's local reestimation formula for
population frequencies, and a local reestimation formula is derived from Zipf's
law for the asymptotic behavior of population frequencies. The two are shown to
be qualitatively different asymptotically, but nevertheless to be instances of
a common class of reestimation-formula-asymptote pairs, in which they
constitute the upper and lower bounds of the convergence region of the
cumulative of the frequency function, as rank tends to infinity. The results
demonstrate that Turing's formula is qualitatively different from the various
extensions to Zipf's law, and suggest that it smooths the frequency estimates
towards a geometric distribution.Comment: 9 pages, uuencoded, gzipped PostScript; some typos remove
観測頻度に基づくゆう度比の保守的な直接推定
データを確率的に取り扱う問題において,統計的尺度の推定は手法の構成やデータ分析の基盤的役割を担う.本論文では統計的尺度の一つであるゆう度比を,離散的な標本空間から得た観測頻度をもとに推定する問題を扱う.素朴な推定方法は,ゆう度比の定義に従い,ゆう度比を構成する二つの確率分布を最ゆう推定して,その比を取ることである.しかし,低頻度からゆう度比を求めるとき,この方法は推定量を不当に高く見積もってしまう場合がある.そこで,ゆう度比の直接推定法uLSIF を応用し,ゆう度比を低めに(保守的に)推定する方法を提案する.提案手法は,最ゆう推定によって求めたゆう度比を正則化パラメータによって調整する枠組みである.実験では提案手法の振る舞いを明らかにし,その有効性を示した.更に,自然言語処理におけるブートストラップ法を利用した実験も行い,提案手法の実用性も示した
A Neural Network Approach for Mixing Language Models
The performance of Neural Network (NN)-based language models is steadily
improving due to the emergence of new architectures, which are able to learn
different natural language characteristics. This paper presents a novel
framework, which shows that a significant improvement can be achieved by
combining different existing heterogeneous models in a single architecture.
This is done through 1) a feature layer, which separately learns different
NN-based models and 2) a mixture layer, which merges the resulting model
features. In doing so, this architecture benefits from the learning
capabilities of each model with no noticeable increase in the number of model
parameters or the training time. Extensive experiments conducted on the Penn
Treebank (PTB) and the Large Text Compression Benchmark (LTCB) corpus showed a
significant reduction of the perplexity when compared to state-of-the-art
feedforward as well as recurrent neural network architectures.Comment: Published at IEEE International Conference on Acoustics, Speech and
Signal Processing (ICASSP) 2017. arXiv admin note: text overlap with
arXiv:1703.0806
- …