Skip to main content
Article thumbnail
Location of Repository

Sequence Prediction Exploiting Similarity Information

By István Bíró, Zoltán Szamonek and Csaba Szepesvári


When data is scarce or the alphabet is large, smoothing the probability estimates becomes inescapable when estimating n-gram models. In this paper we propose a method that implements a form of smoothing by exploiting similarity information of the alphabet elements. The idea is to view the log-conditional probability function as a smooth function defined over the similarity graph. The algorithm that we propose uses the eigenvectors of the similarity graph as the basis of the expansion of the log conditional probability function whose coefficients are found by solving a regularized logistic regression problem. The experimental results demonstrate the superiority of the method when the similarity graph contains relevant information, whilst the method still remains competitive with state-of-the-art smoothing methods even in the lack of such information.

Year: 2013
OAI identifier: oai:CiteSeerX.psu:
Provided by: CiteSeerX
Download PDF:
Sorry, we are unable to provide the full text but you may find it at the following location(s):
  • (external link)
  • (external link)
  • Suggested articles

    To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.