Fast and secure retrieval of DNA sequences

Abstract

Sequence models are retrieved from a sequences index. The sequence models model DNA or RNA sequences stored in a database, and each comprises a finite memory tree source model and parameters for the finite memory tree source model. One or more DNA or RNA sequences stored in the database are identified as being most similar to a query DNA or RNA sequence based on fitting of the retrieved sequence models to the query DNA or RNA sequence. The sequence models may be context tree weighting (CTW) models {Sx, [theta]Sx} where Sx denotes the context tree model for the DNA or RNA sequence x stored in the database, and [theta]Sx denotes parameters of the context tree model Sx. The fitting may include, for each CTW model {Sx, [theta]Sx}, computing the codeword length for the query DNA or RNA sequence y using the CTW model {Sx, [theta]Sx}

    Similar works