2,299 research outputs found
Data compression for sequencing data
Post-Sanger sequencing methods produce tons of data, and there is a general agreement that the challenge to store and process them must be addressed with data compression. In this review we first answer the question âwhy compressionâ in a quantitative manner. Then we also answer the questions âwhatâ and âhowâ, by sketching the fundamental compression ideas, describing the main sequencing data types and formats, and comparing the specialized compression algorithms and tools. Finally, we go back to the question âwhy compressionâ and give other, perhaps surprising answers, demonstrating the pervasiveness of data compression techniques in computational biology
Fuzzy Substring Matching: On-device Fuzzy Friend Search at Snapchat
About 50% of all queries on Snapchat app are targeted at finding the right
friend to interact with. Since everyone has a unique list of friends and that
list is not very large (maximum a few thousand), it makes sense to perform this
search locally, on users' devices. In addition, the friend list is already
available for other purposes, such as showing the chat feed, and the latency
savings can be significant by avoiding a server round-trip call. Historically,
we resorted to substring matching, ranking prefix matches at the top of the
result list. Introducing the ability to perform fuzzy search on a
resource-constrained device and in the environment where typo's are prevalent
is both prudent and challenging. In this paper, we describe our efficient and
accurate two-step approach to fuzzy search, characterized by a skip-bigram
retrieval layer and a novel local Levenshtein distance computation used for
final ranking
- âŠ