Search CORE

2,299 research outputs found

Space-Economical Partial Gram Indices for Exact Substring Matching

Author: Boncz P.A. (Peter)
Sidirourgos E. (Eleftherios)
Tang N. (Nan)
Publication venue: 'American College of Medical Physics (ACMP)'
Publication date: 01/11/2009
Field of study

CWI's Institutional Repository

Pigeonring: A Principle for Faster Thresholded Similarity Search

Author: Altschul S. F.
Andoni A.
Apostol T.
Arasu A.
Broder A. Z.
Christiani T.
Ciaccia P.
Daepp U.
Gionis A.
Gravano L.
Hwang Y.
Jégou H.
Kim S.
Li C.
Lv Q.
Mann W.
Meek C.
Qin J.
Razborov A. A.
Samet H.
Savasere A.
Tabei Y.
Tao T.
Wang J.
Weiss Y.
Yi B.
Publication venue: 'VLDB Endowment'
Publication date: 01/09/2018
Field of study

Crossref

Edinburgh Research Explorer

Data compression for sequencing data

Author: Sebastian Deorowicz
Szymon Grabowski
Publication venue: Springer Nature
Publication date: 01/01/2013
Field of study

Post-Sanger sequencing methods produce tons of data, and there is a general agreement that the challenge to store and process them must be addressed with data compression. In this review we first answer the question “why compression” in a quantitative manner. Then we also answer the questions “what” and “how”, by sketching the fundamental compression ideas, describing the main sequencing data types and formats, and comparing the specialized compression algorithms and tools. Finally, we go back to the question “why compression” and give other, perhaps surprising answers, demonstrating the pervasiveness of data compression techniques in computational biology

Springer - Publisher Connector

PubMed Central

Fuzzy Substring Matching: On-device Fuzzy Friend Search at Snapchat

Author: Pihur Vasyl
Thompson Scott
Publication venue
Publication date: 08/11/2022
Field of study

About 50% of all queries on Snapchat app are targeted at finding the right friend to interact with. Since everyone has a unique list of friends and that list is not very large (maximum a few thousand), it makes sense to perform this search locally, on users' devices. In addition, the friend list is already available for other purposes, such as showing the chat feed, and the latency savings can be significant by avoiding a server round-trip call. Historically, we resorted to substring matching, ranking prefix matches at the top of the result list. Introducing the ability to perform fuzzy search on a resource-constrained device and in the environment where typo's are prevalent is both prudent and challenging. In this paper, we describe our efficient and accurate two-step approach to fuzzy search, characterized by a skip-bigram retrieval layer and a novel local Levenshtein distance computation used for final ranking

arXiv.org e-Print Archive