143,003 research outputs found
Efficient seeding techniques for protein similarity search
We apply the concept of subset seeds proposed in [1] to similarity search in
protein sequences. The main question studied is the design of efficient seed
alphabets to construct seeds with optimal sensitivity/selectivity trade-offs.
We propose several different design methods and use them to construct several
alphabets.We then perform an analysis of seeds built over those alphabet and
compare them with the standard Blastp seeding method [2,3], as well as with the
family of vector seeds proposed in [4]. While the formalism of subset seed is
less expressive (but less costly to implement) than the accumulative principle
used in Blastp and vector seeds, our seeds show a similar or even better
performance than Blastp on Bernoulli models of proteins compatible with the
common BLOSUM62 matrix
Efficient seeding techniques for protein similarity search
We apply the concept of subset seeds proposed in [1] to similarity search in
protein sequences. The main question studied is the design of efficient seed
alphabets to construct seeds with optimal sensitivity/selectivity trade-offs.
We propose several different design methods and use them to construct several
alphabets.We then perform an analysis of seeds built over those alphabet and
compare them with the standard Blastp seeding method [2,3], as well as with the
family of vector seeds proposed in [4]. While the formalism of subset seed is
less expressive (but less costly to implement) than the accumulative principle
used in Blastp and vector seeds, our seeds show a similar or even better
performance than Blastp on Bernoulli models of proteins compatible with the
common BLOSUM62 matrix
When Hashing Met Matching: Efficient Spatio-Temporal Search for Ridesharing
Carpooling, or sharing a ride with other passengers, holds immense potential
for urban transportation. Ridesharing platforms enable such sharing of rides
using real-time data. Finding ride matches in real-time at urban scale is a
difficult combinatorial optimization task and mostly heuristic approaches are
applied. In this work, we mathematically model the problem as that of finding
near-neighbors and devise a novel efficient spatio-temporal search algorithm
based on the theory of locality sensitive hashing for Maximum Inner Product
Search (MIPS). The proposed algorithm can find near-optimal potential
matches for every ride from a pool of rides in time and space for a small . Our
algorithm can be extended in several useful and interesting ways increasing its
practical appeal. Experiments with large NY yellow taxi trip datasets show that
our algorithm consistently outperforms state-of-the-art heuristic methods
thereby proving its practical applicability
- …