8,702 research outputs found
A resource-frugal probabilistic dictionary and applications in (meta)genomics
Genomic and metagenomic fields, generating huge sets of short genomic
sequences, brought their own share of high performance problems. To extract
relevant pieces of information from the huge data sets generated by current
sequencing techniques, one must rely on extremely scalable methods and
solutions. Indexing billions of objects is a task considered too expensive
while being a fundamental need in this field. In this paper we propose a
straightforward indexing structure that scales to billions of element and we
propose two direct applications in genomics and metagenomics. We show that our
proposal solves problem instances for which no other known solution scales-up.
We believe that many tools and applications could benefit from either the
fundamental data structure we provide or from the applications developed from
this structure.Comment: Submitted to PSC 201
Efficient seeding techniques for protein similarity search
We apply the concept of subset seeds proposed in [1] to similarity search in
protein sequences. The main question studied is the design of efficient seed
alphabets to construct seeds with optimal sensitivity/selectivity trade-offs.
We propose several different design methods and use them to construct several
alphabets.We then perform an analysis of seeds built over those alphabet and
compare them with the standard Blastp seeding method [2,3], as well as with the
family of vector seeds proposed in [4]. While the formalism of subset seed is
less expressive (but less costly to implement) than the accumulative principle
used in Blastp and vector seeds, our seeds show a similar or even better
performance than Blastp on Bernoulli models of proteins compatible with the
common BLOSUM62 matrix
Efficient seeding techniques for protein similarity search
We apply the concept of subset seeds proposed in [1] to similarity search in
protein sequences. The main question studied is the design of efficient seed
alphabets to construct seeds with optimal sensitivity/selectivity trade-offs.
We propose several different design methods and use them to construct several
alphabets.We then perform an analysis of seeds built over those alphabet and
compare them with the standard Blastp seeding method [2,3], as well as with the
family of vector seeds proposed in [4]. While the formalism of subset seed is
less expressive (but less costly to implement) than the accumulative principle
used in Blastp and vector seeds, our seeds show a similar or even better
performance than Blastp on Bernoulli models of proteins compatible with the
common BLOSUM62 matrix
Reference face graph for face recognition
Face recognition has been studied extensively; however, real-world face recognition still remains a challenging task. The demand for unconstrained practical face recognition is rising with the explosion of online multimedia such as social networks, and video surveillance footage where face analysis is of significant importance. In this paper, we approach face recognition in the context of graph theory. We recognize an unknown face using an external reference face graph (RFG). An RFG is generated and recognition of a given face is achieved by comparing it to the faces in the constructed RFG. Centrality measures are utilized to identify distinctive faces in the reference face graph. The proposed RFG-based face recognition algorithm is robust to the changes in pose and it is also alignment free. The RFG recognition is used in conjunction with DCT locality sensitive hashing for efficient retrieval to ensure scalability. Experiments are conducted on several publicly available databases and the results show that the proposed approach outperforms the state-of-the-art methods without any preprocessing necessities such as face alignment. Due to the richness in the reference set construction, the proposed method can also handle illumination and expression variation
Automatic Synchronization of Multi-User Photo Galleries
In this paper we address the issue of photo galleries synchronization, where
pictures related to the same event are collected by different users. Existing
solutions to address the problem are usually based on unrealistic assumptions,
like time consistency across photo galleries, and often heavily rely on
heuristics, limiting therefore the applicability to real-world scenarios. We
propose a solution that achieves better generalization performance for the
synchronization task compared to the available literature. The method is
characterized by three stages: at first, deep convolutional neural network
features are used to assess the visual similarity among the photos; then, pairs
of similar photos are detected across different galleries and used to construct
a graph; eventually, a probabilistic graphical model is used to estimate the
temporal offset of each pair of galleries, by traversing the minimum spanning
tree extracted from this graph. The experimental evaluation is conducted on
four publicly available datasets covering different types of events,
demonstrating the strength of our proposed method. A thorough discussion of the
obtained results is provided for a critical assessment of the quality in
synchronization.Comment: ACCEPTED to IEEE Transactions on Multimedi
- …