Search CORE

777 research outputs found

Re-ranking Permutation-Based Candidate Sets with the n-Simplex Projection

Author: A Esuli
B Thomee
D Novak
E Chávez
G Amato
G Amato
G Amato
G Amato
IJ Schoenberg
J Pennington
LM Blumenthal
ML Micó
P Zezula
R Connor
R Connor
R Connor
R Weber
V Pestov
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

In the realm of metric search, the permutation-based approaches have shown very good performance in indexing and supporting approximate search on large databases. These methods embed the metric objects into a permutation space where candidate results to a given query can be efficiently identified. Typically, to achieve high effectiveness, the permutation-based result set is refined by directly comparing each candidate object to the query one. Therefore, one drawback of these approaches is that the original dataset needs to be stored and then accessed during the refining step. We propose a refining approach based on a metric embedding, called n-Simplex projection, that can be used on metric spaces meeting the n-point property. The n-Simplex projection provides upper- and lower-bounds of the actual distance, derived using the distances between the data objects and a finite set of pivots. We propose to reuse the distances computed for building the data permutations to derive these bounds and we show how to use them to improve the permutation-based results. Our approach is particularly advantageous for all the cases in which the traditional refining step is too costly, e.g. very large dataset or very expensive metric function

Crossref

Stirling Online Research Repository (RIOXX)

Stirling Online Research Repository

University of St. Andrews - Pure

SPLX-Perm: A Novel Permutation-Based Representation for Approximate Metric Search

Author: Connor Richard
Falchi Fabrizio
Gennaro Claudio
Rabitti Fausto
Vadicamo Lucia
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

Many approaches for approximate metric search rely on a permutation-based representation of the original data objects. The main advantage of transforming metric objects into permutations is that the latter can be efficiently indexed and searched using data structures such as inverted-files and prefix trees. Typically, the permutation is obtained by ordering the identifiers of a set of pivots according to their distances to the object to be represented. In this paper, we present a novel approach to transform metric objects into permutations. It uses the object-pivot distances in combination with a metric transformation, called n-Simplex projection. The resulting permutation-based representation , named SPLX-Perm, is suitable only for the large class of metric space satisfying the n-point property. We tested the proposed approach on two benchmarks for similarity search. Our preliminary results are encouraging and open new perspectives for further investigations on the use of the n-Simplex projection for supporting permutation-based indexing

Stirling Online Research Repository (RIOXX)

Stirling Online Research Repository

University of St. Andrews - Pure

Recommended from our members

Dissecting the genetic basis of comorbid epilepsy phenotypes in neurodevelopmental disorders.

Author: Amini Hajar
Chow Julie
Girirajan Santhosh
Hormozdiari Farhad
Hormozdiari Fereydoun
Jensen Matthew
Penn Osnat
Shifman Sagiv
Publication venue: eScholarship, University of California
Publication date: 01/10/2019
Field of study

BACKGROUND:Neurodevelopmental disorders (NDDs) such as autism spectrum disorder, intellectual disability, developmental disability, and epilepsy are characterized by abnormal brain development that may affect cognition, learning, behavior, and motor skills. High co-occurrence (comorbidity) of NDDs indicates a shared, underlying biological mechanism. The genetic heterogeneity and overlap observed in NDDs make it difficult to identify the genetic causes of specific clinical symptoms, such as seizures. METHODS:We present a computational method, MAGI-S, to discover modules or groups of highly connected genes that together potentially perform a similar biological function. MAGI-S integrates protein-protein interaction and co-expression networks to form modules centered around the selection of a single "seed" gene, yielding modules consisting of genes that are highly co-expressed with the seed gene. We aim to dissect the epilepsy phenotype from a general NDD phenotype by providing MAGI-S with high confidence NDD seed genes with varying degrees of association with epilepsy, and we assess the enrichment of de novo mutation, NDD-associated genes, and relevant biological function of constructed modules. RESULTS:The newly identified modules account for the increased rate of de novo non-synonymous mutations in autism, intellectual disability, developmental disability, and epilepsy, and enrichment of copy number variations (CNVs) in developmental disability. We also observed that modules seeded with genes strongly associated with epilepsy tend to have a higher association with epilepsy phenotypes than modules seeded at other neurodevelopmental disorder genes. Modules seeded with genes strongly associated with epilepsy (e.g., SCN1A, GABRA1, and KCNB1) are significantly associated with synaptic transmission, long-term potentiation, and calcium signaling pathways. On the other hand, modules found with seed genes that are not associated or weakly associated with epilepsy are mostly involved with RNA regulation and chromatin remodeling. CONCLUSIONS:In summary, our method identifies modules enriched with de novo non-synonymous mutations and can capture specific networks that underlie the epilepsy phenotype and display distinct enrichment in relevant biological processes. MAGI-S is available at https://github.com/jchow32/magi-s

eScholarship - University of California

Projection pursuit for discrete data

Author: Diaconis Persi
Salzman Julia
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2007
Field of study

This paper develops projection pursuit for discrete data using the discrete Radon transform. Discrete projection pursuit is presented as an exploratory method for finding informative low dimensional views of data such as binary vectors, rankings, phylogenetic trees or graphs. We show that for most data sets, most projections are close to uniform. Thus, informative summaries are ones deviating from uniformity. Syllabic data from several of Plato's great works is used to illustrate the methods. Along with some basic distribution theory, an automated procedure for computing informative projections is introduced.Comment: Published in at http://dx.doi.org/10.1214/193940307000000482 the IMS Collections (http://www.imstat.org/publications/imscollections.htm) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

CiteSeerX

Crossref

Visualising many-objective populations

Author: Everson Richard M.
Fieldsend Jonathan E.
Walker David J.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2012
Field of study

Copyright © 2012 ACM14th International Conference on Genetic and Evolutionary Computation (GECCO 2012), Philadelphia, USA, 7-11 July 2012Optimisation problems often comprise a large set of objectives, and visualising the set of solutions to a problem can help with understanding them, assisting a decision maker. If the set of objectives is larger than three, visualising solutions to the problem is a difficult task. Techniques for visualising high-dimensional data are often difficult to interpret. Conversely, discarding objectives so that the solutions can be visualised in two or three spatial dimensions results in a loss of potentially important information. We demonstrate four methods for visualising many-objective populations, two of which use the complete set of objectives to present solutions in a clear and intuitive fashion and two that compress the objectives of a population into two dimensions whilst minimising the information that is lost. All of the techniques are illustrated on populations of solutions to optimisation test problems

Crossref

Open Research Exeter

Hashing for Similarity Search: A Survey

Author: Ji Jianqiu
Shen Heng Tao
Song Jingkuan
Wang Jingdong
Publication venue
Publication date: 13/08/2014
Field of study

Similarity search (nearest neighbor search) is a problem of pursuing the data items whose distances to a query item are the smallest from a large database. Various methods have been developed to address this problem, and recently a lot of efforts have been devoted to approximate search. In this paper, we present a survey on one of the main solutions, hashing, which has been widely studied since the pioneering work locality sensitive hashing. We divide the hashing algorithms two main categories: locality sensitive hashing, which designs hash functions without exploring the data distribution and learning to hash, which learns hash functions according the data distribution, and review them from various aspects, including hash function design and distance measure and search scheme in the hash coding space

arXiv.org e-Print Archive

CiteSeerX