Search CORE

9 research outputs found

Intrinsic Dimensionality

Author: Pestov Vladimir
Publication venue
Publication date: 01/01/2010
Field of study

This entry for the SIGSPATIAL Special July 2010 issue on Similarity Searching in Metric Spaces discusses the notion of intrinsic dimensionality of data in the context of similarity search.Comment: 4 pages, 4 figures, latex; diagram (c) has been correcte

arXiv.org e-Print Archive

CiteSeerX

Indexability, concentration, and VC theory

Author: Pestov Vladimir
Publication venue: 'Elsevier BV'
Publication date: 21/05/2011
Field of study

Degrading performance of indexing schemes for exact similarity search in high dimensions has long since been linked to histograms of distributions of distances and other 1-Lipschitz functions getting concentrated. We discuss this observation in the framework of the phenomenon of concentration of measure on the structures of high dimension and the Vapnik-Chervonenkis theory of statistical learning.Comment: 17 pages, final submission to J. Discrete Algorithms (an expanded, improved and corrected version of the SISAP'2010 invited paper, this e-print, v3

arXiv.org e-Print Archive

Elsevier - Publisher Connector

Avaliação de desempenho dos métodos de acesso R-Tree e M-Tree para consultas aos k-vizinhos mais próximos em conjuntos de dados de alta dimensionalidade

Author: Assumpção Pedro Nogueira Gomes
Publication venue: 'Associacao Portuguesa de Sistemas de Informacao'
Publication date: 19/07/2018
Field of study

Trabalho de Conclusão de Curso (Graduação)Armazenamento de dados é uma atividade constante no campo da computação. A utilização de métodos de acesso para realizar buscas com eficiência é primordial. Esse trabalho apresenta duas estruturas de indexação para dados complexos (R-tree e M-tree) e o algoritmo de busca por similaridade k-vizinhos mais próximos avaliando o desempenho da busca nesses dois métodos de indexação comparando também como método sequencial, para entender casos em que a utilização do método de indexação é menos eficiente do que a não utilização de um método de acesso (causado pela maldição da dimensionalidade). Os experimentos apresentaram que uma das estruturas (R-tree) teve um desempenho pior que o método sequencial para as bases testadas e que a outra estrutura (M-tree) foi mais eficiente em todas as bases

Repositório Institucional da Universidade Federal de Uberlândia

Design and analysis of algorithms for similarity search based on intrinsic dimension

Author: Ma Xiguo
Publication venue: Digital Commons @ NJIT
Publication date: 31/01/2015
Field of study

One of the most fundamental operations employed in data mining tasks such as classification, cluster analysis, and anomaly detection, is that of similarity search. It has been used in numerous fields of application such as multimedia, information retrieval, recommender systems and pattern recognition. Specifically, a similarity query aims to retrieve from the database the most similar objects to a query object, where the underlying similarity measure is usually expressed as a distance function. The cost of processing similarity queries has been typically assessed in terms of the representational dimension of the data involved, that is, the number of features used to represent individual data objects. It is generally the case that high representational dimension would result in a significant increase in the processing cost of similarity queries. This relation is often attributed to an amalgamation of phenomena, collectively referred to as the curse of dimensionality. However, the observed effects of dimensionality in practice may not be as severe as expected. This has led to the development of models quantifying the complexity of data in terms of some measure of the intrinsic dimensionality. The generalized expansion dimension (GED) is one of such models, which estimates the intrinsic dimension in the vicinity of a query point q through the observation of the ranks and distances of pairs of neighbors with respect to q. This dissertation is mainly concerned with the design and analysis of search algorithms, based on the GED model. In particular, three variants of similarity search problem are considered, including adaptive similarity search, flexible aggregate similarity search, and subspace similarity search. The good practical performance of the proposed algorithms demonstrates the effectiveness of dimensionality-driven design of search algorithms

Digital Commons @ New Jersey Institute of Technology (NJIT)

Approaches to Quantifying EEG Features for Design Protocol Analysis

Author: Nguyen Philon
Publication venue
Publication date: 16/03/2017
Field of study

Recently, physiological signals such as eye-tracking and gesture analysis, galvanic skin response (GSR), electrocardiograms (ECG) and electroencephalograms (EEG) have been used by design researchers to extract significant information to describe the conceptual design process. We study a set of video-based design protocols recorded on subjects performing design tasks on a sketchpad while having their EEG monitored. The conceptual design process is rich with information on how designer’s do design. Many methods exist to analyze the conceptual design process, the most popular one being concurrent verbal protocols. A recurring problem in design protocol analysis is to segment and code protocol data into logical and semantic units. This is usually a manual step and little work has been done on fully automated segmentation techniques. Also, verbal protocols are known to fail in some circumstances such as when dealing with creativity, insight (e.g. Aha! experience, gestalt), concurrent, nonverbalizable (e.g. facial recognition) and nonconscious processes. We propose different approaches to study the conceptual design process using electroencephalograms (EEG). More specifically, we use spatio-temporal and frequency domain features. Our research is based on machine learning techniques used on EEG signals (functional microstate analysis), source localization (LORETA) and on a novel method of segmentation for design protocols based on EEG features. Using these techniques, we measure mental effort, fatigue and concentration in the conceptual design process, in addition to creativity and insight/nonverbalizable processing. We discuss the strengths and weaknesses of such approaches

Concordia University Research Repository

Curse of Dimensionality in Pivot Based Indexes

Author: Ilya Volnyansky
Vladimir Pestov
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2009
Field of study

CiteSeerX

Crossref