9 research outputs found
Intrinsic Dimensionality
This entry for the SIGSPATIAL Special July 2010 issue on Similarity Searching
in Metric Spaces discusses the notion of intrinsic dimensionality of data in
the context of similarity search.Comment: 4 pages, 4 figures, latex; diagram (c) has been correcte
Indexability, concentration, and VC theory
Degrading performance of indexing schemes for exact similarity search in high
dimensions has long since been linked to histograms of distributions of
distances and other 1-Lipschitz functions getting concentrated. We discuss this
observation in the framework of the phenomenon of concentration of measure on
the structures of high dimension and the Vapnik-Chervonenkis theory of
statistical learning.Comment: 17 pages, final submission to J. Discrete Algorithms (an expanded,
improved and corrected version of the SISAP'2010 invited paper, this e-print,
v3
Avaliação de desempenho dos métodos de acesso R-Tree e M-Tree para consultas aos k-vizinhos mais próximos em conjuntos de dados de alta dimensionalidade
Trabalho de Conclusão de Curso (Graduação)Armazenamento de dados é uma atividade constante no campo da computação. A utilização
de métodos de acesso para realizar buscas com eficiência é primordial. Esse trabalho
apresenta duas estruturas de indexação para dados complexos (R-tree e M-tree) e o algoritmo
de busca por similaridade k-vizinhos mais próximos avaliando o desempenho da
busca nesses dois métodos de indexação comparando também como método sequencial,
para entender casos em que a utilização do método de indexação é menos eficiente do que
a não utilização de um método de acesso (causado pela maldição da dimensionalidade).
Os experimentos apresentaram que uma das estruturas (R-tree) teve um desempenho pior
que o método sequencial para as bases testadas e que a outra estrutura (M-tree) foi mais
eficiente em todas as bases
Design and analysis of algorithms for similarity search based on intrinsic dimension
One of the most fundamental operations employed in data mining tasks such as classification, cluster analysis, and anomaly detection, is that of similarity search. It has been used in numerous fields of application such as multimedia, information retrieval, recommender systems and pattern recognition. Specifically, a similarity query aims to retrieve from the database the most similar objects to a query object, where the underlying similarity measure is usually expressed as a distance function.
The cost of processing similarity queries has been typically assessed in terms of the representational dimension of the data involved, that is, the number of features used to represent individual data objects. It is generally the case that high representational dimension would result in a significant increase in the processing cost of similarity queries. This relation is often attributed to an amalgamation of phenomena, collectively referred to as the curse of dimensionality. However, the observed effects of dimensionality in practice may not be as severe as expected. This has led to the development of models quantifying the complexity of data in terms of some measure of the intrinsic dimensionality.
The generalized expansion dimension (GED) is one of such models, which estimates the intrinsic dimension in the vicinity of a query point q through the observation of the ranks and distances of pairs of neighbors with respect to q. This dissertation is mainly concerned with the design and analysis of search algorithms, based on the GED model. In particular, three variants of similarity search problem are considered, including adaptive similarity search, flexible aggregate similarity search, and subspace similarity search. The good practical performance of the proposed algorithms demonstrates the effectiveness of dimensionality-driven design of search algorithms
Approaches to Quantifying EEG Features for Design Protocol Analysis
Recently, physiological signals such as eye-tracking and gesture analysis, galvanic skin response (GSR), electrocardiograms (ECG) and electroencephalograms (EEG) have been used by design researchers to extract significant information to describe the conceptual design process. We study a set of video-based design protocols recorded on subjects performing design tasks on a sketchpad while having their EEG monitored. The conceptual design process is rich with information on how designer’s do design. Many methods exist to analyze the conceptual design process, the most popular one being concurrent verbal protocols. A recurring problem in design protocol analysis is to segment and code protocol data into logical and semantic units. This is usually a manual step and little work has been done on fully automated segmentation techniques. Also, verbal protocols are known to fail in some circumstances such as when dealing with creativity, insight (e.g. Aha! experience, gestalt), concurrent, nonverbalizable (e.g. facial recognition) and nonconscious processes. We propose different approaches to study the conceptual design process using electroencephalograms (EEG). More specifically, we use spatio-temporal and frequency domain features. Our research is based on machine learning techniques used on EEG signals (functional microstate analysis), source localization (LORETA) and on a novel method of segmentation for design protocols based on EEG features. Using these techniques, we measure mental effort, fatigue and concentration in the conceptual design process, in addition to creativity and insight/nonverbalizable processing. We discuss the strengths and weaknesses of such approaches