9 research outputs found

    Intrinsic Dimensionality

    Full text link
    This entry for the SIGSPATIAL Special July 2010 issue on Similarity Searching in Metric Spaces discusses the notion of intrinsic dimensionality of data in the context of similarity search.Comment: 4 pages, 4 figures, latex; diagram (c) has been correcte

    Indexability, concentration, and VC theory

    Get PDF
    Degrading performance of indexing schemes for exact similarity search in high dimensions has long since been linked to histograms of distributions of distances and other 1-Lipschitz functions getting concentrated. We discuss this observation in the framework of the phenomenon of concentration of measure on the structures of high dimension and the Vapnik-Chervonenkis theory of statistical learning.Comment: 17 pages, final submission to J. Discrete Algorithms (an expanded, improved and corrected version of the SISAP'2010 invited paper, this e-print, v3

    Avaliação de desempenho dos métodos de acesso R-Tree e M-Tree para consultas aos k-vizinhos mais próximos em conjuntos de dados de alta dimensionalidade

    Get PDF
    Trabalho de Conclusão de Curso (Graduação)Armazenamento de dados é uma atividade constante no campo da computação. A utilização de métodos de acesso para realizar buscas com eficiência é primordial. Esse trabalho apresenta duas estruturas de indexação para dados complexos (R-tree e M-tree) e o algoritmo de busca por similaridade k-vizinhos mais próximos avaliando o desempenho da busca nesses dois métodos de indexação comparando também como método sequencial, para entender casos em que a utilização do método de indexação é menos eficiente do que a não utilização de um método de acesso (causado pela maldição da dimensionalidade). Os experimentos apresentaram que uma das estruturas (R-tree) teve um desempenho pior que o método sequencial para as bases testadas e que a outra estrutura (M-tree) foi mais eficiente em todas as bases

    Design and analysis of algorithms for similarity search based on intrinsic dimension

    Get PDF
    One of the most fundamental operations employed in data mining tasks such as classification, cluster analysis, and anomaly detection, is that of similarity search. It has been used in numerous fields of application such as multimedia, information retrieval, recommender systems and pattern recognition. Specifically, a similarity query aims to retrieve from the database the most similar objects to a query object, where the underlying similarity measure is usually expressed as a distance function. The cost of processing similarity queries has been typically assessed in terms of the representational dimension of the data involved, that is, the number of features used to represent individual data objects. It is generally the case that high representational dimension would result in a significant increase in the processing cost of similarity queries. This relation is often attributed to an amalgamation of phenomena, collectively referred to as the curse of dimensionality. However, the observed effects of dimensionality in practice may not be as severe as expected. This has led to the development of models quantifying the complexity of data in terms of some measure of the intrinsic dimensionality. The generalized expansion dimension (GED) is one of such models, which estimates the intrinsic dimension in the vicinity of a query point q through the observation of the ranks and distances of pairs of neighbors with respect to q. This dissertation is mainly concerned with the design and analysis of search algorithms, based on the GED model. In particular, three variants of similarity search problem are considered, including adaptive similarity search, flexible aggregate similarity search, and subspace similarity search. The good practical performance of the proposed algorithms demonstrates the effectiveness of dimensionality-driven design of search algorithms

    Approaches to Quantifying EEG Features for Design Protocol Analysis

    Get PDF
    Recently, physiological signals such as eye-tracking and gesture analysis, galvanic skin response (GSR), electrocardiograms (ECG) and electroencephalograms (EEG) have been used by design researchers to extract significant information to describe the conceptual design process. We study a set of video-based design protocols recorded on subjects performing design tasks on a sketchpad while having their EEG monitored. The conceptual design process is rich with information on how designer’s do design. Many methods exist to analyze the conceptual design process, the most popular one being concurrent verbal protocols. A recurring problem in design protocol analysis is to segment and code protocol data into logical and semantic units. This is usually a manual step and little work has been done on fully automated segmentation techniques. Also, verbal protocols are known to fail in some circumstances such as when dealing with creativity, insight (e.g. Aha! experience, gestalt), concurrent, nonverbalizable (e.g. facial recognition) and nonconscious processes. We propose different approaches to study the conceptual design process using electroencephalograms (EEG). More specifically, we use spatio-temporal and frequency domain features. Our research is based on machine learning techniques used on EEG signals (functional microstate analysis), source localization (LORETA) and on a novel method of segmentation for design protocols based on EEG features. Using these techniques, we measure mental effort, fatigue and concentration in the conceptual design process, in addition to creativity and insight/nonverbalizable processing. We discuss the strengths and weaknesses of such approaches

    Curse of Dimensionality in Pivot Based Indexes

    No full text
    a
    corecore