3,369 research outputs found

    Efficient similarity search in high-dimensional data spaces

    Get PDF
    Similarity search in high-dimensional data spaces is a popular paradigm for many modern database applications, such as content based image retrieval, time series analysis in financial and marketing databases, and data mining. Objects are represented as high-dimensional points or vectors based on their important features. Object similarity is then measured by the distance between feature vectors and similarity search is implemented via range queries or k-Nearest Neighbor (k-NN) queries. Implementing k-NN queries via a sequential scan of large tables of feature vectors is computationally expensive. Building multi-dimensional indexes on the feature vectors for k-NN search also tends to be unsatisfactory when the dimensionality is high. This is due to the poor index performance caused by the dimensionality curse. Dimensionality reduction using the Singular Value Decomposition method is the approach adopted in this study to deal with high-dimensional data. Noting that for many real-world datasets, data distribution tends to be heterogeneous, dimensionality reduction on the entire dataset may cause a significant loss of information. More efficient representation is sought by clustering the data into homogeneous subsets of points, and applying dimensionality reduction to each cluster respectively, i.e., utilizing local rather than global dimensionality reduction. The thesis deals with the improvement of the efficiency of query processing associated with local dimensionality reduction methods, such as the Clustering and Singular Value Decomposition (CSVD) and the Local Dimensionality Reduction (LDR) methods. Variations in the implementation of CSVD are considered and the two methods are compared from the viewpoint of the compression ratio, CPU time, and retrieval efficiency. An exact k-NN algorithm is presented for local dimensionality reduction methods by extending an existing multi-step k-NN search algorithm, which is designed for global dimensionality reduction. Experimental results show that the new method requires less CPU time than the approximate method proposed original for CSVD at a comparable level of accuracy. Optimal subspace dimensionality reduction has the intent of minimizing total query cost. The problem is complicated in that each cluster can retain a different number of dimensions. A hybrid method is presented, combining the best features of the CSVD and LDR methods, to find optimal subspace dimensionalities for clusters generated by local dimensionality reduction methods. The experiments show that the proposed method works well for both real-world datasets and synthetic datasets

    Giving eyes to ICT!, or How does a computer recognize a cow?

    Get PDF
    Het door Schouten en andere onderzoekers op het CWI ontwikkelde systeem berust op het beschrijven van beelden met behulp van fractale meetkunde. De menselijke waarneming blijkt mede daardoor zo efficiënt omdat zij sterk werkt met gelijkenissen. Het ligt dus voor de hand het te zoeken in wiskundige methoden die dat ook doen. Schouten heeft daarom beeldcodering met behulp van 'fractals' onderzocht. Fractals zijn zelfgelijkende meetkundige figuren, opgebouwd door herhaalde transformatie (iteratie) van een eenvoudig basispatroon, dat zich daardoor op steeds kleinere schalen vertakt. Op elk niveau van detaillering lijkt een fractal op zichzelf (Droste-effect). Met fractals kan men vrij eenvoudig bedrieglijk echte natuurvoorstellingen maken. Fractale beeldcodering gaat ervan uit dat het omgekeerde ook geldt: een beeld effectief opslaan in de vorm van de basispatronen van een klein aantal fractals, samen met het voorschrift hoe het oorspronkelijke beeld daaruit te reconstrueren. Het op het CWI in samenwerking met onderzoekers uit Leuven ontwikkelde systeem is mede gebaseerd op deze methode. ISBN 906196502

    An information-driven framework for image mining

    Get PDF
    [Abstract]: Image mining systems that can automatically extract semantically meaningful information (knowledge) from image data are increasingly in demand. The fundamental challenge in image mining is to determine how low-level, pixel representation contained in a raw image or image sequence can be processed to identify high-level spatial objects and relationships. To meet this challenge, we propose an efficient information-driven framework for image mining. We distinguish four levels of information: the Pixel Level, the Object Level, the Semantic Concept Level, and the Pattern and Knowledge Level. High-dimensional indexing schemes and retrieval techniques are also included in the framework to support the flow of information among the levels. We believe this framework represents the first step towards capturing the different levels of information present in image data and addressing the issues and challenges of discovering useful patterns/knowledge from each level

    The multi-fractal structure of contrast changes in natural images: from sharp edges to textures

    Full text link
    We present a formalism that leads very naturally to a hierarchical description of the different contrast structures in images, providing precise definitions of sharp edges and other texture components. Within this formalism, we achieve a decomposition of pixels of the image in sets, the fractal components of the image, such that each set only contains points characterized by a fixed stregth of the singularity of the contrast gradient in its neighborhood. A crucial role in this description of images is played by the behavior of contrast differences under changes in scale. Contrary to naive scaling ideas where the image is thought to have uniform transformation properties \cite{Fie87}, each of these fractal components has its own transformation law and scaling exponents. A conjecture on their biological relevance is also given.Comment: 41 pages, 8 figures, LaTe

    High-dimensional indexing methods utilizing clustering and dimensionality reduction

    Get PDF
    The emergence of novel database applications has resulted in the prevalence of a new paradigm for similarity search. These applications include multimedia databases, medical imaging databases, time series databases, DNA and protein sequence databases, and many others. Features of data objects are extracted and transformed into high-dimensional data points. Searching for objects becomes a search on points in the high-dimensional feature space. The dissimilarity between two objects is determined by the distance between two feature vectors. Similarity search is usually implemented as nearest neighbor search in feature vector spaces. The cost of processing k-nearest neighbor (k-NN) queries via a sequential scan increases as the number of objects and the number of features increase. A variety of multi-dimensional index structures have been proposed to improve the efficiency of k-NN query processing, which work well in low-dimensional space but lose their efficiency in high-dimensional space due to the curse of dimensionality. This inefficiency is dealt in this study by Clustering and Singular Value Decomposition - CSVD with indexing, Persistent Main Memory - PMM index, and Stepwise Dimensionality Increasing - SDI-tree index. CSVD is an approximate nearest neighbor search method. The performance of CSVD with indexing is studied and the approximation to the distance in original space is investigated. For a given Normalized Mean Square Error - NMSE, the higher the degree of clustering, the higher the recall. However, more clusters require more disk page accesses. Certain number of clusters can be obtained to achieve a higher recall while maintaining a relatively lower query processing cost. Clustering and Indexing using Persistent Main Memory - CIPMM framework is motivated by the following consideration: (a) a significant fraction of index pages are accessed randomly, incurring a high positioning time for each access; (b) disk transfer rate is improving 40% annually, while the improvement in positioning time is only 8%; (c) query processing incurs less CPU time for main memory resident than disk resident indices. CIPMM aims at reducing the elapsed time for query processing by utilizing sequential, rather than random disk accesses. A specific instance of the CIPMM framework CIPOP, indexing using Persistent Ordered Partition - OP-tree, is elaborated and compared with clustering and indexing using the SR-tree, CISR. The results show that CIPOP outperforms CISR, and the higher the dimensionality, the higher the performance gains. The SDI-tree index is motivated by fanouts decrease with dimensionality increasing and shorter vectors reduce cache misses. The index is built by using feature vectors transformed via principal component analysis, resulting in a structure with fewer dimensions at higher levels and increasing the number of dimensions from one level to the other. Dimensions are retained in nonincreasing order of their variance according to a parameter p, which specifies the incremental fraction of variance at each level of the index. Experiments on three datasets have shown that SDL-trees with carefully tuned parameters access fewer disk accesses than SR-trees and VAMSR-trees and incur less CPU time than VA-Files in addition

    Evaluation of fractal dimension effectiveness for damage detection in retinal background

    Full text link
    [EN] This work investigates the characterization of bright lesions in retinal fundus images using texture analysis techniques. Exudates and drusen are evidences of retinal damage in diabetic retinopathy (DR) and age-related macular degeneration (AMD) respectively. An automatic detection of pathological tissues could make possible an early detection of these diseases. In this work, fractal analysis is explored in order to discriminate between pathological and healthy retinal texture. After a deep preprocessing step, in which spatial and colour normalization are performed, the fractal dimension is extracted locally by computing the Hurst exponent (H) along different directions. The greyscale image is described by the increments of the fractional Brownian motion model and the H parameter is computed by linear regression in the frequency domain. The ability of fractal dimension to detect pathological tissues is demonstrated using a home-made system, based on fractal analysis and Support Vector Machine, able to achieve around a 70% and 83% of accuracy in E-OPHTHA and DIARETDB1 public databases respectively. In a second experiment, the fractal descriptor is combined with texture information, extracted by the Local Binary Patterns, improving the bright lesion detection. Accuracy, sensitivity and specificity values higher than 89%, 80% and 90% respectively suggest that the method presented in this paper is a robust algorithm for describing retina texture and can be useful in the automatic detection of DR and AMD.This paper was supported by the European Union's Horizon 2020 research and innovation programme under the Project GALAHAD [H2020-ICT-2016-2017, 732613]. In addition, this work was partially funded by the Ministerio de Economia y Competitividad of Spain, Project SICAP [DPI2016-77869-C2-1-R]. The work of Adrian Colomer has been supported by the Spanish Government under a FPI Grant [BES-2014-067889]. We gratefully acknowledge the support of NVIDIA Corporation with the donation of the Titan Xp GPU used for this research.Colomer, A.; Naranjo Ornedo, V.; Janvier, T.; Mossi García, JM. (2018). Evaluation of fractal dimension effectiveness for damage detection in retinal background. Journal of Computational and Applied Mathematics. 337:341-353. https://doi.org/10.1016/j.cam.2018.01.005S34135333
    corecore