253 research outputs found

    Efficient similarity search in high-dimensional data spaces

    Get PDF
    Similarity search in high-dimensional data spaces is a popular paradigm for many modern database applications, such as content based image retrieval, time series analysis in financial and marketing databases, and data mining. Objects are represented as high-dimensional points or vectors based on their important features. Object similarity is then measured by the distance between feature vectors and similarity search is implemented via range queries or k-Nearest Neighbor (k-NN) queries. Implementing k-NN queries via a sequential scan of large tables of feature vectors is computationally expensive. Building multi-dimensional indexes on the feature vectors for k-NN search also tends to be unsatisfactory when the dimensionality is high. This is due to the poor index performance caused by the dimensionality curse. Dimensionality reduction using the Singular Value Decomposition method is the approach adopted in this study to deal with high-dimensional data. Noting that for many real-world datasets, data distribution tends to be heterogeneous, dimensionality reduction on the entire dataset may cause a significant loss of information. More efficient representation is sought by clustering the data into homogeneous subsets of points, and applying dimensionality reduction to each cluster respectively, i.e., utilizing local rather than global dimensionality reduction. The thesis deals with the improvement of the efficiency of query processing associated with local dimensionality reduction methods, such as the Clustering and Singular Value Decomposition (CSVD) and the Local Dimensionality Reduction (LDR) methods. Variations in the implementation of CSVD are considered and the two methods are compared from the viewpoint of the compression ratio, CPU time, and retrieval efficiency. An exact k-NN algorithm is presented for local dimensionality reduction methods by extending an existing multi-step k-NN search algorithm, which is designed for global dimensionality reduction. Experimental results show that the new method requires less CPU time than the approximate method proposed original for CSVD at a comparable level of accuracy. Optimal subspace dimensionality reduction has the intent of minimizing total query cost. The problem is complicated in that each cluster can retain a different number of dimensions. A hybrid method is presented, combining the best features of the CSVD and LDR methods, to find optimal subspace dimensionalities for clusters generated by local dimensionality reduction methods. The experiments show that the proposed method works well for both real-world datasets and synthetic datasets

    Extreme Quantum Advantage for Rare-Event Sampling

    Get PDF
    We introduce a quantum algorithm for efficient biased sampling of the rare events generated by classical memoryful stochastic processes. We show that this quantum algorithm gives an extreme advantage over known classical biased sampling algorithms in terms of the memory resources required. The quantum memory advantage ranges from polynomial to exponential and when sampling the rare equilibrium configurations of spin systems the quantum advantage diverges.Comment: 11 pages, 9 figures; http://csc.ucdavis.edu/~cmg/compmech/pubs/eqafbs.ht

    Fractal Analysis

    Get PDF
    Fractal analysis is becoming more and more common in all walks of life. This includes biomedical engineering, steganography and art. Writing one book on all these topics is a very difficult task. For this reason, this book covers only selected topics. Interested readers will find in this book the topics of image compression, groundwater quality, establishing the downscaling and spatio-temporal scale conversion models of NDVI, modelling and optimization of 3T fractional nonlinear generalized magneto-thermoelastic multi-material, algebraic fractals in steganography, strain induced microstructures in metals and much more. The book will definitely be of interest to scientists dealing with fractal analysis, as well as biomedical engineers or IT engineers. I encourage you to view individual chapters

    Cosine-Based Clustering Algorithm Approach

    Get PDF
    Due to many applications need the management of spatial data; clustering large spatial databases is an important problem which tries to find the densely populated regions in the feature space to be used in data mining, knowledge discovery, or efficient information retrieval. A good clustering approach should be efficient and detect clusters of arbitrary shapes. It must be insensitive to the outliers (noise) and the order of input data. In this paper Cosine Cluster is proposed based on cosine transformation, which satisfies all the above requirements. Using multi-resolution property of cosine transforms, arbitrary shape clusters can be effectively identified at different degrees of accuracy. Cosine Cluster is also approved to be highly efficient in terms of time complexity. Experimental results on very large data sets are presented, which show the efficiency and effectiveness of the proposed approach compared to other recent clustering methods

    The Oscillatory Universe, phantom crossing and the Hubble tension

    Full text link
    We investigate the validity of cosmological models with an oscillating scale factor in relation to late-time cosmological observations. We show that these models not only meet the required late time observational constraints but can also alleviate the Hubble tension. As a generic feature of the model, the Hubble parameter increases near the current epoch due to its cyclical nature exhibiting the phantom nature allowing to address the said issue related to late time acceleration.Comment: 10 pages, 7 figure

    Statistical Analysis of Seismicity Catalog of Alaska: The mysteries of the timeseries

    Get PDF
    Abstract:This dissertation presents the results of applying three independent statistical techniques on the seismic catalog of Alaska and Aleutian subduction zone. I perform Visibility Graph Analysis and Multifractal Detrended Fluctuation Analysis respectively on the seismic catalogs of several defined seismogenic zones in surface and depth. Forecasting earthquake hazard is based on the assumption that the Gutenberg-Richter relation represents the size distribution of future earthquakes and we show that the series produced by these methods have properties with close correlation with the b-value of the Gutenberg-Richter law. Visibility graph analysis basically maps a time series into the networks of nodes and connection and we want to show that produced network keeps a relationship with seismic characteristics of the region. Same goes for the multifractal detrended fluctuation analysis which studies the multifractality of the seismic catalogs as magnitude time series. I am also trying to improve the spatial information of the catalog using the Condensation method based on the location error. It will produce a new catalog that differs with the original one by the new assigned weight to the events according to their accuracy relative to the neighboring events. Using this statistical method will contribute to the discovery of previously unknown active structures and a better understanding of seismic hazards in Alaska

    An empirical study on the various stock market prediction methods

    Get PDF
    Investment in the stock market is one of the much-admired investment actions. However, prediction of the stock market has remained a hard task because of the non-linearity exhibited. The non-linearity is due to multiple affecting factors such as global economy, political situations, sector performance, economic numbers, foreign institution investment, domestic institution investment, and so on. A proper set of such representative factors must be analyzed to make an efficient prediction model. Marginal improvement of prediction accuracy can be gainful for investors. This review provides a detailed analysis of research papers presenting stock market prediction techniques. These techniques are assessed in the time series analysis and sentiment analysis section. A detailed discussion on research gaps and issues is presented. The reviewed articles are analyzed based on the use of prediction techniques, optimization algorithms, feature selection methods, datasets, toolset, evaluation matrices, and input parameters. The techniques are further investigated to analyze relations of prediction methods with feature selection algorithm, datasets, feature selection methods, and input parameters. In addition, major problems raised in the present techniques are also discussed. This survey will provide researchers with deeper insight into various aspects of current stock market prediction methods

    Seeking the Truth Beyond the Data. An Unsupervised Machine Learning Approach

    Full text link
    Clustering is an unsupervised machine learning methodology where unlabeled elements/objects are grouped together aiming to the construction of well-established clusters that their elements are classified according to their similarity. The goal of this process is to provide a useful aid to the researcher that will help her/him to identify patterns among the data. Dealing with large databases, such patterns may not be easily detectable without the contribution of a clustering algorithm. This article provides a deep description of the most widely used clustering methodologies accompanied by useful presentations concerning suitable parameter selection and initializations. Simultaneously, this article not only represents a review highlighting the major elements of examined clustering techniques but emphasizes the comparison of these algorithms' clustering efficiency based on 3 datasets, revealing their existing weaknesses and capabilities through accuracy and complexity, during the confrontation of discrete and continuous observations. The produced results help us extract valuable conclusions about the appropriateness of the examined clustering techniques in accordance with the dataset's size.Comment: This paper has been accepted for publication in the proceedings of the 3rd International Scientific Forum on Computer and Energy Sciences (WFCES 2022
    • …
    corecore