Search CORE

253 research outputs found

Efficient similarity search in high-dimensional data spaces

Author: Li Yue
Publication venue: Digital Commons @ NJIT
Publication date: 31/05/2004
Field of study

Similarity search in high-dimensional data spaces is a popular paradigm for many modern database applications, such as content based image retrieval, time series analysis in financial and marketing databases, and data mining. Objects are represented as high-dimensional points or vectors based on their important features. Object similarity is then measured by the distance between feature vectors and similarity search is implemented via range queries or k-Nearest Neighbor (k-NN) queries. Implementing k-NN queries via a sequential scan of large tables of feature vectors is computationally expensive. Building multi-dimensional indexes on the feature vectors for k-NN search also tends to be unsatisfactory when the dimensionality is high. This is due to the poor index performance caused by the dimensionality curse. Dimensionality reduction using the Singular Value Decomposition method is the approach adopted in this study to deal with high-dimensional data. Noting that for many real-world datasets, data distribution tends to be heterogeneous, dimensionality reduction on the entire dataset may cause a significant loss of information. More efficient representation is sought by clustering the data into homogeneous subsets of points, and applying dimensionality reduction to each cluster respectively, i.e., utilizing local rather than global dimensionality reduction. The thesis deals with the improvement of the efficiency of query processing associated with local dimensionality reduction methods, such as the Clustering and Singular Value Decomposition (CSVD) and the Local Dimensionality Reduction (LDR) methods. Variations in the implementation of CSVD are considered and the two methods are compared from the viewpoint of the compression ratio, CPU time, and retrieval efficiency. An exact k-NN algorithm is presented for local dimensionality reduction methods by extending an existing multi-step k-NN search algorithm, which is designed for global dimensionality reduction. Experimental results show that the new method requires less CPU time than the approximate method proposed original for CSVD at a comparable level of accuracy. Optimal subspace dimensionality reduction has the intent of minimizing total query cost. The problem is complicated in that each cluster can retain a different number of dimensions. A hybrid method is presented, combining the best features of the CSVD and LDR methods, to find optimal subspace dimensionalities for clusters generated by local dimensionality reduction methods. The experiments show that the proposed method works well for both real-world datasets and synthetic datasets

Digital Commons @ New Jersey Institute of Technology (NJIT)

Extreme Quantum Advantage for Rare-Event Sampling

Author: Aghamohammadi C.
Crutchfield J. P.
Loomis S. P.
Mahoney J. R.
Publication venue: 'American Physical Society (APS)'
Publication date: 29/07/2017
Field of study

We introduce a quantum algorithm for efficient biased sampling of the rare events generated by classical memoryful stochastic processes. We show that this quantum algorithm gives an extreme advantage over known classical biased sampling algorithms in terms of the memory resources required. The quantum memory advantage ranges from polynomial to exponential and when sampling the rare equilibrium configurations of spin systems the quantum advantage diverges.Comment: 11 pages, 9 figures; http://csc.ucdavis.edu/~cmg/compmech/pubs/eqafbs.ht

arXiv.org e-Print Archive

Crossref

Directory of Open Access Journals

eScholarship - University of California

Fractal Analysis

Author
Publication venue: 'IntechOpen'
Publication date: 20/04/2021
Field of study

Fractal analysis is becoming more and more common in all walks of life. This includes biomedical engineering, steganography and art. Writing one book on all these topics is a very difficult task. For this reason, this book covers only selected topics. Interested readers will find in this book the topics of image compression, groundwater quality, establishing the downscaling and spatio-temporal scale conversion models of NDVI, modelling and optimization of 3T fractional nonlinear generalized magneto-thermoelastic multi-material, algebraic fractals in steganography, strain induced microstructures in metals and much more. The book will definitely be of interest to scientists dealing with fractal analysis, as well as biomedical engineers or IT engineers. I encourage you to view individual chapters

Directory of Open Access Books (DOAB)

Cosine-Based Clustering Algorithm Approach

Author: Ashour Wesam M.
Lubbad Mohammed AH
Publication venue: Modern Education and Computer Science Press
Publication date: 01/01/2012
Field of study

Due to many applications need the management of spatial data; clustering large spatial databases is an important problem which tries to find the densely populated regions in the feature space to be used in data mining, knowledge discovery, or efficient information retrieval. A good clustering approach should be efficient and detect clusters of arbitrary shapes. It must be insensitive to the outliers (noise) and the order of input data. In this paper Cosine Cluster is proposed based on cosine transformation, which satisfies all the above requirements. Using multi-resolution property of cosine transforms, arbitrary shape clusters can be effectively identified at different degrees of accuracy. Cosine Cluster is also approved to be highly efficient in terms of time complexity. Experimental results on very large data sets are presented, which show the efficiency and effectiveness of the proposed approach compared to other recent clustering methods

Institutional Repository of the Islamic University of Gaza

The Oscillatory Universe, phantom crossing and the Hubble tension

Author: Pacif Shibesh Kumar Jas
Sharma Mohit K.
Yergaliyeva Gulmira
Yesmakhanova Kuralay
Publication venue
Publication date: 07/10/2022
Field of study

We investigate the validity of cosmological models with an oscillating scale factor in relation to late-time cosmological observations. We show that these models not only meet the required late time observational constraints but can also alleviate the Hubble tension. As a generic feature of the model, the Hubble parameter increases near the current epoch due to its cyclical nature exhibiting the phantom nature allowing to address the said issue related to late time acceleration.Comment: 10 pages, 7 figure

arXiv.org e-Print Archive

Statistical Analysis of Seismicity Catalog of Alaska: The mysteries of the timeseries

Author: Azizzadehroodpish Shima
Publication venue: University of Memphis Digital Commons
Publication date: 01/01/2018
Field of study

Abstract:This dissertation presents the results of applying three independent statistical techniques on the seismic catalog of Alaska and Aleutian subduction zone. I perform Visibility Graph Analysis and Multifractal Detrended Fluctuation Analysis respectively on the seismic catalogs of several defined seismogenic zones in surface and depth. Forecasting earthquake hazard is based on the assumption that the Gutenberg-Richter relation represents the size distribution of future earthquakes and we show that the series produced by these methods have properties with close correlation with the b-value of the Gutenberg-Richter law. Visibility graph analysis basically maps a time series into the networks of nodes and connection and we want to show that produced network keeps a relationship with seismic characteristics of the region. Same goes for the multifractal detrended fluctuation analysis which studies the multifractality of the seismic catalogs as magnitude time series. I am also trying to improve the spatial information of the catalog using the Condensation method based on the location error. It will produce a new catalog that differs with the original one by the new assigned weight to the events according to their accuracy relative to the neighboring events. Using this statistical method will contribute to the discovery of previously unknown active structures and a better understanding of seismic hazards in Alaska

University of Memphis Digital Commons

An empirical study on the various stock market prediction methods

Author: Jaliya Udesang K.
Pandya Jaymit Bharatbhai
Publication venue: 'Universitas Pesantren Tinggi Darul Ulum (Unipdu)'
Publication date: 22/03/2022
Field of study

Investment in the stock market is one of the much-admired investment actions. However, prediction of the stock market has remained a hard task because of the non-linearity exhibited. The non-linearity is due to multiple affecting factors such as global economy, political situations, sector performance, economic numbers, foreign institution investment, domestic institution investment, and so on. A proper set of such representative factors must be analyzed to make an efficient prediction model. Marginal improvement of prediction accuracy can be gainful for investors. This review provides a detailed analysis of research papers presenting stock market prediction techniques. These techniques are assessed in the time series analysis and sentiment analysis section. A detailed discussion on research gaps and issues is presented. The reviewed articles are analyzed based on the use of prediction techniques, optimization algorithms, feature selection methods, datasets, toolset, evaluation matrices, and input parameters. The techniques are further investigated to analyze relations of prediction methods with feature selection algorithm, datasets, feature selection methods, and input parameters. In addition, major problems raised in the present techniques are also discussed. This survey will provide researchers with deeper insight into various aspects of current stock market prediction methods

Jurnal Online Unipdu Jombang (Universitas Pesantren Tinggi Darul 'Ulum)

Seeking the Truth Beyond the Data. An Unsupervised Machine Learning Approach

Author: Papageorgiou Vasileios E.
Saligkaras Dimitrios
Publication venue
Publication date: 19/10/2023
Field of study

Clustering is an unsupervised machine learning methodology where unlabeled elements/objects are grouped together aiming to the construction of well-established clusters that their elements are classified according to their similarity. The goal of this process is to provide a useful aid to the researcher that will help her/him to identify patterns among the data. Dealing with large databases, such patterns may not be easily detectable without the contribution of a clustering algorithm. This article provides a deep description of the most widely used clustering methodologies accompanied by useful presentations concerning suitable parameter selection and initializations. Simultaneously, this article not only represents a review highlighting the major elements of examined clustering techniques but emphasizes the comparison of these algorithms' clustering efficiency based on 3 datasets, revealing their existing weaknesses and capabilities through accuracy and complexity, during the confrontation of discrete and continuous observations. The produced results help us extract valuable conclusions about the appropriateness of the examined clustering techniques in accordance with the dataset's size.Comment: This paper has been accepted for publication in the proceedings of the 3rd International Scientific Forum on Computer and Energy Sciences (WFCES 2022

arXiv.org e-Print Archive