Search CORE

60 research outputs found

High-dimensional indexing methods utilizing clustering and dimensionality reduction

Author: Zhang Lijuan
Publication venue: Digital Commons @ NJIT
Publication date: 31/05/2005
Field of study

The emergence of novel database applications has resulted in the prevalence of a new paradigm for similarity search. These applications include multimedia databases, medical imaging databases, time series databases, DNA and protein sequence databases, and many others. Features of data objects are extracted and transformed into high-dimensional data points. Searching for objects becomes a search on points in the high-dimensional feature space. The dissimilarity between two objects is determined by the distance between two feature vectors. Similarity search is usually implemented as nearest neighbor search in feature vector spaces. The cost of processing k-nearest neighbor (k-NN) queries via a sequential scan increases as the number of objects and the number of features increase. A variety of multi-dimensional index structures have been proposed to improve the efficiency of k-NN query processing, which work well in low-dimensional space but lose their efficiency in high-dimensional space due to the curse of dimensionality. This inefficiency is dealt in this study by Clustering and Singular Value Decomposition - CSVD with indexing, Persistent Main Memory - PMM index, and Stepwise Dimensionality Increasing - SDI-tree index. CSVD is an approximate nearest neighbor search method. The performance of CSVD with indexing is studied and the approximation to the distance in original space is investigated. For a given Normalized Mean Square Error - NMSE, the higher the degree of clustering, the higher the recall. However, more clusters require more disk page accesses. Certain number of clusters can be obtained to achieve a higher recall while maintaining a relatively lower query processing cost. Clustering and Indexing using Persistent Main Memory - CIPMM framework is motivated by the following consideration: (a) a significant fraction of index pages are accessed randomly, incurring a high positioning time for each access; (b) disk transfer rate is improving 40% annually, while the improvement in positioning time is only 8%; (c) query processing incurs less CPU time for main memory resident than disk resident indices. CIPMM aims at reducing the elapsed time for query processing by utilizing sequential, rather than random disk accesses. A specific instance of the CIPMM framework CIPOP, indexing using Persistent Ordered Partition - OP-tree, is elaborated and compared with clustering and indexing using the SR-tree, CISR. The results show that CIPOP outperforms CISR, and the higher the dimensionality, the higher the performance gains. The SDI-tree index is motivated by fanouts decrease with dimensionality increasing and shorter vectors reduce cache misses. The index is built by using feature vectors transformed via principal component analysis, resulting in a structure with fewer dimensions at higher levels and increasing the number of dimensions from one level to the other. Dimensions are retained in nonincreasing order of their variance according to a parameter p, which specifies the incremental fraction of variance at each level of the index. Experiments on three datasets have shown that SDL-trees with carefully tuned parameters access fewer disk accesses than SR-trees and VAMSR-trees and incur less CPU time than VA-Files in addition

Digital Commons @ New Jersey Institute of Technology (NJIT)

Perfect Hash Function Generation on the GPU with RecSplit

Author: Bez Dominik
Publication venue: Karlsruher Institut für Technologie
Publication date: 15/11/2022
Field of study

Minimale perfekte Hashfunktionen (MPHFs) bilden eine statische Menge S von beliebigen Schlüsseln auf die Menge der ersten |S| natürlichen Zahlen bijektiv ab, d. h., jeder Hashwert wird exakt einmal verwendet. Sie sind in vielen Anwendungen hilfreich, zum Beispiel, um Hashtabellen mit garantiert konstanter Zugriffszeit zu implementieren. MPHFs können sehr kompakt sein — weniger als 2 Bit pro Schlüssel sind möglich. Andererseits sind MPHFs nicht in der Lage zu entscheiden, ob ein gegebener Schlüssel zu S gehört. Zurzeit ist RecSplit die speichereffizienteste MPHF. RecSplit bietet verschiedene Kompromisse zwischen Platzverbrauch, Konstruktionszeit und Anfragezeit an. RecSplit kann zum Beispiel eine MPHF mit 1.56 Bits pro Schlüssel in weniger als 2 ms pro Schlüssel konstruieren. Das ist jedoch zu langsam für große Eingaben. Diese Arbeit präsentiert neue RecSplit-Implementierungen, die Multithreading, SIMD und die Leistung von GPUs nutzen, um die Konstruktionszeit zu verbessern. Gemeinsam mit unserer neuen bijection-rotation-Methode erreichen wir Beschleunigungen um Faktoren bis zu 333 für unsere SIMD-Implementierung auf einer 8-Kern CPU und bis zu 1873 für unsere GPU-Implementierung verglichen mit der originalen, sequenziellen RecSplit-Implementierung. Dadurch können wir MPHFs mit 1.56 Bits pro Schlüssel in weniger als 1.5 μs pro Schlüssel konstruieren

KITopen

Pervasive Data Access in Wireless and Mobile Computing Environments

Author: Lee Ken C. K.
Lee Wang-Chien
Madria Sanjay Kumar
Publication venue: Scholars\u27 Mine
Publication date: 12/09/2006
Field of study

The rapid advance of wireless and portable computing technology has brought a lot of research interests and momentum to the area of mobile computing. One of the research focus is on pervasive data access. with wireless connections, users can access information at any place at any time. However, various constraints such as limited client capability, limited bandwidth, weak connectivity, and client mobility impose many challenging technical issues. In the past years, tremendous research efforts have been put forth to address the issues related to pervasive data access. A number of interesting research results were reported in the literature. This survey paper reviews important works in two important dimensions of pervasive data access: data broadcast and client caching. In addition, data access techniques aiming at various application requirements (such as time, location, semantics and reliability) are covered

Missouri University of Science and Technology (Missouri S&T): Scholars' Mine

FlexQueue: Simple and Efficient Priority Queue for System Software

Author: Zhang Yifan
Publication venue: 'University of Waterloo'
Publication date: 15/05/2018
Field of study

Existing studies of priority queue implementations often focus on improving canonical operations such as insert and deleteMin, while sacrificing design simplicity and pre- dictable worst-case latency. Design simplicity is sacrificed as the algorithm becomes more and more optimized, taking into account characteristics of the input workload distribution. Predictable worst-case latency is sacrificed when operations such as memory allocation and structural re-organization are deferred until absolutely necessary. While these techniques often yield performance improvement to some degree, it is possible to take a step back and ask a more basic question: is it possible to achieve similar performance while retaining a simple design? By combining techniques such as hierarchical bit-vector and dynamic horizon resizing, all of which are straight-forward in principle, this thesis presents a new priority queue design called FlexQueue, that answers this question with a definitive “yes”

University of Waterloo's Institutional Repository

Enabling near-term prediction of status for intelligent transportation systems: Management techniques for data on mobile objects

Author: Heendaliya Lasanthi Nilmini
Publication venue: Scholars\u27 Mine
Publication date: 01/01/2015
Field of study

Location Dependent Queries (LDQs) benefit from the rapid advances in communication and Global Positioning System (GPS) technologies to track moving objects\u27 locations, and improve the quality-of-life by providing location relevant services and information to end users. The enormity of the underlying data maintained by LDQ applications - a large quantity of mobile objects and their frequent mobility - is, however, a major obstacle in providing effective and efficient services. Motivated by this obstacle, this thesis sets out in the quest to find improved methods to efficiently index, access, retrieve, and update volatile LDQ related mobile object data and information. Challenges and research issues are discussed in detail, and solutions are presented and examined. --Abstract, page iii

Missouri University of Science and Technology (Missouri S&T): Scholars' Mine

High Performance Spatial Indexing for Parallel I/O and Centralized Architectures

Author: Kamel Ibrahim
Publication venue
Publication date: 15/10/1998
Field of study

Recently, spatial databases have attracted increasing interest in the database field. Because of the volume of the data with which they deal with, the performance of spatial database systems' is important. The R-tree is an efficient spatial access method. It is a generalization of the B-tree in multidimensional space. This thesis investigates how to improve the performance of R-trees. We consider both parallel I/O and centralized architectures. For a parallel I/O environment we propose an R-tree design for a server with one CPU and multiple disks. On this architecture, the nodes of the R-tree are distributed between the different disks with cross-disk pointers ( 'Multiplezed R-tree a). When a new node is created we have to decide on which disk it will be stored. We propose and examine several criteria for choosing a disk for a new node. The most successful one, termed 'Prozimity Indew' or PI, estimates the similarity of the new node to other R-tree nodes already on a disk and chooses the disk with the least degree of similarity. For a centralized environment, we propose a new packing technique for R-trees for static databases. We use space-filling curves, and specifically the Hilbert curve, to achieve better ordering of rectangles and eventually to achieve better packing. For dynamic databases we introduce the filbert R-tree, in which every node has a well defined set of sibling nodes; we can thus use the concept of local rotation [47]. By adjusting the split policy, the Filbert R-tree can achieve a degree of space utilization as high as is desired. (Also cross-referenced as UMIACS-TR-94-131

Digital Repository at the University of Maryland

Recommended from our members

Computational Methods in Multi-Messenger Astrophysics using Gravitational Waves and High Energy Neutrinos

Author: Countryman Stefan Trklja
Publication venue
Publication date: 01/01/2023
Field of study

This dissertation seeks to describe advancements made in computational methods for multi-messenger astrophysics (MMA) using gravitational waves GW and neutrinos during Advanced LIGO (aLIGO)’s first through third observing runs (O1-O3) and, looking forward, to describe novel computational techniques suited to the challenges of both the burgeoning MMA field and high-performance computing as a whole. The first two chapters provide an overview of MMA as it pertains to gravitational wave/high energy neutrino (GWHEN) searches, including a summary of expected astrophysical sources as well as GW, neutrino, and gamma-ray detectors used in their detection. These are followed in the third chapter by an in-depth discussion of LIGO’s timing system, particularly the diagnostic subsystem, describing both its role in MMA searches and the author’s contributions to the system itself. The fourth chapter provides a detailed description of the Low-Latency Algorithm for Multi-messenger Astrophysics (LLAMA), the GWHEN pipeline developed by the author and used in O2 and O3. Relevant past multi-messenger searches are described first, followed by the O2 and O3 analysis methods, the pipeline’s performance, scientific results, and finally, an in-depth account of the library’s structure and functionality. In particular, the author’s high-performance multi-order coordinates (MOC) HEALPix image analysis library, HPMOC, is described. HPMOC increases performance of HEALPix image manipulations by several orders of magnitude vs. naive single-resolution approaches while presenting a simple high-level interface and should prove useful for diverse future MMA searches. The performance improvements it provides for LLAMA are also covered. The final chapter of this dissertation builds on the approaches taken in developing HPMOC, presenting several novel methods for efficiently storing and analyzing large data sets, with applications to MMA and other data-intensive fields. A family of depth-first multi-resolution ordering of HEALPix images — DEPTH9, DEPTH19, and DEPTH40 — is defined, along with algorithms and use cases where it can improve on current approaches, including high-speed streaming calculations suitable for serverless compute or FPGAs. For performance-constrained analyses on HEALPix data (e.g. image analysis in multi-messenger search pipelines) using SIMD processors, breadth-first data structures can provide short-circuiting calculations in a data-parallel way on compressed data; a simple compression method is described with application to further improving LLAMA performance. A new storage scheme and associated algorithms for efficiently compressing and contracting tensors of varying sparsity is presented; these demuxed tensors (D-Tensors) have equivalent asymptotic time and space complexity to optimal representations of both dense and sparse matrices, and could be used as a universal drop-in replacement to reduce code complexity and developer effort while improving performance of existing non-optimized numerical code. Finally, the big bucket hash table (B-Table), a novel type of hash table making guarantees on data layout (vs. load factor), is described, along with optimizations it allows for (like hardware acceleration, online rebuilds, and hard realtime applications) that are not possible with existing hash table approaches. These innovations are presented in the hope that some will prove useful for improving future MMA searches and other data-intensive applications

Columbia University Academic Commons

Multidimensional access methods

Author: ABEL D. J.
ABEL D. J.
ANG C.
AREF W. G.
BAYER R.
BAYER R.
BECKER B.
BECKMANN N.
BELUSSI A.
BENTLEY J. L.
BERCHTOLD S.
BLANKEN H.
BRINKHOFF T.
BRINKHOFF T.
BRINKHOFF T.
BRINKHOFF T.
BRODSKY A.
BURKHARD W.
BURKHARD W.A.
EVANGELIDIS G.
FALOUTSOS C.
FALOUTSOS C.
FALOUTSOS C.
FALOUTSOS C.
FALOUTSOS C.
FALOUTSOS C.
FINKEL R.
FLAJOLET P.
FRANK A.
FREESTON M.
FREESTON M.
FREESTON M.
FREESTON M.
FREESTON M.
GAEDE V.
GAEDE V.
GAEDE V.
GAEDE V.
GREENE D.
GUNTHER O.
GUNTHER O.
GUNTHER O.
GUNTHER O.
GUNTHER O.
GUNTHER O.
GUTING R. H.
GUTING R. H.
GUTTMAN A.
HELLERSTEIN J. M.
HELLERSTEIN J. M.
HENRICH A.
HENRICH A.
HENRICH A.
HENRICH A.
HOEL E. G.
HOEL E. G.
HUTFLESZ A.
HUTFLESZ A.
HUTFLESZ A.
HUTFLESZ A.
JAGADISH H. V.
JAGADISH H. V.
JAGADISH H.V.
KAMEL I.
KAMEL I.
KAMEL I.
KAMEL I.
KANELLAKIS P. C.
KEDEM G.
KLINGER A.
KNOTT G.
KOLOVSON C.
KORNACKER M.
KRIEGEL H.-P.
KRIEGEL H.-P.
KRIEGEL H.-P.
KRIEGEL H.-P.
KRIEGEL H.-P.
KRIEGEL H.-P.
KUMAR A.
LARSON P.A.
LIN K.-I.
LITWIN W.
LOMET D. B.
LOMET D. B.
LOMET D.B.
MATSUYAMA T.
MCDONELL K. J.
NELSON R.
NG R. T.
NG V.
NG V.
NIEVERGELT
NIEVERGELT ICHS
OHSAWA Y.
OHSAWA Y.
Oliver Günther
OoI
ORENSTEIN J.
ORENSTEIN J.
ORENSTEIN J.
ORENSTEIN J.
ORENSTEIN J.
ORENSTEIN J.
ORENSTEIN J. A.
OTOO E. J.
OTOO E. J.
OTOO E. J.
OUKSEL M.
OUKSEL M.
PAGEL B. U.
PAGEL B. U.
PAGEL B. U.
PAPADIAS D.
PAPADOPOULOS A.
RAVISHANKAR C.
ROBINSON J.T.
ROTEM D.
ROUSSOPOULOS N.
ROUSSOPOULOS N.
SCHNEIDER R.
SCHOLL M.
SEEGER B.
SEEGER B.
SEEGER B.
SELLIS T.
SEVCIK K.
SEXTON P.
SHEKHAR S.
SIEMENS
SIX H.
SMITH T. R.
STONEBRAKER M.
STUCKEY P.
SUBRAMANIAN S.
TAMMINEN M.
TAMMINEN M.
THEODORIDIS Y.
TROPF H.
Volker Gaede
WHITE M.
WIDMAYER P.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref

New Approaches to Similarity Searching in Metric Spaces

Author: celik cengiz
Publication venue
Publication date: 24/04/2006
Field of study

The complex and unstructured nature of many types of data, such as multimedia objects, text documents, protein sequences, requires the use of similarity search techniques for retrieval of information from databases. One popular approach for similarity searching is mapping database objects into feature vectors, which introduces an undesirable element of indirection into the process. A more direct approach is to define a distance function directly between objects. Typically such a function is taken from a metric space, which satisfies a number of properties, such as the triangle inequality. Index structures that can work for metric spaces have been shown to provide satisfactory performance, and were reported to outperform vector-based counterparts in many applications. Metric spaces also provide a more general framework, and for some domains defining a distance between objects can be accomplished more intuitively than mapping objects to feature vectors. In this thesis we will investigate new efficient methods for similarity searching in metric spaces. We will first show that current solutions to indexing in metric spaces have several drawbacks. Tree-based solutions do not provide the best tradeoffs between construction time and query performance. Tree structures are also difficult to make dynamic without further degrading their performance. There is also a family of flat structures that address some of the deficiencies of tree-based indices, but they introduce their own unique problems in terms of higher construction cost, higher space usage, and extra CPU overhead. In this thesis a new family of flat structures will be introduced, which are very flexible and simple. We will show that dynamic operations can easily be performed, and that they can be customized to work under different performance requirements. They also address many of the general drawbacks of flat structures as outlined above. A new framework, composite metrics will also be introduced, which provides a more flexible similarity searching process by allowing several metrics to be combined in one search structure. Two indexing structures will be introduced that can handle similarity queries in this setting, and it will be shown that they provide competitive query performance with respect to data structures for standard metrics

Digital Repository at the University of Maryland

On packet switch design

Author: Minkenberg C.J.A.
Publication venue: Technische Universiteit Eindhoven
Publication date: 01/01/2001
Field of study

Repository TU/e

Pure OAI Repository