Search CORE

9 research outputs found

Classifiers Based on Inverted Distances

Author: Jr.
Marcel Jirina
Marcel Jirina
Publication venue: 'IntechOpen'
Publication date: 21/01/2011
Field of study

IntechOpen

High dimensional kNN-graph construction using space filling curves

Author: Sieranoja Sami
Publication venue: University of Eastern Finland
Publication date
Field of study

UEF Electronic Publications

Data Management Techniques For Fast Function Approximation

Author: Panda Biswanath
Publication venue
Publication date: 19/08/2009
Field of study

Computing and information technology is fundamentally changing the face of modern science. Traditional methods of performing scienti?c studies are now making way for the next generation of methods that use computing technology. However, the kinds of calculations that scientists wish to perform, and the ways in which they want to collect, archive and analyze information has posed several new challenges in data management and algorithm design. Achieving the goal of using computing technology effectively in scienti?c applications has become an important area of research in computer science, called eScience or data-driven science. Most data-driven scienti?c applications are aimed at studying and understanding some real world physical phenomenon. The general methodology followed by a scientist is to ?rst model the physical phenomenon either directly from the mathematical equations governing the phenomenon, or from a large dataset of observations about the phenomenon. Recent advances in data management, data mining and machine learning have addressed numerous challenges that arise in this ?rst stage of model building. However, the state of the art methods are inadequate in addressing challenges that arise in the second stage of a data driven scienti?c study, where the scientist uses the model she has built to help her understand the physical phenomenon, using tools such as computer simulation and visualization. This thesis identi?es and addresses data management challenges that arise when a complex model built for a real world phenomenon is analyzed by a scientist to gain insights about the phenomenon. The ?rst part of the thesis concentrates on high-dimensional function approximation (HFA), a problem relevant to virtually all applications that use computer simulation as the methodology for understanding complex models. We explore various aspects of HFA in depth, identify key data management problems, and propose solutions that signi?cantly speedup long running scienti?c simulations. Besides computer simulation, visualizing low dimensional summaries of a complex model is another method commonly used by scientists to understand models. Most real world models are complex and involve thousands of attributes. In order to get a very good understanding of a model, a scientist generates a very large number of low dimensional summaries for the model. Generating large sets of summaries for a complex model presents a challenging data management task and the second part of the thesis develops scalable algorithms for solving this data management problem

eCommons@Cornell

Accounting for Boundary Effects in Nearest Neighbor Searching

Author: Arya David
David M. Mount
Onuttom Narayan
Searching Sunil
Sunil Arya
Publication venue
Publication date: 01/01/1996
Field of study

Given n data points in d-dimensional space, nearest neighbor searching involves determining the nearest of these data points to a given query point. Most averagecase analyses of nearest neighbor searching algorithms are made under the simplifying assumption that d is fixed and that n is so large relative to d that boundary effects can be ignored. This means that for any query point the statistical distribution of the data points surrounding it is independent of the location of the query point. However, in many applications of nearest neighbor searching (such as data compression by vector quantization) this assumption is not met, since the number of data points n grows roughly as 2^d. Largely for this reason, the actual performances of many nearest neighbor algorithms tend to be much better than their theoretical analyses would suggest. We present evidence of why this is the case. We provide an accurate analysis of the number of cells visited in nearest neighbor searching by the bucketing and k-d tree algorithms. We assume..

CiteSeerX

Accounting for Boundary Effects in Nearest Neighbor Searching

Author: Arya S.
Mount D.
Narayan O.
Publication venue: Max-Planck-Institut für Informatik
Publication date: 01/01/1994
Field of study

MPG.PuRe

New Fundamental Technologies in Data Mining

Author
Publication venue: 'IntechOpen'
Publication date: 20/04/2021
Field of study

The progress of data mining technology and large public popularity establish a need for a comprehensive text on the subject. The series of books entitled by "Data Mining" address the need by presenting in-depth description of novel mining algorithms and many useful applications. In addition to understanding each section deeply, the two books present useful hints and strategies to solving problems in the following chapters. The contributing authors have highlighted many future research directions that will foster multi-disciplinary collaborations and hence will lead to significant development in the field of data mining

Directory of Open Access Books (DOAB)

Accounting for Boundary Effects in Nearest Neighbor Searching

Author: David M. Mount
Onuttom Narayan
Sunil Arya David
Publication venue
Publication date: 01/01/1995
Field of study

Given n data points in d-dimensional space, nearest neighbor searching involves determining the nearest of these data points to a given query point. Most averagecase analyses of nearest neighbor searching algorithms are made under the simplifying assumption that d is fixed and that n is so large relative to d that boundary effects can be ignored. This means that for any query point the statistical distribution of the data points surrounding it is independent of the location of the query point. However, in many applications of nearest neighbor searching (such as data compression by vector quantization) this assumption is not met, since the number of data points n grows roughly as 2 d . Largely for this reason, the actual performances of many nearest neighbor algorithms tend to be much better than their theoretical analyses would suggest. We present evidence of why this is the case. We provide an accurate analysis of the number of cells visited in nearest neighbor searching ..

CiteSeerX

Crossref

Hong Kong University of Science and Technology Institutional Repository

MPG.PuRe

Author: D. M. Mount
O. Narayan
S. Arya
Publication venue
Publication date
Field of study

Abstract. Given n data points in d-dimensional space, nearest-neighbor searching involves determining the nearest of these data points to a given query point. Most averagecase analyses of nearest-neighbor searching algorithms are made under the simplifying assumption that d is fixed and that n is so large relative to d that boundary effects can be ignored. This means that for any query point the statistical distribution of the data points surrounding it is independent of the location of the query point. However, in many applications of nearest-neighbor searching (such as data compression by vector quantization) this assumption is not met, since the number of data points n grows roughly as 2 d. Largely for this reason, the actual performances of many nearest-neighbor algorithms tend to be much better than their theoretical analyses would suggest. We present evidence of why this is the case. We provide an accurate analysis of the number of cells visited in nearest-neighbor searching by the bucketing and k-d tree algorithms. We assume m d points uniformly distributed in dimension d, where m is a fixed integer ≥2. Further, we assume that distances are measured in the L ∞ metric. Our analysis is tight in the limit as d approaches infinity

CiteSeerX