189 research outputs found

    HD-Index: Pushing the Scalability-Accuracy Boundary for Approximate kNN Search in High-Dimensional Spaces

    Full text link
    Nearest neighbor searching of large databases in high-dimensional spaces is inherently difficult due to the curse of dimensionality. A flavor of approximation is, therefore, necessary to practically solve the problem of nearest neighbor search. In this paper, we propose a novel yet simple indexing scheme, HD-Index, to solve the problem of approximate k-nearest neighbor queries in massive high-dimensional databases. HD-Index consists of a set of novel hierarchical structures called RDB-trees built on Hilbert keys of database objects. The leaves of the RDB-trees store distances of database objects to reference objects, thereby allowing efficient pruning using distance filters. In addition to triangular inequality, we also use Ptolemaic inequality to produce better lower bounds. Experiments on massive (up to billion scale) high-dimensional (up to 1000+) datasets show that HD-Index is effective, efficient, and scalable.Comment: PVLDB 11(8):906-919, 201

    Simultaneous Graph Representation Problems

    Get PDF
    Many graphs arising in practice can be represented in a concise and intuitive way that conveys their structure. For example: A planar graph can be represented in the plane with points for vertices and non-crossing curves for edges. An interval graph can be represented on the real line with intervals for vertices and intersection of intervals representing edges. The concept of ``simultaneity'' applies for several types of graphs: the idea is to find representations for two graphs that share some common vertices and edges, and ensure that the common vertices and edges are represented the same way. Simultaneous representation problems arise in any situation where two related graphs should be represented consistently. A main instance is for temporal relationships, where an old graph and a new graph share some common parts. Pairs of related graphs arise in many other situations. For example, two social networks that share some members; two schedules that share some events, overlap graphs of DNA fragments of two similar organisms, circuit graphs of two adjacent layers on a computer chip etc. In this thesis, we study the simultaneous representation problem for several graph classes. For planar graphs the problem is defined as follows. Let G1 and G2 be two graphs sharing some vertices and edges. The simultaneous planar embedding problem asks whether there exist planar embeddings (or drawings) for G1 and G2 such that every vertex shared by the two graphs is mapped to the same point and every shared edge is mapped to the same curve in both embeddings. Over the last few years there has been a lot of work on simultaneous planar embeddings, which have been called `simultaneous embeddings with fixed edges'. A major open question is whether simultaneous planarity for two graphs can be tested in polynomial time. We give a linear-time algorithm for testing the simultaneous planarity of any two graphs that share a 2-connected subgraph. Our algorithm also extends to the case of k planar graphs, where each vertex [edge] is either common to all graphs or belongs to exactly one of them. Next we introduce a new notion of simultaneity for intersection graph classes (interval graphs, chordal graphs etc.) and for comparability graphs. For interval graphs, the problem is defined as follows. Let G1 and G2 be two interval graphs sharing some vertices I and the edges induced by I. G1 and G2 are said to be `simultaneous interval graphs' if there exist interval representations of G1 and G2 such that any vertex of I is assigned to the same interval in both the representations. The `simultaneous representation problem' for interval graphs asks whether G1 and G2 are simultaneous interval graphs. The problem is defined in a similar way for other intersection graph classes. For comparability graphs and any intersection graph class, we show that the simultaneous representation problem for the graph class is equivalent to a graph augmentation problem: given graphs G1 and G2, sharing vertices I and the corresponding induced edges, do there exist edges E' between G1-I and G2-I such that the graph G1 U G_2 U E' belongs to the graph class. This equivalence implies that the simultaneous representation problem is closely related to other well-studied classes in the literature, namely, sandwich graphs and probe graphs. We give efficient algorithms for solving the simultaneous representation problem for interval graphs, chordal graphs, comparability graphs and permutation graphs. Further, our algorithms for comparability and permutation graphs solve a more general version of the problem when there are multiple graphs, any two of which share the same common graph. This version of the problem also generalizes probe graphs

    Gene Expression and its Discontents: Developmental disorders as dysfunctions of epigenetic cognition

    Get PDF
    Systems biology presently suffers the same mereological and sufficiency fallacies that haunt neural network models of high order cognition. Shifting perspective from the massively parallel space of gene matrix interactions to the grammar/syntax of the time series of expressed phenotypes using a cognitive paradigm permits import of techniques from statistical physics via the homology between information source uncertainty and free energy density. This produces a broad spectrum of possible statistical models of development and its pathologies in which epigenetic regulation and the effects of embedding environment are analogous to a tunable enzyme catalyst. A cognitive paradigm naturally incorporates memory, leading directly to models of epigenetic inheritance, as affected by environmental exposures, in the largest sense. Understanding gene expression, development, and their dysfunctions will require data analysis tools considerably more sophisticated than the present crop of simplistic models abducted from neural network studies or stochastic chemical reaction theory

    Numerical Linear Algebra applications in Archaeology: the seriation and the photometric stereo problems

    Get PDF
    The aim of this thesis is to explore the application of Numerical Linear Algebra to Archaeology. An ordering problem called the seriation problem, used for dating findings and/or artifacts deposits, is analysed in terms of graph theory. In particular, a Matlab implementation of an algorithm for spectral seriation, based on the use of the Fiedler vector of the Laplacian matrix associated with the problem, is presented. We consider bipartite graphs for describing the seriation problem, since the interrelationship between the units (i.e. archaeological sites) to be reordered, can be described in terms of these graphs. In our archaeological metaphor of seriation, the two disjoint nodes sets into which the vertices of a bipartite graph can be divided, represent the excavation sites and the artifacts found inside them. Since it is a difficult task to determine the closest bipartite network to a given one, we describe how a starting network can be approximated by a bipartite one by solving a sequence of fairly simple optimization problems. Another numerical problem related to Archaeology is the 3D reconstruction of the shape of an object from a set of digital pictures. In particular, the Photometric Stereo (PS) photographic technique is considered

    Complex Realities, Simple Beauties: Interactions between the Development of Physics Ideas and Western Civilization, from Ancient Times to the Late Nineteenth Century

    Get PDF
    An instructive text covering the history of Physics concepts within the western tradition. It begins with a brief history of the human species, including discussions of food-gathering technology, early settlements, and the development of culture. It continues on to trace the development of human intellectual culture through ancient history and European history, charting a course through Mesopotamian, Egyptian, Greek, and Arabic mathematical and scientific contributions. Much of the book examines the interaction of science with historical factors such as war and rule changes. It challenges readers to think about ways of knowing and the process of developing systematic knowledge

    Spatial thinking and external representation : towards a historical epistemology of space

    Get PDF
    • …
    corecore