189 research outputs found
HD-Index: Pushing the Scalability-Accuracy Boundary for Approximate kNN Search in High-Dimensional Spaces
Nearest neighbor searching of large databases in high-dimensional spaces is
inherently difficult due to the curse of dimensionality. A flavor of
approximation is, therefore, necessary to practically solve the problem of
nearest neighbor search. In this paper, we propose a novel yet simple indexing
scheme, HD-Index, to solve the problem of approximate k-nearest neighbor
queries in massive high-dimensional databases. HD-Index consists of a set of
novel hierarchical structures called RDB-trees built on Hilbert keys of
database objects. The leaves of the RDB-trees store distances of database
objects to reference objects, thereby allowing efficient pruning using distance
filters. In addition to triangular inequality, we also use Ptolemaic inequality
to produce better lower bounds. Experiments on massive (up to billion scale)
high-dimensional (up to 1000+) datasets show that HD-Index is effective,
efficient, and scalable.Comment: PVLDB 11(8):906-919, 201
Simultaneous Graph Representation Problems
Many graphs arising in practice can be represented in a concise and intuitive way that conveys their structure. For example: A planar graph can be represented in the plane with points for vertices and non-crossing curves for edges. An interval graph can be represented on the real line with intervals for vertices and intersection of intervals representing edges. The concept of ``simultaneity'' applies for several types of graphs: the idea is to find representations for two graphs that share some common vertices and edges, and ensure that the common vertices and edges are represented the same way. Simultaneous representation problems arise in any situation where two related graphs should be represented consistently. A main instance is for temporal relationships, where an old graph and a new graph share some common parts. Pairs of related graphs arise in many other situations. For example, two social networks that share some members; two schedules that share some events, overlap graphs of DNA fragments of two similar organisms, circuit graphs of two adjacent layers on a computer chip etc. In this thesis, we study the simultaneous
representation problem for several graph classes.
For planar graphs the problem is defined as follows. Let G1 and G2 be two graphs sharing some vertices and edges. The simultaneous planar embedding problem asks whether there exist planar embeddings (or drawings) for G1 and G2 such that every vertex shared by the two graphs is mapped to the same point and every shared edge is mapped to the same curve in both embeddings. Over the last few years there has been a lot of work on simultaneous planar embeddings, which have been called `simultaneous embeddings with fixed edges'. A major open question is whether simultaneous planarity for two graphs can be tested in polynomial time. We give a linear-time algorithm for testing the simultaneous planarity of any two graphs that share a 2-connected subgraph. Our algorithm also extends to the case of k planar graphs, where each vertex [edge] is either common to all graphs
or belongs to exactly one of them.
Next we introduce a new notion of simultaneity for intersection graph classes (interval graphs, chordal graphs etc.) and for comparability graphs. For interval graphs, the problem is defined as follows. Let G1 and G2 be two interval graphs sharing some vertices I and the edges induced by I. G1 and G2 are said to be `simultaneous interval graphs' if there exist interval representations of G1 and G2 such that any vertex of I is assigned to the same interval in both the representations. The `simultaneous representation problem' for interval graphs asks whether G1 and G2 are simultaneous interval graphs. The problem is defined in a similar way for other intersection graph classes.
For comparability graphs and any intersection graph class, we show that the simultaneous representation problem for the graph class is equivalent to a graph augmentation problem: given graphs G1 and G2, sharing vertices I and the corresponding induced edges, do there exist edges E' between G1-I and G2-I such that the graph G1 U G_2 U E' belongs to the graph class. This equivalence implies that the simultaneous representation problem is closely related to other well-studied classes in the literature, namely, sandwich graphs and probe graphs.
We give efficient algorithms for solving the simultaneous representation problem for interval graphs, chordal graphs, comparability graphs and permutation graphs. Further, our algorithms for comparability and permutation graphs solve a more general version of the problem when there are multiple graphs, any two of which share the same common graph. This version of the problem also generalizes probe graphs
Gene Expression and its Discontents: Developmental disorders as dysfunctions of epigenetic cognition
Systems biology presently suffers the same mereological and sufficiency fallacies that haunt neural network models of high order cognition. Shifting perspective from the massively parallel space of gene matrix interactions to the grammar/syntax of the time series of expressed phenotypes using a cognitive paradigm permits import of techniques from statistical physics via the homology between information source uncertainty and free energy density. This produces a broad spectrum of possible statistical models of development and its pathologies in which epigenetic regulation and the effects of embedding environment are analogous to a tunable enzyme catalyst. A cognitive paradigm naturally incorporates memory, leading directly to models of epigenetic inheritance, as affected by environmental exposures, in the largest sense. Understanding gene expression, development, and their dysfunctions will require data analysis tools considerably more sophisticated than the present crop of simplistic models abducted from neural network studies or stochastic chemical reaction theory
Numerical Linear Algebra applications in Archaeology: the seriation and the photometric stereo problems
The aim of this thesis is to explore the application of Numerical Linear Algebra to Archaeology. An ordering problem called the seriation problem, used for dating findings and/or artifacts deposits, is analysed in terms of graph theory. In particular, a Matlab implementation of an algorithm for spectral seriation, based on the use of the Fiedler vector of the Laplacian matrix associated with the problem, is presented. We consider bipartite graphs for describing the seriation problem, since the interrelationship between the units (i.e. archaeological sites) to be reordered, can be described in terms of these graphs. In our archaeological metaphor of seriation, the two disjoint nodes sets into which the vertices of a bipartite graph can be divided, represent the excavation sites and the artifacts found inside
them.
Since it is a difficult task to determine the closest bipartite network to a given one, we describe how a starting network can be approximated by a bipartite one by solving a sequence of fairly simple optimization problems.
Another numerical problem related to Archaeology is the 3D reconstruction of the shape of an object from a set of digital pictures. In particular, the Photometric Stereo (PS) photographic technique is considered
Complex Realities, Simple Beauties: Interactions between the Development of Physics Ideas and Western Civilization, from Ancient Times to the Late Nineteenth Century
An instructive text covering the history of Physics concepts within the western tradition. It begins with a brief history of the human species, including discussions of food-gathering technology, early settlements, and the development of culture. It continues on to trace the development of human intellectual culture through ancient history and European history, charting a course through Mesopotamian, Egyptian, Greek, and Arabic mathematical and scientific contributions. Much of the book examines the interaction of science with historical factors such as war and rule changes. It challenges readers to think about ways of knowing and the process of developing systematic knowledge
Recommended from our members
Overcoming the Intuition Wall: Measurement and Analysis in Computer Architecture
These are exciting times for computer architecture research. Today there is significant demand to improve the performance and energy-efficiency of emerging, transformative applications which are being hammered out by the hundreds for new computing platforms and usage models. This booming growth of applications and the variety of programming languages used to create them is challenging our ability as architects to rapidly and rigorously characterize these applications. Concurrently, hardware has become more complex with the emergence of accelerators, multicore systems, and heterogeneity caused by further divergence between processor market segments. No one architect can now understand all the complexities of many systems and reason about the full impact of changes or new applications.
To that end, this dissertation presents four case studies in quantitative methods. Each case study attacks a different application and proposes a new measurement or analytical technique. In each case study we find at least one surprising or unintuitive result which would likely not have been found without the application of our method
- …