2 research outputs found
Concise Fuzzy Representation of Big Graphs: a Dimensionality Reduction Approach
The enormous amount of data to be represented using large graphs exceeds in
some cases the resources of a conventional computer. Edges in particular can
take up a considerable amount of memory as compared to the number of nodes.
However, rigorous edge storage might not always be essential to be able to draw
the needed conclusions. A similar problem takes records with many variables and
attempts to extract the most discernible features. It is said that the
"dimension" of this data is reduced. Following an approach with the same
objective in mind, we can map a graph representation to a k-dimensional space
and answer queries of neighboring nodes by measuring Euclidean distances. The
accuracy of our answers would decrease but would be compensated for by fuzzy
logic which gives an idea about the likelihood of error. This method allows for
reasonable representation in memory while maintaining a fair amount of useful
information. Promising preliminary results are obtained and reported by testing
the proposed approach on a number of Facebook graphs
Distributed Dimension Reduction Algorithms for Widely Dispersed Data
It is well known that information retrieval, clustering and visualization can often be improved by reducing the dimensionality of high dimensional data. Classical techniques oer optimality but are much too slow for extremely large databases. The problem becomes harder yet when data are distributed across geographically dispersed machines. To address this need, an eective distributed dimension reduction algorithm is developed. Motivated by the success of the serial (non-distributed) FastMap heuristic of Faloutsos and Lin, the distributed method presented here is intended to be fast, accurate and reliable. It runs in linear time and requires very little data transmission. A series of experiments is conducted to gauge how the algorithm's emphasis on minimal data transmission aects solution quality. Stress function measurements indicate that the distributed algorithm is highly competitive with the original FastMap heuristic