Search CORE

2 research outputs found

Concise Fuzzy Representation of Big Graphs: a Dimensionality Reduction Approach

Author: Abu-Khzam Faisal N.
Mouawi Rana H.
Publication venue
Publication date: 08/03/2018
Field of study

The enormous amount of data to be represented using large graphs exceeds in some cases the resources of a conventional computer. Edges in particular can take up a considerable amount of memory as compared to the number of nodes. However, rigorous edge storage might not always be essential to be able to draw the needed conclusions. A similar problem takes records with many variables and attempts to extract the most discernible features. It is said that the "dimension" of this data is reduced. Following an approach with the same objective in mind, we can map a graph representation to a k-dimensional space and answer queries of neighboring nodes by measuring Euclidean distances. The accuracy of our answers would decrease but would be compensated for by fuzzy logic which gives an idea about the likelihood of error. This method allows for reasonable representation in memory while maintaining a fair amount of useful information. Promising preliminary results are obtained and reported by testing the proposed approach on a number of Facebook graphs

arXiv.org e-Print Archive

Distributed Dimension Reduction Algorithms for Widely Dispersed Data

Author: Al Geist
Faisal N. Abu-khzam
George Ostrouchov
Michael A. Langston
Nagiza Samatova
Publication venue
Publication date: 21/11/2007
Field of study

It is well known that information retrieval, clustering and visualization can often be improved by reducing the dimensionality of high dimensional data. Classical techniques oer optimality but are much too slow for extremely large databases. The problem becomes harder yet when data are distributed across geographically dispersed machines. To address this need, an eective distributed dimension reduction algorithm is developed. Motivated by the success of the serial (non-distributed) FastMap heuristic of Faloutsos and Lin, the distributed method presented here is intended to be fast, accurate and reliable. It runs in linear time and requires very little data transmission. A series of experiments is conducted to gauge how the algorithm's emphasis on minimal data transmission aects solution quality. Stress function measurements indicate that the distributed algorithm is highly competitive with the original FastMap heuristic

CiteSeerX