2 research outputs found

    Concise Fuzzy Representation of Big Graphs: a Dimensionality Reduction Approach

    Full text link
    The enormous amount of data to be represented using large graphs exceeds in some cases the resources of a conventional computer. Edges in particular can take up a considerable amount of memory as compared to the number of nodes. However, rigorous edge storage might not always be essential to be able to draw the needed conclusions. A similar problem takes records with many variables and attempts to extract the most discernible features. It is said that the "dimension" of this data is reduced. Following an approach with the same objective in mind, we can map a graph representation to a k-dimensional space and answer queries of neighboring nodes by measuring Euclidean distances. The accuracy of our answers would decrease but would be compensated for by fuzzy logic which gives an idea about the likelihood of error. This method allows for reasonable representation in memory while maintaining a fair amount of useful information. Promising preliminary results are obtained and reported by testing the proposed approach on a number of Facebook graphs

    Distributed Dimension Reduction Algorithms for Widely Dispersed Data

    No full text
    It is well known that information retrieval, clustering and visualization can often be improved by reducing the dimensionality of high dimensional data. Classical techniques oer optimality but are much too slow for extremely large databases. The problem becomes harder yet when data are distributed across geographically dispersed machines. To address this need, an eective distributed dimension reduction algorithm is developed. Motivated by the success of the serial (non-distributed) FastMap heuristic of Faloutsos and Lin, the distributed method presented here is intended to be fast, accurate and reliable. It runs in linear time and requires very little data transmission. A series of experiments is conducted to gauge how the algorithm's emphasis on minimal data transmission aects solution quality. Stress function measurements indicate that the distributed algorithm is highly competitive with the original FastMap heuristic
    corecore