319,219 research outputs found

    Storage Solutions for Big Data Systems: A Qualitative Study and Comparison

    Full text link
    Big data systems development is full of challenges in view of the variety of application areas and domains that this technology promises to serve. Typically, fundamental design decisions involved in big data systems design include choosing appropriate storage and computing infrastructures. In this age of heterogeneous systems that integrate different technologies for optimized solution to a specific real world problem, big data system are not an exception to any such rule. As far as the storage aspect of any big data system is concerned, the primary facet in this regard is a storage infrastructure and NoSQL seems to be the right technology that fulfills its requirements. However, every big data application has variable data characteristics and thus, the corresponding data fits into a different data model. This paper presents feature and use case analysis and comparison of the four main data models namely document oriented, key value, graph and wide column. Moreover, a feature analysis of 80 NoSQL solutions has been provided, elaborating on the criteria and points that a developer must consider while making a possible choice. Typically, big data storage needs to communicate with the execution engine and other processing and visualization technologies to create a comprehensive solution. This brings forth second facet of big data storage, big data file formats, into picture. The second half of the research paper compares the advantages, shortcomings and possible use cases of available big data file formats for Hadoop, which is the foundation for most big data computing technologies. Decentralized storage and blockchain are seen as the next generation of big data storage and its challenges and future prospects have also been discussed

    Visualising the structure of document search results: A comparison of graph theoretic approaches

    Get PDF
    This is the post-print of the article - Copyright @ 2010 Sage PublicationsPrevious work has shown that distance-similarity visualisation or ‘spatialisation’ can provide a potentially useful context in which to browse the results of a query search, enabling the user to adopt a simple local foraging or ‘cluster growing’ strategy to navigate through the retrieved document set. However, faithfully mapping feature-space models to visual space can be problematic owing to their inherent high dimensionality and non-linearity. Conventional linear approaches to dimension reduction tend to fail at this kind of task, sacrificing local structural in order to preserve a globally optimal mapping. In this paper the clustering performance of a recently proposed algorithm called isometric feature mapping (Isomap), which deals with non-linearity by transforming dissimilarities into geodesic distances, is compared to that of non-metric multidimensional scaling (MDS). Various graph pruning methods, for geodesic distance estimation, are also compared. Results show that Isomap is significantly better at preserving local structural detail than MDS, suggesting it is better suited to cluster growing and other semantic navigation tasks. Moreover, it is shown that applying a minimum-cost graph pruning criterion can provide a parameter-free alternative to the traditional K-neighbour method, resulting in spatial clustering that is equivalent to or better than that achieved using an optimal-K criterion

    A simple physical model for scaling in protein-protein interaction networks

    Full text link
    It has recently been demonstrated that many biological networks exhibit a scale-free topology where the probability of observing a node with a certain number of edges (k) follows a power law: i.e. p(k) ~ k^-g. This observation has been reproduced by evolutionary models. Here we consider the network of protein-protein interactions and demonstrate that two published independent measurements of these interactions produce graphs that are only weakly correlated with one another despite their strikingly similar topology. We then propose a physical model based on the fundamental principle that (de)solvation is a major physical factor in protein-protein interactions. This model reproduces not only the scale-free nature of such graphs but also a number of higher-order correlations in these networks. A key support of the model is provided by the discovery of a significant correlation between number of interactions made by a protein and the fraction of hydrophobic residues on its surface. The model presented in this paper represents the first physical model for experimentally determined protein-protein interactions that comprehensively reproduces the topological features of interaction networks. These results have profound implications for understanding not only protein-protein interactions but also other types of scale-free networks.Comment: 50 pages, 17 figure

    ChoiceRank: Identifying Preferences from Node Traffic in Networks

    Get PDF
    Understanding how users navigate in a network is of high interest in many applications. We consider a setting where only aggregate node-level traffic is observed and tackle the task of learning edge transition probabilities. We cast it as a preference learning problem, and we study a model where choices follow Luce's axiom. In this case, the O(n)O(n) marginal counts of node visits are a sufficient statistic for the O(n2)O(n^2) transition probabilities. We show how to make the inference problem well-posed regardless of the network's structure, and we present ChoiceRank, an iterative algorithm that scales to networks that contains billions of nodes and edges. We apply the model to two clickstream datasets and show that it successfully recovers the transition probabilities using only the network structure and marginal (node-level) traffic data. Finally, we also consider an application to mobility networks and apply the model to one year of rides on New York City's bicycle-sharing system.Comment: Accepted at ICML 201

    25 years development of knowledge graph theory: the results and the challenge

    Get PDF
    The project on knowledge graph theory was begun in 1982. At the initial stage, the goal was to use graphs to represent knowledge in the form of an expert system. By the end of the 80's expert systems in medical and social science were developed successfully using knowledge graph theory. In the following stage, the goal of the project was broadened to represent natural language by knowledge graphs. Since then, this theory can be considered as one of the methods to deal with natural language processing. At the present time knowledge graph representation has been proven to be a method that is language independent. The theory can be applied to represent almost any characteristic feature in various languages.\ud The objective of the paper is to summarize the results of 25 years of development of knowledge graph theory and to point out some challenges to be dealt with in the next stage of the development of the theory. The paper will give some highlight on the difference between this theory and other theories like that of conceptual graphs which has been developed and presented by Sowa in 1984 and other theories like that of formal concept analysis by Wille or semantic networks
    • 

    corecore