319,219 research outputs found
Storage Solutions for Big Data Systems: A Qualitative Study and Comparison
Big data systems development is full of challenges in view of the variety of
application areas and domains that this technology promises to serve.
Typically, fundamental design decisions involved in big data systems design
include choosing appropriate storage and computing infrastructures. In this age
of heterogeneous systems that integrate different technologies for optimized
solution to a specific real world problem, big data system are not an exception
to any such rule. As far as the storage aspect of any big data system is
concerned, the primary facet in this regard is a storage infrastructure and
NoSQL seems to be the right technology that fulfills its requirements. However,
every big data application has variable data characteristics and thus, the
corresponding data fits into a different data model. This paper presents
feature and use case analysis and comparison of the four main data models
namely document oriented, key value, graph and wide column. Moreover, a feature
analysis of 80 NoSQL solutions has been provided, elaborating on the criteria
and points that a developer must consider while making a possible choice.
Typically, big data storage needs to communicate with the execution engine and
other processing and visualization technologies to create a comprehensive
solution. This brings forth second facet of big data storage, big data file
formats, into picture. The second half of the research paper compares the
advantages, shortcomings and possible use cases of available big data file
formats for Hadoop, which is the foundation for most big data computing
technologies. Decentralized storage and blockchain are seen as the next
generation of big data storage and its challenges and future prospects have
also been discussed
Visualising the structure of document search results: A comparison of graph theoretic approaches
This is the post-print of the article - Copyright @ 2010 Sage PublicationsPrevious work has shown that distance-similarity visualisation or âspatialisationâ can provide a potentially useful context in which to browse the results of a query search, enabling the user to adopt a simple local foraging or âcluster growingâ strategy to navigate through the retrieved document set. However, faithfully mapping feature-space models to visual space can be problematic owing to their inherent high dimensionality and non-linearity. Conventional linear approaches to dimension reduction tend to fail at this kind of task, sacrificing local structural in order to preserve a globally optimal mapping. In this paper the clustering performance of a recently proposed algorithm called isometric feature mapping (Isomap), which deals with non-linearity by transforming dissimilarities into geodesic distances, is compared to that of non-metric multidimensional scaling (MDS). Various graph pruning methods, for geodesic distance estimation, are also compared. Results show that Isomap is significantly better at preserving local structural detail than MDS, suggesting it is better suited to cluster growing and other semantic navigation tasks. Moreover, it is shown that applying a minimum-cost graph pruning criterion can provide a parameter-free alternative to the traditional K-neighbour method, resulting in spatial clustering that is equivalent to or better than that achieved using an optimal-K criterion
A simple physical model for scaling in protein-protein interaction networks
It has recently been demonstrated that many biological networks exhibit a
scale-free topology where the probability of observing a node with a certain
number of edges (k) follows a power law: i.e. p(k) ~ k^-g. This observation has
been reproduced by evolutionary models. Here we consider the network of
protein-protein interactions and demonstrate that two published independent
measurements of these interactions produce graphs that are only weakly
correlated with one another despite their strikingly similar topology. We then
propose a physical model based on the fundamental principle that (de)solvation
is a major physical factor in protein-protein interactions. This model
reproduces not only the scale-free nature of such graphs but also a number of
higher-order correlations in these networks. A key support of the model is
provided by the discovery of a significant correlation between number of
interactions made by a protein and the fraction of hydrophobic residues on its
surface. The model presented in this paper represents the first physical model
for experimentally determined protein-protein interactions that comprehensively
reproduces the topological features of interaction networks. These results have
profound implications for understanding not only protein-protein interactions
but also other types of scale-free networks.Comment: 50 pages, 17 figure
ChoiceRank: Identifying Preferences from Node Traffic in Networks
Understanding how users navigate in a network is of high interest in many
applications. We consider a setting where only aggregate node-level traffic is
observed and tackle the task of learning edge transition probabilities. We cast
it as a preference learning problem, and we study a model where choices follow
Luce's axiom. In this case, the marginal counts of node visits are a
sufficient statistic for the transition probabilities. We show how to
make the inference problem well-posed regardless of the network's structure,
and we present ChoiceRank, an iterative algorithm that scales to networks that
contains billions of nodes and edges. We apply the model to two clickstream
datasets and show that it successfully recovers the transition probabilities
using only the network structure and marginal (node-level) traffic data.
Finally, we also consider an application to mobility networks and apply the
model to one year of rides on New York City's bicycle-sharing system.Comment: Accepted at ICML 201
25 years development of knowledge graph theory: the results and the challenge
The project on knowledge graph theory was begun in 1982. At the initial stage, the goal was to use graphs to represent knowledge in the form of an expert system. By the end of the 80's expert systems in medical and social science were developed successfully using knowledge graph theory. In the following stage, the goal of the project was broadened to represent natural language by knowledge graphs. Since then, this theory can be considered as one of the methods to deal with natural language processing. At the present time knowledge graph representation has been proven to be a method that is language independent. The theory can be applied to represent almost any characteristic feature in various languages.\ud
The objective of the paper is to summarize the results of 25 years of development of knowledge graph theory and to point out some challenges to be dealt with in the next stage of the development of the theory. The paper will give some highlight on the difference between this theory and other theories like that of conceptual graphs which has been developed and presented by Sowa in 1984 and other theories like that of formal concept analysis by Wille or semantic networks
- âŠ