11,907 research outputs found
An introduction to Graph Data Management
A graph database is a database where the data structures for the schema
and/or instances are modeled as a (labeled)(directed) graph or generalizations
of it, and where querying is expressed by graph-oriented operations and type
constructors. In this article we present the basic notions of graph databases,
give an historical overview of its main development, and study the main current
systems that implement them
The Ubiquity of Large Graphs and Surprising Challenges of Graph Processing: Extended Survey
Graph processing is becoming increasingly prevalent across many application
domains. In spite of this prevalence, there is little research about how graphs
are actually used in practice. We performed an extensive study that consisted
of an online survey of 89 users, a review of the mailing lists, source
repositories, and whitepapers of a large suite of graph software products, and
in-person interviews with 6 users and 2 developers of these products. Our
online survey aimed at understanding: (i) the types of graphs users have; (ii)
the graph computations users run; (iii) the types of graph software users use;
and (iv) the major challenges users face when processing their graphs. We
describe the participants' responses to our questions highlighting common
patterns and challenges. Based on our interviews and survey of the rest of our
sources, we were able to answer some new questions that were raised by
participants' responses to our online survey and understand the specific
applications that use graph data and software. Our study revealed surprising
facts about graph processing in practice. In particular, real-world graphs
represent a very diverse range of entities and are often very large,
scalability and visualization are undeniably the most pressing challenges faced
by participants, and data integration, recommendations, and fraud detection are
very popular applications supported by existing graph software. We hope these
findings can guide future research
Kolmogorov Complexity in perspective. Part II: Classification, Information Processing and Duality
We survey diverse approaches to the notion of information: from Shannon
entropy to Kolmogorov complexity. Two of the main applications of Kolmogorov
complexity are presented: randomness and classification. The survey is divided
in two parts published in a same volume. Part II is dedicated to the relation
between logic and information system, within the scope of Kolmogorov
algorithmic information theory. We present a recent application of Kolmogorov
complexity: classification using compression, an idea with provocative
implementation by authors such as Bennett, Vitanyi and Cilibrasi. This stresses
how Kolmogorov complexity, besides being a foundation to randomness, is also
related to classification. Another approach to classification is also
considered: the so-called "Google classification". It uses another original and
attractive idea which is connected to the classification using compression and
to Kolmogorov complexity from a conceptual point of view. We present and unify
these different approaches to classification in terms of Bottom-Up versus
Top-Down operational modes, of which we point the fundamental principles and
the underlying duality. We look at the way these two dual modes are used in
different approaches to information system, particularly the relational model
for database introduced by Codd in the 70's. This allows to point out diverse
forms of a fundamental duality. These operational modes are also reinterpreted
in the context of the comprehension schema of axiomatic set theory ZF. This
leads us to develop how Kolmogorov's complexity is linked to intensionality,
abstraction, classification and information system.Comment: 43 page
SODA: Generating SQL for Business Users
The purpose of data warehouses is to enable business analysts to make better
decisions. Over the years the technology has matured and data warehouses have
become extremely successful. As a consequence, more and more data has been
added to the data warehouses and their schemas have become increasingly
complex. These systems still work great in order to generate pre-canned
reports. However, with their current complexity, they tend to be a poor match
for non tech-savvy business analysts who need answers to ad-hoc queries that
were not anticipated. This paper describes the design, implementation, and
experience of the SODA system (Search over DAta Warehouse). SODA bridges the
gap between the business needs of analysts and the technical complexity of
current data warehouses. SODA enables a Google-like search experience for data
warehouses by taking keyword queries of business users and automatically
generating executable SQL. The key idea is to use a graph pattern matching
algorithm that uses the metadata model of the data warehouse. Our results with
real data from a global player in the financial services industry show that
SODA produces queries with high precision and recall, and makes it much easier
for business users to interactively explore highly-complex data warehouses.Comment: VLDB201
- …