12 research outputs found

    Lenguajes y modelos subyacentes a los grafos de conocimiento

    Get PDF
    Un grafo de conocimiento es una gran base de datos que integra informaci贸n desde distintas fuentes de datos, con el objetivo de poder extraer conocimiento y transformarlo en valor para los usuarios. Esta base de datos es representada como un grafo, donde los nodos representan entidades, y cada arista representa una relaci贸n entre dos nodos o un atributo de un nodo. El objetivo de este art铆culo es presentar una revisi贸n de los modelos de datos que se usan para representar grafos de conocimiento, y los lenguajes de consulta que permiten extraer la informaci贸n expl铆cita e impl铆cita contenida en dichos grafos

    Lenguajes y modelos subyacentes a los grafos de conocimiento

    Get PDF
    A knowledge graph is a large database that integrates information from different data sources, this with the objective of supporting knowledge extraction, and allowing the transformation of such knowledge into value for the users. Such database is represented as a graph where the nodes represent entities, and each edge represents a relationship between nodes, or the attribute of a node. The objective of this article is to review the data models used to represent knowledge graphs, and the query languages that allow to extract explicit and implicit information contained in such graphs.Un grafo de conocimiento es una gran base de datos que integra informaci贸n desde distintas fuentes de datos, esto con el objetivo de poder extraer conocimiento y transformarlo en valor para los usuarios. Dicha base de datos es representada como un grafo donde los nodos representan entidades, y cada arista representa una relaci贸n entre dos nodos, o un atributo de un nodo. El objetivo de este art铆culo es presentar una revisi贸n de los modelos de datos que se usan para representar grafos de conocimiento, y los lenguajes de consulta que permiten extraer la informaci贸n expl铆cita e impl铆cita contenida en dichos grafos

    An Empirical Comparison of Graph Databases.

    Get PDF
    Abstract-In recent years, more and more companies provide services that can not be anymore achieved efficiently using relational databases. As such, these companies are forced to use alternative database models such as XML databases, objectoriented databases, document-oriented databases and, more recently graph databases. Graph databases only exist for a few years. Although there have been some comparison attempts, they are mostly focused on certain aspects only. In this paper, we present a distributed graph database comparison framework and the results we obtained by comparing four important players in the graph databases market: Neo4j, OrientDB, Titan and DEX

    Graph Data Modeling for Political Communication on Twitter

    Get PDF
    Twitter has become a political reality where political parties, presidential candidates, legislatures and journalists post tweets about the latest events sharing texts, pictures, hashtags, URLs, and mentioning other users. Gaining insight from the vast amount of political data on Twitter is only possible with proper computational tools. We propose to store and manage Twitter data in an optimized Neo4j graph database for serving queries about political communication among state legislators of 50 U.S. states, state reporters, and presidential candidates for the 2016 presidential election. Our rationale for selecting this relatively new database technology is threefold: (1) ease of use in explicitly modeling and visualizing communication relationships among entities of interest; (2) flexibility to evolve the database overtime to quickly adapt to changes in user requirements; and (3) user-friendly intuitive query interface. We developed a Python-based Google App Engine application using Twitter API to collect tweets from the Twitter鈥檚 handlers of the aforementioned political actors. We employed best practice guidelines in graph database design to develop five different database models in order to distinguish the impact of each query optimization technique. We evaluated each of the models on the same set of tweets posted during January 1, 2016 to November 11, 2016 using the same set of queries of interest to political communication scholars in terms of the average query response times. Our experimental results confirmed the benefits of the best practice design guidelines. In addition, they show that the optimized database model is able to provide significant improvement in query response times. Reducing the number of hops used in the graph queries and using database indexes on most commonly used attributes reduced the average query response time in our dataset by as much as 74.52% and by 85.27%, respectively, compared to the reference model. Nevertheless, the reduction in the average query response time comes with the cost of the increase in graph database relationship store size by 5.49% compared to the reference model. Our contributions are as follows. (1) The optimized Neo4j graph database that will be updated weekly with new tweets; the access to this database can be made available to political communication scholars. (2) The above findings added to currently limited guidelines in graph database designs. (3) The findings about political communication prior to the Iowa caucus of the 2016 primary presidential election

    Demystifying Graph Databases: Analysis and Taxonomy of Data Organization, System Designs, and Graph Queries

    Full text link
    Graph processing has become an important part of multiple areas of computer science, such as machine learning, computational sciences, medical applications, social network analysis, and many others. Numerous graphs such as web or social networks may contain up to trillions of edges. Often, these graphs are also dynamic (their structure changes over time) and have domain-specific rich data associated with vertices and edges. Graph database systems such as Neo4j enable storing, processing, and analyzing such large, evolving, and rich datasets. Due to the sheer size of such datasets, combined with the irregular nature of graph processing, these systems face unique design challenges. To facilitate the understanding of this emerging domain, we present the first survey and taxonomy of graph database systems. We focus on identifying and analyzing fundamental categories of these systems (e.g., triple stores, tuple stores, native graph database systems, or object-oriented systems), the associated graph models (e.g., RDF or Labeled Property Graph), data organization techniques (e.g., storing graph data in indexing structures or dividing data into records), and different aspects of data distribution and query execution (e.g., support for sharding and ACID). 51 graph database systems are presented and compared, including Neo4j, OrientDB, or Virtuoso. We outline graph database queries and relationships with associated domains (NoSQL stores, graph streaming, and dynamic graph algorithms). Finally, we describe research and engineering challenges to outline the future of graph databases

    Graph Processing in Main-Memory Column Stores

    Get PDF
    Evermore, novel and traditional business applications leverage the advantages of a graph data model, such as the offered schema flexibility and an explicit representation of relationships between entities. As a consequence, companies are confronted with the challenge of storing, manipulating, and querying terabytes of graph data for enterprise-critical applications. Although these business applications operate on graph-structured data, they still require direct access to the relational data and typically rely on an RDBMS to keep a single source of truth and access. Existing solutions performing graph operations on business-critical data either use a combination of SQL and application logic or employ a graph data management system. For the first approach, relying solely on SQL results in poor execution performance caused by the functional mismatch between typical graph operations and the relational algebra. To the worse, graph algorithms expose a tremendous variety in structure and functionality caused by their often domain-specific implementations and therefore can be hardly integrated into a database management system other than with custom coding. Since the majority of these enterprise-critical applications exclusively run on relational DBMSs, employing a specialized system for storing and processing graph data is typically not sensible. Besides the maintenance overhead for keeping the systems in sync, combining graph and relational operations is hard to realize as it requires data transfer across system boundaries. A basic ingredient of graph queries and algorithms are traversal operations and are a fundamental component of any database management system that aims at storing, manipulating, and querying graph data. Well-established graph traversal algorithms are standalone implementations relying on optimized data structures. The integration of graph traversals as an operator into a database management system requires a tight integration into the existing database environment and a development of new components, such as a graph topology-aware optimizer and accompanying graph statistics, graph-specific secondary index structures to speedup traversals, and an accompanying graph query language. In this thesis, we introduce and describe GRAPHITE, a hybrid graph-relational data management system. GRAPHITE is a performance-oriented graph data management system as part of an RDBMS allowing to seamlessly combine processing of graph data with relational data in the same system. We propose a columnar storage representation for graph data to leverage the already existing and mature data management and query processing infrastructure of relational database management systems. At the core of GRAPHITE we propose an execution engine solely based on set operations and graph traversals. Our design is driven by the observation that different graph topologies expose different algorithmic requirements to the design of a graph traversal operator. We derive two graph traversal implementations targeting the most common graph topologies and demonstrate how graph-specific statistics can be leveraged to select the optimal physical traversal operator. To accelerate graph traversals, we devise a set of graph-specific, updateable secondary index structures to improve the performance of vertex neighborhood expansion. Finally, we introduce a domain-specific language with an intuitive programming model to extend graph traversals with custom application logic at runtime. We use the LLVM compiler framework to generate efficient code that tightly integrates the user-specified application logic with our highly optimized built-in graph traversal operators. Our experimental evaluation shows that GRAPHITE can outperform native graph management systems by several orders of magnitude while providing all the features of an RDBMS, such as transaction support, backup and recovery, security and user management, effectively providing a promising alternative to specialized graph management systems that lack many of these features and require expensive data replication and maintenance processes
    corecore