12 research outputs found

    Cypher: An Evolving Query Language for Property Graphs

    Get PDF
    International audienceThe Cypher property graph query language is an evolving language, originally designed and implemented as part of the Neo4j graph database, and it is currently used by several commercial database products and researchers. We describe Cypher 9, which is the first version of the language governed by the openCypher Implementers Group. We first introduce the language by example, and describe its uses in industry. We then provide a formal semantic definition of the core read-query features of Cypher, including its variant of the property graph data model, and its " ASCII Art " graph pattern matching mechanism for expressing subgraphs of interest to an application. We compare the features of Cypher to other property graph query languages, and describe extensions, at an advanced stage of development, which will form part of Cypher 10, turning the language into a compositional language which supports graph projections and multiple named graphs

    Visões em bancos de dados de grafos : uma abordagem multifoco para dados heterogêneos

    Get PDF
    Orientador: Claudia Maria Bauzer MedeirosTese (doutorado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: A pesquisa científica tornou-se cada vez mais dependente de dados. Esse novo paradigma de pesquisa demanda técnicas e tecnologias computacionais sofisticadas para apoiar tanto o ciclo de vida dos dados científicos como a colaboração entre cientistas de diferentes áreas. Uma demanda recorrente em equipes multidisciplinares é a construção de múltiplas perspectivas sobre um mesmo conjunto de dados. Soluções atuais cobrem vários aspectos, desde o projeto de padrões de interoperabilidade ao uso de sistemas de gerenciamento de bancos de dados não-relacionais. Entretanto, nenhum desses esforços atende de forma adequada a necessidade de múltiplas perspectivas, denominadas focos nesta tese. Em termos gerais, um foco é projetado e construído para atender um determinado grupo de pesquisa (mesmo no escopo de um único projeto) que necessita manipular um subconjunto de dados de interesse em múltiplos níveis de agregação/generalização. A definição e criação de um foco são tarefas complexas que demandam mecanismos capazes de manipular múltiplas representações de um mesmo fenômeno do mundo real. O objetivo desta tese é prover múltiplos focos sobre dados heterogêneos. Para atingir esse objetivo, esta pesquisa se concentrou em quatro principais problemas. Os problemas inicialmente abordados foram: (1) escolher um paradigma de gerenciamento de dados adequado e (2) elencar os principais requisitos de pesquisas multifoco. Nossos resultados nos direcionaram para a adoção de bancos de dados de grafos como solução para o problema (1) e a utilização do conceito de visões, de bancos de dados relacionais, para o problema (2). Entretanto, não há consenso sobre um modelo de dados para bancos de dados de grafos e o conceito de visões é pouco explorado nesse contexto. Com isso, os demais problemas tratados por esta pesquisa são: (3) a especificação de um modelo de dados de grafos e (4) a definição de um framework para manipular visões em bancos de dados de grafos. Nossa pesquisa nesses quatro problemas resultaram nas contribuições principais desta tese: (i) apontar o uso de bancos de dados de grafos como camada de persistência em pesquisas multifoco - um tipo de banco de dados de esquema flexível e orientado a relacionamentos que provê uma ampla compreensão sobre as relações entre os dados; (ii) definir visões para bancos de dados de grafos como mecanismo para manipular múltiplos focos, considerando operações de manipulação de dados em grafos, travessias e algoritmos de grafos; (iii) propor um modelo de dados para grafos - baseado em grafos de propriedade - para lidar com a ausência de um modelo de dados pleno para grafos; (iv) especificar e implementar um framework, denominado Graph-Kaleidoscope, para prover o uso de visões em bancos de dados de grafos e (v) validar nosso framework com dados reais em aplicações distintas - em biodiversidade e em recursos naturais - dois típicos exemplos de pesquisas multidisciplinares que envolvem a análise de interações de fenômenos a partir de dados heterogêneosAbstract: Scientific research has become data-intensive and data-dependent. This new research paradigm requires sophisticated computer science techniques and technologies to support the life cycle of scientific data and collaboration among scientists from distinct areas. A major requirement is that researchers working in data-intensive interdisciplinary teams demand construction of multiple perspectives of the world, built over the same datasets. Present solutions cover a wide range of aspects, from the design of interoperability standards to the use of non-relational database management systems. None of these efforts, however, adequately meet the needs of multiple perspectives, which are called foci in the thesis. Basically, a focus is designed/built to cater to a research group (even within a single project) that needs to deal with a subset of data of interest, under multiple ggregation/generalization levels. The definition and creation of a focus are complex tasks that require mechanisms and engines to manipulate multiple representations of the same real world phenomenon. This PhD research aims to provide multiple foci over heterogeneous data. To meet this challenge, we deal with four research problems. The first two were (1) choosing an appropriate data management paradigm; and (2) eliciting multifocus requirements. Our work towards solving these problems made as choose graph databases to answer (1) and the concept of views in relational databases for (2). However, there is no consensual data model for graph databases and views are seldom discussed in this context. Thus, research problems (3) and (4) are: (3) specifying an adequate graph data model and (4) defining a framework to handle views on graph databases. Our research in these problems results in the main contributions of this thesis: (i) to present the case for the use of graph databases in multifocus research as persistence layer - a schemaless and relationship driven type of database that provides a full understanding of data connections; (ii) to define views for graph databases to support the need for multiple foci, considering graph data manipulation, graph algorithms and traversal tasks; (iii) to propose a property graph data model (PGDM) to fill the gap of absence of a full-fledged data model for graphs; (iv) to specify and implement a framework, named Graph-Kaleidoscope, that supports views over graph databases and (v) to validate our framework for real world applications in two domains - biodiversity and environmental resources - typical examples of multidisciplinary research that involve the analysis of interactions of phenomena using heterogeneous dataDoutoradoCiência da ComputaçãoDoutora em Ciência da Computaçã

    Comparative analysis of PropertyFirst vs. EntityFirst modeling approaches in graph databases

    Get PDF
    While relational databases still hold the primary position in the database technology domain, and have been for the longest time of any Computer Science technology has since its inception, for the first time the relational databases now have valid and worthy opponent in the NoSQL database movement. NoSQL databases, even though not many people have heard of them, with a significant number of Computer Science people included, have spread rapidly in many shapes and forms and have done so in quite a chaotic fashion. Similarly to the way they appeared and spread, design and modeling for them have been undertaken in an unstructured manner. Currently they are subcategorized in 4 main groups as: Key-value stores, Column Family stores, Document stores and Graph databases. In this thesis, different modeling approaches for graph databases, applied to the same domain are analyzed and compared, especially from a design perspective. The database selected here as the implemented technology is Neo4J by Neo Technologies and is a directed property graph database, which means that relationships between its data entities must have a starting and ending (or source and destination) node. This research provides an overview of two competing modeling approaches and evaluates them in a context of a real world example. The work done here shows that both of these modeling approaches are valid and that it is possible to fully develop a data model based on the same domain data with both approaches and that both can be used later to support application access in a similar fashion. One of the models provides for faster access to data, but at a cost of higher maintenance and increased complexity

    Optimization of Regular Path Queries in Graph Databases

    Get PDF
    Regular path queries offer a powerful navigational mechanism in graph databases. Recently, there has been renewed interest in such queries in the context of the Semantic Web. The extension of SPARQL in version 1.1 with property paths offers a type of regular path query for RDF graph databases. While eminently useful, such queries are difficult to optimize and evaluate efficiently, however. We design and implement a cost-based optimizer we call Waveguide for SPARQL queries with property paths. Waveguide builds a query planwhich we call a waveplan (WP)which guides the query evaluation. There are numerous choices in the con- struction of a plan, and a number of optimization methods, so the space of plans for a query can be quite large. Execution costs of plans for the same query can vary by orders of magnitude with the best plan often offering excellent performance. A WPs costs can be estimated, which opens the way to cost-based optimization. We demonstrate that Waveguide properly subsumes existing techniques and that the new plans it adds are relevant. We analyze the effective plan space which is enabled by Waveguide and design an efficient enumerator for it. We implement a pro- totype of a Waveguide cost-based optimizer on top of an open-source relational RDF store. Finally, we perform a comprehensive performance study of the state of the art for evaluation of SPARQL property paths and demonstrate the significant performance gains that Waveguide offers

    Prosessitietomallin toteutus graafitietokannassa

    Get PDF
    Tässä tutkielmassa käsitellään jäljitettävyyttä ja toimitusketjujen mallintamista graafitietokannassa. Esimerkkinä toimitusketjujen mallintamisesta käytetään jäljitettävyysgraafia. Tutkielmassa esitellään jäljitettävyysgraafin toteutus Neo4j-graafitietokannassa. Graafitietokanta on yhdistetty Lucene-käänteishakemistoon, mikä mahdollistaa jäljitettävyystietojen yhdistämisen dokumentteihin kohdistuvaan sanahakuun. Tavoitteena on tutkia, miten jäljitettävyysgraafin pohjalta luodusta tietokannasta ja siihen yhdistetystä käänteishakemistosta voidaan hakea prosesseihin ja tuotteisiin liittyvää tietoa. Tätä varten on ohjelmoitu Java-ohjelmointikielellä hakuohjelma, jonka kautta tehtävillä esimerkkikyselyillä tiedonhakua havainnollistetaan

    Acta Cybernetica : Volume 13. Number 3.

    Get PDF

    Linguaggi di interrogazione per basi di dati semistrutturati a grafo.

    Get PDF
    Negli ultimi anni, grazie anche all'avvento del web 2.0, si sono rese disponibili grandi sorgenti di dati semistrutturati. Si è quindi sviluppato un certo interesse da parte della comunità di ricerca su alcuni modelli per basi di dati atti ad una migliore rappresentazione e più effciente interrogazione di questo tipo di informazioni: un noto esempio proposto dal W3C è "Resource Description Framework". In questa tesi si analizzano alcuni modelli a grafo sia per dati semistrutturati che non, e si confrontano i relativi linguaggi di interrogazione dal punto di vista del loro potere espressivo

    Investigating renewable energy systems using artifcial intelligence techniques

    Get PDF
    This research investigated applying Artificial Intelegence (AI) and Machine Learning (ML) to renewable energy through three studies. The first study characterized and mapped the recent research landscape in the field of AI applications for various renewable energy systems using Natural Language Prcoessing (NLP) and ML models. It considered published documetns at Scopus database in the period (2000-2021). The second study built hybrid Catboost-CNN-LSTM architecture pipeline to predict an industrial-scale biogas plant’s daily biogas production and investigate the feedstock components importance on it. The third study investigated prediciting biogas yield of various subtrates and the significance of each organic component (carbohydrates, proteins, fats/lipids, and legnin) in biogas production using hybrid VAE-XGboost model. The first study showed seven main metatopics and ascent of "deep learning (DL)" as a prominent methodology led to an increase in intricate subjects, including the optimization of power costs and the prediction of wind patterns. Also, a growing utilization of DL approaches for the analysis of renewable energy data, particularly in the context of wind and solar photovoltaic systems. The research themes and trends observed in the first study signify substantial recent investments in advanced AI learning techniques. The developed Catboost-CNN-LSTM pipeline achived a significant results and presented a superior approach when compared to previous relevant studies by eliminating the requirement for feature engineering, enabling direct prediction of biogas yield without the need for converting it into a classification task. The VAE-XGboost pipeline could ovcercome data limitation in the field and produced significant results. It has shown that the "fats" category is the most influential group on the methane production in biogas plants, however, “proteins” illustrated the lowest impact on biogas production

    Detection of potential misuse in information systems based on temporal graph anomalies

    Get PDF
    U složenom informacijskom sustavu u kojem korisnici imaju različite uloge, putem kojih su im dodijeljene različite ovlasti, moguće su složene zlouporabe pri kojima nitko od korisnika ne prekoračuje svoje ovlasti, no zajedničkim djelovanjem mogu prouzročiti štetu ili steći korist. Ovakav oblik unutarnjih prijetnji sustavima, u kojima organizirano sudjeluje veći broj autoriziranih korisnika koji ne prekoračuju dodijeljene im ovlasti, nije dovoljno istražen. U ovom radu je predložena općenita metoda za pronalazak mogućih zlouporaba sustava neovisno o semantici podataka i poznavanju poslovnih procesa sustava. Metoda se temelji na postojanju povijesti podataka informacijskog sustava. Implementacijom i testiranjem je ocijenjeno da predložena metoda prepoznaje moguće zlouporabe sustava. Predloženi model potpuno vremenski određenog grafa i algoritmi za konverziju relacijskih i vremenskih relacijskih podataka u grafove, pronalazak čestih vremenskih podgrafova i usporedbu vremenskih grafova su iskoristivi za opću namjenu. Znanstveni doprinosi: 1) Algoritam za transformaciju podataka iz relacijskih baza podataka u grafovske baze podataka, s posebnim naglaskom na transformaciju vremenskih relacijskih podataka u potpuno vremenski određene grafove; 2) Algoritam za pronalazak čestih vremenskih podgrafova potpuno vremenski određenog grafa; 3) Algoritam za pronalazak odstupanja od čestih vremenskih podgrafova potpuno vremenski određenog grafa; 4) Metoda za otkrivanje mogućih sigurnosnih prijetnji na osnovu odstupanja od čestih vremenskih podgrafova potpuno vremenski određenog grafaUsers of complex information systems can have various roles, which define their permissions. By acting in a coordinated manner, users can perform complex misuses without overstepping their permissions, and cause damage or gain illegal benefits. This kind of internal threats, where multiple users act coordinately and do not overstep their permissions, is not sufficiently researched. This thesis proposes general method for identification of potential misuses, which is independent of data semantics and business rules familiarity. Method is based on the existence of the information system's relational database audit trail. By implementation and testing it is evaluated that the method recognizes potential misuses. Proposed model of completely-timed graph, relational and temporal relational database to graph conversion algorithms, frequent completely-timed subgraph mining algorithm and completely-timed graph comparison algorithm can be used for general purpose. Scientific contributions: 1) relational database to graph database conversion algorithm, with special emphasis on temporal relational database to completely-timed graph conversion; 2) frequent completely-timed subgraph mining algorithm; 3) frequent completely-timed subgraph anomaly detection algorithm; 4) potential information system misuse detection method based on frequent completely-timed subgraph anomalie

    Detection of potential misuse in information systems based on temporal graph anomalies

    Get PDF
    U složenom informacijskom sustavu u kojem korisnici imaju različite uloge, putem kojih su im dodijeljene različite ovlasti, moguće su složene zlouporabe pri kojima nitko od korisnika ne prekoračuje svoje ovlasti, no zajedničkim djelovanjem mogu prouzročiti štetu ili steći korist. Ovakav oblik unutarnjih prijetnji sustavima, u kojima organizirano sudjeluje veći broj autoriziranih korisnika koji ne prekoračuju dodijeljene im ovlasti, nije dovoljno istražen. U ovom radu je predložena općenita metoda za pronalazak mogućih zlouporaba sustava neovisno o semantici podataka i poznavanju poslovnih procesa sustava. Metoda se temelji na postojanju povijesti podataka informacijskog sustava. Implementacijom i testiranjem je ocijenjeno da predložena metoda prepoznaje moguće zlouporabe sustava. Predloženi model potpuno vremenski određenog grafa i algoritmi za konverziju relacijskih i vremenskih relacijskih podataka u grafove, pronalazak čestih vremenskih podgrafova i usporedbu vremenskih grafova su iskoristivi za opću namjenu. Znanstveni doprinosi: 1) Algoritam za transformaciju podataka iz relacijskih baza podataka u grafovske baze podataka, s posebnim naglaskom na transformaciju vremenskih relacijskih podataka u potpuno vremenski određene grafove; 2) Algoritam za pronalazak čestih vremenskih podgrafova potpuno vremenski određenog grafa; 3) Algoritam za pronalazak odstupanja od čestih vremenskih podgrafova potpuno vremenski određenog grafa; 4) Metoda za otkrivanje mogućih sigurnosnih prijetnji na osnovu odstupanja od čestih vremenskih podgrafova potpuno vremenski određenog grafaUsers of complex information systems can have various roles, which define their permissions. By acting in a coordinated manner, users can perform complex misuses without overstepping their permissions, and cause damage or gain illegal benefits. This kind of internal threats, where multiple users act coordinately and do not overstep their permissions, is not sufficiently researched. This thesis proposes general method for identification of potential misuses, which is independent of data semantics and business rules familiarity. Method is based on the existence of the information system's relational database audit trail. By implementation and testing it is evaluated that the method recognizes potential misuses. Proposed model of completely-timed graph, relational and temporal relational database to graph conversion algorithms, frequent completely-timed subgraph mining algorithm and completely-timed graph comparison algorithm can be used for general purpose. Scientific contributions: 1) relational database to graph database conversion algorithm, with special emphasis on temporal relational database to completely-timed graph conversion; 2) frequent completely-timed subgraph mining algorithm; 3) frequent completely-timed subgraph anomaly detection algorithm; 4) potential information system misuse detection method based on frequent completely-timed subgraph anomalie
    corecore