272 research outputs found

    A Survey on Graph Database Management Techniques for Huge Unstructured Data

    Get PDF
    Data analysis, data management, and big data play a major role in both social and business perspective, in the last decade. Nowadays, the graph database is the hottest and trending research topic. A graph database is preferred to deal with the dynamic and complex relationships in connected data and offer better results. Every data element is represented as a node. For example, in social media site, a person is represented as a node, and its properties name, age, likes, and dislikes, etc and the nodes are connected with the relationships via edges. Use of graph database is expected to be beneficial in business, and social networking sites that generate huge unstructured data as that Big Data requires proper and efficient computational techniques to handle with. This paper reviews the existing graph data computational techniques and the research work, to offer the future research line up in graph database management

    Comparison of Graph Databases and Relational Databases When Handling Large-Scale Social Data

    Get PDF
    Over the past few years, with the rapid development of mobile technology, more people use mobile social applications, such as Facebook, Twitter and Weibo, in their daily lives, and there is an increasing amount of social data. Thus, finding a suitable storage approach to store and process the social data, especially for the large-scale social data, should be important for the social network companies. Traditionally, a relational database, which represents data in terms of tables, is widely used in the legacy applications. However, a graph database, which is a kind of NoSQL databases, is in a rapid development to handle the growing amount of unstructured or semi-structured data. The two kinds of storage approaches have their own advantages. For example, a relational database should be a more mature storage approach, and a graph database can handle graph-like data in an easier way. In this research, a comparison of capabilities for storing and processing large-scale social data between relational databases and graph databases is applied. Two kinds of analysis, the quantitative research analysis of storage cost and executing time and the qualitative analysis of five criteria, including maturity, ease of programming, flexibility, security and data visualization, are taken into the comparison to evaluate the performance of relational databases and graph databases when handling large-scale social data. Also, a simple mobile social application is developed for experiments. The comparison is used to figure out which kind of database is more suitable for handling large-scale social data, and it can compare more graph database models with real-world social data sets in the future research

    Data management in cloud environments: NoSQL and NewSQL data stores

    Get PDF
    : Advances in Web technology and the proliferation of mobile devices and sensors connected to the Internet have resulted in immense processing and storage requirements. Cloud computing has emerged as a paradigm that promises to meet these requirements. This work focuses on the storage aspect of cloud computing, specifically on data management in cloud environments. Traditional relational databases were designed in a different hardware and software era and are facing challenges in meeting the performance and scale requirements of Big Data. NoSQL and NewSQL data stores present themselves as alternatives that can handle huge volume of data. Because of the large number and diversity of existing NoSQL and NewSQL solutions, it is difficult to comprehend the domain and even more challenging to choose an appropriate solution for a specific task. Therefore, this paper reviews NoSQL and NewSQL solutions with the objective of: (1) providing a perspective in the field, (2) providing guidance to practitioners and researchers to choose the appropriate data store, and (3) identifying challenges and opportunities in the field. Specifically, the most prominent solutions are compared focusing on data models, querying, scaling, and security related capabilities. Features driving the ability to scale read requests and write requests, or scaling data storage are investigated, in particular partitioning, replication, consistency, and concurrency control. Furthermore, use cases and scenarios in which NoSQL and NewSQL data stores have been used are discussed and the suitability of various solutions for different sets of applications is examined. Consequently, this study has identified challenges in the field, including the immense diversity and inconsistency of terminologies, limited documentation, sparse comparison and benchmarking criteria, and nonexistence of standardized query languages

    The Future is Big Graphs! A Community View on Graph Processing Systems

    Get PDF
    Graphs are by nature unifying abstractions that can leverage interconnectedness to represent, explore, predict, and explain real- and digital-world phenomena. Although real users and consumers of graph instances and graph workloads understand these abstractions, future problems will require new abstractions and systems. What needs to happen in the next decade for big graph processing to continue to succeed?Comment: 12 pages, 3 figures, collaboration between the large-scale systems and data management communities, work started at the Dagstuhl Seminar 19491 on Big Graph Processing Systems, to be published in the Communications of the AC

    Graphical Database Architecture For Clinical Trials

    Get PDF
    The general area of the research is Health Informatics. The research focuses on creating an innovative and novel solution to manage and analyze clinical trials data. It constructs a Graphical Database Architecture (GDA) for Clinical Trials (CT) using New Technology for Java (Neo4j) as a robust, a scalable and a high-performance database. The purpose of the research project is to develop concepts and techniques based on architecture to accelerate the processing time of clinical data navigation at lower cost. The research design uses a positivist approach to empirical research. The research is significant because it proposes a new approach of clinical trials through graph theory and designs a responsive structure of clinical data that can be deployed across all the health informatics landscape. It uniquely contributes to scholarly literature of the phenomena of Not only SQL (NoSQL) graph databases, mainly Neo4j in CT, for future research of clinical informatics. A prototype is created and examined to validate the concepts, taking advantage of Neo4j’s high availability, scalability, and powerful graph query language (Cypher). This research study finds that integration of search methodologies and information retrieval with the graphical database provides a solid starting point to manage, query, and analyze the clinical trials data, furthermore the design and the development of a prototype demonstrate the conceptual model of this study. Likewise the proposed clinical trials ontology (CTO) incorporates all data elements of a standard clinical study which facilitate a heuristic overview of treatments, interventions, and outcome results of these studies

    GLL-based Context-Free Path Querying for Neo4j

    Full text link
    We propose GLL-based context-free path querying algorithm which handles queries in Extended Backus-Naur Form (EBNF) using Recursive State Machines (RSM). Utilization of EBNF allows one to combine traditional regular expressions and mutually recursive patterns in constraints natively. The proposed algorithm solves both the reachability-only and the all-paths problems for the all-pairs and the multiple sources cases. The evaluation on realworld graphs demonstrates that utilization of RSMs increases performance of query evaluation. Being implemented as a stored procedure for Neo4j, our solution demonstrates better performance than a similar solution for RedisGraph. Performance of our solution of regular path queries is comparable with performance of native Neo4j solution, and in some cases our solution requires significantly less memory

    Multi-threaded execution of Cypher queries

    Get PDF
    In this report we investigate parallel execution of queries in graph databases. We analyse different methods of parallelization, how to introduce query parallelization to a graph database, which query operations that are suitable for parallelization and if we can improve the execution time of a single query. We do this by designing and implementing a parallel runtime for the Cypher query language in the graph database Neo4j, but many of the design ideas and operators investigated are applicable to any graph database. We focus on increasing performance for a select few operators, while still being fully integrated with Neo4j. We take much inspiration from a design called morsel-driven parallelism. This means that we strive to split the workload into many small pieces, “morsels”, and then hand these morsels to the threads executing the query. This is in contrast to a more classical parallelization approach, where you split the workload into a few big parts of equal size. We conclude that the operators best suited for parallelization are the operators that can be split into several smaller parts, where each part can be computed independently. We successfully introduce parallel execution of Cypher queries to Neo4j and by doing so we increase the performance of a single query by up to 15 times under certain conditionsGrafdatabaser blir allt vanligare, samtidigt som antalet processorer i moderna datorer ökar mer och mer. Vi tittar i detta arbete på hur parallelliserad sökning kan leda till prestandavinster i den populära grafdatabashanteraren Neo4j. För att ta reda på om det går att parallellisera en enskild sökning i en grafdatabas och hur stor påverkan detta då har på svarstider, skapade vi vår egen modifierade version av Neo4j. Vi började med att ta reda på vilka delar av mjukvaran som bäst lämpade sig för parallellisering, med hänsyn till hur ofta de förekom i sökningar samt hur pass stora krav de ställde på processorn. Efter att ha valt ut ett antal av dessa så gick vi vidare med att ta fram metoder för att dela upp dem i mindre uppgifter som kunde köras i olika delar av processorn samtidigt, för att slutligen införa dessa ändringar i Neo4j. Resultatet är en version av Neo4j som under rätt förhållanden ger upp till 15 gånger snabbare svar på enskilda sökningar

    EpiGeNet : A graph database of interdependencies between genetic and epigenetic events in colorectal cancer

    Get PDF
    The development of colorectal cancer (CRC)—the third most common cancer type—has been associated with deregulations of cellular mechanisms stimulated by both genetic and epigenetic events. StatEpigen is a manually curated and annotated database, containing information on interdependencies between genetic and epigenetic signals, and specialized currently for CRC research. Although StatEpigen provides a well-developed graphical user interface for information retrieval, advanced queries involving associations between multiple concepts can benefit from more detailed graph representation of the integrated data. This can be achieved by using a graph database (NoSQL) approach. Data were extracted from StatEpigen and imported to our newly developed EpiGeNet, a graph database for storage and querying of conditional relationships between molecular (genetic and epigenetic) events observed at different stages of colorectal oncogenesis. We illustrate the enhanced capability of EpiGeNet for exploration of different queries related to colorectal tumor progression; specifically, we demonstrate the query process for (i) stage-specific molecular events, (ii) most frequently observed genetic and epigenetic interdependencies in colon adenoma, and (iii) paths connecting key genes reported in CRC and associated events. The EpiGeNet framework offers improved capability for management and visualization of data on molecular events specific to CRC initiation and progression