18 research outputs found

    NoXperanto: Crowdsourced Polyglot Persistence

    No full text
    This paper proposes NoXperanto , a novel crowdsourcing approach to address querying over data collections managed by polyglot persistence settings. The main contribution of NoXperanto is the ability to solve complex queries involving different data stores by exploiting queries from expert users (i.e. a crowd of database administrators, data engineers, domain experts, etc.), assuming that these users can submit meaningful queries. NoXperanto exploits the results of meaningful queries in order to facilitate the forthcoming query answering processes. In particular, queries results are used to: (i) help non-expert users in using the multi- database environment and (ii) improve performances of the multi-database environment, which not only uses disk and memory resources, but heavily rely on network bandwidth. NoXperanto employs a layer to keep track of the information produced by the crowd modeled as a Property Graph and managed in a Graph Database Management System (GDBMS)

    Evaluating The Semantic Mapping

    Get PDF
    Along the increasing of the importance of links in the network of data, they should be considered more in the mapping relational to graph model. Semantic abstraction gaps often occur during the mapping process where the link in the real world is mapped as a node in a graph model. This paper focused on evaluating the result of mapping and converting without losing the semantics. We propose the evaluation of our approach by using schema.org as the semantic standard. The experiments in three data sets show that the semantic mapping approach is pretty effective. We obtain quite good score matching without considering the gap index (the average is 0.6922) and with considering the gap index (the average is 0.5264) and the average precision score, 0.7042, is pretty good too

    Graph4Med: a web application and a graph database for visualizing and analyzing medical databases

    Get PDF
    Background: Medical databases normally contain large amounts of data in a variety of forms. Although they grant significant insights into diagnosis and treatment, implementing data exploration into current medical databases is challenging since these are often based on a relational schema and cannot be used to easily extract information for cohort analysis and visualization. As a consequence, valuable information regarding cohort distribution or patient similarity may be missed. With the rapid advancement of biomedical technologies, new forms of data from methods such as Next Generation Sequencing (NGS) or chromosome microarray (array CGH) are constantly being generated; hence it can be expected that the amount and complexity of medical data will rise and bring relational database systems to a limit. Description: We present Graph4Med, a web application that relies on a graph database obtained by transforming a relational database. Graph4Med provides a straightforward visualization and analysis of a selected patient cohort. Our use case is a database of pediatric Acute Lymphoblastic Leukemia (ALL). Along routine patients’ health records it also contains results of latest technologies such as NGS data. We developed a suitable graph data schema to convert the relational data into a graph data structure and store it in Neo4j. We used NeoDash to build a dashboard for querying and displaying patients’ cohort analysis. This way our tool (1) quickly displays the overview of patients’ cohort information such as distributions of gender, age, mutations (fusions), diagnosis; (2) provides mutation (fusion) based similarity search and display in a maneuverable graph; (3) generates an interactive graph of any selected patient and facilitates the identification of interesting patterns among patients. Conclusion: We demonstrate the feasibility and advantages of a graph database for storing and querying medical databases. Our dashboard allows a fast and interactive analysis and visualization of complex medical data. It is especially useful for patients similarity search based on mutations (fusions), of which vast amounts of data have been generated by NGS in recent years. It can discover relationships and patterns in patients cohorts that are normally hard to grasp. Expanding Graph4Med to more medical databases will bring novel insights into diagnostic and research

    Visual Exploration System for Analyzing Trends in Annual Recruitment Using Time-varying Graphs

    Full text link
    Annual recruitment data of new graduates are manually analyzed by human resources specialists (HR) in industries, which signifies the need to evaluate the recruitment strategy of HR specialists. Every year, different applicants send in job applications to companies. The relationships between applicants' attributes (e.g., English skill or academic credential) can be used to analyze the changes in recruitment trends across multiple years' data. However, most attributes are unnormalized and thus require thorough preprocessing. Such unnormalized data hinder the effective comparison of the relationship between applicants in the early stage of data analysis. Thus, a visual exploration system is highly needed to gain insight from the overview of the relationship between applicants across multiple years. In this study, we propose the Polarizing Attributes for Network Analysis of Correlation on Entities Association (Panacea) visualization system. The proposed system integrates a time-varying graph model and dynamic graph visualization for heterogeneous tabular data. Using this system, human resource specialists can interactively inspect the relationships between two attributes of prospective employees across multiple years. Further, we demonstrate the usability of Panacea with representative examples for finding hidden trends in real-world datasets and then describe HR specialists' feedback obtained throughout Panacea's development. The proposed Panacea system enables HR specialists to visually explore the annual recruitment of new graduates

    Modelagem em grafos a partir de bancos de dados relacionais

    Get PDF
    A importância de trazer dados do modelo relacional para outros modelos e tecnologias tem sido amplamente debatidos, como por exemplo a publicação de dados como grafos. Este modelo permite executar análises topológicas, tal como ocorre nas análises de redes sociais, predição de ligações e sistemas de recomendação. Existem iniciativas para mapear de um banco de dados relacional para a representação em grafo. No entanto, eles não consideram as diferentes maneiras de gerar esses grafos, especialmente quando o objetivo é realizar análises topológicas. Este trabalho propõe heurísticas para facilitar a sistematização do mapeamento de dados do modelo relacional para a representação em grafos. A principal contribuição é que a escolha do modelo do grafo deve considerar o tipo de análise topológica que será realizada pelo usuário. Experimentos são apresentados e mostram resultados interessantes, incluindo heurísticas para apoiar o usuário na escolha da modelagem do grafo.Sociedad Argentina de Informática e Investigación Operativa (SADIO

    Konverzija relacijskih u grafovske baze podataka orijentirana na svojstva

    Get PDF
    Analysis of data stored in a graph enables the discovery of certain information that could be hard to see if the data were stored using some other model (e.g. relational). However, the vast majority of data in information systems today is stored in relational databases, which dominate the data management field over the last decades. In spite of the rise of NoSQL technologies, the development of new information systems is still mostly based on relational databases. Given the increasing awareness about the benefits of data analysis as well as current research interest in graph mining techniques, we aim to enable the usage of those techniques on relational data. In that regard, we propose a universal relational-to-graph data conversion algorithm which can be used in preparation of data to perform a graph mining analysis. Our approach leverages the property graph model which is mainly used by the graph databases, while maintaining the level of relational data clarity.Analiza podataka u formatu grafa omogućava pronalazak određenih informacija koje može biti vrlo teško vidjeti ako su podaci u nekom drugom formatu (npr. relacijskom). Ipak, velika većina podataka koji su danas dio informacijskih sustava pohranjena je upravo u relacijskim bazama podataka koje dominiraju tržištem u posljednjih nekoliko desetljeća. I dalje se razvoj novih informacijskih sustava uglavnom zasniva na relacijskim bazama podataka. Kako je sve veća svjesnost o vrijednosti analize podataka, kao i aktualni interes istraživanja u području tehnika dubinske analize grafova, naš je cilj omogućiti korištenje tih tehnika nad relacijskim podacima. U tom smislu, predlažemo univerzalni algoritam konverzije podataka iz relacijskog modela u graf, koji se može koristiti u pripremi podataka za izvođenje dubinske analize grafova. Naš pristup maksimalno iskorištava model grafa sa svojstvima koji je u širokoj uporabi u aktualnim grafovskim bazama podataka, u isto vrijeme zadržavajući razinu jasnoće relacijskih podataka

    Rancang Bangun Aplikasi Visualisasi Database SQL Server dengan Dynamic Management View Berbasis Graph NEO4J untuk Memetakan Relasi Implisit pada Database

    Get PDF
    Ketersediaan dokumentasi teknologi informasi (TI) adalah kunci sukses pengelolaan TI pada perusahaan. Dokumentasi yang layak sangat berguna agar proses maintenance dan learning lebih mudah serta mebantu knowlegde sharing sebuah organisasi. Kenyataannya dokumentasi TI adalah hal yang sering diabaikan. Microsoft SQL Server dalam 5 tahun terakhir berhasil bertahan sebagai salah satu database terpopuler. Sikap abai perusahaan pada pendokumentasian SQL Server membuat pemahaman terhadap sistem database menjadi tacit knowledge yang terkesan ekslusif untuk orang-orang senior di perusahaan. Hal ini berdampak pada sulitnya orang baru mempelajari sistem database yang sudah ada sehingga memakan waktu yang lebih lama. Ketiadaan dokumentasi SQL sebenarnya bisa diatasi dengan tool visualisasi database semisal DBVisualizer. Sayangnya dengan semua kelengkapan fitur dan dukungannya, DBVisualizer tidak mampu melakukan penelusuran pada relasi yang sifatnya implisit. Seringkali rancangan relasi antar tabel SQL tidak memiliki foreign key sehingga program semisal DBVisualizer kesulitan melakukan penelusuran. Karenanya penulis menawarkan solusi untuk memvisualkan database SQL ke dalam bentuk graph visual dengan memanfaatkan salah satu fitur SQL Server yaitu Dynamic Management View. Tidak berhenti sampai di situ, sistem graph juga diterapkan sebagai basis visualisasi agar informasi yang disajikan lengkap dan akurat namun tidak kehilangan simplicity dan kemudahannya. ================================================================================================================== The availability of information technology (IT) documentation is the key to successful IT management in companies. Worthy documentation is very useful for the process of maintenance and learning easier and mebantu knowlegde sharing of an organization. In fact, IT documentation is something that is often ignored. In the last 5 years, Microsoft SQL Server managed to survive as one of the most popular database platform. The company's ignorant attitude towards documenting SQL Server makes understanding of database systems become tacit knowledge that impressed exclusively for senior in the company. This makes new employee’s learning process become harder and takes a longer time. The absence of SQL documentation can actually be solved with database visualization tools such as DBVisualizer. Unfortunately with all the excellence of its features and support, DBVisualizer is incapable to map implicit relationships. In many cases, the relationship between SQL tables does not have foreign keys so that programs such as DBVisualizer have difficulty to perform mapping process. Therefore the author offers a solution to visualize the SQL database into the form of visual graph by utilizing one of the features of SQL Server that is Dynamic Management View. The graph DBMS is also applied as a base platform for the information visualization so it is not only complete and accurate but also simple and elegant

    Systems for Graph Extraction from Tabular Data

    Get PDF
    Connections amongst real-world entities provide significant insights for numerous real-life applications in social networks, semantic web, road maps, finance, among others. Graphs are perhaps the most natural way to model such connections in application data. However, in many enterprises, an application data is still primarily stored in an RDBMS in a tabular format and users extract graphs out of an RDBMS and store them in specialized graph processing systems. As a result, many users face two major challenges before conducting any graph analysis. First, extracting graphs from an RDBMS requires building an ETL pipeline, which can require a significant amount of time. Second, keeping the extracted graph in the graph processing system, such as a graph database management system (GDBMS), in sync with the original data in the RDBMS requires developing additional non-trivial synchronization code. In this thesis, we study and address these two challenges and present two software systems, GraphWrangler and R2GSync, that we have developed to solve these challenges. GraphWrangler is an interactive system that streamlines the ETL pipeline. Users connect to an RDBMS using GraphWrangler and with several simple interactions, such as dragging and dropping of rows and columns and drawing edges on the screen, they describe table-to-graph mappings. This way, users can describe the graphs they would like to extract without writing any custom scripts. In addition, GraphWrangler allows user to immediately visualize their tables in the form of a graph. Our second system, R2GSync, uses the mappings of an extracted graph and maintains a consistent, i.e., in sync, copy of this graph in a GDBMS as updates happen to the original RDBMS from which the graph was extracted. Querying the extracted graph inside the GDBMS requires a new querying functionality inside the GDBMS that we call edge views. We describe our implementation of edge views and several optimizations to make queries that contain edge views more efficient

    Mapeamento, conversão e migração automática de bancos de dados relacionais para orientados a grafos

    Get PDF
    Relational Databases are the most used models in several applications in reason of the ease of use in its language of consultation and use in environments multi-users. With the great volume of information that we have today and, being that these are increasingly related, databases oriented graphs as a way to deal with this new demand, given the difficulties of the model relational to this new scenario. In view of this, this research dealt with the mapping processes, conversion and migration from the relational model to the graph-oriented model, above all, the semantic overload of constructors between the two models. The purpose of this study was the development of an application, called ThrusterDB, that performs this conversion process from the relational model to the graph-oriented one automatically. The research contributes by integrating the mapping, conversion and migration phases from a relational database to a graph-oriented one. This dissertation presents results that show that the generated database, after the process, provides a better performance in the average time of consultations carried out, in addition to preserving the semantics from the source relational database, without any loss or redundancy of Dice.Agência 1Bancos de Dados Relacionais são os modelos mais utilizados em diversas aplicações em razão da facilidade existente em sua linguagem de consulta e utilização em ambientes multi-usuários. Com o grande volume de informação que se tem nos dias de hoje e, sendo que estes encontram-se cada vez mais relacionadas, surgem os bancos de dados orientados a grafos como forma de lidar com esta nova demanda, frente às dificuldades do modelo relacional a este novo cenário. Diante disto, esta pesquisa tratou dos processos de mapeamento, conversão e migração do modelo relacional para o orientado a grafos, tratando, sobretudo, a sobrecarga semântica de construtores entre os dois modelos. O objetivo deste estudo foi o desenvolvimento de uma aplicação, denominada ThrusterDB, que realiza esse processo de conversão do modelo relacional para o orientado a grafos de forma automática. A pesquisa traz contribuição ao integrar as fases de mapeamento, conversão e migração automática de um banco de dados relacional para um orientado a grafos. Esta dissertação apresenta resultados que mostram que o banco de dados gerado, após o processo, provê um desempenho melhor no tempo médio de consultas realizadas, além de preservar a semântica do banco de dados relacional de origem, sem qualquer perda ou redundância de dados

    Graph Data Processing and Analysis: From Algorithms to System Development

    Full text link
    There are many real-world application domains where data can be naturally modelled as graphs, such as social networks and computer networks. The amount of data generated and published is rapidly increasing with the explosion of information. Effective storage of graph data and querying has become a significant challenge; hence the graph database is emerging to address this challenge. Graph databases have the unique advantages of modelling and querying complex relationships, capturing and navigating complex data relationships and recursive path querying when handling graph data. In this thesis, we enhance graph databases from both system and algorithm perspectives. Firstly, we propose two systems, SQL2Cypher and FSPS, to improve the usability and efficiency of graph databases. SQL2Cypher automatically migrates data from a relational database to a graph database. This system also supports translating SQL queries into Cypher queries. FSPS is the first FPGA-based system for accelerating graph queries on massive graphs. FSPS has the following features 1) a CPU-FPGA co-designed framework, 2) a fully pipelined FPGA execution, and 3) reduced data transfer from FPGA’s external memory. FSPS supports the two most fundamental types of graph queries, namely subgraph and path queries. Performance evaluation shows that FSPS outperforms the most popular graph database, Neo4j, by up to three orders of magnitude. All the draft demo videos can be found at https://www.youtube.com/watch?v=oSpHtJ8iVio and https://www.youtube.com/watch?v=eGaeBrVTJws. Secondly, the graph database does not widely support the cohesive subgraph models (i.e., Neo4j and PatMat). Many real-world relationships can be naturally represented as bipartite graphs such as customer-product, user-item, and author-paper. Therefore, we use efficient construct algorithms to investigate the bipartite hierarchy model. The bipartite hierarchy is the first model to discover the hierarchical structure of bipartite graphs based on the concept of (alpha, beta)-core and graph connectivity. These algorithms can effectively identify the affected regions to limit computation scope and avoid re-building the bipartite hierarchy from scratch. Extensive experiments on 10 real-world graphs demonstrate the effectiveness of the proposed bipartite hierarchy and validate the efficiency of our hierarchy constructions algorithms
    corecore