65 research outputs found

    Graph databases and their application to the Italian Business Register for efficient search of relationships among companies

    Get PDF
    We studied and tested three of the major graph databases, and we compared them with a relational database. We worked on a dataset representing equity participations among companies, and we found out that the strong points of graph databases are: the purposely designed storage techniques; and their query languages. The main performance increments have been obtained when heavy graph situations are queried; for simpler situations and queries, a relational database performs equally wellope

    A design space for RDF data representations

    Get PDF
    RDF triplestores' ability to store and query knowledge bases augmented with semantic annotations has attracted the attention of both research and industry. A multitude of systems offer varying data representation and indexing schemes. However, as recently shown for designing data structures, many design choices are biased by outdated considerations and may not result in the most efficient data representation for a given query workload. To overcome this limitation, we identify a novel three-dimensional design space. Within this design space, we map the trade-offs between different RDF data representations employed as part of an RDF triplestore and identify unexplored solutions. We complement the review with an empirical evaluation of ten standard SPARQL benchmarks to examine the prevalence of these access patterns in synthetic and real query workloads. We find some access patterns, to be both prevalent in the workloads and under-supported by existing triplestores. This shows the capabilities of our model to be used by RDF store designers to reason about different design choices and allow a (possibly artificially intelligent) designer to evaluate the fit between a given system design and a query workload

    Data management in cloud environments: NoSQL and NewSQL data stores

    Get PDF
    : Advances in Web technology and the proliferation of mobile devices and sensors connected to the Internet have resulted in immense processing and storage requirements. Cloud computing has emerged as a paradigm that promises to meet these requirements. This work focuses on the storage aspect of cloud computing, specifically on data management in cloud environments. Traditional relational databases were designed in a different hardware and software era and are facing challenges in meeting the performance and scale requirements of Big Data. NoSQL and NewSQL data stores present themselves as alternatives that can handle huge volume of data. Because of the large number and diversity of existing NoSQL and NewSQL solutions, it is difficult to comprehend the domain and even more challenging to choose an appropriate solution for a specific task. Therefore, this paper reviews NoSQL and NewSQL solutions with the objective of: (1) providing a perspective in the field, (2) providing guidance to practitioners and researchers to choose the appropriate data store, and (3) identifying challenges and opportunities in the field. Specifically, the most prominent solutions are compared focusing on data models, querying, scaling, and security related capabilities. Features driving the ability to scale read requests and write requests, or scaling data storage are investigated, in particular partitioning, replication, consistency, and concurrency control. Furthermore, use cases and scenarios in which NoSQL and NewSQL data stores have been used are discussed and the suitability of various solutions for different sets of applications is examined. Consequently, this study has identified challenges in the field, including the immense diversity and inconsistency of terminologies, limited documentation, sparse comparison and benchmarking criteria, and nonexistence of standardized query languages

    An exploratory study of a NoSQL database for a clinical data repository

    Get PDF
    The need to implement a distributed Clinical Data Repository (CDR) at a healthcare facility, rose in large part due to the high volume of data and the discrepancy of their sources. Over the years, Relational Database Management Systems (RDBMS) began to present difficulties in responding to the needs of various organizations when it comes to manipulating a large amount of data and to its scalability. Therefore, it was necessary to explore other techniques to choose the appropriate technology to build the CDR. In this way, NoSQL emerged as a new type of database that is quite useful to work with multiple and different types of data. In addition, NoSQL introduces a number of user-friendly features such as a distributed, scalable, elastic and also fault tolerant system. In this way, Oracle NoSQL Database was the NoSQL solution chosen to develop this case study, using the key-value storage. This article was motivated to propose a CDR architecture based on Oracle NoSQL Database functionalities. A one-single node database was deployed for better comprehension, in order to enhance their features for future implementation.The work has been supported by FCT – Fundação para a Ciência e Tecnologia within the Project Scope UID/CEC/00319/2019 and DSAIPA/DS/0084/2018

    An automated materials and processes identification tool for material informatics using deep learning approach

    Get PDF
    This article reports a tool that enables Materials Informatics, termed as MatRec, via a deep learning approach. The tool captures data, makes appropriate domain suggestions, extracts various entities such as materials and processes, and helps to establish entity-value relationships. This tool uses keyword extraction, a document similarity index to suggest relevant documents, and a deep learning approach employing Bi-LSTM for entity extraction. For example, materials and processes for electrical charge storage under an electric double layer capacitor (EDLC) mechanism are demonstrated herewith. A knowledge graph approach finds and visualizes different latent knowledge sets from the processed information. The MatRec received an F1 score of 9̃6% for entity extraction, 8̃3% for material-value relationship extraction, and 8̃7% for process-value relationship extraction, respectively. The proposed MatRec could be extended to solve material selection issues for various applications and could be an excellent tool for academia and industry
    corecore