Search CORE

1,096 research outputs found

A parametric prototype for spatiotemporal databases

Author: Noh Seo-Young
Publication venue: Iowa State University Digital Repository
Publication date: 05/04/2004
Field of study

The main goal of this project is to design and implement the parametric database (ParaDB). Conceptually, ParaDB consists of the parametric data model (ParaDM) and the parametric structured query language (ParaSQL). Parametric data model is a data model for multi-dimensional databases such as temporal, spatial, spatiotemporal, or multi-level secure databases. Main difference compared to the classical relational data model is that ParaDM models an object as a single tuple, and an attribute is defined as a function from parametric elements. The set of parametric elements is closed under union, intersection, and complementation. These operations are counterparts of or, and, and not in a natural language like English. Therefore, the closure properties provide very flexible ways to query on objects without introducing additional self-join operations which are frequently required in other multi-dimensional database models

Digital Repository @ Iowa State University (ISU)

‘Enhanced Encryption and Fine-Grained Authorization for Database Systems

Author: Rjaibi Walid
Publication venue
Publication date: 01/01/2020
Field of study

The aim of this research is to enhance fine-grained authorization and encryption so that database systems are equipped with the controls necessary to help enterprises adhere to zero-trust security more effectively. For fine-grained authorization, this thesis has extended database systems with three new concepts: Row permissions, column masks and trusted contexts. Row permissions and column masks provide data-centric security so the security policy cannot be bypassed as with database views, for example. They also coexist in harmony with the rest of the database core tenets so that enterprises are not forced to compromise neither security nor database functionality. Trusted contexts provide applications in multitiered environments with a secure and controlled manner to propagate user identities to the database and therefore enable such applications to delegate the security policy to the database system where it is enforced more effectively. Trusted contexts also protect against application bypass so the application credentials cannot be abused to make database changes outside the scope of the application’s business logic. For encryption, this thesis has introduced a holistic database encryption solution to address the limitations of traditional database encryption methods. It too coexists in harmony with the rest of the database core tenets so that enterprises are not forced to choose between security and performance as with column encryption, for example. Lastly, row permissions, column masks, trusted contexts and holistic database encryption have all been implemented IBM DB2, where they are relied upon by thousands of organizations from around the world to protect critical data and adhere to zero-trust security more effectively

E-space: Manchester Metropolitan University's Research Repository

On the Evaluation of RDF Distribution Algorithms Implemented over Apache Spark

Author: Amann Bernd
Baazizi Mohamed-Amine
Curé Olivier
Naacke Hubert
Publication venue
Publication date: 08/07/2015
Field of study

Querying very large RDF data sets in an efficient manner requires a sophisticated distribution strategy. Several innovative solutions have recently been proposed for optimizing data distribution with predefined query workloads. This paper presents an in-depth analysis and experimental comparison of five representative and complementary distribution approaches. For achieving fair experimental results, we are using Apache Spark as a common parallel computing framework by rewriting the concerned algorithms using the Spark API. Spark provides guarantees in terms of fault tolerance, high availability and scalability which are essential in such systems. Our different implementations aim to highlight the fundamental implementation-independent characteristics of each approach in terms of data preparation, load balancing, data replication and to some extent to query answering cost and performance. The presented measures are obtained by testing each system on one synthetic and one real-world data set over query workloads with differing characteristics and different partitioning constraints.Comment: 16 pages, 3 figure

arXiv.org e-Print Archive

HAL Descartes

Hal-Diderot

HAL-Ecole des Ponts ParisTech

HAL - UPEC / UPEM

Analytical Queries: A Comprehensive Survey

Author: Kurapov Petr
Melik-Adamyan Areg
Publication venue
Publication date: 27/11/2023
Field of study

Modern hardware heterogeneity brings efficiency and performance opportunities for analytical query processing. In the presence of continuous data volume and complexity growth, bridging the gap between recent hardware advancements and the data processing tools ecosystem is paramount for improving the speed of ETL and model development. In this paper, we present a comprehensive overview of existing analytical query processing approaches as well as the use and design of systems that use heterogeneous hardware for the task. We then analyze state-of-the-art solutions and identify missing pieces. The last two chapters discuss the identified problems and present our view on how the ecosystem should evolve

arXiv.org e-Print Archive

Tools for educational data mining:A review

Author: Baker R. S.
Baker R. S. J. d.
Berry M. J.
Dragan Gasevic
Grun B.
Reye J.
Ryan S. Baker
Srećko Joksimović
Stefan Slater
Tukey J. W.
Verbeek H. M. W.
Vitomir Kovanovic
Witten I. H.
Publication venue: 'American Educational Research Association (AERA)'
Publication date: 24/09/2016
Field of study

Crossref

Edinburgh Research Explorer

A Survey of Distributed Data Stream Processing Frameworks

Author: Abughofa Tariq
Ajerla Dharmitha
Isah Haruna
Khan Shahzad
Mahfuz Sazia
Zulkernine Farhana
Publication venue: SOURCE: Sheridan Institutional Repository
Publication date: 11/10/2019
Field of study

Big data processing systems are evolving to be more stream oriented where each data record is processed as it arrives by distributed and low-latency computational frameworks on a continuous basis. As the stream processing technology matures and more organizations invest in digital transformations, new applications of stream analytics will be identified and implemented across a wide spectrum of industries. One of the challenges in developing a streaming analytics infrastructure is the difficulty in selecting the right stream processing framework for the different use cases. With a view to addressing this issue, in this paper we present a taxonomy, a comparative study of distributed data stream processing and analytics frameworks, and a critical review of representative open source (Storm, Spark Streaming, Flink, Kafka Streams) and commercial (IBM Streams) distributed data stream processing frameworks. The study also reports our ongoing study on a multilevel streaming analytics architecture that can serve as a guide for organizations and individuals planning to implement a real-time data stream processing and analytics framework

SOURCE: Sheridan Scholarly Output Undergraduate Research Creative Excellence

An Educator’s Perspective of the Tidyverse

Author: Baumer Benjamin
Hardin Johanna
Horton Nicholas J.
McNamara Amelia
Rundel Colin W.
Çetinkaya-Rundel Mine
Publication venue: Smith ScholarWorks
Publication date: 23/04/2022
Field of study

Computing makes up a large and growing component of data science and statistics courses. Many of those courses, especially when taught by faculty who are statisticians by training, teach R as the programming language. A number of instructors have opted to build much of their teaching around use of the tidyverse. The tidyverse, in the words of its developers, “is a collection of R packages that share a high-level design philosophy and low-level grammar and data structures, so that learning one package makes it easier to learn the next” (Wickham et al. 2019). These shared principles have led to the widespread adoption of the tidyverse ecosystem. A large part of this usage is because the tidyverse tools have been intentionally designed to ease the learning process and make it easier for users to learn new functions as they engage with additional pieces of the larger ecosystem. Moreover, the functionality offered by the packages within the tidyverse spans the entire data science cycle, which includes data import, visualisation, wrangling, modeling, and communication. We believe the tidyverse provides an effective and efficient pathway for undergraduate students at all levels and majors to gain computational skills and thinking needed throughout the data science cycle. In this paper, we introduce the tidyverse from an educator’s perspective. We provide a brief introduction to the tidyverse, demonstrate how foundational statistics and data science tasks are accomplished with the tidyverse, and discuss the strengths of the tidyverse, particularly in the context of teaching and learning

Smith College: Smith ScholarWorks