21 research outputs found

    Polystore mathematics of relational algebra

    Get PDF
    Financial transactions, internet search, and data analysis are all placing increasing demands on databases. SQL, NoSQL, and NewSQL databases have been developed to meet these demands and each offers unique benefits. SQL, NoSQL, and NewSQL databases also rely on different underlying mathematical models. Polystores seek to provide a mechanism to allow applications to transparently achieve the benefits of diverse databases while insulating applications from the details of these databases. Integrating the underlying mathematics of these diverse databases can be an important enabler for polystores as it enables effective reasoning across different databases. Associative arrays provide a common approach for the mathematics of polystores by encompassing the mathematics found in different databases: sets (SQL), graphs (NoSQL), and matrices (NewSQL). Prior work presented the SQL relational model in terms of associative arrays and identified key mathematical properties that are preserved within SQL. This work provides the rigorous mathematical definitions, lemmas, and theorems underlying these properties. Specifically, SQL Relational Algebra deals primarily with relations - multisets of tuples - and operations on and between those relations. These relations can be modeled as associative arrays by treating tuples as non-zero rows in an array. Operations in relational algebra are built as compositions of standard operations on associative arrays which mirror their matrix counterparts. These constructions provide insight into how relational algebra can be handled via array operations. As an example application, the composition of two projection operations is shown to also be a projection, and the projection of a union is shown to be equal to the union of the projections

    Polystore mathematics of relational algebra

    Get PDF
    Financial transactions, internet search, and data analysis are all placing increasing demands on databases. SQL, NoSQL, and NewSQL databases have been developed to meet these demands and each offers unique benefits. SQL, NoSQL, and NewSQL databases also rely on different underlying mathematical models. Polystores seek to provide a mechanism to allow applications to transparently achieve the benefits of diverse databases while insulating applications from the details of these databases. Integrating the underlying mathematics of these diverse databases can be an important enabler for polystores as it enables effective reasoning across different databases. Associative arrays provide a common approach for the mathematics of polystores by encompassing the mathematics found in different databases: sets (SQL), graphs (NoSQL), and matrices (NewSQL). Prior work presented the SQL relational model in terms of associative arrays and identified key mathematical properties that are preserved within SQL. This work provides the rigorous mathematical definitions, lemmas, and theorems underlying these properties. Specifically, SQL Relational Algebra deals primarily with relations - multisets of tuples - and operations on and between those relations. These relations can be modeled as associative arrays by treating tuples as non-zero rows in an array. Operations in relational algebra are built as compositions of standard operations on associative arrays which mirror their matrix counterparts. These constructions provide insight into how relational algebra can be handled via array operations. As an example application, the composition of two projection operations is shown to also be a projection, and the projection of a union is shown to be equal to the union of the projections

    Multi-Temporal Analysis and Scaling Relations of 100,000,000,000 Network Packets

    Full text link
    Our society has never been more dependent on computer networks. Effective utilization of networks requires a detailed understanding of the normal background behaviors of network traffic. Large-scale measurements of networks are computationally challenging. Building on prior work in interactive supercomputing and GraphBLAS hypersparse hierarchical traffic matrices, we have developed an efficient method for computing a wide variety of streaming network quantities on diverse time scales. Applying these methods to 100,000,000,000 anonymized source-destination pairs collected at a network gateway reveals many previously unobserved scaling relationships. These observations provide new insights into normal network background traffic that could be used for anomaly detection, AI feature engineering, and testing theoretical models of streaming networks.Comment: 6 pages, 6 figures,3 tables, 49 references, accepted to IEEE HPEC 202

    HIL: designing an exokernel for the data center

    Full text link
    We propose a new Exokernel-like layer to allow mutually untrusting physically deployed services to efficiently share the resources of a data center. We believe that such a layer offers not only efficiency gains, but may also enable new economic models, new applications, and new security-sensitive uses. A prototype (currently in active use) demonstrates that the proposed layer is viable, and can support a variety of existing provisioning tools and use cases.Partial support for this work was provided by the MassTech Collaborative Research Matching Grant Program, National Science Foundation awards 1347525 and 1149232 as well as the several commercial partners of the Massachusetts Open Cloud who may be found at http://www.massopencloud.or

    Genetic Insights Into Latent Autoimmune Diabetes In Adults

    Get PDF
    ‘Latent autoimmune diabetes in adults’ (LADA) is a controversial subtype of diabetes characterized by initial insulin independency and the presence of diabetes associated autoantibodies. As a result, LADA is often misclassified and can represent 5-10% of apparent type 2 diabetes (T2D) cases and is potentially more prevalent than childhood-onset type 1 diabetes (T1D). Despite LADA sharing features with the two better characterized classic diabetes subtypes, the genetic etiology of LADA remains largely unknown. Once there is a more accurate definition of LADA, there will be an improvement in diabetes classification and consequently better treatment and therapeutic interventions. The objective of this thesis is to understand the genetic basis of LADA in order to bring clarity to the current definition of LADA by being the first to leverage genome-wide genotype data from a LADA cohort and the subsequent application of statistical genetics approaches. These investigations can be divided into three parts: 1) the role of T1D and T2D loci in LADA 2) the first genome-wide association study (GWAS) of LADA, and 3) searching for genetic discrepancies between LADA and childhood-onset T1D in the human leukocyte antigen (HLA) region. Four out of the five strongest associations from the candidate locus study were known T1D loci (HLA, PTPN22, INS and SH2B3) and reached genome-wide significance in the GWAS meta-analysis. However, a novel independent signal at a known T1D locus was also observed to be genome-wide significant, near the PFKFB3 gene, which had not been implicated in previous T1D or T2D GWAS. Additionally, major T1D-susceptibility HLA haplotypes were observed to be less frequent in LADA. Furthermore, contrary to observations in childhood-onset T1D studies, HLA-B and HLA-A, were not significantly associated with LADA, independent of HLA-DQB1 and HLA-DRB1 haplotypes. Overall, the genetics of LADA point to a strong T1D component, but a positive genetic correlation between LADA and T2D is also evident, strongly suggesting LADA has both a T1D and T2D component. However, it remains unresolved whether LADA is at the genetic intersection of T1D and T2D or simply a mixture of relatively poorly phenotyped individuals who have either T1D or T2D
    corecore