4 research outputs found
Recommended from our members
Data Management Solutions for Tackling Big Data Variety
Variety is one of the three defining characteristics of Big Data; the others being Volume and Velocity. There are several aspects of this data variety: diversity in data formats (text, video, audio) and structure (relational, graph etc), variety in access methodologies(OLTP, OLAP), and distribution heterogeneity within the workloads (read-heavy, high contention). Data management solutions for modern-day applications need to tackle this variety.This dissertation provides an understanding of the challenges associated with the different elements of variety, and proposes several solutions for efficiently handling its various aspects. First, the dissertation studies the challenges related to variety in data structure and access methodologies, and the resultant heterogeneity at the data infrastructure level. Applications now employ several data-processing engines with different underlying representations, like row, column, graph etc., to process their data. We propose Janus, which introduces a novel data-movement pipeline, which enables the use of different representations to support both high throughput of transactions and diverse analytics, while still ensuring consistent real-time analytics in a scale-out setting. Janus partitions the data at different representations, and allows distributed transactions and diverse partitioning strategies at the representations. Then, we propose Typhon and Cerberus, which define and enforce consistency semantics for application data spread across representations. Second, this dissertation proposes solutions for handling distribution heterogeneity within the workloads. Workloads can have have skewed distribution in terms of operation-type, data access or temporal variation. We propose strongly-consistent quorum reads for Raft-like consensus protocols, which can be utilized to scale read-heavy workloads. For supporting high contention transaction workloads, we integrate an existing dynamic timestamp allocation based concurrency control mechanism in a distributed OLTP setting, and analyze its performance. Third, we study IoT applications, which have to deal with both physical heterogeneity of the sensors, as well asdiverse data-processing demands. We propose a multi-representation based architecture catering to IoT applications, and also present the initial design of M-stream, a computation framework for enabling integration and monitoring of uncertain data from multiplesensors. Through analysis, illustrative examples and extensive evaluation of the proposed protocols, this dissertation demonstrates that the proposed solutions can be employed for efficiently handling the different aspects of variety of data-intensive applications
筑波大学計算科学研究センター 平成30年度 年次報告書
まえがき ...... 21 センター組織と構成員 ...... 42 平成30 年度の活動状況 ...... 83 各研究部門の報告 ...... 15I. 素粒子物理研究部門 ...... 15II. 宇宙物理研究部門 ....... 40III. 原子核物理研究部門 ...... 65IV. 量子物性研究部門 ...... 83V. 生命科学研究部門 ...... 110 V-1. 生命機能情報分野 ...... 110 V-2. 分子進化分野 ...... 125VI. 地球環境研究部門 ...... 140VII. 高性能計算システム研究部門 ...... 155VIII. 計算情報学研究部門 ...... 207 VIII-1. データ基盤分野 ...... 207 VIII-2. 計算メディア分野 ...... 22