33 research outputs found

    Improving the Performance of Big Data Analytics Platforms by Task and I/O Granularity Adjustment

    Get PDF
    Department of Computer Science and EngineeringWith the massive increase in the amount of semi-structured and unstructured web data, big data analytics platforms have emerged and started to evolve rapidly. Apache Hadoop has been developed for batch processing on a large dataset, and systems for interactive and general purpose applications have been developed alongside NoSQL databases. Numerous efforts have been made to improve the performance of Hadoop and NoSQL databases, including utilizing a new device called NVMM for NoSQL databases. Nonetheless, their performance is still far from satisfactory due to inadequate granularity for tasks and I/O. In this dissertation, we present novel techniques to improve the performance of Apache Hadoop and NVMM-based LSM-tree by adjusting task and I/O granularity. First, we analyze YARN container overhead and present dynamic input split size adjustment scheme, which can logically combine multiple HDFS blocks and increase the input size of each container, thereby enabling a single map wave and reducing the number of containers and their initialization overhead. Experimental results shows that we can avoid recurring container overhead by selecting the right size for input splits and reducing the number of containers. Second, we present a novel HDFS block coalescing scheme that mitigates the YARN con tainer overhead. Our assorted block coalescing scheme combines multiple HDFS blocks and creates large input splits of various sizes, reducing the number of containers and their initializa tion overhead. Our experimental study shows the block coalescing scheme significantly reduces the container overhead while it achieves good load balancing and job scheduling fairness without impairing the degree of overlap between map phase and reduce phase. Third, we discuss design choice of using NVMM for indexing structure in NoSQL databases and present ZipperDB, a key-value store that redesigns LSM-tree for byte-addressable persistent memory. To benefit from the byte-addressability of persistent memory, ZipperDB employs byte addressable persistent SkipLists and performs Zipper Compaction, a novel in-place compaction algorithm that merges two adjacent persistent SkipLists without compromising the failure atomicity. The byte-addressable compaction helps mitigate the write amplification problem, which is known to be the root cause of the write stall problem in LSM-tree. Finally, we present ListDB, a write-optimized key-value store for NVMM to overcome the gap between DRAM and NVMM write latencies and thereby, resolve the write stall problem. ListDB consists of three novel techniques: (i) byte-addressable Index-Unified Logging, which incrementally converts write-ahead logs into SkipLists, (ii) Braided SkipList, a simple NUMA aware SkipList that effectively reduces the NUMA effects of NVMM, and (iii) NUMA-aware Zipper Compaction. Using the three techniques, ListDB makes background flush and com paction fast enough to resolve the infamous write stall problem and shows 1.6x and 25x higher write throughputs than PACTree and Intel Pmem-RocksDB, respectively.ope

    Computationally efficient air quality forecasting tool: implementation of STOPS v1.5 model into CMAQ v5.0.2 for a prediction of Asian dust

    Get PDF
    This study suggests a new modeling framework using a hybrid Eulerian-Lagrangian-based modeling tool (the Screening Trajectory Ozone Prediction System, STOPS) for a prediction of an Asian dust event in Korea. The new version of STOPS (v1.5) has been implemented into the Community Multi-scale Air Quality (CMAQ) model version 5.0.2. The STOPS modeling system is a moving nest (Lagrangian approach) between the source and the receptor inside the host Eulerian CMAQ model. The proposed model generates simulation results that are relatively consistent with those of CMAQ but within a comparatively shorter computational time period. We find that standard CMAQ generally underestimates PM10 concentrations during the simulation period (February 2015) and fails to capture PM10 peaks during Asian dust events (22-24 February 2015). The underestimation in PM10 concentration is very likely due to missing dust emissions in CMAQ rather than incorrectly simulated meteorology, as the model meteorology agrees well with the observations. To improve the underestimated PM10 results from CMAQ, we used the STOPS model with constrained PM concentrations based on aerosol optical depth (AOD) data from the Geostationary Ocean Color Imager (GOCI), reflecting real-time initial and boundary conditions of dust particles near the Korean Peninsula. The simulated PM10 from the STOPS simulations were improved significantly and closely matched the surface observations. With additional verification of the capabilities of the methodology on emission estimations and more STOPS simulations for various time periods, the STOPS model could prove to be a useful tool not just for the predictions of Asian dust but also for other unexpected events such as wildfires and oil spillsopen0

    Minimizing Task Initialization Overhead of Hadoop via HDFS Block Coalescing

    No full text
    Department of Computer Science and EngineeringIn this work, we present a novel HDFS block coalescing scheme that mitigates the YARN container overhead. YARN is designed to be a generic resource manager that decouples programming models from the resource management infrastructure. We show that YARN???s generic design incurs significant overhead as each container must perform various initialization steps including the authentication. In order to reduce the container overhead without making significant changes to the existing YARN framework, we propose to leverage the input split, which is the logical representation of physical HDFS blocks. The HDFS block coalescing scheme creates large input splits to enable a single map wave and to reduce the number of containers and their initialization overhead. Our experimental study shows the block coalescing scheme significantly reduces the container overhead while it achieves good load balancing and job scheduling fairness without impairing the degree of overlap between map phase and reduce phase.clos

    Adaptive RTS/CTS-Exchange and Rate Prediction in IEEE 802.11 WLANs

    No full text

    Coalescing HDFS Blocks to Avoid Recurring YARN Container Overhead

    No full text
    Hadoop clusters have been transitioning from a dedicated cluster environment to a shared cluster environment. This trend has resulted in the YARN container abstraction that isolates computing tasks from physical resources. With YARN containers, Hadoop has expanded to support various distributed frameworks. However, it has been reported that Hadoop tasks suffer from a significant overhead of container relaunch. In order to reduce the container overhead without making significant changes to the existing YARN framework, we propose leveraging the input split, which is the logical representation of physical HDFS blocks. Our assorted block coalescing scheme combines multiple HDFS blocks and creates large input splits of various sizes, reducing the number of containers and their initialization overhead. Our experimental study shows the assorted block coalescing scheme reduces the container overhead by a large margin while it achieves good load balance and job scheduling fairness without impairing the degree of overlap between map phase and reduce phase

    2DVD dataset for GMD publication - Simulated prognostic approach of graupel density in a bulk-type cloud microphysics scheme and evaluation during the ICE-POP field campaign

    No full text
    <p>This archive contains the 2DVD measurement of graupel particles used in the GMD paper "Simulated prognostic approach of graupel density in a bulk-type cloud microphysics scheme and evaluation during the ICE-POP field campaign".</p><p>For each identified graupel particle, the following are included:</p><ul><li>Volume-equivalent diameter (mm)</li><li>Density (g cm-3)</li><li>Fall velocity (m s-1)</li></ul&gt

    Anisotropic hyperelastic modeling for face-centered cubic and diamond cubic structures

    No full text
    A new hyperelastic model for a crystal structure with face-centered cubic or diamond cubic system is proposed. The proposed model can be simply embedded into a nonlinear finite element analysis framework and does not require information of the crystal structure. The hyperelastic constitutive relation of the model is expressed as a polynomial-based strain energy density function. Nine strain invariants of the crystal structure are directly used as polynomial bases of the model. The hyperelastic material constants, which are the coefficients of the polynomials, are determined through a numerical simulation using the least square method. In the simulation, the Cauchy-Born rule and interatomic potentials are utilized to calculate reference data under various deformation conditions. As the fitting result, the hyperelastic material constants for silicon, germanium, and six transition metals (Ni, Pd, Pt, Cu, Ag, and Au) are provided. Furthermore, numerical examples are performed using the proposed hyperelastic model

    Asymmetric surface effect on the configuration of bilayer Si/SiGe nanosprings

    No full text
    This study investigates the asymmetric surface effect on nanosprings composed of Si/SiGe bilayer thin films. The misfit strain between Si and SiGe layers is known to be the driving force whereby the deformation into the nanospring shape occurs. The crystalline orientation and width-to-thickness ratio are the main factors that determine the deformed equilibrium configuration. In addition, as the thickness decreases to dozens of nanometers or less, the effect of the surface on the equilibrium configuration of the thin film is magnified. The diamond cubic crystal structure, unlike the face-centered or body-centered cubic structures, has asymmetric surface properties. Owing to the asymmetry, Si/SiGe bilayers with odd numbers of atomic layers have different surface configurations than those with even numbers of atomic layers. Finite element analysis with the surface effect has been performed to investigate the surface effect on the equilibrium configuration. It is observed that both size and surface configuration affect the equilibrium configuration of bilayer Si/SiGe nanosprings. An unexpected spring shape was observed when the film aligned in the < 100 > direction, which is unlikely if the surface effect is neglected
    corecore