7 research outputs found

    Data-intensive Systems on Modern Hardware : Leveraging Near-Data Processing to Counter the Growth of Data

    Get PDF
    Over the last decades, a tremendous change toward using information technology in almost every daily routine of our lives can be perceived in our society, entailing an incredible growth of data collected day-by-day on Web, IoT, and AI applications. At the same time, magneto-mechanical HDDs are being replaced by semiconductor storage such as SSDs, equipped with modern Non-Volatile Memories, like Flash, which yield significantly faster access latencies and higher levels of parallelism. Likewise, the execution speed of processing units increased considerably as nowadays server architectures comprise up to multiple hundreds of independently working CPU cores along with a variety of specialized computing co-processors such as GPUs or FPGAs. However, the burden of moving the continuously growing data to the best fitting processing unit is inherently linked to today’s computer architecture that is based on the data-to-code paradigm. In the light of Amdahl's Law, this leads to the conclusion that even with today's powerful processing units, the speedup of systems is limited since the fraction of parallel work is largely I/O-bound. Therefore, throughout this cumulative dissertation, we investigate the paradigm shift toward code-to-data, formally known as Near-Data Processing (NDP), which relieves the contention on the I/O bus by offloading processing to intelligent computational storage devices, where the data is originally located. Firstly, we identified Native Storage Management as the essential foundation for NDP due to its direct control of physical storage management within the database. Upon this, the interface is extended to propagate address mapping information and to invoke NDP functionality on the storage device. As the former can become very large, we introduce Physical Page Pointers as one novel NDP abstraction for self-contained immutable database objects. Secondly, the on-device navigation and interpretation of data are elaborated. Therefore, we introduce cross-layer Parsers and Accessors as another NDP abstraction that can be executed on the heterogeneous processing capabilities of modern computational storage devices. Thereby, the compute placement and resource configuration per NDP request is identified as a major performance criteria. Our experimental evaluation shows an improvement in the execution durations of 1.4x to 2.7x compared to traditional systems. Moreover, we propose a framework for the automatic generation of Parsers and Accessors on FPGAs to ease their application in NDP. Thirdly, we investigate the interplay of NDP and modern workload characteristics like HTAP. Therefore, we present different offloading models and focus on an intervention-free execution. By propagating the Shared State with the latest modifications of the database to the computational storage device, it is able to process data with transactional guarantees. Thus, we achieve to extend the design space of HTAP with NDP by providing a solution that optimizes for performance isolation, data freshness, and the reduction of data transfers. In contrast to traditional systems, we experience no significant drop in performance when an OLAP query is invoked but a steady and 30% faster throughput. Lastly, in-situ result-set management and consumption as well as NDP pipelines are proposed to achieve flexibility in processing data on heterogeneous hardware. As those produce final and intermediary results, we continue investigating their management and identified that an on-device materialization comes at a low cost but enables novel consumption modes and reuse semantics. Thereby, we achieve significant performance improvements of up to 400x by reusing once materialized results multiple times

    Rainfall-runoff modelling and numerical weather prediction for real-time flood forecasting

    Get PDF
    Abstract This thesis focuses on integrating rainfall-runoff modelling with a mesoscale numerical weather prediction (NWP) model to make real-time flood forecasts at the catchment scale. Studies carried out are based on catchments in Southwest England with a main focus on the Brue catchment of an area of 135 km2 and covered by a dense network of 49 rain gauges and a C-band weather radar. The studies are composed of three main parts: Firstly, two data mining issues are investigated to enable a better calibrated rainfall-runoff model for flood forecasting. The Probability Distributed Model (PDM) is chosen which is widely used in the UK. One of the issues is the selection of appropriate data for model calibration regarding the data length and duration. It is found that the information quality of the calibration data is more important than the data length in determining the model performance after calibration. An index named the Information Cost Function (ICF) developed on the discrete wavelet decomposition is found to be efficient in identifying the most appropriate calibration data scenario. Another issue is for the impact of the temporal resolution of the model input data when using the rainfall-runoff model for real-time forecasting. Through case studies and spectral analyses, the optimal choice of the data time interval is found to have a positive relation with the forecast lead time, i.e., the longer is the lead time, the larger should the time interval be. This positive relation is also found to be more obvious in the catchment with a longer concentration time. A hypothetical curve is finally concluded to describe the general impact of data time interval in real-time forecasting. The development of the NWP model together with the weather radar allows rainfall forecasts to be made in high resolutions of time and space. In the second part of studies, numerical experiments for improving the NWP rainfall forecasts are carried out based on the newest generation mesoscale NWP model, the Weather Research & Forecasting (WRF) model. The sensitivity of the WRF performance is firstly investigated for different domain configurations and various storm types regarding the evenness of rainfall distribution in time and space. Meanwhile a two-dimensional verification scheme is developed to quantitatively evaluate the WRF performance in the temporal and spatial dimensions. Following that the WRF model is run in the cycling mode in tandem with the three-dimensional variational assimilation technique for continuous assimilation of the radar reflectivity and traditional surface/ upperair observations. The WRF model has shown its best performance in producing both rainfall simulations and improved rainfall forecasts through data assimilation for the storm events with two dimensional evenness of rainfall distribution; while for highly convective storms with rainfall concentrated in a small area and a short time period, the results are not ideal and much work remains to be done in the future. Finally, the rainfall-runoff model PDM and the rainfall forecasting results from WRF are integrated together with a real-time updating scheme, the Auto-Regressive and Moving Average (ARMA) model to constitute a flood forecasting system. The system is tested to be reliable in the small catchment such as Brue and the use of the NWP rainfall products has shown its advantages for long lead-time forecasting beyond the catchment concentration time. Keywords: rainfall-runoff modelling, numerical weather prediction, flood forecasting, real-time updating, spectral analysis, data assimilation, weather radar.EThOS - Electronic Theses Online ServiceGBUnited Kingdo
    corecore