53 research outputs found

    Utilizing query logs for data replication and placement in big data applications

    Get PDF
    Ankara : The Department of Computer Engineering and the Graduate School of Engineering and Science of Bilkent University, 2012.Thesis (Ph. D.) -- Bilkent University, 2012.Includes bibliographical refences.The growth in the amount of data in todays computing problems and the level of parallelism dictated by the large-scale computing economics necessitates highlevel parallelism for many applications. This parallelism is generally achieved via data-parallel solutions that require effective data clustering (partitioning) or declustering schemes (depending on the application requirements). In addition to data partitioning/declustering, data replication, which is used for data availability and increased performance, has also become an inherent feature of many applications. The data partitioning/declustering and data replication problems are generally addressed separately. This thesis is centered around the idea of performing data replication and data partitioning/declustering simultenously to obtain replicated data distributions that yield better parallelism. To this end, we utilize query-logs to propose replicated data distribution solutions and extend the well known Fiduccia-Mattheyses (FM) iterative improvement algorithm so that it can be used to generate replicated partitioning/declustering of data. For the replicated declustering problem, we propose a novel replicated declustering scheme that utilizes query logs to improve the performance of a parallel database system. We also extend our replicated declustering scheme and propose a novel replicated re-declustering scheme such that in the face of drastic query pattern changes or server additions/removals from the parallel database system, new declustering solutions that require low migration overheads can be computed. For the replicated partitioning problem, we show how to utilize an effective single-phase replicated partitioning solution in two well-known applications (keyword-based search and Twitter). For these applications, we provide the algorithmic solutions we had to devise for solving the problems that replication brings, the engineering decisions we made so as to obtain the greatest benefits from the proposed data distribution, and the implementation details for realistic systems. Obtained results indicate that utilizing query-logs and performing replication and partitioning/declustering in a single phase improves parallel performance.Türk, AtaPh.D

    Scalable Storage for Digital Libraries

    Get PDF
    I propose a storage system optimised for digital libraries. Its key features are its heterogeneous scalability; its integration and exploitation of rich semantic metadata associated with digital objects; its use of a name space; and its aggressive performance optimisation in the digital library domain

    The Architecture of an Autonomic, Resource-Aware, Workstation-Based Distributed Database System

    Get PDF
    Distributed software systems that are designed to run over workstation machines within organisations are termed workstation-based. Workstation-based systems are characterised by dynamically changing sets of machines that are used primarily for other, user-centric tasks. They must be able to adapt to and utilize spare capacity when and where it is available, and ensure that the non-availability of an individual machine does not affect the availability of the system. This thesis focuses on the requirements and design of a workstation-based database system, which is motivated by an analysis of existing database architectures that are typically run over static, specially provisioned sets of machines. A typical clustered database system -- one that is run over a number of specially provisioned machines -- executes queries interactively, returning a synchronous response to applications, with its data made durable and resilient to the failure of machines. There are no existing workstation-based databases. Furthermore, other workstation-based systems do not attempt to achieve the requirements of interactivity and durability, because they are typically used to execute asynchronous batch processing jobs that tolerate data loss -- results can be re-computed. These systems use external servers to store the final results of computations rather than workstation machines. This thesis describes the design and implementation of a workstation-based database system and investigates its viability by evaluating its performance against existing clustered database systems and testing its availability during machine failures.Comment: Ph.D. Thesi

    Improving the assessment of seismic hazard in the North Sea

    Get PDF
    The following PhD thesis provides a comprehensive reassessment of probabilistic seismic hazard assessment (PSHA) in the North Sea. PSHA provides probabilistic representations of the expected ground-shaking at sites of interest, which can be used to assess the seismic risk for structures located at (or proximal to) said sites. In the North Sea, the seismic risk for offshore infrastructure including (1) oil and gas platforms and (2) wind turbine facilities must be considered. The seismic risk of this offshore infrastructure is important to consider because certain levels of seismic damage can result in negative impacts upon (1) the environmental health of the North Sea, (2) the personal health of employees on or near the considered infrastructure and (3) the economic health of governments and corporations which are reliant upon this infrastructure. The most recent publicly available North Sea PSHA was undertaken by Bungum et al. (2000). Two decades have passed since this study, since which substantial developments in PSHA have been made, and additional North Sea ground-motion data has been collected. Furthermore, the 2001 Ekofisk earthquake was the first hydrocarbon production induced earthquake in the North Sea to have been deemed of engineering significance for platforms in the region, but was not considered within the Bungum et al. (2000) study. In this investigation, North Sea PSHA is reassessed in several ways. Firstly, a pre-existing ground-motion prediction equation (GMPE) which performs well in the North Sea is identified as a base model for a North Sea GMPE using an additional 20 years of ground motion records available since the Bungum et al. (2000) study. This base model GMPE is then improved incrementally through the constrainment of North Sea path and site effects using novel techniques. Following the development of this North Sea GMPE, the seismogenic source model of Bungum et al. (2000) is updated using an additional two decades of North Sea earthquake observations. The impact of the North Sea GMPE and the updated source model are evaluated using (1) macroseismic earthquake observations and (2) assessment of the seismic risk of offshore infrastructure in the region. The updated PSHA formulation developed within this investigation results in moderate but significant differences in the seismic risk for offshore infrastructure in the North Sea. These seismic risk estimates are potentially more appropriate than those computed using the Bungum et al. (2000) PSHA formulation due to the additional ground-motion data and the PSHA advancements available since the Bungum et al. (2000) PSHA study. Ultimately, the improved seismic hazard estimates potentially help to better assess the structural health of offshore North Sea infrastructure, and subsequently minimise the likelihood of levels of seismic damage which could be detrimental to the North Sea environment or the personnel and/or economies operating within the region.The following PhD thesis provides a comprehensive reassessment of probabilistic seismic hazard assessment (PSHA) in the North Sea. PSHA provides probabilistic representations of the expected ground-shaking at sites of interest, which can be used to assess the seismic risk for structures located at (or proximal to) said sites. In the North Sea, the seismic risk for offshore infrastructure including (1) oil and gas platforms and (2) wind turbine facilities must be considered. The seismic risk of this offshore infrastructure is important to consider because certain levels of seismic damage can result in negative impacts upon (1) the environmental health of the North Sea, (2) the personal health of employees on or near the considered infrastructure and (3) the economic health of governments and corporations which are reliant upon this infrastructure. The most recent publicly available North Sea PSHA was undertaken by Bungum et al. (2000). Two decades have passed since this study, since which substantial developments in PSHA have been made, and additional North Sea ground-motion data has been collected. Furthermore, the 2001 Ekofisk earthquake was the first hydrocarbon production induced earthquake in the North Sea to have been deemed of engineering significance for platforms in the region, but was not considered within the Bungum et al. (2000) study. In this investigation, North Sea PSHA is reassessed in several ways. Firstly, a pre-existing ground-motion prediction equation (GMPE) which performs well in the North Sea is identified as a base model for a North Sea GMPE using an additional 20 years of ground motion records available since the Bungum et al. (2000) study. This base model GMPE is then improved incrementally through the constrainment of North Sea path and site effects using novel techniques. Following the development of this North Sea GMPE, the seismogenic source model of Bungum et al. (2000) is updated using an additional two decades of North Sea earthquake observations. The impact of the North Sea GMPE and the updated source model are evaluated using (1) macroseismic earthquake observations and (2) assessment of the seismic risk of offshore infrastructure in the region. The updated PSHA formulation developed within this investigation results in moderate but significant differences in the seismic risk for offshore infrastructure in the North Sea. These seismic risk estimates are potentially more appropriate than those computed using the Bungum et al. (2000) PSHA formulation due to the additional ground-motion data and the PSHA advancements available since the Bungum et al. (2000) PSHA study. Ultimately, the improved seismic hazard estimates potentially help to better assess the structural health of offshore North Sea infrastructure, and subsequently minimise the likelihood of levels of seismic damage which could be detrimental to the North Sea environment or the personnel and/or economies operating within the region

    Quantitative Analysis of Shallow Earthquake Sequences and Regional Earthquake Behavior: Implications for Earthquake Forecasting

    No full text
    This study is a quantitative investigation and characterization of earthquake sequences in the Central Volcanic Region (CVR) of New Zealand, and several regions in New Zealand and Southern California. We introduce CURATE, a new declustering algorithm that uses rate as the primary indicator of an earthquake sequence, and we show it has appreciable utility for analyzing seismicity. The algorithm is applied to the CVR and other regions around New Zealand. These regions are also compared with the Southern California earthquake catalogue. There is a variety of behavior within these regions, with areas that experience larger mainshock-aftershock (MS-AS) sequences having distinctly different general sequence parameters than those of more swarm dominated regions. The analysis of the declustered catalog shows that Lake Taupo and at least three other North Island regions have correlated variations in rate over periods of ~5 years. These increases in rate are not due to individual large sequences, but are instead caused by a general increase in earthquake and sequence occurrence. The most obvious increase in rate across the four North Island subsets follows the 1995-1996 magmatic eruption at Ruapehu volcano. The fact that these increases are geographically widespread and occur over years at a time suggests that the variations may reflect changes in the subduction system or a broad tectonic process. We examine basic sequence parameters of swarms and MS-AS sequences to provide better information for earthquake forecasting models. Like MS-AS sequences, swarm sequences contain a large amount of decay (decreasing rate) throughout their duration. We have tested this decay and found that 89% of MS-AS sequences and 55% of swarm sequences are better fit with an Omori's law decay than a linear rate. This result will be important to future efforts to forecast lower magnitude ranges or swarm prone areas like the CVR. To look at what types of process may drive individual sequences and may be associated with the rate changes, we examined a series of swarms that occurred to the South of Lake Taupo in 2009. We relocated these earthquakes using double-difference method, hypoDD, to obtain more accurate relative locations and depths. These swarms occur in an area about 20x20 km. They do not show systematic migration between sequences. The last swarm in the series is located in the most resistive area of the Tokaanu geothermal region and had two M =4.4 earthquakes within just four hours of each other. The earthquakes in this swarm have an accelerating rate of occurrence leading up to the first M = 4.4 earthquakes, which migrate upward in depth. The locations of earthquakes following the M = 4.4 event expand away from it at a rate consistent with fluid diffusion. Our statistical investigation of triggering due to large global (M ≥ 7) and regional earthquakes (M ≥ 6) concludes that more detailed (waveform level) investigation of individual sequences will be necessary to conclusively identify triggering, but sequence catalogs may be useful in identifying potential targets for those investigations. We also analyzed the probability that a series of swarms in the central Southern Alps were triggered by the 2009 Dusky Sound Mw = 7.8 and the 2010 Darfield Mw = 7.1 earthquake. There is less than a one-percent chance that the observed sequences occurred randomly in time. The triggered swarms do not show a significant difference to the swarms occurring in that region at other times in the 1.5-year catalog. Waveform cross-correlation was performed on this central Southern Alps earthquake catalog by a fellow PhD student Carolin Boese, and reveals that individual swarms are often composed of a single waveform family or multiple waveform families in addition to earthquakes that did not show waveform similarities. The existence of earthquakes that do not share waveform similarity in the same swarm (2.5 km radius) as a waveform family indicates that similar waveform groups may be unique in their location, but do not necessarily necessitate a unique trigger or driver. In addition to these triggered swarms in the Southern Alps we have also identified two swarms that are potentially triggered by slow-slip earthquakes along the Hikurangi margin in 2009 and 2010. The sequence catalogs generated by the CURATE method may be an ideal tool for searching for earthquake sequences triggered by slow-slip

    Data Hiding and Its Applications

    Get PDF
    Data hiding techniques have been widely used to provide copyright protection, data integrity, covert communication, non-repudiation, and authentication, among other applications. In the context of the increased dissemination and distribution of multimedia content over the internet, data hiding methods, such as digital watermarking and steganography, are becoming increasingly relevant in providing multimedia security. The goal of this book is to focus on the improvement of data hiding algorithms and their different applications (both traditional and emerging), bringing together researchers and practitioners from different research fields, including data hiding, signal processing, cryptography, and information theory, among others

    Appraisal of the self-organization and evolutionary dynamics of seismicity based on (non-extensive) statistical physics and complexity science methods

    Get PDF
    Θεμελιώδης πρόκληση σε πολλά επιστημονικά πεδία αποτελεί ο καθορισμός κανονικοτήτων και νόμων ανωτέρας κλίμακας σε σχέση με την υπάρχουσα γνώση για φαινόμενα κατωτέρας κλίμακας. Είναι πλέον αποδεκτό ότι o ενεργός τεκτονικός ιστός αποτελεί ένα κρίσιμο πολύπλοκο σύστημα, αν και δεν έχει ακόμη οριστικοποιηθεί αν είναι στατικό, δυναμικό/εξελικτικό, ή ένας χρονικά εξαρτημένος συνδυασμός αμφοτέρων. Σε κάθε περίπτωση, τα κρίσιμα συστήματα χαρακτηρίζονται από μορφοκλασματική ή πολυ-μορφοκλασματική κατανομή των στοιχείων τους, ισχυρές αλληλεπιδράσεις μεταξύ των κοντινών και μακρινών γειτόνων και διακοπτόμενη (ασυνεχή) έκφρασή τους. Οι ιδιότητες αυτές μπορούν να μελετηθούν με όρους Μη Εκτατικής Στατιστικής Φυσικής (ΜΕΣΦ). Πέραν του ρυθμού έκλυσης ενέργειας που εκφράζεται μέσω του μεγέθους (Μ), μέτρο των πιθανών συσχετίσεων αποτελεί ο παρέλθων χρόνος (Δt) και η υποκεντρική απόσταση (Δd) μεταξύ αλλεπάλληλων σεισμών πάνω από ένα κατώφλι μεγέθους σε μια περιοχή. Πρόσφατες έρευνες έδειξαν ότι, εάν οι κατανομές μεγέθους (Μ), χρονικής (Δt) και χωρικής (Δd) εξάρτησης μεταξύ διαδοχικών σεισμών θεωρηθούν ανεξάρτητες έτσι ώστε η από κοινού πιθανότητα p(M, Δt, Δd) να παραγοντοποιείται σε p(MUΔtUΔd) = p(M) p(Δt) p(Δd), τότε η συχνότητα εμφάνισης ενός σεισμού εξαρτάται πολλαπλώς όχι μόνο από το μέγεθος όπως πρόβλεπει ο νόμος Gutenberg – Richter αλλά και από τη χρονική και χωρική εξάρτηση διαδοχικών σεισμών. Αυτό, με τη σειρά του, σημαίνει ότι η αυτο-οργάνωση της σεισμικότητας θα πρέπει να εκδηλώνεται μέσω μιας συγκεκριμένης στατιστικής συμπεριφοράς της χρονικής και χωρικής εξάρτησης της (κατανομές νόμων δύναμης). Στην παρούσα διατριβή θα επιχειρηθεί η περιγραφή της σεισμικότητας με όρους ΜΕΣΦ, σε σεισμογενετικά συστήματα κατά μήκος του ορίου πλακών του ΒΑ-Β Ειρηνικού και της Βορείου Αμερικής, καθώς και στο σεισμογενετικό σύστημα του ελλαδικού χώρου-Δυτικής Τουρκίας. Η ανάλυση πραγματοποιείται σε πλήρης και ομαδοποιημένους καταλόγους σεισμών, όπου οι μετασεισμοί έχουν αφαιρεθεί με τη στοχαστική μέθοδο απομαδοποίησης του Zhuang et al., (2002). Η στατιστική συμπεριφορά της σεισμικότητας υποδεικνύει ότι η επιφανειακή σεισμικότητα των συστημάτων που μελετώνται είναι υποεκτατική, χαρακτηρίζεται από μακράς εμβέλειας συσχετίσεις και για το λόγο αυτό είναι αυτο-οργανωμένη και πιθανόν κρίσιμη. Ο βαθμός της υπο-εκτακτικότητας δεν είναι ομοιόμορφος, ούτε σταθερός, αλλά διαφέρει δυναμικά από σύστημα σε σύστημα, ενίοτε διαφέρει στη χρονική εξέλιξη και μπορεί να παρουσιάζει κυκλικότητα. Το μόνο σύστημα βαθειάς δομής (σεισμικότητα σε μεγάλα εστιακά βάθη) που εξετάζεται εδώ - η Αλεούτια ζώνη υποβύθισης- φαίνεται να παρουσιάζει στατιστική που περιγράφεται με όρους κατανομής Poisson (απουσία συσχέτισης). Τα αποτελέσματα που προκύπτουν υποδεικνύουν ότι η ΜΕΣΦ αποτελεί ένα εξαιρετικό εργαλείο για την φυσική περιγραφή της σεισμικότητας σε διάφορα σεισμογενετικά περιβάλλοντα. Ο μη εκτατικός φορμαλισμός θεωρείται το κατάλληλο μεθοδολογικό εργαλείο για να περιγράψει φυσικά συστήματα που δε βρίσκονται σε ισορροπία και έχουν μεγάλη μεταβλητότητα και πολυκλασματική δομή όπως η σεισμικότητα.A fundamental challenge in many scientific fields is to define norms and laws of higher-order in relation to the existing knowledge about phenomena of lower-order. It has been long suggested that the active tectonic grain comprises a self-organized complex system, therefore its expression (seismicity) should be manifested in the temporal and spatial statistics of energy release rates, and exhibit memory due to long-range interactions in a fractal-like space-time. Such attributes can be properly understood in terms of Non-Extensive Statistical Physics (NESP) In addition to energy release rates expressed by the magnitude M, measures of the temporal and spatial interactions are the time (Δt) and hypocentral distance (Δd) between consecutive events. Recent work indicated that if the distributions of M, Δt and Δd are independent so that the joint probability p(M, Δt, Δd) factorizes into the probabilities of M, Δt and Δd, i.e. p(MUΔtUΔd) = p(M) p(Δt) p(Δd), then the frequency of earthquake occurrence is multiply related, not only to magnitude as the celebrated Gutenberg – Richter law predicts, but also to interevent time and distance by means of well-defined power-laws consistent with NESP. The present work applies these concepts to investigate the dynamics of seismogenetic systems along the NE – N boundary of the Pacific and North American plates and the seismogenic zones of Greece – Western Turkey. The analysis is conducted to full and declustered (reduced) catalogues where the aftreshocks are removed by the stochasting declustering method of Zhuang et al., 2002.The statistical behaviour of seismicity suggests that crustal seismogenetic systems along the Pacific–North American plate boundaries in California, the seismogenic zones of Greece – Western Turkey, Alaska and the Aleutian Arc are invariably sub-extensive; they exhibit prominent operative long-range interaction and long-term memory, therefore they are self-organized and possibly critical. The degree of sub-extensivity is neither uniform, nor stationary but varies dynamically between systems and may also vary with time, or in cycles. The only sub-crustal system studied herein (Aleutian Subduction) appears to be Poissonian. The results are consistent with simulations of small-world fault networks in which free boundary conditions at the edges, (i.e. at the surface) allow for self-organization and criticality to develop, and fixed boundary conditions within, (i.e. at depth), do not. The results indicate that NESP is an excellent natural descriptor of earthquake statistics and appears to apply to the seismicity observed in different seismogenetic environments. The NESP formalism, although far from having answered questions and debates on the statistical physics of earthquakes, appears to be an effective and insightful tool in the investigation of seismicity and its associated complexity

    Machine learning to generate soil information

    Get PDF
    This thesis is concerned with the novel use of machine learning (ML) methods in soil science research. ML adoption in soil science has increased considerably, especially in pedometrics (the use of quantitative methods to study the variation of soils). In parallel, the size of the soil datasets has also increased thanks to projects of global impact that aim to rescue legacy data or new large extent surveys to collect new information. While we have big datasets and global projects, currently, modelling is mostly based on "traditional" ML approaches which do not take full advantage of these large data compilations. This compilation of these global datasets is severely limited by privacy concerns and, currently, no solution has been implemented to facilitate the process. If we consider the performance differences derived from the generality of global models versus the specificity of local models, there is still a debate on which approach is better. Either in global or local DSM, most applications are static. Even with the large soil datasets available to date, there is not enough soil data to perform a fully-empirical, space-time modelling. Considering these knowledge gaps, this thesis aims to introduce advanced ML algorithms and training techniques, specifically deep neural networks, for modelling large datasets at a global scale and provide new soil information. The research presented here has been successful at applying the latest advances in ML to improve upon some of the current approaches for soil modelling with large datasets. It has also created opportunities to utilise information, such as descriptive data, that has been generally disregarded. ML methods have been embraced by the soil community and their adoption is increasing. In the particular case of neural networks, their flexibility in terms of structure and training makes them a good candidate to improve on current soil modelling approaches

    Actas da 10ª Conferência sobre Redes de Computadores

    Get PDF
    Universidade do MinhoCCTCCentro AlgoritmiCisco SystemsIEEE Portugal Sectio
    corecore