608 research outputs found

    Extending DBMSs with satellite databases

    Get PDF
    In this paper, we propose an extensible architecture for database engines where satellite databases are used to scale out and implement additional functionality for a centralized database engine. The architecture uses a middleware layer that offers consistent views and a single system image over a cluster of machines with database engines. One of these engines acts as a master copy while the others are read-only snapshots which we call satellites. The satellites are lightweight DBMSs used for scalability and to provide functionality difficult or expensive to implement in the main engine. Our approach also supports the dynamic creation of satellites to be able to autonomously adapt to varying loads. The paper presents the architecture, discusses the research problems it raises, and validates its feasibility with extensive experimental result

    A Cross-Platform Database Infrastructure Monitoring Dashboard for The Hanover Insurance Group

    Get PDF
    This MIS MQP developed a cross-platform database infrastructure monitoring dashboard for The Hanover Insurance Group. Both technical details and executive level insight are comprehensively represented. Critical health and performance information from four database platforms that support over 2,000 databases is integrated with independent monitoring tools in one centralized dashboard. Increasing database monitoring efficiency and cross-organizational communication, the dashboard is a lean implementation tool designed to transform Hanover\u27s reactive monitoring approach into a proactive one

    Diagnosis of Arrhythmia Using Neural Networks

    Get PDF
    This dissertation presents an intelligent framework for classification of heart arrhythmias. It is a framework of cascaded discrete wavelet transform and the Fourier transform as preprocessing stages for the neural network. This work exploits the information about heart activity contained in the ECG signal; the power of the wavelet and Fourier transforms in characterizing the signal and the power learningpower of neural networks. Firstly, the ECG signals are four-level discrete wavelet decomposed using a filter-bank and mother wavelet 'db2'. Then all the detailed coefficients were discarded, while retaining only the approximation coefficients at the fourth level. The retained approximation coefficients are Fourier transformed using a 16-point FFT. The FFT is symmetrical, therefore only the first 8-points are sufficient to characterize the spectrum. The last 8-points resulting from theFFTare discarded during feature selection. The 8-point feature vector is then used to train a feedforward neural network with one hidden layer of 20-units and three outputs. The neural network is trained by using the Scaled Conjugate Gradient Backpropagation algorithm (SCG). This was implemented in a MATLAB environment using the MATLAB GUINeural networktoolbox.. This approach yields an accuracy of 94.66% over three arrhythmia classes, namely the Ventricular Flutter (VFL), the Ventricular Tachycardia (VT) and the Supraventricular Tachyarrhythmia (SVTA). We conclude that for the amount of information retained and the number features used the performance is fairly competitive

    Magnetic edge states

    Full text link
    Magnetic edge states are responsible for various phenomena of magneto-transport. Their importance is due to the fact that, unlike the bulk of the eigenstates in a magnetic system, they carry electric current along the boundary of a confined domain. Edge states can exist both as interior (quantum dot) and exterior (anti-dot) states. In the present report we develop a consistent and practical spectral theory for the edge states encountered in magnetic billiards. It provides an objective definition for the notion of edge states, is applicable for interior and exterior problems, facilitates efficient quantization schemes, and forms a convenient starting point for both the semiclassical description and the statistical analysis. After elaborating these topics we use the semiclassical spectral theory to uncover nontrivial spectral correlations between the interior and the exterior edge states. We show that they are the quantum manifestation of a classical duality between the trajectories in an interior and an exterior magnetic billiard.Comment: 170 pages, 48 figures (high quality version available at http://www.klaus-hornberger.de

    Automatic acquisition of LFG resources for German - as good as it gets

    Get PDF
    We present data-driven methods for the acquisition of LFG resources from two German treebanks. We discuss problems specific to semi-free word order languages as well as problems arising fromthe data structures determined by the design of the different treebanks. We compare two ways of encoding semi-free word order, as done in the two German treebanks, and argue that the design of the TiGer treebank is more adequate for the acquisition of LFG resources. Furthermore, we describe an architecture for LFG grammar acquisition for German, based on the two German treebanks, and compare our results with a hand-crafted German LFG grammar

    Optimizing Hierarchical Storage Management For Database System

    Get PDF
    Caching is a classical but effective way to improve system performance. To improve system performance, servers, such as database servers and storage servers, contain significant amounts of memory that act as a fast cache. Meanwhile, as new storage devices such as flash-based solid state drives (SSDs) are added to storage systems over time, using the memory cache is not the only way to improve system performance. In this thesis, we address the problems of how to manage the cache of a storage server and how to utilize the SSD in a hybrid storage system. Traditional caching policies are known to perform poorly for storage server caches. One promising approach to solving this problem is to use hints from the storage clients to manage the storage server cache. Previous hinting approaches are ad hoc, in that a predefined reaction to specific types of hints is hard-coded into the caching policy. With ad hoc approaches, it is difficult to ensure that the best hints are being used, and it is difficult to accommodate multiple types of hints and multiple client applications. In this thesis, we propose CLient-Informed Caching (CLIC), a generic hint-based technique for managing storage server caches. CLIC automatically interprets hints generated by storage clients and translates them into a server caching policy. It does this without explicit knowledge of the application-specific hint semantics. We demonstrate using trace-based simulation of database workloads that CLIC outperforms hint-oblivious and state-of-the-art hint-aware caching policies. We also demonstrate that the space required to track and interpret hints is small. SSDs are becoming a part of the storage system. Adding SSD to a storage system not only raises the question of how to manage the SSD, but also raises the question of whether current buffer pool algorithms will still work effectively. We are interested in the use of hybrid storage systems, consisting of SSDs and hard disk drives (HDD), for database management. We present cost-aware replacement algorithms for both the DBMS buffer pool and the SSD. These algorithms are aware of the different I/O performance of HDD and SSD. In such a hybrid storage system, the physical access pattern to the SSD depends on the management of the DBMS buffer pool. We studied the impact of the buffer pool caching policies on the access patterns of the SSD. Based on these studies, we designed a caching policy to effectively manage the SSD. We implemented these algorithms in MySQL's InnoDB storage engine and used the TPC-C workload to demonstrate that these cost-aware algorithms outperform previous algorithms

    Auditing database systems through forensic analysis

    Get PDF
    The majority of sensitive and personal data is stored in a number of different Database Management Systems (DBMS). For example, Oracle is frequently used to store corporate data, MySQL serves as the back-end storage for many webstores, and SQLite stores personal data such as SMS messages or browser bookmarks. Consequently, the pervasive use of DBMSes has led to an increase in the rate at which they are exploited in cybercrimes. After a cybercrime occurs, investigators need forensic tools and methods to recreate a timeline of events and determine the extent of the security breach. When a breach involves a compromised system, these tools must make few assumptions about the system (e.g., corrupt storage, poorly configured logging, data tampering). Since DBMSes manage storage independent of the operating system, they require their own set of forensic tools. This dissertation presents 1) our database-agnostic forensic methods to examine DBMS contents from any evidence source (e.g., disk images or RAM snapshots) without using a live system and 2) applications of our forensic analysis methods to secure data. The foundation of this analysis is page carving, our novel database forensic method that we implemented as the tool DBCarver. We demonstrate that DBCarver is capable of reconstructing DBMS contents, including metadata and deleted data, from various types of digital evidence. Since DBMS storage is managed independently of the operating system, DBCarver can be used for new methods to securely delete data (i.e., data sanitization). In the event of suspected log tampering or direct modification to DBMS storage, DBCarver can be used to verify log integrity and discover storage inconsistencies

    Database server workload characterization in an e-commerce environment

    Get PDF
    A typical E-commerce system that is deployed on the Internet has multiple layers that include Web users, Web servers, application servers, and a database server. As the system use and user request frequency increase, Web/application servers can be scaled up by replication. A load balancing proxy can be used to route user requests to individual machines that perform the same functionality. To address the increasing workload while avoiding replicating the database server, various dynamic caching policies have been proposed to reduce the database workload in E-commerce systems. However, the nature of the changes seen by the database server as a result of dynamic caching remains unknown. A good understanding of this change is fundamental for tuning a database server to get better performance. In this study, the TPC-W (a transactional Web E-commerce benchmark) workloads on a database server are characterized under two different dynamic caching mechanisms, which are generalized and implemented as query-result cache and table cache. The characterization focuses on response time, CPU computation, buffer pool references, disk I/O references, and workload classification. This thesis combines a variety of analysis techniques: simulation, real time measurement and data mining. The experimental results in this thesis reveal some interesting effects that the dynamic caching has on the database server workload characteristics. The main observations include: (a) dynamic cache can considerably reduce the CPU usage of the database server and the number of database page references when it is heavily loaded; (b) dynamic cache can also reduce the database reference locality, but to a smaller degree than that reported in file servers. The data classification results in this thesis show that with dynamic cache, the database server sees TPC-W profiles more like on-line transaction processing workloads

    Second-tier Cache Management to Support DBMS Workloads

    Get PDF
    Enterprise Database Management Systems (DBMS) often run on computers with dedicated storage systems. Their data access requests need to go through two tiers of cache, i.e., a database bufferpool and a storage server cache, before reaching the storage media, e.g., disk platters. A tremendous amount of work has been done to improve the performance of the first-tier cache, i.e., the database bufferpool. However, the amount of work focusing on second-tier cache management to support DBMS workloads is comparably small. In this thesis we propose several novel techniques for managing second-tier caches to boost DBMS performance in terms of query throughput and query response time. The main purpose of second-tier cache management is to reduce the I/O latency endured by database query executions. This goal can be achieved by minimizing the number of reads and writes issued from second-tier caches to storage devices. The rst part of our research focuses on reducing the number of read I/Os issued by second-tier caches. We observe that DBMSs issue I/O requests for various reasons. The rationales behind these I/O requests provide useful information to second-tier caches because they can be used to estimate the temporal locality of the data blocks being requested. A second-tier cache can exploit this information when making replacement decisions. In this thesis we propose a technique to pass this information from DBMSs to second-tier caches and to use it in guiding cache replacements. The second part of this thesis focuses on reducing the number of writes issued by second-tier caches. Our work is two fold. First, we observe that although there are second-tier caches within computer systems, today's DBMS cannot take full advantage of them. For example, most commercial DBMSs use forced writes to propagate bufferpool updates to permanent storage for data durability reasons. We notice that enforcing such a practice is more conservative than necessary. Some of the writes can be issued as unforced requests and can be cached in the second-tier cache without immediate synchronization. This will give the second-tier cache opportunities to cache and consolidate multiple writes into one request. However, unfortunately, the current POSIX compliant le system interfaces provided by mainstream operating systems e.g., Unix and Windows) are not flexible enough to support such dynamic synchronization. We propose to extend such interfaces to let DBMSs take advantage of using unforced writes whenever possible. Additionally, we observe that the existing cache replacement algorithms are designed solely to maximize read cache hits (i.e., to minimize read I/Os). The purpose is to minimize the read latency, which is on the critical path of query executions. We argue that minimizing read requests is not the only objective of cache replacement. When I/O bandwidth becomes a bottleneck the objective should be to minimize the total number of I/Os, including both reads and writes, to achieve the best performance. We propose to associate a new type of replacement cost, i.e., the total number of I/Os caused by the replacement, with each cache page; and we also present a partial characterization of an optimal algorithm which minimizes the total number of I/Os generated by caches. Based on this knowledge, we extend several existing replacement algorithms, which are write-oblivious (focus only on reducing reads), to be write-aware and observe promising performance gains in the evaluations
    corecore