1,594 research outputs found

    Techniques for Ensuring Index Usage Predictability in Microsoft SQL Server

    Get PDF
    The demand for carrying out high-performance operations with data is growing in parallel with the vast growth of data itself. The retrieval of data for analysis, the manipulation of data, as well as its insertion in data stores must all be performed very efficiently, using techniques that ensure speed, reliability and accuracy. This paper investigates the techniques and practices that improve the performance of data retrieving by the use of SQL and Microsoft SQL Server. Being that SQL is a declarative language that specifies what should be produced as a result, instead of how to achieve that result, this paper will look at the internals of SQL Server that affect the "how" of queries and data operations, in order to propose techniques that ensure performance gains. The paper will aim to shed light on the limitations and variance in index usage, and to answer the question why indexes are sometimes used, and other times not, for the same query. To overcome the index limitations the "index fusion" technique is proposed

    Neo: A Learned Query Optimizer

    Full text link
    Query optimization is one of the most challenging problems in database systems. Despite the progress made over the past decades, query optimizers remain extremely complex components that require a great deal of hand-tuning for specific workloads and datasets. Motivated by this shortcoming and inspired by recent advances in applying machine learning to data management challenges, we introduce Neo (Neural Optimizer), a novel learning-based query optimizer that relies on deep neural networks to generate query executions plans. Neo bootstraps its query optimization model from existing optimizers and continues to learn from incoming queries, building upon its successes and learning from its failures. Furthermore, Neo naturally adapts to underlying data patterns and is robust to estimation errors. Experimental results demonstrate that Neo, even when bootstrapped from a simple optimizer like PostgreSQL, can learn a model that offers similar performance to state-of-the-art commercial optimizers, and in some cases even surpass them

    EcoGIS – GIS tools for ecosystem approaches to fisheries management

    Get PDF
    Executive Summary: The EcoGIS project was launched in September 2004 to investigate how Geographic Information Systems (GIS), marine data, and custom analysis tools can better enable fisheries scientists and managers to adopt Ecosystem Approaches to Fisheries Management (EAFM). EcoGIS is a collaborative effort between NOAA’s National Ocean Service (NOS) and National Marine Fisheries Service (NMFS), and four regional Fishery Management Councils. The project has focused on four priority areas: Fishing Catch and Effort Analysis, Area Characterization, Bycatch Analysis, and Habitat Interactions. Of these four functional areas, the project team first focused on developing a working prototype for catch and effort analysis: the Fishery Mapper Tool. This ArcGIS extension creates time-and-area summarized maps of fishing catch and effort from logbook, observer, or fishery-independent survey data sets. Source data may come from Oracle, Microsoft Access, or other file formats. Feedback from beta-testers of the Fishery Mapper was used to debug the prototype, enhance performance, and add features. This report describes the four priority functional areas, the development of the Fishery Mapper tool, and several themes that emerged through the parallel evolution of the EcoGIS project, the concept and implementation of the broader field of Ecosystem Approaches to Management (EAM), data management practices, and other EAM toolsets. In addition, a set of six succinct recommendations are proposed on page 29. One major conclusion from this work is that there is no single “super-tool” to enable Ecosystem Approaches to Management; as such, tools should be developed for specific purposes with attention given to interoperability and automation. Future work should be coordinated with other GIS development projects in order to provide “value added” and minimize duplication of efforts. In addition to custom tools, the development of cross-cutting Regional Ecosystem Spatial Databases will enable access to quality data to support the analyses required by EAM. GIS tools will be useful in developing Integrated Ecosystem Assessments (IEAs) and providing pre- and post-processing capabilities for spatially-explicit ecosystem models. Continued funding will enable the EcoGIS project to develop GIS tools that are immediately applicable to today’s needs. These tools will enable simplified and efficient data query, the ability to visualize data over time, and ways to synthesize multidimensional data from diverse sources. These capabilities will provide new information for analyzing issues from an ecosystem perspective, which will ultimately result in better understanding of fisheries and better support for decision-making. (PDF file contains 45 pages.

    The Family of MapReduce and Large Scale Data Processing Systems

    Full text link
    In the last two decades, the continuous increase of computational power has produced an overwhelming flow of data which has called for a paradigm shift in the computing architecture and large scale data processing mechanisms. MapReduce is a simple and powerful programming model that enables easy development of scalable parallel applications to process vast amounts of data on large clusters of commodity machines. It isolates the application from the details of running a distributed program such as issues on data distribution, scheduling and fault tolerance. However, the original implementation of the MapReduce framework had some limitations that have been tackled by many research efforts in several followup works after its introduction. This article provides a comprehensive survey for a family of approaches and mechanisms of large scale data processing mechanisms that have been implemented based on the original idea of the MapReduce framework and are currently gaining a lot of momentum in both research and industrial communities. We also cover a set of introduced systems that have been implemented to provide declarative programming interfaces on top of the MapReduce framework. In addition, we review several large scale data processing systems that resemble some of the ideas of the MapReduce framework for different purposes and application scenarios. Finally, we discuss some of the future research directions for implementing the next generation of MapReduce-like solutions.Comment: arXiv admin note: text overlap with arXiv:1105.4252 by other author

    Hillview:A trillion-cell spreadsheet for big data

    Get PDF
    Hillview is a distributed spreadsheet for browsing very large datasets that cannot be handled by a single machine. As a spreadsheet, Hillview provides a high degree of interactivity that permits data analysts to explore information quickly along many dimensions while switching visualizations on a whim. To provide the required responsiveness, Hillview introduces visualization sketches, or vizketches, as a simple idea to produce compact data visualizations. Vizketches combine algorithmic techniques for data summarization with computer graphics principles for efficient rendering. While simple, vizketches are effective at scaling the spreadsheet by parallelizing computation, reducing communication, providing progressive visualizations, and offering precise accuracy guarantees. Using Hillview running on eight servers, we can navigate and visualize datasets of tens of billions of rows and trillions of cells, much beyond the published capabilities of competing systems

    The LDBC Social Network Benchmark Interactive workload v2: A transactional graph query benchmark with deep delete operations

    Full text link
    The LDBC Social Network Benchmark's Interactive workload captures an OLTP scenario operating on a correlated social network graph. It consists of complex graph queries executed concurrently with a stream of updates operation. Since its initial release in 2015, the Interactive workload has become the de facto industry standard for benchmarking transactional graph data management systems. As graph systems have matured and the community's understanding of graph processing features has evolved, we initiated the renewal of this benchmark. This paper describes the Interactive v2 workload with several new features: delete operations, a cheapest path-finding query, support for larger data sets, and a novel temporal parameter curation algorithm that ensures stable runtimes for path queries

    Weiterentwicklung analytischer Datenbanksysteme

    Get PDF
    This thesis contributes to the state of the art in analytical database systems. First, we identify and explore extensions to better support analytics on event streams. Second, we propose a novel polygon index to enable efficient geospatial data processing in main memory. Third, we contribute a new deep learning approach to cardinality estimation, which is the core problem in cost-based query optimization.Diese Arbeit trägt zum aktuellen Forschungsstand von analytischen Datenbanksystemen bei. Wir identifizieren und explorieren Erweiterungen um Analysen auf Eventströmen besser zu unterstützen. Wir stellen eine neue Indexstruktur für Polygone vor, die eine effiziente Verarbeitung von Geodaten im Hauptspeicher ermöglicht. Zudem präsentieren wir einen neuen Ansatz für Kardinalitätsschätzungen mittels maschinellen Lernens
    • …
    corecore