2,097,929 research outputs found

    Statistical structures for internet-scale data management

    Get PDF
    Efficient query processing in traditional database management systems relies on statistics on base data. For centralized systems, there is a rich body of research results on such statistics, from simple aggregates to more elaborate synopses such as sketches and histograms. For Internet-scale distributed systems, on the other hand, statistics management still poses major challenges. With the work in this paper we aim to endow peer-to-peer data management over structured overlays with the power associated with such statistical information, with emphasis on meeting the scalability challenge. To this end, we first contribute efficient, accurate, and decentralized algorithms that can compute key aggregates such as Count, CountDistinct, Sum, and Average. We show how to construct several types of histograms, such as simple Equi-Width, Average-Shifted Equi-Width, and Equi-Depth histograms. We present a full-fledged open-source implementation of these tools for distributed statistical synopses, and report on a comprehensive experimental performance evaluation, evaluating our contributions in terms of efficiency, accuracy, and scalability

    USING DATA BASE MANAGEMENT SYSTEMS IN STATISTICAL DATA PROCESSING

    Get PDF
    National and international statistical bureaus produce ca. 25,000 tables for publication each year, based on hundreds of inter-related object-types with thousands of attributes. It would appear that this environment should be well suited to the application of data base management techniques for the administration of the data. This paper presents a data oriented model of the statistical production process which is used as a basis for a review of the state of experience within statistical offices with commercially available data base management systems. We conclude with a presentation of some important data management facilities which must be enhanced or developed in order to support statistical production processing.Information Systems Working Papers Serie

    Interfaces between statistical analysis packages and the ESRI geographic information system

    Get PDF
    Interfaces between ESRI's geographic information system (GIS) data files and real valued data files written to facilitate statistical analysis and display of spatially referenced multivariable data are described. An example of data analysis which utilized the GIS and the statistical analysis system is presented to illustrate the utility of combining the analytic capability of a statistical package with the data management and display features of the GIS

    Integrating R and Hadoop for Big Data Analysis

    Get PDF
    Analyzing and working with big data could be very diffi cult using classical means like relational database management systems or desktop software packages for statistics and visualization. Instead, big data requires large clusters with hundreds or even thousands of computing nodes. Offi cial statistics is increasingly considering big data for deriving new statistics because big data sources could produce more relevant and timely statistics than traditional sources. One of the software tools successfully and wide spread used for storage and processing of big data sets on clusters of commodity hardware is Hadoop. Hadoop framework contains libraries, a distributed fi le-system (HDFS), a resource-management platform and implements a version of the MapReduce programming model for large scale data processing. In this paper we investigate the possibilities of integrating Hadoop with R which is a popular software used for statistical computing and data visualization. We present three ways of integrating them: R with Streaming, Rhipe and RHadoop and we emphasize the advantages and disadvantages of each solution.Comment: Romanian Statistical Review no. 2 / 201

    State of Our Estuaries 2006

    Get PDF
    The 2006 State of the Estuaries Report includes twelve indicators intended to report on the health and environmental quality of New Hampshire’s estuaries. The New Hampshire Estuaries Project (NHEP) developed and now implements a Monitoring Plan to track environmental indicators, inform management decisions, and report on environmental progress and status. The Monitoring Plan describes the methods and data for 34 indicators used to determine if the environmental goals and objectives of the Management Plan are being met. For each indicator, the Monitoring Plan defines the monitoring objective, management goal, data quality objectives, data analysis and statistical methods, and data sources. Just as implementation of the Management Plan for New Hampshire’s estuaries involves the collaboration of many organizations and agencies, the NHEP Monitoring Plan relies on data compiled from organizations that are leaders in the management and protection of the state’s estuaries and coastal watershed resources

    General specifications for the development of a USL NASA PC R and D statistical analysis support package

    Get PDF
    The University of Southwestern Louisiana (USL) NASA PC R and D statistical analysis support package is designed to be a three-level package to allow statistical analysis for a variety of applications within the USL Data Base Management System (DBMS) contract work. The design addresses usage of the statistical facilities as a library package, as an interactive statistical analysis system, and as a batch processing package

    Fish assemblages and indicator species: reef fishes off the southeastern United States

    Get PDF
    For many fish stocks, resource management cannot be based on stock assessment because data are insufficient-a situation that requires alternative approaches to management. One possible approach is to manage data-limited stocks as part of an assemblage and to determine the status of the entire unit by a data-rich indicator species. The utility of this approach was evaluated in analyses of 15 years of commercial and 34 years of recreational logbook data from reef fisheries off the southeastern United States coast. Multivariate statistical analyses successfully revealed three primary assemblages. Within assemblages, however, there was little evidence of synchrony in population dynamics of member species, and thus, no support for the use of indicator species. Nonetheless, assemblages could prove useful as management units. Their identification offers opportunities for implementing management to address such ecological considerations as bycatch and species interrelations
    corecore