2,097,929 research outputs found
Statistical structures for internet-scale data management
Efficient query processing in traditional database management systems relies on statistics on base data. For centralized systems, there is a rich body of research results on such statistics, from simple aggregates to more elaborate synopses such as sketches and histograms. For Internet-scale distributed systems, on the other hand, statistics management still poses major challenges. With the work in this paper we aim to endow peer-to-peer data management over structured overlays with the power associated with such statistical information, with emphasis on meeting the scalability challenge. To this end, we first contribute efficient, accurate, and decentralized algorithms that can compute key aggregates such as Count, CountDistinct, Sum, and Average. We show how to construct several types of histograms, such as simple Equi-Width, Average-Shifted Equi-Width, and Equi-Depth histograms. We present a full-fledged open-source implementation of these tools for distributed statistical synopses, and report on a comprehensive experimental performance evaluation, evaluating our contributions in terms of efficiency, accuracy, and scalability
USING DATA BASE MANAGEMENT SYSTEMS IN STATISTICAL DATA PROCESSING
National and international statistical bureaus produce ca. 25,000
tables for publication each year, based on hundreds of inter-related object-types with thousands of attributes. It would appear
that this environment should be well suited to the application of
data base management techniques for the administration of the data.
This paper presents a data oriented model of the statistical production process which is used as a basis for a review of the state of
experience within statistical offices with commercially available
data base management systems. We conclude with a presentation of
some important data management facilities which must be enhanced or
developed in order to support statistical production processing.Information Systems Working Papers Serie
Interfaces between statistical analysis packages and the ESRI geographic information system
Interfaces between ESRI's geographic information system (GIS) data files and real valued data files written to facilitate statistical analysis and display of spatially referenced multivariable data are described. An example of data analysis which utilized the GIS and the statistical analysis system is presented to illustrate the utility of combining the analytic capability of a statistical package with the data management and display features of the GIS
Integrating R and Hadoop for Big Data Analysis
Analyzing and working with big data could be very diffi cult using classical
means like relational database management systems or desktop software packages
for statistics and visualization. Instead, big data requires large clusters
with hundreds or even thousands of computing nodes. Offi cial statistics is
increasingly considering big data for deriving new statistics because big data
sources could produce more relevant and timely statistics than traditional
sources. One of the software tools successfully and wide spread used for
storage and processing of big data sets on clusters of commodity hardware is
Hadoop. Hadoop framework contains libraries, a distributed fi le-system (HDFS),
a resource-management platform and implements a version of the MapReduce
programming model for large scale data processing. In this paper we investigate
the possibilities of integrating Hadoop with R which is a popular software used
for statistical computing and data visualization. We present three ways of
integrating them: R with Streaming, Rhipe and RHadoop and we emphasize the
advantages and disadvantages of each solution.Comment: Romanian Statistical Review no. 2 / 201
State of Our Estuaries 2006
The 2006 State of the Estuaries Report includes twelve indicators intended to report on the health and environmental quality of New Hampshire’s estuaries. The New Hampshire Estuaries Project (NHEP) developed and now implements a Monitoring Plan to track environmental indicators, inform management decisions, and report on environmental progress and status. The Monitoring Plan describes the methods and data for 34 indicators used to determine if the environmental goals and objectives of the Management Plan are being met. For each indicator, the Monitoring Plan defines the monitoring objective, management goal, data quality objectives, data analysis and statistical methods, and data sources. Just as implementation of the Management Plan for New Hampshire’s estuaries involves the collaboration of many organizations and agencies, the NHEP Monitoring Plan relies on data compiled from organizations that are leaders in the management and protection of the state’s estuaries and coastal watershed resources
General specifications for the development of a USL NASA PC R and D statistical analysis support package
The University of Southwestern Louisiana (USL) NASA PC R and D statistical analysis support package is designed to be a three-level package to allow statistical analysis for a variety of applications within the USL Data Base Management System (DBMS) contract work. The design addresses usage of the statistical facilities as a library package, as an interactive statistical analysis system, and as a batch processing package
Fish assemblages and indicator species: reef fishes off the southeastern United States
For many fish stocks, resource management cannot be based on stock assessment because data are insufficient-a situation that requires alternative approaches to management. One possible approach is to manage data-limited stocks as part of an assemblage and to determine the status of the entire unit by a data-rich indicator species. The utility of this approach was evaluated in analyses of 15 years of commercial and 34 years of recreational logbook data from reef fisheries off the southeastern United States coast. Multivariate statistical analyses successfully revealed three primary assemblages. Within assemblages, however, there was little evidence of synchrony in population dynamics of member species, and thus, no support for the use of indicator species. Nonetheless, assemblages could prove useful as management units. Their identification offers opportunities for implementing management to address such ecological considerations as bycatch and species interrelations
- …