Search CORE

2,097,929 research outputs found

Statistical structures for internet-scale data management

Author: Ntarmos N.
Triantafillou P.
Weikum G.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

Efficient query processing in traditional database management systems relies on statistics on base data. For centralized systems, there is a rich body of research results on such statistics, from simple aggregates to more elaborate synopses such as sketches and histograms. For Internet-scale distributed systems, on the other hand, statistics management still poses major challenges. With the work in this paper we aim to endow peer-to-peer data management over structured overlays with the power associated with such statistical information, with emphasis on meeting the scalability challenge. To this end, we first contribute efficient, accurate, and decentralized algorithms that can compute key aggregates such as Count, CountDistinct, Sum, and Average. We show how to construct several types of histograms, such as simple Equi-Width, Average-Shifted Equi-Width, and Equi-Depth histograms. We present a full-fledged open-source implementation of these tools for distributed statistical synopses, and report on a comprehensive experimental performance evaluation, evaluating our contributions in terms of efficiency, accuracy, and scalability

CiteSeerX

Springer - Publisher Connector

Enlighten

MPG.PuRe

USING DATA BASE MANAGEMENT SYSTEMS IN STATISTICAL DATA PROCESSING

Author: Veim Joan C.
Publication venue: Stern School of Business, New York University
Publication date: 01/01/1981
Field of study

National and international statistical bureaus produce ca. 25,000 tables for publication each year, based on hundreds of inter-related object-types with thousands of attributes. It would appear that this environment should be well suited to the application of data base management techniques for the administration of the data. This paper presents a data oriented model of the statistical production process which is used as a basis for a review of the state of experience within statistical offices with commercially available data base management systems. We conclude with a presentation of some important data management facilities which must be enhanced or developed in order to support statistical production processing.Information Systems Working Papers Serie

New York University Faculty Digital Archive

Interfaces between statistical analysis packages and the ESRI geographic information system

Author: Masuoka E.
Publication venue
Publication date
Field of study

Interfaces between ESRI's geographic information system (GIS) data files and real valued data files written to facilitate statistical analysis and display of spatially referenced multivariable data are described. An example of data analysis which utilized the GIS and the statistical analysis system is presented to illustrate the utility of combining the analytic capability of a statistical package with the data management and display features of the GIS

NASA Technical Reports Server

Integrating R and Hadoop for Big Data Analysis

Author: Dragoescu Raluca Mariana
Oancea Bogdan
Publication venue
Publication date: 01/06/2014
Field of study

Analyzing and working with big data could be very diffi cult using classical means like relational database management systems or desktop software packages for statistics and visualization. Instead, big data requires large clusters with hundreds or even thousands of computing nodes. Offi cial statistics is increasingly considering big data for deriving new statistics because big data sources could produce more relevant and timely statistics than traditional sources. One of the software tools successfully and wide spread used for storage and processing of big data sets on clusters of commodity hardware is Hadoop. Hadoop framework contains libraries, a distributed fi le-system (HDFS), a resource-management platform and implements a version of the MapReduce programming model for large scale data processing. In this paper we investigate the possibilities of integrating Hadoop with R which is a popular software used for statistical computing and data visualization. We present three ways of integrating them: R with Streaming, Rhipe and RHadoop and we emphasize the advantages and disadvantages of each solution.Comment: Romanian Statistical Review no. 2 / 201

arXiv.org e-Print Archive

Directory of Open Access Journals

State of Our Estuaries 2006

Author: Piscataqua Region Estuaries Partnership
Publication venue: University of New Hampshire Scholars\u27 Repository
Publication date: 01/01/2006
Field of study

The 2006 State of the Estuaries Report includes twelve indicators intended to report on the health and environmental quality of New Hampshire’s estuaries. The New Hampshire Estuaries Project (NHEP) developed and now implements a Monitoring Plan to track environmental indicators, inform management decisions, and report on environmental progress and status. The Monitoring Plan describes the methods and data for 34 indicators used to determine if the environmental goals and objectives of the Management Plan are being met. For each indicator, the Monitoring Plan defines the monitoring objective, management goal, data quality objectives, data analysis and statistical methods, and data sources. Just as implementation of the Management Plan for New Hampshire’s estuaries involves the collaboration of many organizations and agencies, the NHEP Monitoring Plan relies on data compiled from organizations that are leaders in the management and protection of the state’s estuaries and coastal watershed resources

UNH Scholars' Repository

General specifications for the development of a USL NASA PC R and D statistical analysis support package

Author: Bassari Jinous
Dominick Wayne D.
Triantafyllopoulos Spiros
Publication venue
Publication date
Field of study

The University of Southwestern Louisiana (USL) NASA PC R and D statistical analysis support package is designed to be a three-level package to allow statistical analysis for a variety of applications within the USL Data Base Management System (DBMS) contract work. The design addresses usage of the statistical facilities as a library package, as an interactive statistical analysis system, and as a batch processing package

NASA Technical Reports Server

Fish assemblages and indicator species: reef fishes off the southeastern United States

Author: Shertzer Kyle W.
Williams Erik W.
Publication venue
Publication date: 01/01/2008
Field of study

For many fish stocks, resource management cannot be based on stock assessment because data are insufficient-a situation that requires alternative approaches to management. One possible approach is to manage data-limited stocks as part of an assemblage and to determine the status of the entire unit by a data-rich indicator species. The utility of this approach was evaluated in analyses of 15 years of commercial and 34 years of recreational logbook data from reef fisheries off the southeastern United States coast. Multivariate statistical analyses successfully revealed three primary assemblages. Within assemblages, however, there was little evidence of synchrony in population dynamics of member species, and thus, no support for the use of indicator species. Nonetheless, assemblages could prove useful as management units. Their identification offers opportunities for implementing management to address such ecological considerations as bycatch and species interrelations

Aquatic Commons