Search CORE

2,929 research outputs found

Enabling On-Demand Database Computing with MIT SuperCloud Database Management System

Author: Arcand William
Bergeron Bill
Bestor David
Byun Chansup
Edwards Lauren
Gadepally Vijay
Hubbell Matthew
Kepner Jeremy
Michaleas Peter
Mullen Julie
Prout Andrew
Reuther Albert
Rosa Antonio
Yee Charles
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 29/06/2015
Field of study

The MIT SuperCloud database management system allows for rapid creation and flexible execution of a variety of the latest scientific databases, including Apache Accumulo and SciDB. It is designed to permit these databases to run on a High Performance Computing Cluster (HPCC) platform as seamlessly as any other HPCC job. It ensures the seamless migration of the databases to the resources assigned by the HPCC scheduler and centralized storage of the database files when not running. It also permits snapshotting of databases to allow researchers to experiment and push the limits of the technology without concerns for data or productivity loss if the database becomes unstable.Comment: 6 pages; accepted to IEEE High Performance Extreme Computing (HPEC) conference 2015. arXiv admin note: text overlap with arXiv:1406.492

arXiv.org e-Print Archive

Crossref

Performance comparison of point and spatial access methods

Author: D. Comer
H.P. Kriegel
J. Nievergelt
K. Hinrichs
K.-Y. Whang
M. Tamminen
W.A. Burkhard
Publication venue: Ludwig-Maximilians-Universität München
Publication date: 01/01/1990
Field of study

In the past few years a large number of multidimensional point access methods, also called multiattribute index structures, has been suggested, all of them claiming good performance. Since no performance comparison of these structures under arbitrary (strongly correlated nonuniform, short "ugly") data distributions and under various types of queries has been performed, database researchers and designers were hesitant to use any of these new point access methods. As shown in a recent paper, such point access methods are not only important in traditional database applications. In new applications such as CAD/CIM and geographic or environmental information systems, access methods for spatial objects are needed. As recently shown such access methods are based on point access methods in terms of functionality and performance. Our performance comparison naturally consists of two parts. In part I we w i l l compare multidimensional point access methods, whereas in part I I spatial access methods for rectangles will be compared. In part I we present a survey and classification of existing point access methods. Then we carefully select the following four methods for implementation and performance comparison under seven different data files (distributions) and various types of queries: the 2-level grid file, the BANG file, the hB-tree and a new scheme, called the BUDDY hash tree. We were surprised to see one method to be the clear winner which was the BUDDY hash tree. It exhibits an at least 20 % better average performance than its competitors and is robust under ugly data and queries. In part I I we compare spatial access methods for rectangles. After presenting a survey and classification of existing spatial access methods we carefully selected the following four methods for implementation and performance comparison under six different data files (distributions) and various types of queries: the R-tree, the BANG file, PLOP hashing and the BUDDY hash tree. The result presented two winners: the BANG file and the BUDDY hash tree. This comparison is a first step towards a standardized testbed or benchmark. We offer our data and query files to each designer of a new point or spatial access method such that he can run his implementation in our testbed

Crossref

Open Access LMU

Viewpoints: A high-performance high-dimensional exploratory data analysis tool

Author: C. Levit
Høg E.
M. J. Way
P. R. Gazis
Publication venue: 'University of Chicago Press'
Publication date: 08/11/2010
Field of study

Scientific data sets continue to increase in both size and complexity. In the past, dedicated graphics systems at supercomputing centers were required to visualize large data sets, but as the price of commodity graphics hardware has dropped and its capability has increased, it is now possible, in principle, to view large complex data sets on a single workstation. To do this in practice, an investigator will need software that is written to take advantage of the relevant graphics hardware. The Viewpoints visualization package described herein is an example of such software. Viewpoints is an interactive tool for exploratory visual analysis of large, high-dimensional (multivariate) data. It leverages the capabilities of modern graphics boards (GPUs) to run on a single workstation or laptop. Viewpoints is minimalist: it attempts to do a small set of useful things very well (or at least very quickly) in comparison with similar packages today. Its basic feature set includes linked scatter plots with brushing, dynamic histograms, normalization and outlier detection/removal. Viewpoints was originally designed for astrophysicists, but it has since been used in a variety of fields that range from astronomy, quantum chemistry, fluid dynamics, machine learning, bioinformatics, and finance to information technology server log mining. In this article, we describe the Viewpoints package and show examples of its usage.Comment: 18 pages, 3 figures, PASP in press, this version corresponds more closely to that to be publishe

arXiv.org e-Print Archive

Crossref

The Buddy Effect: An efficient and robust access method for spatial data base systems

Author: Kriegel Hans-Peter
Seeger B.
Publication venue
Publication date: 01/01/1990
Field of study

Open Access LMU

EVE – A Grid enabled data analysis environment for neutron scattering experiments

Author: Fowler R
Pakhira A
Perring T
Sastry L
Publication venue
Publication date: 01/01/2004
Field of study

ePubs: the open archive for STFC research publications

Best matching processes in distributed systems

Author: Moghaddam Mohsen
Publication venue: 'Purdue University (bepress)'
Publication date: 01/01/2016
Field of study

The growing complexity and dynamic behavior of modern manufacturing and service industries along with competitive and globalized markets have gradually transformed traditional centralized systems into distributed networks of e- (electronic) Systems. Emerging examples include e-Factories, virtual enterprises, smart farms, automated warehouses, and intelligent transportation systems. These (and similar) distributed systems, regardless of context and application, have a property in common: They all involve certain types of interactions (collaborative, competitive, or both) among their distributed individuals—from clusters of passive sensors and machines to complex networks of computers, intelligent robots, humans, and enterprises. Having this common property, such systems may encounter common challenges in terms of suboptimal interactions and thus poor performance, caused by potential mismatch between individuals. For example, mismatched subassembly parts, vehicles—routes, suppliers—retailers, employees—departments, and products—automated guided vehicles—storage locations may lead to low-quality products, congested roads, unstable supply networks, conflicts, and low service level, respectively. This research refers to this problem as best matching, and investigates it as a major design principle of CCT, the Collaborative Control Theory. The original contribution of this research is to elaborate on the fundamentals of best matching in distributed and collaborative systems, by providing general frameworks for (1) Systematic analysis, inclusive taxonomy, analogical and structural comparison between different matching processes; (2) Specification and formulation of problems, and development of algorithms and protocols for best matching; (3) Validation of the models, algorithms, and protocols through extensive numerical experiments and case studies. The first goal is addressed by investigating matching problems in distributed production, manufacturing, supply, and service systems based on a recently developed reference model, the PRISM Taxonomy of Best Matching. Following the second goal, the identified problems are then formulated as mixed-integer programs. Due to the computational complexity of matching problems, various optimization algorithms are developed for solving different problem instances, including modified genetic algorithms, tabu search, and neighbourhood search heuristics. The dynamic and collaborative/competitive behaviors of matching processes in distributed settings are also formulated and examined through various collaboration, best matching, and task administration protocols. In line with the third goal, four case studies are conducted on various manufacturing, supply, and service systems to highlight the impact of best matching on their operational performance, including service level, utilization, stability, and cost-effectiveness, and validate the computational merits of the developed solution methodologies

Purdue E-Pubs

Integrating Algorithmic and Systemic Load Balancing Strategies in Parallel Scientific Applications

Author: Ghafoor Sheikh Khaled
Publication venue: Scholars Junction
Publication date: 13/12/2003
Field of study

Load imbalance is a major source of performance degradation in parallel scientific applications. Load balancing increases the efficient use of existing resources and improves performance of parallel applications running in distributed environments. At a coarse level of granularity, advances in runtime systems for parallel programs have been proposed in order to control available resources as efficiently as possible by utilizing idle resources and using task migration. At a finer granularity level, advances in algorithmic strategies for dynamically balancing computational loads by data redistribution have been proposed in order to respond to variations in processor performance during the execution of a given parallel application. Algorithmic and systemic load balancing strategies have complementary set of advantages. An integration of these two techniques is possible and it should result in a system, which delivers advantages over each technique used in isolation. This thesis presents a design and implementation of a system that combines an algorithmic fine-grained data parallel load balancing strategy called Fractiling with a systemic coarse-grained task-parallel load balancing system called Hector. It also reports on experimental results of running N-body simulations under this integrated system. The experimental results indicate that a distributed runtime environment, which combines both algorithmic and systemic load balancing strategies, can provide performance advantages with little overhead, underscoring the importance of this approach in large complex scientific applications

Scholars Junction - Mississippi State University Institutional Repository