59 research outputs found

    Supporting Complex Scientific Database Schemas in a Grid Middleware

    Get PDF
    “This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder." “Copyright IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.” DOI: 10.1109/AINA.2009.129The volume of digital scientific data has increased considerably with advancing technologies of computing devices and scientific instruments. We are exploring the use of emerging Grid technologies for the management and manipulation of very large distributed scientific datasets. Taking as an example a terabyte-size scientific database with complex database schema, this paper focuses on the potential of a well-known Grid middleware - OGSA-DQP - for distributing such datasets. In particular, we investigate and extend the data type support in this system to handle a complex schema of a real scientific database - the Sloan Digital Sky Survey database

    AstroGrid-D: Grid Technology for Astronomical Science

    Full text link
    We present status and results of AstroGrid-D, a joint effort of astrophysicists and computer scientists to employ grid technology for scientific applications. AstroGrid-D provides access to a network of distributed machines with a set of commands as well as software interfaces. It allows simple use of computer and storage facilities and to schedule or monitor compute tasks and data management. It is based on the Globus Toolkit middleware (GT4). Chapter 1 describes the context which led to the demand for advanced software solutions in Astrophysics, and we state the goals of the project. We then present characteristic astrophysical applications that have been implemented on AstroGrid-D in chapter 2. We describe simulations of different complexity, compute-intensive calculations running on multiple sites, and advanced applications for specific scientific purposes, such as a connection to robotic telescopes. We can show from these examples how grid execution improves e.g. the scientific workflow. Chapter 3 explains the software tools and services that we adapted or newly developed. Section 3.1 is focused on the administrative aspects of the infrastructure, to manage users and monitor activity. Section 3.2 characterises the central components of our architecture: The AstroGrid-D information service to collect and store metadata, a file management system, the data management system, and a job manager for automatic submission of compute tasks. We summarise the successfully established infrastructure in chapter 4, concluding with our future plans to establish AstroGrid-D as a platform of modern e-Astronomy.Comment: 14 pages, 12 figures Subjects: data analysis, image processing, robotic telescopes, simulations, grid. Accepted for publication in New Astronom

    AstroGrid-D: Enhancing Astronomic Science with Grid Technology

    Get PDF
    We present AstroGrid-D, a project bringing together astronomers and experts in Grid technology to enhance astronomic science in many aspects. First, by sharing currently dispersed resources, scientists can calculate their models in more detail. Second, by developing new mechanisms to efficiently access and process existing datasets, scientific problems can be investigated that were until now impossible to solve. Third, by adopting Grid technology large instruments such as robotic telescopes and complex scientific workflows from data aquisition to analysis can be managed in an integrated manner. In this paper, we present prominent astronomic use cases, discuss requirements on a Grid middleware and present our approach to extend/augment existing middleware to facilitate the improvements mentioned above

    AstroGrid-D: Enhancing Astronomic Science with Grid Technology

    Get PDF
    We present AstroGrid-D, a project bringing together astronomers and experts in Grid technology to enhance astronomic science in many aspects. First, by sharing currently dispersed resources, scientists can calculate their models in more detail. Second, by developing new mechanisms to efficiently access and process existing datasets, scientific problems can be investigated that were until now impossible to solve. Third, by adopting Grid technology large instruments such as robotic telescopes and complex scientific workflows from data aquisition to analysis can be managed in an integrated manner. In this paper, we present prominent astronomic use cases, discuss requirements on a Grid middleware and present our approach to extend/augment existing middleware to facilitate the improvements mentioned above

    Formal Representation of the SS-DB Benchmark and Experimental Evaluation in EXTASCID

    Full text link
    Evaluating the performance of scientific data processing systems is a difficult task considering the plethora of application-specific solutions available in this landscape and the lack of a generally-accepted benchmark. The dual structure of scientific data coupled with the complex nature of processing complicate the evaluation procedure further. SS-DB is the first attempt to define a general benchmark for complex scientific processing over raw and derived data. It fails to draw sufficient attention though because of the ambiguous plain language specification and the extraordinary SciDB results. In this paper, we remedy the shortcomings of the original SS-DB specification by providing a formal representation in terms of ArrayQL algebra operators and ArrayQL/SciQL constructs. These are the first formal representations of the SS-DB benchmark. Starting from the formal representation, we give a reference implementation and present benchmark results in EXTASCID, a novel system for scientific data processing. EXTASCID is complete in providing native support both for array and relational data and extensible in executing any user code inside the system by the means of a configurable metaoperator. These features result in an order of magnitude improvement over SciDB at data loading, extracting derived data, and operations over derived data.Comment: 32 pages, 3 figure

    A Taxonomy of Data Grids for Distributed Data Sharing, Management and Processing

    Full text link
    Data Grids have been adopted as the platform for scientific communities that need to share, access, transport, process and manage large data collections distributed worldwide. They combine high-end computing technologies with high-performance networking and wide-area storage management techniques. In this paper, we discuss the key concepts behind Data Grids and compare them with other data sharing and distribution paradigms such as content delivery networks, peer-to-peer networks and distributed databases. We then provide comprehensive taxonomies that cover various aspects of architecture, data transportation, data replication and resource allocation and scheduling. Finally, we map the proposed taxonomy to various Data Grid systems not only to validate the taxonomy but also to identify areas for future exploration. Through this taxonomy, we aim to categorise existing systems to better understand their goals and their methodology. This would help evaluate their applicability for solving similar problems. This taxonomy also provides a "gap analysis" of this area through which researchers can potentially identify new issues for investigation. Finally, we hope that the proposed taxonomy and mapping also helps to provide an easy way for new practitioners to understand this complex area of research.Comment: 46 pages, 16 figures, Technical Repor

    Grids and the Virtual Observatory

    Get PDF
    We consider several projects from astronomy that benefit from the Grid paradigm and associated technology, many of which involve either massive datasets or the federation of multiple datasets. We cover image computation (mosaicking, multi-wavelength images, and synoptic surveys); database computation (representation through XML, data mining, and visualization); and semantic interoperability (publishing, ontologies, directories, and service descriptions)

    Regionally distributed architecture for dynamic e-learning environment (RDADeLE)

    Get PDF
    e-Learning is becoming an influential role as an economic method and a flexible mode of study in the institutions of higher education today which has a presence in an increasing number of college and university courses. e-Learning as system of systems is a dynamic and scalable environment. Within this environment, e-learning is still searching for a permanent, comfortable and serviceable position that is to be controlled, managed, flexible, accessible and continually up-to-date with the wider university structure. As most academic and business institutions and training centres around the world have adopted the e-learning concept and technology in order to create, deliver and manage their learning materials through the web, it has become the focus of investigation. However, management, monitoring and collaboration between these institutions and centres are limited. Existing technologies such as grid, web services and agents are promising better results. In this research a new architecture has been developed and adopted to make the e-learning environment more dynamic and scalable by dividing it into regional data grids which are managed and monitored by agents. Multi-agent technology has been applied to integrate each regional data grid with others in order to produce an architecture which is more scalable, reliable, and efficient. The result we refer to as Regionally Distributed Architecture for Dynamic e-Learning Environment (RDADeLE). Our RDADeLE architecture is an agent-based grid environment which is composed of components such as learners, staff, nodes, regional grids, grid services and Learning Objects (LOs). These components are built and organised as a multi-agent system (MAS) using the Java Agent Development (JADE) platform. The main role of the agents in our architecture is to control and monitor grid components in order to build an adaptable, extensible, and flexible grid-based e-learning system. Two techniques have been developed and adopted in the architecture to build LOs' information and grid services. The first technique is the XML-based Registries Technique (XRT). In this technique LOs' information is built using XML registries to be discovered by the learners. The registries are written in Dublin Core Metadata Initiative (DCMI) format. The second technique is the Registered-based Services Technique (RST). In this technique the services are grid services which are built using agents. The services are registered with the Directory Facilitator (DF) of a JADE platform in order to be discovered by all other components. All components of the RDADeLE system, including grid service, are built as a multi-agent system (MAS). Each regional grid in the first technique has only its own registry, whereas in the second technique the grid services of all regional grids have to be registered with the DF. We have evaluated the RDADeLE system guided by both techniques by building a simulation of the prototype. The prototype has a main interface which consists of the name of the system (RDADeLE) and a specification table which includes Number of Regional Grids, Number of Nodes, Maximum Number of Learners connected to each node, and Number of Grid Services to be filled by the administrator of the RDADeLE system in order to create the prototype. Using the RST technique shows that the RDADeLE system can be built with more regional grids with less memory consumption. Moreover, using the RST technique shows that more grid services can be registered in the RDADeLE system with a lower average search time and the search performance is increased compared with the XRT technique. Finally, using one or both techniques, the XRT or the RST, in the prototype does not affect the reliability of the RDADeLE system.Royal Commission for Jubail and Yanbu - Directorate General For Jubail Project Kingdom of Saudi Arabi

    Resource Allocation for Query Optimization in Data Grid Systems: Static Load Balancing Strategies

    Get PDF
    International audienceResource allocation is one of the principal stages of relational query processing in data grid systems. Static allocation methods allocate nodes to relational operations during query compilation. Existing heuristics did not take into account the multi-queries environment, where some nodes may become overloaded because they are allocated to too many concurrent queries. Dynamic resource allocation mechanisms are currently developed to modify the physical plan during query execution. In fact, when a node is detected to be overloaded, some of the operations on it will migrate. However, if the resource contention is too heavy in the initial execution plan, the operation migration cost may be very high. In this paper, we propose two load balancing strategies adopted during the static resource allocation phase, so that the workload is balanced at the beginning, the operation migration cost is decreased during the query execution, and therefore the average response time is reduced

    Navigating Diverse Datasets in the Face of Uncertainty

    Get PDF
    When exploring big volumes of data, one of the challenging aspects is their diversity of origin. Multiple files that have not yet been ingested into a database system may contain information of interest to a researcher, who must curate, understand and sieve their content before being able to extract knowledge. Performance is one of the greatest difficulties in exploring these datasets. On the one hand, examining non-indexed, unprocessed files can be inefficient. On the other hand, any processing before its understanding introduces latency and potentially un- necessary work if the chosen schema matches poorly the data. We have surveyed the state-of-the-art and, fortunately, there exist multiple proposal of solutions to handle data in-situ performantly. Another major difficulty is matching files from multiple origins since their schema and layout may not be compatible or properly documented. Most surveyed solutions overlook this problem, especially for numeric, uncertain data, as is typical in fields like astronomy. The main objective of our research is to assist data scientists during the exploration of unprocessed, numerical, raw data distributed across multiple files based solely on its intrinsic distribution. In this thesis, we first introduce the concept of Equally-Distributed Dependencies, which provides the foundations to match this kind of dataset. We propose PresQ, a novel algorithm that finds quasi-cliques on hypergraphs based on their expected statistical properties. The probabilistic approach of PresQ can be successfully exploited to mine EDD between diverse datasets when the underlying populations can be assumed to be the same. Finally, we propose a two-sample statistical test based on Self-Organizing Maps (SOM). This method can outperform, in terms of power, other classifier-based two- sample tests, being in some cases comparable to kernel-based methods, with the advantage of being interpretable. Both PresQ and the SOM-based statistical test can provide insights that drive serendipitous discoveries
    corecore