11 research outputs found

    A Taxonomy of Data Grids for Distributed Data Sharing, Management and Processing

    Full text link
    Data Grids have been adopted as the platform for scientific communities that need to share, access, transport, process and manage large data collections distributed worldwide. They combine high-end computing technologies with high-performance networking and wide-area storage management techniques. In this paper, we discuss the key concepts behind Data Grids and compare them with other data sharing and distribution paradigms such as content delivery networks, peer-to-peer networks and distributed databases. We then provide comprehensive taxonomies that cover various aspects of architecture, data transportation, data replication and resource allocation and scheduling. Finally, we map the proposed taxonomy to various Data Grid systems not only to validate the taxonomy but also to identify areas for future exploration. Through this taxonomy, we aim to categorise existing systems to better understand their goals and their methodology. This would help evaluate their applicability for solving similar problems. This taxonomy also provides a "gap analysis" of this area through which researchers can potentially identify new issues for investigation. Finally, we hope that the proposed taxonomy and mapping also helps to provide an easy way for new practitioners to understand this complex area of research.Comment: 46 pages, 16 figures, Technical Repor

    Mining Large Data Sets on Grids: Issues and Prospects

    Get PDF
    When data mining and knowledge discovery techniques must be used to analyze large amounts of data, high-performance parallel and distributed computers can help to provide better computational performance and, as a consequence, deeper and more meaningful results. Recently grids, composed of large-scale, geographically distributed platforms working together, have emerged as effective architectures for high-performance decentralized computation. It is natural to consider grids as tools for distributed data-intensive applications such as data mining, but the underlying patterns of computation and data movement in such applications are different from those of more conventional high-performance computation. These differences require a different kind of grid, or at least a grid with significantly different emphases. This paper discusses the main issues, requirements, and design approaches for the implementation of grid-based knowledge discovery systems. Furthermore, some prospects and promising research directions in datacentric and knowledge-discovery oriented grids are outlined

    Deployment Distribuito di codice e dati su Grid mediante Tecniche di Compressione e di Caching

    Get PDF
    Studio, progettazione e realizzazione di un sistema scalabile di deployment per Grid. Il prototipo realizza il multicast di grandi quantita' di dati tramite decomposizione a blocchi con fingerprinting e replicazione distribuita. Utilizza tecniche di compressione e caching per l’ottimizzazione della banda di rete, dei tempi di accesso ai dati e per riutilizzare i dati frutto di precedenti deployment. Il sistema e' ottimizzato per l’invio di insiemi di file a insiemi di nodi, tutti eventualmente disgiunti. La libreria progettata e realizzata e' in grado di mantenere pressoche' costante il tempo di deployment all’aumentare dei nodi destinatari e riesce a mantenere un’efficienza relativa che arriva fino al 100% all’aumentare della quantita' di dati da inviar

    Content rendering and interaction technologies for digital heritage systems

    Get PDF
    Existing digital heritage systems accommodate a huge amount of digital repository information; however their content rendering and interaction components generally lack the more interesting functionality that allows better interaction with heritage contents. Many digital heritage libraries are simply collections of 2D images with associated metadata and textual content, i.e. little more than museum catalogues presented online. However, over the last few years, largely as a result of EU framework projects, some 3D representation of digital heritage objects are beginning to appear in a digital library context. In the cultural heritage domain, where researchers and museum visitors like to observe cultural objects as closely as possible and to feel their existence and use in the past, giving the user only 2D images along with textual descriptions significantly limits interaction and hence understanding of their heritage. The availability of powerful content rendering technologies, such as 3D authoring tools to create 3D objects and heritage scenes, grid tools for rendering complex 3D scenes, gaming engines to display 3D interactively, and recent advances in motion capture technologies for embodied immersion, allow the development of unique solutions for enhancing user experience and interaction with digital heritage resources and objects giving a higher level of understanding and greater benefit to the community. This thesis describes DISPLAYS (Digital Library Services for Playing with Shared Heritage Resources), which is a novel conceptual framework where five unique services are proposed for digital content: creation, archival, exposition, presentation and interaction services. These services or tools are designed to allow the heritage community to create, interpret, use and explore digital heritage resources organised as an online exhibition (or virtual museum). This thesis presents innovative solutions for two of these services or tools: content creation where a cost effective render grid is proposed; and an interaction service, where a heritage scenario is presented online using a real-time motion capture and digital puppeteer solution for the user to explore through embodied immersive interaction their digital heritage

    Applications Development for the Computational Grid

    Get PDF

    Parallel and Distributed Astronomical Data Analysis on Grid Datafarm

    No full text
    A comprehensive study of the whole petabyte-scale archival data of astronomical observatories has a possibility of new science and new knowledge in the field, while it was not feasible so far due to lack of enough data analysis environment. The Grid Datafarm architecture is designed for global petabyte-scale data-intensive computing, which provides a Grid file system with file replica management for fault tolerance and load balancing, and parallel and distributed data computing support for a set of files, to meet with the requirements of the comprehensive study of the whole archival data. In the paper, we discuss about worldwide parallel and distributed data analysis in the observational astronomical field. The archival data is stored, replicated and dispersed in a Gfarm file system. All the astronomical data analysis tools successfully access files in Gfarm file system without any code modification, using a syscall hooking library regardless of file replica locations. Performance evaluation of the parallel data analysis in several ways shows file-affinity process scheduling plays an essential role for scalable and efficient parallel file I/O performance. A data calibration tools shows scalable file I/O performance, and achieved the file I/O performance of 5.9 GB/sec and 4.0 GB/sec for reading and writing FITS files, respectively, using 30 cluster nodes (60 CPUs). On-demand file replica creation mitigates the overhead of access concentration. Another tool shows the performance improvement at a factor of six for reading a shared file by creating file replicas. 1
    corecore