11 research outputs found
A Taxonomy of Data Grids for Distributed Data Sharing, Management and Processing
Data Grids have been adopted as the platform for scientific communities that
need to share, access, transport, process and manage large data collections
distributed worldwide. They combine high-end computing technologies with
high-performance networking and wide-area storage management techniques. In
this paper, we discuss the key concepts behind Data Grids and compare them with
other data sharing and distribution paradigms such as content delivery
networks, peer-to-peer networks and distributed databases. We then provide
comprehensive taxonomies that cover various aspects of architecture, data
transportation, data replication and resource allocation and scheduling.
Finally, we map the proposed taxonomy to various Data Grid systems not only to
validate the taxonomy but also to identify areas for future exploration.
Through this taxonomy, we aim to categorise existing systems to better
understand their goals and their methodology. This would help evaluate their
applicability for solving similar problems. This taxonomy also provides a "gap
analysis" of this area through which researchers can potentially identify new
issues for investigation. Finally, we hope that the proposed taxonomy and
mapping also helps to provide an easy way for new practitioners to understand
this complex area of research.Comment: 46 pages, 16 figures, Technical Repor
Mining Large Data Sets on Grids: Issues and Prospects
When data mining and knowledge discovery techniques must be used to analyze large amounts of data, high-performance parallel and distributed computers can help to provide better computational performance and, as a consequence, deeper and more meaningful results. Recently grids, composed of large-scale, geographically distributed platforms working together, have emerged as effective architectures for high-performance decentralized computation. It is natural to consider grids as tools for distributed data-intensive applications such as data mining, but the underlying patterns of computation and data movement in such applications are different from those of more conventional high-performance computation. These differences require a different kind of grid, or at least a grid with significantly different emphases. This paper discusses the main issues, requirements, and design approaches for the implementation of grid-based knowledge discovery systems. Furthermore, some prospects and promising research directions in datacentric and knowledge-discovery oriented grids are outlined
Deployment Distribuito di codice e dati su Grid mediante Tecniche di Compressione e di Caching
Studio, progettazione e realizzazione di un sistema scalabile di deployment per Grid. Il prototipo realizza il multicast di grandi quantita' di dati tramite decomposizione a blocchi con fingerprinting e replicazione distribuita.
Utilizza tecniche di compressione e caching per l’ottimizzazione della banda di rete, dei tempi di accesso ai dati e per riutilizzare i dati frutto di precedenti deployment.
Il sistema e' ottimizzato per l’invio di insiemi di file a insiemi di nodi, tutti eventualmente disgiunti.
La libreria progettata e realizzata e' in grado di mantenere pressoche' costante il tempo di deployment all’aumentare dei nodi destinatari e riesce a mantenere un’efficienza relativa che arriva fino al 100% all’aumentare della quantita' di dati da inviar
Resource aware load distribution strategies for scheduling divisible loads on large-scale data intensive computational grid systems
Ph.DDOCTOR OF PHILOSOPH
Content rendering and interaction technologies for digital heritage systems
Existing digital heritage systems accommodate a huge amount of digital repository information; however their content rendering and interaction components generally lack the more interesting functionality that allows better interaction with heritage contents. Many digital heritage libraries are simply collections of 2D images with associated metadata and textual content, i.e. little more than museum catalogues presented online. However, over the last few years, largely as a result of EU framework projects, some 3D representation of digital heritage objects are beginning to appear in a digital library context. In the cultural heritage domain, where researchers and museum visitors like to observe cultural objects as closely as possible and to feel their existence and use in the past, giving the user only 2D images along with textual descriptions significantly limits interaction and hence understanding of their heritage.
The availability of powerful content rendering technologies, such as 3D authoring tools to create 3D objects and heritage scenes, grid tools for rendering complex 3D scenes, gaming engines to display 3D interactively, and recent advances in motion capture technologies for embodied immersion, allow the development of unique solutions for enhancing user experience and interaction with digital heritage resources and objects giving a higher level of understanding and greater benefit to the community.
This thesis describes DISPLAYS (Digital Library Services for Playing with Shared Heritage Resources), which is a novel conceptual framework where five unique services are proposed for digital content: creation, archival, exposition, presentation and interaction services. These services or tools are designed to allow the heritage community to create, interpret, use and explore digital heritage resources organised as an online exhibition (or virtual museum). This thesis presents innovative solutions for two of these services or tools: content creation where a cost effective render grid is proposed; and an interaction service, where a heritage scenario is presented online using a real-time motion capture and digital puppeteer solution for the user to explore through embodied immersive interaction their digital heritage
Parallel and Distributed Astronomical Data Analysis on Grid Datafarm
A comprehensive study of the whole petabyte-scale archival data of astronomical observatories has a possibility of new science and new knowledge in the field, while it was not feasible so far due to lack of enough data analysis environment. The Grid Datafarm architecture is designed for global petabyte-scale data-intensive computing, which provides a Grid file system with file replica management for fault tolerance and load balancing, and parallel and distributed data computing support for a set of files, to meet with the requirements of the comprehensive study of the whole archival data. In the paper, we discuss about worldwide parallel and distributed data analysis in the observational astronomical field. The archival data is stored, replicated and dispersed in a Gfarm file system. All the astronomical data analysis tools successfully access files in Gfarm file system without any code modification, using a syscall hooking library regardless of file replica locations. Performance evaluation of the parallel data analysis in several ways shows file-affinity process scheduling plays an essential role for scalable and efficient parallel file I/O performance. A data calibration tools shows scalable file I/O performance, and achieved the file I/O performance of 5.9 GB/sec and 4.0 GB/sec for reading and writing FITS files, respectively, using 30 cluster nodes (60 CPUs). On-demand file replica creation mitigates the overhead of access concentration. Another tool shows the performance improvement at a factor of six for reading a shared file by creating file replicas. 1