4 research outputs found
Generic Metadata Handling in Scientific Data Life Cycles
Scientific data life cycles define how data is created, handled, accessed, and analyzed by users. Such data life cycles become increasingly sophisticated as the sciences they deal with become more and more demanding and complex with the coming advent of exascale data and computing. The overarching data life cycle management background includes multiple abstraction categories with data sources, data and metadata management, computing and workflow management, security, data sinks, and methods on how to enable utilization. Challenges in this context are manifold. One is to hide the complexity from the user and to enable seamlessness in using resources to usability and efficiency. Another one is to enable generic metadata management that is not restricted to one use case but can be adapted with limited effort to further ones.
Metadata management is essential to enable scientists to save time by avoiding the need for manually keeping track of data, meaning for example by its content and location. As the number of files grows into the millions, managing data without metadata becomes increasingly difficult. Thus, the solution is to employ metadata management to enable the organization of data based on information about it. Previously, use cases tended to only support highly specific or no metadata management at all. Now, a generic metadata management concept is available that can be used to efficiently integrate metadata capabilities with use cases.
The concept was implemented within the MoSGrid data life cycle that enables molecular simulations on distributed HPC-enabled data and computing infrastructures. The implementation enables easy-to-use and effective metadata management. Automated extraction, annotation, and indexing of metadata was designed, developed, integrated, and search capabilities provided via a seamless user interface. Further analysis runs can be directly started based on search results. A complete evaluation of the concept both in general and along the example implementation is presented. In conclusion, generic metadata management concept advances the state of the art in scientific date life cycle management
Descoberta de recursos para sistemas de escala arbitrarias
Doutoramento em InformáticaTecnologias de Computação DistribuÃda em larga escala tais como Cloud,
Grid, Cluster e Supercomputadores HPC estão a evoluir juntamente com a
emergência revolucionária de modelos de múltiplos núcleos (por exemplo:
GPU, CPUs num único die, Supercomputadores em single die, Supercomputadores
em chip, etc) e avanços significativos em redes e soluções de
interligação. No futuro, nós de computação com milhares de núcleos podem
ser ligados entre si para formar uma única unidade de computação
transparente que esconde das aplicações a complexidade e a natureza distribuÃda desses sistemas com múltiplos núcleos. A fim de beneficiar de forma
eficiente de todos os potenciais recursos nesses ambientes de computação
em grande escala com múltiplos núcleos ativos, a descoberta de recursos é um elemento crucial para explorar ao máximo as capacidade de todos
os recursos heterogéneos distribuÃdos, através do reconhecimento preciso e
localização desses recursos no sistema. A descoberta eficiente e escalável
de recursos ´e um desafio para tais sistemas futuros, onde os recursos e as
infira-estruturas de computação e comunicação subjacentes são altamente
dinâmicas, hierarquizadas e heterogéneas. Nesta tese, investigamos o problema
da descoberta de recursos no que diz respeito aos requisitos gerais da
escalabilidade arbitrária de ambientes de computação futuros com múltiplos
núcleos ativos. A principal contribuição desta tese ´e a proposta de uma
entidade de descoberta de recursos adaptativa hÃbrida (Hybrid Adaptive
Resource Discovery - HARD), uma abordagem de descoberta de recursos eficiente
e altamente escalável, construÃda sobre uma sobreposição hierárquica
virtual baseada na auto-organizaçãoo e auto-adaptação de recursos de processamento
no sistema, onde os recursos computacionais são organizados
em hierarquias distribuÃdas de acordo com uma proposta de modelo de
descriçãoo de recursos multi-camadas hierárquicas. Operacionalmente, em
cada camada, que consiste numa arquitetura ponto-a-ponto de módulos que,
interagindo uns com os outros, fornecem uma visão global da disponibilidade
de recursos num ambiente distribuÃdo grande, dinâmico e heterogéneo. O
modelo de descoberta de recursos proposto fornece a adaptabilidade e flexibilidade
para executar consultas complexas através do apoio a um conjunto
de caracterÃsticas significativas (tais como multi-dimensional, variedade e
consulta agregada) apoiadas por uma correspondência exata e parcial, tanto
para o conteúdo de objetos estéticos e dinâmicos. Simulações mostram
que o HARD pode ser aplicado a escalas arbitrárias de dinamismo, tanto
em termos de complexidade como de escala, posicionando esta proposta
como uma arquitetura adequada para sistemas futuros de múltiplos núcleos.
Também contribuÃmos com a proposta de um regime de gestão eficiente
dos recursos para sistemas futuros que podem utilizar recursos distribuÃos
de forma eficiente e de uma forma totalmente descentralizada. Além disso,
aproveitando componentes de descoberta (RR-RPs) permite que a nossa
plataforma de gestão de recursos encontre e aloque dinamicamente recursos
disponÃeis que garantam os parâmetros de QoS pedidos.Large scale distributed computing technologies such as Cloud, Grid, Cluster
and HPC supercomputers are progressing along with the revolutionary emergence
of many-core designs (e.g. GPU, CPUs on single die, supercomputers
on chip, etc.) and significant advances in networking and interconnect solutions.
In future, computing nodes with thousands of cores may be connected
together to form a single transparent computing unit which hides from applications
the complexity and distributed nature of these many core systems. In
order to efficiently benefit from all the potential resources in such large scale
many-core-enabled computing environments, resource discovery is the vital
building block to maximally exploit the capabilities of all distributed heterogeneous
resources through precisely recognizing and locating those resources
in the system. The efficient and scalable resource discovery is challenging for
such future systems where the resources and the underlying computation and
communication infrastructures are highly-dynamic, highly-hierarchical and
highly-heterogeneous. In this thesis, we investigate the problem of resource
discovery with respect to the general requirements of arbitrary scale future
many-core-enabled computing environments. The main contribution of this
thesis is to propose Hybrid Adaptive Resource Discovery (HARD), a novel
efficient and highly scalable resource-discovery approach which is built upon
a virtual hierarchical overlay based on self-organization and self-adaptation
of processing resources in the system, where the computing resources are
organized into distributed hierarchies according to a proposed hierarchical
multi-layered resource description model. Operationally, at each layer, it
consists of a peer-to-peer architecture of modules that, by interacting with
each other, provide a global view of the resource availability in a large,
dynamic and heterogeneous distributed environment. The proposed resource
discovery model provides the adaptability and flexibility to perform complex
querying by supporting a set of significant querying features (such as
multi-dimensional, range and aggregate querying) while supporting exact
and partial matching, both for static and dynamic object contents. The
simulation shows that HARD can be applied to arbitrary scales of dynamicity,
both in terms of complexity and of scale, positioning this proposal as a
proper architecture for future many-core systems. We also contributed to
propose a novel resource management scheme for future systems which
efficiently can utilize distributed resources in a fully decentralized fashion.
Moreover, leveraging discovery components (RR-RPs) enables our resource
management platform to dynamically find and allocate available resources
that guarantee the QoS parameters on demand
Parallele Simulation der globalen Beleuchtung in komplexen Architekturmodellen
von Olaf SchmidtPaderborn, Univ.-GH, Diss., 200