15 research outputs found

    Building the Future Internet through FIRE

    Get PDF
    The Internet as we know it today is the result of a continuous activity for improving network communications, end user services, computational processes and also information technology infrastructures. The Internet has become a critical infrastructure for the human-being by offering complex networking services and end-user applications that all together have transformed all aspects, mainly economical, of our lives. Recently, with the advent of new paradigms and the progress in wireless technology, sensor networks and information systems and also the inexorable shift towards everything connected paradigm, first as known as the Internet of Things and lately envisioning into the Internet of Everything, a data-driven society has been created. In a data-driven society, productivity, knowledge, and experience are dependent on increasingly open, dynamic, interdependent and complex Internet services. The challenge for the Internet of the Future design is to build robust enabling technologies, implement and deploy adaptive systems, to create business opportunities considering increasing uncertainties and emergent systemic behaviors where humans and machines seamlessly cooperate

    Proceedings of the 5th bwHPC Symposium

    Get PDF
    In modern science, the demand for more powerful and integrated research infrastructures is growing constantly to address computational challenges in data analysis, modeling and simulation. The bwHPC initiative, founded by the Ministry of Science, Research and the Arts and the universities in Baden-Württemberg, is a state-wide federated approach aimed at assisting scientists with mastering these challenges. At the 5th bwHPC Symposium in September 2018, scientific users, technical operators and government representatives came together for two days at the University of Freiburg. The symposium provided an opportunity to present scientific results that were obtained with the help of bwHPC resources. Additionally, the symposium served as a platform for discussing and exchanging ideas concerning the use of these large scientific infrastructures as well as its further development

    Parallel optimization algorithms for high performance computing : application to thermal systems

    Get PDF
    The need of optimization is present in every field of engineering. Moreover, applications requiring a multidisciplinary approach in order to make a step forward are increasing. This leads to the need of solving complex optimization problems that exceed the capacity of human brain or intuition. A standard way of proceeding is to use evolutionary algorithms, among which genetic algorithms hold a prominent place. These are characterized by their robustness and versatility, as well as their high computational cost and low convergence speed. Many optimization packages are available under free software licenses and are representative of the current state of the art in optimization technology. However, the ability of optimization algorithms to adapt to massively parallel computers reaching satisfactory efficiency levels is still an open issue. Even packages suited for multilevel parallelism encounter difficulties when dealing with objective functions involving long and variable simulation times. This variability is common in Computational Fluid Dynamics and Heat Transfer (CFD & HT), nonlinear mechanics, etc. and is nowadays a dominant concern for large scale applications. Current research in improving the performance of evolutionary algorithms is mainly focused on developing new search algorithms. Nevertheless, there is a vast knowledge of sequential well-performing algorithmic suitable for being implemented in parallel computers. The gap to be covered is efficient parallelization. Moreover, advances in the research of both new search algorithms and efficient parallelization are additive, so that the enhancement of current state of the art optimization software can be accelerated if both fronts are tackled simultaneously. The motivation of this Doctoral Thesis is to make a step forward towards the successful integration of Optimization and High Performance Computing capabilities, which has the potential to boost technological development by providing better designs, shortening product development times and minimizing the required resources. After conducting a thorough state of the art study of the mathematical optimization techniques available to date, a generic mathematical optimization tool has been developed putting a special focus on the application of the library to the field of Computational Fluid Dynamics and Heat Transfer (CFD & HT). Then the main shortcomings of the standard parallelization strategies available for genetic algorithms and similar population-based optimization methods have been analyzed. Computational load imbalance has been identified to be the key point causing the degradation of the optimization algorithm¿s scalability (i.e. parallel efficiency) in case the average makespan of the batch of individuals is greater than the average time required by the optimizer for performing inter-processor communications. It occurs because processors are often unable to finish the evaluation of their queue of individuals simultaneously and need to be synchronized before the next batch of individuals is created. Consequently, the computational load imbalance is translated into idle time in some processors. Several load balancing algorithms have been proposed and exhaustively tested, being extendable to any other population-based optimization method that needs to synchronize all processors after the evaluation of each batch of individuals. Finally, a real-world engineering application that consists on optimizing the refrigeration system of a power electronic device has been presented as an illustrative example in which the use of the proposed load balancing algorithms is able to reduce the simulation time required by the optimization tool.El aumento de las aplicaciones que requieren de una aproximación multidisciplinar para poder avanzar se constata en todos los campos de la ingeniería, lo cual conlleva la necesidad de resolver problemas de optimización complejos que exceden la capacidad del cerebro humano o de la intuición. En estos casos es habitual el uso de algoritmos evolutivos, principalmente de los algoritmos genéticos, caracterizados por su robustez y versatilidad, así como por su gran coste computacional y baja velocidad de convergencia. La multitud de paquetes de optimización disponibles con licencias de software libre representan el estado del arte actual en tecnología de optimización. Sin embargo, la capacidad de adaptación de los algoritmos de optimización a ordenadores masivamente paralelos alcanzando niveles de eficiencia satisfactorios es todavía una tarea pendiente. Incluso los paquetes adaptados al paralelismo multinivel tienen dificultades para gestionar funciones objetivo que requieren de tiempos de simulación largos y variables. Esta variabilidad es común en la Dinámica de Fluidos Computacional y la Transferencia de Calor (CFD & HT), mecánica no lineal, etc. y es una de las principales preocupaciones en aplicaciones a gran escala a día de hoy. La investigación actual que tiene por objetivo la mejora del rendimiento de los algoritmos evolutivos está enfocada principalmente al desarrollo de nuevos algoritmos de búsqueda. Sin embargo, ya se conoce una gran variedad de algoritmos secuenciales apropiados para su implementación en ordenadores paralelos. La tarea pendiente es conseguir una paralelización eficiente. Además, los avances en la investigación de nuevos algoritmos de búsqueda y la paralelización son aditivos, por lo que el proceso de mejora del software de optimización actual se verá incrementada si se atacan ambos frentes simultáneamente. La motivación de esta Tesis Doctoral es avanzar hacia una integración completa de las capacidades de Optimización y Computación de Alto Rendimiento para así impulsar el desarrollo tecnológico proporcionando mejores diseños, acortando los tiempos de desarrollo del producto y minimizando los recursos necesarios. Tras un exhaustivo estudio del estado del arte de las técnicas de optimización matemática disponibles a día de hoy, se ha diseñado una librería de optimización orientada al campo de la Dinámica de Fluidos Computacional y la Transferencia de Calor (CFD & HT). A continuación se han analizado las principales limitaciones de las estrategias de paralelización disponibles para algoritmos genéticos y otros métodos de optimización basados en poblaciones. En el caso en que el tiempo de evaluación medio de la tanda de individuos sea mayor que el tiempo medio que necesita el optimizador para llevar a cabo comunicaciones entre procesadores, se ha detectado que la causa principal de la degradación de la escalabilidad o eficiencia paralela del algoritmo de optimización es el desequilibrio de la carga computacional. El motivo es que a menudo los procesadores no terminan de evaluar su cola de individuos simultáneamente y deben sincronizarse antes de que se cree la siguiente tanda de individuos. Por consiguiente, el desequilibrio de la carga computacional se convierte en tiempo de inactividad en algunos procesadores. Se han propuesto y testado exhaustivamente varios algoritmos de equilibrado de carga aplicables a cualquier método de optimización basado en una población que necesite sincronizar los procesadores tras cada tanda de evaluaciones. Finalmente, se ha presentado como ejemplo ilustrativo un caso real de ingeniería que consiste en optimizar el sistema de refrigeración de un dispositivo de electrónica de potencia. En él queda demostrado que el uso de los algoritmos de equilibrado de carga computacional propuestos es capaz de reducir el tiempo de simulación que necesita la herramienta de optimización

    Ontology matching in a distributed environment

    Get PDF

    Interactive High Performance Volume Rendering

    Get PDF
    This thesis is about Direct Volume Rendering on high performance computing systems. As direct rendering methods do not create a lower-dimensional geometric representation, the whole scientific dataset must be kept in memory. Thus, this family of algorithms has a tremendous resource demand. Direct Volume Rendering algorithms in general are well suited to be implemented for dedicated graphics hardware. Nevertheless, high performance computing systems often do not provide resources for hardware accelerated rendering, so that the visualization algorithm must be implemented for the available general-purpose hardware. Ever growing datasets that imply copying large amounts of data from the compute system to the workstation of the scientist, and the need to review intermediate simulation results, make porting Direct Volume Rendering to high performance computing systems highly relevant. The contribution of this thesis is twofold. As part of the first contribution, after devising a software architecture for general implementations of Direct Volume Rendering on highly parallel platforms, parallelization issues and implementation details for various modern architectures are discussed. The contribution results in a highly parallel implementation that tackles several platforms. The second contribution is concerned with the display phase of the “Distributed Volume Rendering Pipeline”. Rendering on a high performance computing system typically implies displaying the rendered result at a remote location. This thesis presents a remote rendering technique that is capable of hiding latency and can thus be used in an interactive environment

    Infrastructural Security for Virtualized Grid Computing

    Get PDF
    The goal of the grid computing paradigm is to make computer power as easy to access as an electrical power grid. Unlike the power grid, the computer grid uses remote resources located at a service provider. Malicious users can abuse the provided resources, which not only affects their own systems but also those of the provider and others. Resources are utilized in an environment where sensitive programs and data from competitors are processed on shared resources, creating again the potential for misuse. This is one of the main security issues, since in a business environment competitors distrust each other, and the fear of industrial espionage is always present. Currently, human trust is the strategy used to deal with these threats. The relationship between grid users and resource providers ranges from highly trusted to highly untrusted. This wide trust relationship occurs because grid computing itself changed from a research topic with few users to a widely deployed product that included early commercial adoption. The traditional open research communities have very low security requirements, while in contrast, business customers often operate on sensitive data that represents intellectual property; thus, their security demands are very high. In traditional grid computing, most users share the same resources concurrently. Consequently, information regarding other users and their jobs can usually be acquired quite easily. This includes, for example, that a user can see which processes are running on another user´s system. For business users, this is unacceptable since even the meta-data of their jobs is classified. As a consequence, most commercial customers are not convinced that their intellectual property in the form of software and data is protected in the grid. This thesis proposes a novel infrastructural security solution that advances the concept of virtualized grid computing. The work started back in 2007 and led to the development of the XGE, a virtual grid management software. The XGE itself uses operating system virtualization to provide a virtualized landscape. Users’ jobs are no longer executed in a shared manner; they are executed within special sandboxed environments. To satisfy the requirements of a traditional grid setup, the solution can be coupled with an installed scheduler and grid middleware on the grid head node. To protect the prominent grid head node, a novel dual-laned demilitarized zone is introduced to make attacks more difficult. In a traditional grid setup, the head node and the computing nodes are installed in the same network, so a successful attack could also endanger the user´s software and data. While the zone complicates attacks, it is, as all security solutions, not a perfect solution. Therefore, a network intrusion detection system is enhanced with grid specific signatures. A novel software called Fence is introduced that supports end-to-end encryption, which means that all data remains encrypted until it reaches its final destination. It transfers data securely between the user´s computer, the head node and the nodes within the shielded, internal network. A lightweight kernel rootkit detection system assures that only trusted kernel modules can be loaded. It is no longer possible to load untrusted modules such as kernel rootkits. Furthermore, a malware scanner for virtualized grids scans for signs of malware in all running virtual machines. Using virtual machine introspection, that scanner remains invisible for most types of malware and has full access to all system calls on the monitored system. To speed up detection, the load is distributed to multiple detection engines simultaneously. To enable multi-site service-oriented grid applications, the novel concept of public virtual nodes is presented. This is a virtualized grid node with a public IP address shielded by a set of dynamic firewalls. It is possible to create a set of connected, public nodes, either present on one or more remote grid sites. A special web service allows users to modify their own rule set in both directions and in a controlled manner. The main contribution of this thesis is the presentation of solutions that convey the security of grid computing infrastructures. This includes the XGE, a software that transforms a traditional grid into a virtualized grid. Design and implementation details including experimental evaluations are given for all approaches. Nearly all parts of the software are available as open source software. A summary of the contributions and an outlook to future work conclude this thesis

    A multi-tier cached I/O architecture for massively parallel supercomputers

    Get PDF
    Recent advances in storage technologies and high performance interconnects have made possible in the last years to build, more and more potent storage systems that serve thousands of nodes. The majority of storage systems of clusters and supercomputers from Top 500 list are managed by one of three scalable parallel file systems: GPFS, PVFS, and Lustre. Most large-scale scientific parallel applications are written in Message Passing Interface (MPI), which has become the de-facto standard for scalable distributed memory machines. One part of the MPI standard is related to I/O and has among its main goals the portability and efficiency of file system accesses. All of the above mentioned parallel file systems may be accessed also through the MPI-IO interface. The I/O access patterns of scientific parallel applications often consist of accesses to a large number of small, non-contiguous pieces of data. For small file accesses the performance is dominated by the latency of network transfers and disks. Parallel scientific applications lead to interleaved file access patterns with high interprocess spatial locality at the I/O nodes. Additionally, scientific applications exhibit repetitive behaviour when a loop or a function with loops issues I/O requests. When I/O access patterns are repetitive, caching and prefetching can effectively mask their access latency. These characteristics of the access patterns motivated several researchers to propose parallel I/O optimizations both at library and file system levels. However, these optimizations are not always integrated across different layers in the systems. In this dissertation we propose a novel generic parallel I/O architecture for clusters and supercomputers. Our design is aimed at large-scale parallel architectures with thousands of compute nodes. Besides acting as middleware for existing parallel file systems, our architecture provides on-line virtualization of storage resources. Another objective of this thesis is to factor out the common parallel I/O functionality from clusters and supercomputers in generic modules in order to facilitate porting of scientific applications across these platforms. Our solution is based on a multi-tier cache architecture, collective I/O, and asynchronous data staging strategies hiding the latency of data transfer between cache tiers. The thesis targets to reduce the file access latency perceived by the data-intensive parallel scientific applications by multi-layer asynchronous data transfers. In order to accomplish this objective, our techniques leverage the multi-core architectures by overlapping computation with communication and I/O in parallel threads. Prototypes of our solutions have been deployed on both clusters and Blue Gene supercomputers. Performance evaluation shows that the combination of collective strategies with overlapping of computation, communication, and I/O may bring a substantial performance benefit for access patterns common for parallel scientific applications.-----------------------------------------------------------------------------------------------------------------------------En los últimos años se ha observado un incremento sustancial de la cantidad de datos producidos por las aplicaciones científicas paralelas y de la necesidad de almacenar estos datos de forma persistente. Los sistemas de ficheros paralelos como PVFS, Lustre y GPFS han ofrecido una solución escalable para esta demanda creciente de almacenamiento. La mayoría de las aplicaciones científicas son escritas haciendo uso de la interfaz de paso de mensajes (MPI), que se ha convertido en un estándar de-facto de programación para las arquitecturas de memoria distribuida. Las aplicaciones paralelas que usan MPI pueden acceder a los sistemas de ficheros paralelos a través de la interfaz ofrecida por MPI-IO. Los patrones de acceso de las aplicaciones científicas paralelas consisten en un gran número de accesos pequeños y no contiguos. Para tamaños de acceso pequeños, el rendimiento viene limitado por la latencia de las transferencias de red y disco. Además, las aplicaciones científicas llevan a cabo accesos con una alta localidad espacial entre los distintos procesos en los nodos de E/S. Adicionalmente, las aplicaciones científicas presentan típicamente un comportamiento repetitivo. Cuando los patrones de acceso de E/S son repetitivos, técnicas como escritura demorada y lectura adelantada pueden enmascarar de forma eficiente las latencias de los accesos de E/S. Estas características han motivado a muchos investigadores en proponer optimizaciones de E/S tanto a nivel de biblioteca como a nivel del sistema de ficheros. Sin embargo, actualmente estas optimizaciones no se integran siempre a través de las distintas capas del sistema. El objetivo principal de esta tesis es proponer una nueva arquitectura genérica de E/S paralela para clusters y supercomputadores. Nuestra solución está basada en una arquitectura de caches en varias capas, una técnica de E/S colectiva y estrategias de acceso asíncronas que ocultan la latencia de transferencia de datos entre las distintas capas de caches. Nuestro diseño está dirigido a arquitecturas paralelas escalables con miles de nodos de cómputo. Además de actuar como middleware para los sistemas de ficheros paralelos existentes, nuestra arquitectura debe proporcionar virtualización on-line de los recursos de almacenamiento. Otro de los objeticos marcados para esta tesis es la factorización de las funcionalidades comunes en clusters y supercomputadores, en módulos genéricos que faciliten el despliegue de las aplicaciones científicas a través de estas plataformas. Se han desplegado distintos prototipos de nuestras soluciones tanto en clusters como en supercomputadores. Las evaluaciones de rendimiento demuestran que gracias a la combicación de las estratégias colectivas de E/S y del solapamiento de computación, comunicación y E/S, se puede obtener una sustancial mejora del rendimiento en los patrones de acceso anteriormente descritos, muy comunes en las aplicaciones paralelas de caracter científico

    GRID superscalar: a programming model for the Grid

    Get PDF
    Durant els darrers anys el Grid ha sorgit com una nova plataforma per la computació distribuïda. La tecnologia Gris permet unir diferents recursos de diferents dominis administratius i formar un superordinador virtual amb tots ells. Molts grups de recerca han dedicat els seus esforços a desenvolupar un conjunt de serveis bàsics per oferir un middleware de Grid: una capa que permet l'ús del Grid. De tota manera, utilitzar aquests serveis no és una tasca fácil per molts usuaris finals, cosa que empitjora si l'expertesa d'aquests usuaris no està relacionada amb la informàtica.Això té una influència negativa a l'hora de que la comunitat científica adopti la tecnologia Grid. Es veu com una tecnologia potent però molt difícil de fer servir. Per facilitar l'ús del Grid és necessària una capa extra que amagui la complexitat d'aquest i permeti als usuaris programar o portar les seves aplicacions de manera senzilla.Existeixen moltes propostes d'eines de programació pel Grid. En aquesta tesi fem un resum d'algunes d'elles, i podem veure que existeixen eines conscients i no-conscients del Grid (es programen especificant o no els detalls del Grid, respectivament). A més, molt poques d'aquestes eines poden explotar el paral·lelisme implícit de l'aplicació, i en la majoria d'elles, l'usuari ha de definir aquest paral·lelisme de manera explícita. Una altra característica que considerem important és si es basen en llenguatges de programació molt populars (com C++ o Java), cosa que facilita l'adopció per part dels usuaris finals.En aquesta tesi, el nostre objectiu principal ha estat crear un model de programació pel Grid basat en la programació seqüencial i els llenguatges més coneguts de la programació imperativa, capaç d'explotar el paral·lelisme implícit de les aplicacions i d'accelerar-les fent servir els recursos del Grid de manera concurrent. A més, com el Grid és de naturalesa distribuïda, heterogènia i dinàmica i degut també a que el nombre de recursos que pot formar un Grid pot ser molt gran, la probabilitat de que es produeixi una errada durant l'execució d'una aplicació és elevada. Per tant, un altre dels nostres objectius ha estat tractar qualsevol tipus d'error que pugui sorgir durant l'execució d'una aplicació de manera automàtica (ja siguin errors relacionats amb l'aplicació o amb el Grid). GRID superscalar (GRIDSs), la principal contribució d'aquesta tesi, és un model de programació que assoleix elsobjectius mencionats proporcionant una interfície molt petita i simple i un entorn d'execució que és capaç d'executar en paral·lel el codi proporcionat fent servir el Grid. La nostra interfície de programació permet a un usuari programar una aplicació no-conscient del Grid, amb llenguatges imperatius coneguts i populars (com C/C++, Java, Perl o Shell script) i de manera seqüencial, per tant dóna un pas important per ajudar als usuaris a adoptar la tecnologia Grid.Hem aplicat el nostre coneixement de l'arquitectura de computadors i el disseny de microprocessadors a l'entorn d'execució de GRIDSs. Tal com es fa a un processador superescalar, l'entorn d'execució de GRIDSs és capaç de realitzar un anàlisi de dependències entre les tasques que formen l'aplicació, i d'aplicar tècniques de renombrament per incrementar el seu paral·lelisme. GRIDSs genera automàticament a partir del codi principal de l'usuari un graf que descriu les dependències de dades en l'aplicació. També presentem casos d'ús reals del model de programació en els camps de la química computacional i la bioinformàtica, que demostren que els nostres objectius han estat assolits.Finalment, hem estudiat l'aplicació de diferents tècniques per detectar i tractar fallades: checkpoint, reintent i replicació de tasques. La nostra proposta és proporcionar un entorn capaç de tractar qualsevol tipus d'errors, de manera transparent a l'usuari sempre que sigui possible. El principal avantatge d'implementar aquests mecanismos al nivell del model de programació és que el coneixement a nivell de l'aplicació pot ser explotat per crear dinàmicament una estratègia de tolerància a fallades per cada aplicació, i evitar introduir sobrecàrrega en entorns lliures d'errors.During last years, the Grid has emerged as a new platform for distributed computing. The Grid technology allows joining different resources from different administrative domains and forming a virtual supercomputer with all of them.Many research groups have dedicated their efforts to develop a set of basic services to offer a Grid middleware: a layer that enables the use of the Grid. Anyway, using these services is not an easy task for many end users, even more if their expertise is not related to computer science. This has a negative influence in the adoption of the Grid technology by the scientific community. They see it as a powerful technology but very difficult to exploit. In order to ease the way the Grid must be used, there is a need for an extra layer which hides all the complexity of the Grid, and allows users to program or port their applications in an easy way.There has been many proposals of programming tools for the Grid. In this thesis we give an overview on some of them, and we can see that there exist both Grid-aware and Grid-unaware environments (programmed with or without specifying details of the Grid respectively). Besides, very few existing tools can exploit the implicit parallelism of the application and in the majority of them, the user must define the parallelism explicitly. Another important feature we consider is if they are based in widely used programming languages (as C++ or Java), so the adoption is easier for end users.In this thesis, our main objective has been to create a programming model for the Grid based on sequential programming and well-known imperative programming languages, able to exploit the implicit parallelism of applications and to speed them up by using the Grid resources concurrently. Moreover, because the Grid has a distributed, heterogeneous and dynamic nature and also because the number of resources that form a Grid can be very big, the probability that an error arises during an application's execution is big. Thus, another of our objectives has been to automatically deal with any type of errors which may arise during the execution of the application (application related or Grid related).GRID superscalar (GRIDSs), the main contribution of this thesis, is a programming model that achieves these mentioned objectives by providing a very small and simple interface and a runtime that is able to execute in parallel the code provided using the Grid. Our programming interface allows a user to program a Grid-unaware application with already known and popular imperative languages (such as C/C++, Java, Perl or Shell script) and in a sequential fashion, therefore giving an important step to assist end users in the adoption of the Grid technology.We have applied our knowledge from computer architecture and microprocessor design to the GRIDSs runtime. As it is done in a superscalar processor, the GRIDSs runtime system is able to perform a data dependence analysis between the tasks that form an application, and to apply renaming techniques in order to increase its parallelism. GRIDSs generates automatically from user's main code a graph describing the data dependencies in the application.We present real use cases of the programming model in the fields of computational chemistry and bioinformatics, which demonstrate that our objectives have been achieved.Finally, we have studied the application of several fault detection and treatment techniques: checkpointing, task retry and task replication. Our proposal is to provide an environment able to deal with all types of failures, transparently for the user whenever possible. The main advantage in implementing these mechanisms at the programming model level is that application-level knowledge can be exploited in order to dynamically create a fault tolerance strategy for each application, and avoiding to introduce overhead in error-free environments

    Security for Service-Oriented On-Demand Grid Computing

    Get PDF
    Grid Computing ist mittlerweile zu einem etablierten Standard für das verteilte Höchstleistungsrechnen geworden. Während die erste Generation von Grid Middleware-Systemen noch mit proprietären Schnittstellen gearbeitet hat, wurde durch die Einführung von service-orientierten Standards wie WSDL und SOAP durch die Open Grid Services Architecture (OGSA) die Interoperabilität von Grids signifikant erhöht. Dies hat den Weg für mehrere nationale und internationale Grid-Projekten bereitet, in denen eine groß e Anzahl von akademischen und eine wachsende Anzahl von industriellen Anwendungen im Grid ausgeführt werden, die die bedarfsgesteuerte (on-demand) Provisionierung und Nutzung von Ressourcen erfordern. Bedarfsgesteuerte Grids zeichnen sich dadurch aus, dass sowohl die Software, als auch die Benutzer einer starken Fluktuation unterliegen. Weiterhin sind sowohl die Software, als auch die Daten, auf denen operiert wird, meist proprietär und haben einen hohen finanziellen Wert. Dies steht in starkem Kontrast zu den heutigen Grid-Anwendungen im akademischen Umfeld, die meist offen im Quellcode vorliegen bzw. frei verfügbar sind. Um den Ansprüchen einer bedarfsgesteuerten Grid-Nutzung gerecht zu werden, muss das Grid administrative Komponenten anbieten, mit denen Anwender autonom Software installieren können, selbst wenn diese Root-Rechte benötigen. Zur gleichen Zeit muss die Sicherheit des Grids erhöht werden, um Software, Daten und Meta-Daten der kommerziellen Anwender zu schützen. Dies würde es dem Grid auch erlauben als Basistechnologie für das gerade entstehende Gebiet des Cloud Computings zu dienen, wo ähnliche Anforderungen existieren. Wie es bei den meisten komplexen IT-Systemen der Fall ist, sind auch in traditionellen Grid Middlewares Schwachstellen zu finden, die durch die geforderten Erweiterungen der administrativen Möglichkeiten potentiell zu einem noch größ erem Problem werden. Die Schwachstellen in der Grid Middleware öffnen einen homogenen Angriffsvektor auf die ansonsten heterogenen und meist privaten Cluster-Umgebungen. Hinzu kommt, dass anders als bei den privaten Cluster-Umgebungen und kleinen akademischen Grid-Projekten die angestrebten groß en und offenen Grid-Landschaften die Administratoren mit gänzlich unbekannten Benutzern und Verhaltenstrukturen konfrontieren. Dies macht das Erkennen von böswilligem Verhalten um ein Vielfaches schwerer. Als Konsequenz werden Grid-Systeme ein immer attraktivere Ziele für Angreifer, da standardisierte Zugriffsmöglichkeiten Angriffe auf eine groß e Anzahl von Maschinen und Daten von potentiell hohem finanziellen Wert ermöglichen. Während die Rechenkapazität, die Bandbreite und der Speicherplatz an sich schon attraktive Ziele darstellen können, sind die im Grid enthaltene Software und die gespeicherten Daten viel kritischere Ressourcen. Modelldaten für die neuesten Crash-Test Simulationen, eine industrielle Fluid-Simulation, oder Rechnungsdaten von Kunden haben einen beträchtlichen Wert und müssen geschützt werden. Wenn ein Grid-Anbieter nicht für die Sicherheit von Software, Daten und Meta-Daten sorgen kann, wird die industrielle Verbreitung der offenen Grid-Technologie nicht stattfinden. Die Notwendigkeit von strikten Sicherheitsmechanismen muss mit der diametral entgegengesetzten Forderung nach einfacher und schneller Integration von neuer Software und neuen Kunden in Einklang gebracht werden. In dieser Arbeit werden neue Ansätze zur Verbesserung der Sicherheit und Nutzbarkeit von service-orientiertem bedarfsgesteuertem Grid Computing vorgestellt. Sie ermöglichen eine autonome und sichere Installation und Nutzung von komplexer, service-orientierter und traditioneller Software auf gemeinsam genutzen Ressourcen. Neue Sicherheitsmechanismen schützen Software, Daten und Meta-Daten der Anwender vor anderen Anwendern und vor externen Angreifern. Das System basiert auf Betriebssystemvirtualisierungstechnologien und bietet dynamische Erstellungs- und Installationsfunktionalitäten für virtuelle Images in einer sicheren Umgebung, in der automatisierte Mechanismen anwenderspezifische Firewall-Regeln setzen, um anwenderbezogene Netzwerkpartitionen zu erschaffen. Die Grid-Umgebung wird selbst in mehrere Bereiche unterteilt, damit die Kompromittierung von einzelnen Komponenten nicht so leicht zu einer Gefährdung des gesamten Systems führen kann. Die Grid-Headnode und der Image-Erzeugungsserver werden jeweils in einzelne Bereiche dieser demilitarisierten Zone positioniert. Um die sichere Anbindung von existierenden Geschäftsanwendungen zu ermöglichen, werden der BPEL-Standard (Business Process Execution Language) und eine Workflow-Ausführungseinheit um Grid-Sicherheitskonzepte erweitert. Die Erweiterung erlaubt eine nahtlose Integration von geschützten Grid Services mit existierenden Web Services. Die Workflow-Ausführungseinheit bietet die Erzeugung und die Erneuerung (im Falle von lange laufenden Anwendungen) von Proxy-Zertifikaten. Der Ansatz ermöglicht die sichere gemeinsame Ausführung von neuen, fein-granularen, service-orientierten Grid Anwendungen zusammen mit traditionellen Batch- und Job-Farming Anwendungen. Dies wird durch die Integration des vorgestellten Grid Sandboxing-Systems in existierende Cluster Scheduling Systeme erreicht. Eine innovative Server-Rotationsstrategie sorgt für weitere Sicherheit für den Grid Headnode Server, in dem transparent das virtuelle Server Image erneuert wird und damit auch unbekannte und unentdeckte Angriffe neutralisiert werden. Um die Angriffe, die nicht verhindert werden konnten, zu erkennen, wird ein neuartiges Intrusion Detection System vorgestellt, das auf Basis von Datenstrom-Datenbanksystemen funktioniert. Als letzte Neuerung dieser Arbeit wird eine Erweiterung des modellgetriebenen Softwareentwicklungsprozesses eingeführt, die eine automatisierte Generierung von sicheren Grid Services ermöglicht, um die komplexe und damit unsichere manuelle Erstellung von Grid Services zu ersetzen. Eine prototypische Implementierung der Konzepte wird auf Basis des Globus Toolkits 4, der Sun Grid Engine und der ActiveBPEL Engine vorgestellt. Die modellgetriebene Entwicklungsumgebung wurde in Eclipse für das Globus Toolkit 4 realisiert. Experimentelle Resultate und eine Evaluation der kritischen Komponenten des vorgestellten neuen Grids werden präsentiert. Die vorgestellten Sicherheitsmechanismem sollen die nächste Phase der Evolution des Grid Computing in einer sicheren Umgebung ermöglichen

    IT-Management in der Praxis. Seminar WS 2004/05

    Get PDF
    corecore