10 research outputs found

    ZIH-Info

    Get PDF
    - IT-Service-Katalog der TU Dresden - Konferenzzugänge für eduroam - Adobe ETLA-Desktop-Rahmenvertrag für Sachsen - Workshop „Videokonferenzen im Wissenschaftsnetz“ - Das ZIH läuft Mitteilung aus dem Dezernat 8 - Eröffnung Frontdesk des ServiceCenterStudium - ZIH-Publikationen - Veranstaltunge

    A comprehensive scenario agnostic Data LifeCycle model for an efficient data complexity management

    Get PDF
    There is a vast amount of data being generated every day in the world, coming from a variety of sources, with different formats, quality levels, etc. This new data, together with the archived historical data, constitute the seed for future knowledge discovery and value generation in several fields of eScience. Discovering value from data is a complex computing process where data is the key resource, not only during its processing, but also during its entire life cycle. However, there is still a huge concern about how to organize and manage this data in all fields, and at all scales, for efficient usage and exploitation during all data life cycles. Although several specific Data LifeCycle (DLC) models have been recently defined for particular scenarios, we argue that there is no global and comprehensive DLC framework to be widely used in different fields. For this reason, in this paper we present and describe a comprehensive scenario agnostic Data LifeCycle (COSA-DLC) model successfully addressing all challenges included in the 6Vs, namely Value, Volume, Variety, Velocity, Variability and Veracity, not tailored to any specific environment, but easy to be adapted to fit the requirements of any particular field. We conclude that a comprehensive scenario agnostic DLC model provides several advantages, such as facilitating global data organization and integration, easing the adaptation to any kind of scenario, guaranteeing good quality data levels, and helping save design time and efforts for the research and industrial communities.Peer ReviewedPostprint (author's final draft

    Towards a comprehensive Data LifeCycle model for big data environments

    Get PDF
    A huge amount of data is constantly being produced in the world. Data coming from the IoT, from scientific simulations, or from any other field of the eScience, are accumulated over historical data sets and set up the seed for future Big Data processing, with the final goal to generate added value and discover knowledge. In such computing processes, data are the main resource, however, organizing and managing data during their entire life cycle becomes a complex research topic. As part of this, Data LifeCycle (DLC) models have been proposed to efficiently organize large and complex data sets, from creation to consumption, in any field, and any scale, for an effective data usage and big data exploitation. 2. Several DLC frameworks can be found in the literature, each one defined for specific environments and scenarios. However, we realized that there is no global and comprehensive DLC model to be easily adapted to different scientific areas. For this reason, in this paper we describe the Comprehensive Scenario Agnostic Data LifeCycle (COSA-DLC) model, a DLC model which: i) is proved to be comprehensive as it addresses the 6Vs challenges (namely Value, Volume, Variety, Velocity, Variability and Veracity, and ii), it can be easily adapted to any particular scenario and, therefore, fit the requirements of a specific scientific field. In this paper we also include two use cases to illustrate the ease of the adaptation in different scenarios. We conclude that the comprehensive scenario agnostic DLC model provides several advantages, such as facilitating global data management, organization and integration, easing the adaptation to any kind of scenario, guaranteeing good data quality levels and, therefore, saving design time and efforts for the scientific and industrial communities.Peer ReviewedPostprint (author's final draft

    Generic Metadata Handling in Scientific Data Life Cycles

    Get PDF
    Scientific data life cycles define how data is created, handled, accessed, and analyzed by users. Such data life cycles become increasingly sophisticated as the sciences they deal with become more and more demanding and complex with the coming advent of exascale data and computing. The overarching data life cycle management background includes multiple abstraction categories with data sources, data and metadata management, computing and workflow management, security, data sinks, and methods on how to enable utilization. Challenges in this context are manifold. One is to hide the complexity from the user and to enable seamlessness in using resources to usability and efficiency. Another one is to enable generic metadata management that is not restricted to one use case but can be adapted with limited effort to further ones. Metadata management is essential to enable scientists to save time by avoiding the need for manually keeping track of data, meaning for example by its content and location. As the number of files grows into the millions, managing data without metadata becomes increasingly difficult. Thus, the solution is to employ metadata management to enable the organization of data based on information about it. Previously, use cases tended to only support highly specific or no metadata management at all. Now, a generic metadata management concept is available that can be used to efficiently integrate metadata capabilities with use cases. The concept was implemented within the MoSGrid data life cycle that enables molecular simulations on distributed HPC-enabled data and computing infrastructures. The implementation enables easy-to-use and effective metadata management. Automated extraction, annotation, and indexing of metadata was designed, developed, integrated, and search capabilities provided via a seamless user interface. Further analysis runs can be directly started based on search results. A complete evaluation of the concept both in general and along the example implementation is presented. In conclusion, generic metadata management concept advances the state of the art in scientific date life cycle management

    Helmholtz Portfolio Theme Large-Scale Data Management and Analysis (LSDMA)

    Get PDF
    The Helmholtz Association funded the "Large-Scale Data Management and Analysis" portfolio theme from 2012-2016. Four Helmholtz centres, six universities and another research institution in Germany joined to enable data-intensive science by optimising data life cycles in selected scientific communities. In our Data Life cycle Labs, data experts performed joint R&D together with scientific communities. The Data Services Integration Team focused on generic solutions applied by several communities

    Hierarchical distributed fog-to-cloud data management in smart cities

    Get PDF
    There is a vast amount of data being generated every day in the world with different formats, quality levels, etc. This new data, together with the archived historical data, constitute the seed for future knowledge discovery and value generation in several fields of science and big data environments. Discovering value from data is a complex computing process where data is the key resource, not only during its processing, but also during its entire life cycle. However, there is still a huge concern about how to organize and manage this data in all fields for efficient usage and exploitation during all data life cycles. Although several specific Data LifeCycle (DLC) models have been recently defined for particular scenarios, we argue that there is no global and comprehensive DLC framework to be widely used in different fields. In particular scenario, smart cities are the current technological solutions to handle the challenges and complexity of the growing urban density. Traditionally, Smart City resources management rely on cloud based solutions where sensors data are collected to provide a centralized and rich set of open data. The advantages of cloud-based frameworks are their ubiquity, as well as an (almost) unlimited resources capacity. However, accessing data from the cloud implies large network traffic, high latencies usually not appropriate for real-time or critical solutions, as well as higher security risks. Alternatively, fog computing emerges as a promising technology to absorb these inconveniences. It proposes the use of devices at the edge to provide closer computing facilities and, therefore, reducing network traffic, reducing latencies drastically while improving security. We have defined a new framework for data management in the context of a Smart City through a global fog to cloud resources management architecture. This model has the advantages of both, fog and cloud technologies, as it allows reduced latencies for critical applications while being able to use the high computing capabilities of cloud technology. In this thesis, we propose many novel ideas in the design of a novel F2C Data Management architecture for smart cities as following. First, we draw and describe a comprehensive scenario agnostic Data LifeCycle model successfully addressing all challenges included in the 6Vs not tailored to any specific environment, but easy to be adapted to fit the requirements of any particular field. Then, we introduce the Smart City Comprehensive Data LifeCycle model, a data management architecture generated from a comprehensive scenario agnostic model, tailored for the particular scenario of Smart Cities. We define the management of each data life phase, and explain its implementation on a Smart City with Fog-to-Cloud (F2C) resources management. And then, we illustrate a novel architecture for data management in the context of a Smart City through a global fog to cloud resources management architecture. We show this model has the advantages of both, fog and cloud, as it allows reduced latencies for critical applications while being able to use the high computing capabilities of cloud technology. As a first experiment for the F2C data management architecture, a real Smart City is analyzed, corresponding to the city of Barcelona, with special emphasis on the layers responsible for collecting the data generated by the deployed sensors. The amount of daily sensors data transmitted through the network has been estimated and a rough projection has been made assuming an exhaustive deployment that fully covers all city. And, we provide some solutions to both reduce the data transmission and improve the data management. Then, we used some data filtering techniques (including data aggregation and data compression) to estimate the network traffic in this model during data collection and compare it with a traditional real system. Indeed, we estimate the total data storage sizes through F2C scenario for Barcelona smart citiesAl món es generen diàriament una gran quantitat de dades, amb diferents formats, nivells de qualitat, etc. Aquestes noves dades, juntament amb les dades històriques arxivades, constitueixen la llavor per al descobriment de coneixement i la generació de valor en diversos camps de la ciència i grans entorns de dades (big data). Descobrir el valor de les dades és un procés complex de càlcul on les dades són el recurs clau, no només durant el seu processament, sinó també durant tot el seu cicle de vida. Tanmateix, encara hi ha una gran preocupació per com organitzar i gestionar aquestes dades en tots els camps per a un ús i explotació eficients durant tots els cicles de vida de les dades. Encara que recentment s'han definit diversos models específics de Data LifeCycle (DLC) per a escenaris particulars, argumentem que no hi ha un marc global i complet de DLC que s'utilitzi àmpliament en diferents camps. En particular, les ciutats intel·ligents són les solucions tecnològiques actuals per fer front als reptes i la complexitat de la creixent densitat urbana. Tradicionalment, la gestió de recursos de Smart City es basa en solucions basades en núvol (cloud computing) on es recopilen dades de sensors per proporcionar un conjunt de dades obert i centralitzat. Les avantatges dels entorns basats en núvol són la seva ubiqüitat, així com una capacitat (gairebé) il·limitada de recursos. Tanmateix, l'accés a dades del núvol implica un gran trànsit de xarxa i, en general, les latències elevades no són apropiades per a solucions crítiques o en temps real, així com també per a riscos de seguretat més elevats. Alternativament, el processament de boira (fog computing) sorgeix com una tecnologia prometedora per absorbir aquests inconvenients. Proposa l'ús de dispositius a la vora per proporcionar recuirsos informàtics més propers i, per tant, reduir el trànsit de la xarxa, reduint les latències dràsticament mentre es millora la seguretat. Hem definit un nou marc per a la gestió de dades en el context d'una ciutat intel·ligent a través d'una arquitectura de gestió de recursos des de la boira fins al núvol (Fog-to-Cloud computing, o F2C). Aquest model té els avantatges combinats de les tecnologies de boira i de núvol, ja que permet reduir les latències per a aplicacions crítiques mentre es poden utilitzar les grans capacitats informàtiques de la tecnologia en núvol. En aquesta tesi, proposem algunes idees noves en el disseny d'una arquitectura F2C de gestió de dades per a ciutats intel·ligents. En primer lloc, dibuixem i descrivim un model de Data LifeCycle global agnòstic que aborda amb èxit tots els reptes inclosos en els 6V i no adaptats a un entorn específic, però fàcil d'adaptar-se als requisits de qualsevol camp en concret. A continuació, presentem el model de Data LifeCycle complet per a una ciutat intel·ligent, una arquitectura de gestió de dades generada a partir d'un model agnòstic d'escenari global, adaptat a l'escenari particular de ciutat intel·ligent. Definim la gestió de cada fase de la vida de les dades i expliquem la seva implementació en una ciutat intel·ligent amb gestió de recursos F2C. I, a continuació, il·lustrem la nova arquitectura per a la gestió de dades en el context d'una Smart City a través d'una arquitectura de gestió de recursos F2C. Mostrem que aquest model té els avantatges d'ambdues, la tecnologia de boira i de núvol, ja que permet reduir les latències per a aplicacions crítiques mentre es pot utilitzar la gran capacitat de processament de la tecnologia en núvol. Com a primer experiment per a l'arquitectura de gestió de dades F2C, s'analitza una ciutat intel·ligent real, corresponent a la ciutat de Barcelona, amb especial èmfasi en les capes responsables de recollir les dades generades pels sensors desplegats. S'ha estimat la quantitat de dades de sensors diàries que es transmet a través de la xarxa i s'ha realitzat una projecció aproximada assumint un desplegament exhaustiu que cobreix tota la ciutat.Postprint (published version

    Hierarchical distributed fog-to-cloud data management in smart cities

    Get PDF
    There is a vast amount of data being generated every day in the world with different formats, quality levels, etc. This new data, together with the archived historical data, constitute the seed for future knowledge discovery and value generation in several fields of science and big data environments. Discovering value from data is a complex computing process where data is the key resource, not only during its processing, but also during its entire life cycle. However, there is still a huge concern about how to organize and manage this data in all fields for efficient usage and exploitation during all data life cycles. Although several specific Data LifeCycle (DLC) models have been recently defined for particular scenarios, we argue that there is no global and comprehensive DLC framework to be widely used in different fields. In particular scenario, smart cities are the current technological solutions to handle the challenges and complexity of the growing urban density. Traditionally, Smart City resources management rely on cloud based solutions where sensors data are collected to provide a centralized and rich set of open data. The advantages of cloud-based frameworks are their ubiquity, as well as an (almost) unlimited resources capacity. However, accessing data from the cloud implies large network traffic, high latencies usually not appropriate for real-time or critical solutions, as well as higher security risks. Alternatively, fog computing emerges as a promising technology to absorb these inconveniences. It proposes the use of devices at the edge to provide closer computing facilities and, therefore, reducing network traffic, reducing latencies drastically while improving security. We have defined a new framework for data management in the context of a Smart City through a global fog to cloud resources management architecture. This model has the advantages of both, fog and cloud technologies, as it allows reduced latencies for critical applications while being able to use the high computing capabilities of cloud technology. In this thesis, we propose many novel ideas in the design of a novel F2C Data Management architecture for smart cities as following. First, we draw and describe a comprehensive scenario agnostic Data LifeCycle model successfully addressing all challenges included in the 6Vs not tailored to any specific environment, but easy to be adapted to fit the requirements of any particular field. Then, we introduce the Smart City Comprehensive Data LifeCycle model, a data management architecture generated from a comprehensive scenario agnostic model, tailored for the particular scenario of Smart Cities. We define the management of each data life phase, and explain its implementation on a Smart City with Fog-to-Cloud (F2C) resources management. And then, we illustrate a novel architecture for data management in the context of a Smart City through a global fog to cloud resources management architecture. We show this model has the advantages of both, fog and cloud, as it allows reduced latencies for critical applications while being able to use the high computing capabilities of cloud technology. As a first experiment for the F2C data management architecture, a real Smart City is analyzed, corresponding to the city of Barcelona, with special emphasis on the layers responsible for collecting the data generated by the deployed sensors. The amount of daily sensors data transmitted through the network has been estimated and a rough projection has been made assuming an exhaustive deployment that fully covers all city. And, we provide some solutions to both reduce the data transmission and improve the data management. Then, we used some data filtering techniques (including data aggregation and data compression) to estimate the network traffic in this model during data collection and compare it with a traditional real system. Indeed, we estimate the total data storage sizes through F2C scenario for Barcelona smart citiesAl món es generen diàriament una gran quantitat de dades, amb diferents formats, nivells de qualitat, etc. Aquestes noves dades, juntament amb les dades històriques arxivades, constitueixen la llavor per al descobriment de coneixement i la generació de valor en diversos camps de la ciència i grans entorns de dades (big data). Descobrir el valor de les dades és un procés complex de càlcul on les dades són el recurs clau, no només durant el seu processament, sinó també durant tot el seu cicle de vida. Tanmateix, encara hi ha una gran preocupació per com organitzar i gestionar aquestes dades en tots els camps per a un ús i explotació eficients durant tots els cicles de vida de les dades. Encara que recentment s'han definit diversos models específics de Data LifeCycle (DLC) per a escenaris particulars, argumentem que no hi ha un marc global i complet de DLC que s'utilitzi àmpliament en diferents camps. En particular, les ciutats intel·ligents són les solucions tecnològiques actuals per fer front als reptes i la complexitat de la creixent densitat urbana. Tradicionalment, la gestió de recursos de Smart City es basa en solucions basades en núvol (cloud computing) on es recopilen dades de sensors per proporcionar un conjunt de dades obert i centralitzat. Les avantatges dels entorns basats en núvol són la seva ubiqüitat, així com una capacitat (gairebé) il·limitada de recursos. Tanmateix, l'accés a dades del núvol implica un gran trànsit de xarxa i, en general, les latències elevades no són apropiades per a solucions crítiques o en temps real, així com també per a riscos de seguretat més elevats. Alternativament, el processament de boira (fog computing) sorgeix com una tecnologia prometedora per absorbir aquests inconvenients. Proposa l'ús de dispositius a la vora per proporcionar recuirsos informàtics més propers i, per tant, reduir el trànsit de la xarxa, reduint les latències dràsticament mentre es millora la seguretat. Hem definit un nou marc per a la gestió de dades en el context d'una ciutat intel·ligent a través d'una arquitectura de gestió de recursos des de la boira fins al núvol (Fog-to-Cloud computing, o F2C). Aquest model té els avantatges combinats de les tecnologies de boira i de núvol, ja que permet reduir les latències per a aplicacions crítiques mentre es poden utilitzar les grans capacitats informàtiques de la tecnologia en núvol. En aquesta tesi, proposem algunes idees noves en el disseny d'una arquitectura F2C de gestió de dades per a ciutats intel·ligents. En primer lloc, dibuixem i descrivim un model de Data LifeCycle global agnòstic que aborda amb èxit tots els reptes inclosos en els 6V i no adaptats a un entorn específic, però fàcil d'adaptar-se als requisits de qualsevol camp en concret. A continuació, presentem el model de Data LifeCycle complet per a una ciutat intel·ligent, una arquitectura de gestió de dades generada a partir d'un model agnòstic d'escenari global, adaptat a l'escenari particular de ciutat intel·ligent. Definim la gestió de cada fase de la vida de les dades i expliquem la seva implementació en una ciutat intel·ligent amb gestió de recursos F2C. I, a continuació, il·lustrem la nova arquitectura per a la gestió de dades en el context d'una Smart City a través d'una arquitectura de gestió de recursos F2C. Mostrem que aquest model té els avantatges d'ambdues, la tecnologia de boira i de núvol, ja que permet reduir les latències per a aplicacions crítiques mentre es pot utilitzar la gran capacitat de processament de la tecnologia en núvol. Com a primer experiment per a l'arquitectura de gestió de dades F2C, s'analitza una ciutat intel·ligent real, corresponent a la ciutat de Barcelona, amb especial èmfasi en les capes responsables de recollir les dades generades pels sensors desplegats. S'ha estimat la quantitat de dades de sensors diàries que es transmet a través de la xarxa i s'ha realitzat una projecció aproximada assumint un desplegament exhaustiu que cobreix tota la ciutat

    Jahresbericht 2015 zur kooperativen DV-Versorgung

    Get PDF
    :VORWORT 9 ÜBERSICHT DER INSERENTEN 10 TEIL I ZUR ARBEIT DES IT-LENKUNGSAUSSCHUSSES 15 ZUR ARBEIT DES ERWEITERTEN IT-LENKUNGSAUSSCHUSSES 15 TEIL II 1 DAS ZENTRUM FÜR INFORMATIONSDIENSTE UND HOCHLEISTUNGSRECHNEN (ZIH) 19 1.1 AUFGABEN 19 1.2 ZAHLEN UND FAKTEN (REPRÄSENTATIVE AUSWAHL) 19 1.3 HAUSHALT 20 1.4 ZUR ARBEIT DES WISSENSCHAFTLICHEN BEIRATES 21 1.5 STRUKTUR / PERSONAL 22 1.6 STANDORTE 23 1.7 GREMIENARBEIT 24 2 KOMMUNIKATIONSINFRASTRUKTUR 27 2.1 NUTZUNGSÜBERSICHT NETZDIENSTE 27 2.2 NETZWERKINFRASTRUKTUR 27 2.3 KOMMUNIKATIONS- UND INFORMATIONSDIENSTE 37 3 ZENTRALES DIENSTEANGEBOT 47 3.1 SERVICE DESK 47 3.2 TROUBLE TICKET SYSTEM (OTRS) 48 3.3 IDENTITÄTSMANAGEMENT 49 3.4 LOGIN-SERVICE 51 3.5 BEREITSTELLUNG VON VIRTUELLEN SERVERN 51 3.6 STORAGE-MANAGEMENT 52 3.7 PC-POOLS 59 3.8 SECURITY 60 3.9 LIZENZ-SERVICE 61 3.10 PERIPHERIE-SERVICE 61 3.11 DRESDEN SCIENCE CALENDAR 61 4 SERVICELEISTUNGEN FÜR DEZENTRALE DV-SYSTEME 63 4.1 ALLGEMEINES 63 4.2 INVESTBERATUNG 63 4.3 PC- UND DRUCKER-SUPPORT 63 4.4 MICROSOFT WINDOWS-SUPPORT 63 4.5 ZENTRALE SOFTWARE-BESCHAFFUNG FÜR DIE TU DRESDEN 68 5 HOCHLEISTUNGSRECHNEN 71 5.1 HOCHLEISTUNGSRECHNER/SPEICHERKOMPLEX 72 5.2 NUTZUNGSÜBERSICHT DER HPC-SERVER 76 5.3 SPEZIALRESSOURCEN 76 5.4 GRID-RESSOURCEN 77 5.5 ANWENDUNGSSOFTWARE 78 5.6 VISUALISIERUNG 79 5.7 PARALLELE PROGRAMMIERWERKZEUGE 79 6 WISSENSCHAFTLICHE PROJEKTE UND KOOPERATIONEN 81 6.1 KOMPETENZZENTRUM FÜR VIDEOKONFERENZDIENSTE 81 6.2 SKALIERBARE SOFTWARE-WERKZEUGE ZUR UNTERSTÜTZUNG DER ANWENDUNGSOPTIMIERUNG AUF HPC-SYSTEMEN 81 6.3 LEISTUNGS- UND ENERGIEEFFIZIENZ-ANALYSE FÜR INNOVATIVE RECHNERARCHITEKTUREN 82 6.4 DATENINTENSIVES RECHNEN, VERTEILTES RECHNEN UND CLOUD COMPUTING 86 6.5 DATENANALYSE, METHODEN UND MODELLIERUNG IN DEN LIFE SCIENCES 88 6.6 PARALLELE PROGRAMMIERUNG, ALGORITHMEN UND METHODEN 90 6.7 INITIATIVBUDGET ZUR UNTERSTÜTZUNG VON KOOPERATIONSAUFGABEN DER SÄCHSISCHEN HOCHSCHULEN 91 6.8 KOOPERATIONEN 93 7 AUSBILDUNGSBETRIEB UND PRAKTIKA 95 7.1 AUSBILDUNG ZUM FACHINFORMATIKER / FACHRICHTUNG ANWENDUNGSENTWICKLUNG 95 7.2 PRAKTIKA 95 8 VERANSTALTUNGEN 97 8.1 AUS- UND WEITERBILDUNGSVERANSTALTUNGEN 97 8.2 NUTZERSCHULUNGEN 98 8.3 ZIH-KOLLOQUIEN 98 8.4 WORKSHOPS 98 8.5 STANDPRÄSENTATIONEN/VORTRÄGE/FÜHRUNGEN 98 9 PUBLIKATIONEN 99 TEIL III BEREICH MATHEMATIK UND NATURWISSENSCHAFTEN 105 BEREICH GEISTES- UND SOZIALWISSENSCHAFTEN 127 BEREICH INGENIEURWISSENSCHAFTEN 159 BEREICH BAU UND UMWELT 167 BEREICH MEDIZIN 18
    corecore