27 research outputs found

    Machine learning as a service for high energy physics (MLaaS4HEP): a service for ML-based data analyses

    Get PDF
    With the CERN LHC program underway, there has been an acceleration of data growth in the High Energy Physics (HEP) field and the usage of Machine Learning (ML) in HEP will be critical during the HL-LHC program when the data that will be produced will reach the exascale. ML techniques have been successfully used in many areas of HEP nevertheless, the development of a ML project and its implementation for production use is a highly time-consuming task and requires specific skills. Complicating this scenario is the fact that HEP data is stored in ROOT data format, which is mostly unknown outside of the HEP community. The work presented in this thesis is focused on the development of a ML as a Service (MLaaS) solution for HEP, aiming to provide a cloud service that allows HEP users to run ML pipelines via HTTP calls. These pipelines are executed by using the MLaaS4HEP framework, which allows reading data, processing data, and training ML models directly using ROOT files of arbitrary size from local or distributed data sources. Such a solution provides HEP users non-expert in ML with a tool that allows them to apply ML techniques in their analyses in a streamlined manner. Over the years the MLaaS4HEP framework has been developed, validated, and tested and new features have been added. A first MLaaS solution has been developed by automatizing the deployment of a platform equipped with the MLaaS4HEP framework. Then, a service with APIs has been developed, so that a user after being authenticated and authorized can submit MLaaS4HEP workflows producing trained ML models ready for the inference phase. A working prototype of this service is currently running on a virtual machine of INFN-Cloud and is compliant to be added to the INFN Cloud portfolio of services

    Technologies and Applications for Big Data Value

    Get PDF
    This open access book explores cutting-edge solutions and best practices for big data and data-driven AI applications for the data-driven economy. It provides the reader with a basis for understanding how technical issues can be overcome to offer real-world solutions to major industrial areas. The book starts with an introductory chapter that provides an overview of the book by positioning the following chapters in terms of their contributions to technology frameworks which are key elements of the Big Data Value Public-Private Partnership and the upcoming Partnership on AI, Data and Robotics. The remainder of the book is then arranged in two parts. The first part “Technologies and Methods” contains horizontal contributions of technologies and methods that enable data value chains to be applied in any sector. The second part “Processes and Applications” details experience reports and lessons from using big data and data-driven approaches in processes and applications. Its chapters are co-authored with industry experts and cover domains including health, law, finance, retail, manufacturing, mobility, and smart cities. Contributions emanate from the Big Data Value Public-Private Partnership and the Big Data Value Association, which have acted as the European data community's nucleus to bring together businesses with leading researchers to harness the value of data to benefit society, business, science, and industry. The book is of interest to two primary audiences, first, undergraduate and postgraduate students and researchers in various fields, including big data, data science, data engineering, and machine learning and AI. Second, practitioners and industry experts engaged in data-driven systems, software design and deployment projects who are interested in employing these advanced methods to address real-world problems

    Earth Observation Open Science and Innovation

    Get PDF
    geospatial analytics; social observatory; big earth data; open data; citizen science; open innovation; earth system science; crowdsourced geospatial data; citizen science; science in society; data scienc

    Power Modeling and Resource Optimization in Virtualized Environments

    Get PDF
    The provisioning of on-demand cloud services has revolutionized the IT industry. This emerging paradigm has drastically increased the growth of data centers (DCs) worldwide. Consequently, this rising number of DCs is contributing to a large amount of world total power consumption. This has directed the attention of researchers and service providers to investigate a power-aware solution for the deployment and management of these systems and networks. However, these solutions could be bene\ufb01cial only if derived from a precisely estimated power consumption at run-time. Accuracy in power estimation is a challenge in virtualized environments due to the lack of certainty of actual resources consumed by virtualized entities and of their impact on applications\u2019 performance. The heterogeneous cloud, composed of multi-tenancy architecture, has also raised several management challenges for both service providers and their clients. Task scheduling and resource allocation in such a system are considered as an NP-hard problem. The inappropriate allocation of resources causes the under-utilization of servers, hence reducing throughput and energy e\ufb03ciency. In this context, the cloud framework needs an e\ufb00ective management solution to maximize the use of available resources and capacity, and also to reduce the impact of their carbon footprint on the environment with reduced power consumption. This thesis addresses the issues of power measurement and resource utilization in virtualized environments as two primary objectives. At \ufb01rst, a survey on prior work of server power modeling and methods in virtualization architectures is carried out. This helps investigate the key challenges that elude the precision of power estimation when dealing with virtualized entities. A di\ufb00erent systematic approach is then presented to improve the prediction accuracy in these networks, considering the resource abstraction at di\ufb00erent architectural levels. Resource usage monitoring at the host and guest helps in identifying the di\ufb00erence in performance between the two. Using virtual Performance Monitoring Counters (vPMCs) at a guest level provides detailed information that helps in improving the prediction accuracy and can be further used for resource optimization, consolidation and load balancing. Later, the research also targets the critical issue of optimal resource utilization in cloud computing. This study seeks a generic, robust but simple approach to deal with resource allocation in cloud computing and networking. The inappropriate scheduling in the cloud causes under- and over- utilization of resources which in turn increases the power consumption and also degrades the system performance. This work \ufb01rst addresses some of the major challenges related to task scheduling in heterogeneous systems. After a critical analysis of existing approaches, this thesis presents a rather simple scheduling scheme based on the combination of heuristic solutions. Improved resource utilization with reduced processing time can be achieved using the proposed energy-e\ufb03cient scheduling algorithm

    Intelligent Energy-Savings and Process Improvement Strategies in Energy-Intensive Industries

    Get PDF
    S tím, jak se neustále vyvíjejí nové technologie pro energeticky náročná průmyslová odvětví, stávající zařízení postupně zaostávají v efektivitě a produktivitě. Tvrdá konkurence na trhu a legislativa v oblasti životního prostředí nutí tato tradiční zařízení k ukončení provozu a k odstavení. Zlepšování procesu a projekty modernizace jsou zásadní v udržování provozních výkonů těchto zařízení. Současné přístupy pro zlepšování procesů jsou hlavně: integrace procesů, optimalizace procesů a intenzifikace procesů. Obecně se v těchto oblastech využívá matematické optimalizace, zkušeností řešitele a provozní heuristiky. Tyto přístupy slouží jako základ pro zlepšování procesů. Avšak, jejich výkon lze dále zlepšit pomocí moderní výpočtové inteligence. Účelem této práce je tudíž aplikace pokročilých technik umělé inteligence a strojového učení za účelem zlepšování procesů v energeticky náročných průmyslových procesech. V této práci je využit přístup, který řeší tento problém simulací průmyslových systémů a přispívá následujícím: (i)Aplikace techniky strojového učení, která zahrnuje jednorázové učení a neuro-evoluci pro modelování a optimalizaci jednotlivých jednotek na základě dat. (ii) Aplikace redukce dimenze (např. Analýza hlavních komponent, autoendkodér) pro vícekriteriální optimalizaci procesu s více jednotkami. (iii) Návrh nového nástroje pro analýzu problematických částí systému za účelem jejich odstranění (bottleneck tree analysis – BOTA). Bylo také navrženo rozšíření nástroje, které umožňuje řešit vícerozměrné problémy pomocí přístupu založeného na datech. (iv) Prokázání účinnosti simulací Monte-Carlo, neuronové sítě a rozhodovacích stromů pro rozhodování při integraci nové technologie procesu do stávajících procesů. (v) Porovnání techniky HTM (Hierarchical Temporal Memory) a duální optimalizace s několika prediktivními nástroji pro podporu managementu provozu v reálném čase. (vi) Implementace umělé neuronové sítě v rámci rozhraní pro konvenční procesní graf (P-graf). (vii) Zdůraznění budoucnosti umělé inteligence a procesního inženýrství v biosystémech prostřednictvím komerčně založeného paradigmatu multi-omics.Zlepšení průmyslových procesů, Model založený na datech, Optimalizace procesu, Strojové učení, Průmyslové systémy, Energeticky náročná průmyslová odvětví, Umělá inteligence.

    Performance analysis of container-based networking solutions for high-performance computing cloud

    Get PDF
    Recently, cloud service providers have been gradually changing from virtual machine-based cloud infrastructures to container-based cloud-native infrastructures that consider performance and workload-management issues. Several data network performance issues for virtual instances have arisen, and various networking solutions have been newly developed or utilized. In this paper, we propose a solution suitable for a high-performance computing (HPC) cloud through a performance comparison analysis of container-based networking solutions. We constructed a supercomputer-based test-bed cluster to evaluate the serviceability by executing HPC jobs

    Services de sécurité inter-locataire et multi-locataire pour les logiciels en tant que service

    Get PDF
    Récemment, l’infonuagique a joué un rôle essentiel dans l’évolution de la technologie d’informatique. Les logiciels en tant que service (SaaS for software as a service) sont parmi les services infonuagiques les plus attractifs qui ont suscité l’intérêt des fournisseurs et des consommateurs d’applications Web. D’une part, l’externalisation des ressources permet au fournisseur de déployer une application dans une infonuagique publique au lieu de gérer ses ressources sous-jacentes (machines physiques). D’autant plus, les ressources de cette application peuvent être dynamiquement et automatiquement mises à l’échelle en fonction de l’évolution de la clientèle et/ou de la quantité du trafic. D’autre part, la mutualisation (partage) des ressources permet au fournisseur une réduction significative des coûts d’infrastructure et de maintenance en partageant la même instance d’application entre plusieurs locataires, appelés tenants (en anglais). Un locataire peut s’abonner aux SaaS à la demande en payant à l’usage. En dépit de leurs avantages, l’externalisation et la mutualisation des ressources entraînent de nouveaux défis et risques de sécurité qui doivent être inventoriés et résolus par le fournisseur d’un SaaS. Le locataire d’un SaaS ne peut pas déployer ses systèmes de détection d’intrusion (IDS for intrusion detection system) préférés puisqu’il ne contrôle ni le code source ni l’infrastructure de l’application (déployée par le fournisseur dans une infonuagique publique). Le fournisseur doit donc non seulement intégrer des IDS en tant que service dans son infrastructure infonuagique, mais aussi protéger chaque locataire selon ses propres exigences de sécurité. Dans un SaaS multi-locataire, les données des locataires, qui peuvent être des compétiteurs, sont stockées dans la même base de données. Le fournisseur doit donc détecter et prévenir les attaques réalisées par un locataire contre les données d’autres locataires. Plusieurs recherches scientifiques proposent des IDS infonuagiques qui se focalisent sur l’infrastructure (réseaux virtuels, machines virtuelles, etc.). Cependant, ces IDS n’offrent pas une sécurité en tant que service au fournisseur et aux locataires d’un SaaS. D’autres recherches scientifiques et entreprises informatiques suggèrent des mécanismes d’isolation des données des locataires afin de réduire les risques d’attaques entre eux. Cependant, ces mécanismes ne sont pas automatisés et ne permettent pas de prévenir les attaques entre les locataires partageant la même base de données.----------ABSTRACT: Recently, cloud computing plays a vital role in the evolution of computer technology. Softwareas-a-Service (SaaS) is one of the cloud services that has attracted the providers and clients (tenants) of Web applications. On the one hand, outsourcing allows a SaaS provider to deploy an application in a public cloud instead of managing its underlying resources (physical machines). The resources of this application can be scaled dynamically and automatically according to the evolution of the customer and/or the amount of traffic. On the other hand, multi-tenancy (or resources pooling) enables a SaaS provider to significantly reduce the infrastructure and maintenance costs by sharing the same application and database instances among several tenants. A tenant can subscribe to SaaS on-demand and pay according to pay-per-use model. However, the outsourcing and multi-tenancy bring new challenges and security risks that must be addressed by the SaaS provider. A tenant can not deploy its preferred intrusion detection systems (IDS) since it does not control the source code and the infrastructure of the application (deployed by the provider in a public cloud). Therefore, the provider must not only integrate IDS as a service into its cloud infrastructure, but also protect each tenant according to its own security requirements. In a multi-tenant SaaS, the data of tenants that can be competitors are stored in the same database. Therefore, the provider must detect and prevent attacks realized by a tenant (maliciously or accidentally) against the data of other tenants. The cloud-based IDS proposed by scientific research focus on the infrastructure (e.g., virtual networks, virtual machines, etc.). However, they do not detect attacks between the tenants of SaaS and do not provide security as a service for both SaaS provider and tenant. Other scientific research and IT companies propose tenant data isolation mechanisms to reduce the risk of inter-tenant attacks. However, these mechanisms are not automated and do not prevent attacks between tenants sharing the same database

    Intercloud-Kommunikation für Mehrwehrtdienste von Cloud-basierten Architekturen im Internet of Things

    Get PDF
    Das Internet of Things (IoT) ist aktuell ein junger Wachstumsmarkt, dessen Bedeutung für unsere Gesellschaft in naher Zukunft vielen Menschen erst noch wirklich bewusst werden wird. Die Subdomänen Smart-Home, Smart-Grid, Smart-Mobility, Industrie 4.0, Smart-Health und viele mehr sind wichtig für unsere zukünftige Wettbewerbsfähigkeit, die Herausforderungen zur Bewältigung des Klimawandels, unsere Gesundheit, aber auch für trivialere Dinge wie Komfort. Andererseits ergibt sich hierbei bereits dasselbe große Problem, das in einer ähnlichen Form schon bei klassischem Cloud-Computing bekannt ist: Vendor-Silos, die keinen hersteller- oder anbieterübergreifenden Austausch von Gerätedaten ermöglichen, verhindern eine schnelle Verbreitung dieser neuen Technologie. Diensteanbieter müssen ihre Produkte aufwendig für unzählige Technologien bereitstellen, was die Entwicklung von Diensten unnötig teuer macht und letztendlich das Dienstangebot insgesamt einschränkt. Cloud-Computing wird dabei in Zukunft eine wichtige Rolle spielen. Die Dissertation beschäftigt sich daher mit dem Problem IoT-Gerätedaten an IoT-Clouds plattformübergreifend und anbieterübergreifend nutzbar zu machen. Die Motivation und die adressierte Forschungslücke zeigen die Notwendigkeit der Beschäftigung mit dem Thema auf. Ausgehend davon, wird das Konzept einer dezentral organisierten IoT-Intercloud vorgeschlagen, welches in der Lage ist heterogene IoT-Clouds zu integrieren. Die Analyse des Standes der Technik zeigt, das IoT-Clouds genügend Eigenschaften teilen, um in Zukunft eine Adaption zu einer einheitlichen Schnittstelle für die IoT-Intercloud zu schaffen. Das Konzept umfasst zunächst die Komponentenarchitektur eines Intercloud-Brokers zur Etablierung einer IoT-Intercloud. Ausgehend davon wird in vertiefenden Teilkonzepten ein Discovery-Service zum Finden von Gerätedaten und einem Push-Stream-Provider, für die Zustellung von IoT-Event-Notifications in Echtzeit, behandelt. Eine Evaluation zeigt letztlich die praxistaugliche Realisierbarkeit, Skalierbarkeit und Performance der Konzeption und des implementierten Prototyps.:1 Einleitung 1.1 Problemstellung und Motivation 1.2 Ziele der Dissertation 1.2.1 Thesen 1.2.2 Forschungsfragen 1.3 Aufbau der Dissertation 2 Grundlagen zu Cloud-Computing im Internet of Things 2.1 Definition von Cloud-Computing 2.1.1 Generelle Eigenschaften 2.1.2 Architekturschichten 2.1.3 Einsatzformen 2.2 Internet of Things 2.2.1 Middleware im IoT 2.3 Architekturen verteilter Systeme zur Bereitstellung der IoT-Middleware 2.3.1 Geräte-zentrische IoT-Architektur 2.3.2 Gateway-zentrische IoT-Architektur 2.3.3 Cloud-zentrische IoT-Architektur 2.3.4 Zusammenfassung 2.4 Eigenschaften von verteilten Event-basierten Systemen 2.4.1 Interaktionsmodelle 2.4.2 Filtermodelle von Subscriptions 2.4.3 Verteiltes Notfication-Routing 2.5 Discovery im IoT 2.5.1 Grundlegende Begrifflichkeiten 2.5.2 Topologien von Discovery-Services 2.5.3 Funktionale Anforderungen für Discovery-Services im IoT 2.5.4 Ausgewählte Ansätze von Discovery-Services im IoT 3 Stand der Technik 3.1 Device-as-a-Service-Schnittstellen von IoT-Clouds 3.1.1 Gerätedatenmodell 3.1.2 Datenabruf mit Pull-Semantik 3.1.3 Datenabruf mit Push-Semantik 3.1.4 Steuerung von Gerätedaten 3.1.5 Datenzugriff durch Drittparteien 3.2 Analyse der DaaS-Schnittstellen verschiedener IoT-Clouds 3.2.1 Google Nest 3.2.2 Samsung Artik 3.2.3 AWS IoT 3.2.4 Microsoft Azure IoT Suite 3.2.5 Kiwigrid IoT-Plattform 3.2.6 Digi Device Cloud 3.2.7 DeviceHive 3.2.8 Eurotech Everyware Cloud 3.3 Zusammenfassung und Diskussion des Standes der Technik 4 Intercloud-Computing für das IoT 4.1 Intercloud-Computing nach Toosi 4.1.1 Ansätze zur Interoperabilität 4.1.2 Szenarien zur Cloud-übergreifenden Interoperabilität 4.1.3 Herausforderungen für Komponenten 4.2 Intercloud-Computing nach Grozev 4.2.1 Klassifikation der Architekturen 4.2.2 Klassifikation des Brokering-Mechanismus 4.2.3 Klassifikation verteilter Cloudanwendungen 4.3 Verwandte Arbeiten 4.3.1 Intercloud-Architekturen außerhalb der IoT-Domäne 4.3.2 Intercloud-Architekturen für das IoT 4.4 Analyse der verwandten Arbeiten 4.4.1 Systematik zur Bewertung 4.4.2 Bewertung und Abgrenzung 5 Anforderungsanalyse 5.1 Akteure in einer IoT-Intercloud 5.1.1 Menschliche Akteure 5.1.2 Systemakteure 5.2 Anwendungsfälle 5.2.1 Anwendungsfälle von IoT-Diensten 5.2.2 Anwendungsfälle von IoT-Clouds 5.2.3 Anwendungsfälle von IoT-Geräten 5.2.4 Anwendungsfälle von Intercloud-Brokern 5.3 Anforderungen 5.4 Ausschlusskriterien 6 Intercloud-Architektur für das IoT 6.1 Systemmodell einer IoT-Intercloud 6.1.1 IoT-Datenmodell für die Intercloud 6.1.2 Etablierung einer Vertrauensbeziehung zwischen zwei Clouds 6.2 Komponentenarchitektur des Intercloud-Brokers 6.2.1 Service-Connector, IC-DaaS-IF und Service-Protocol 6.2.2 Intercloud-Proxy, ICC-IF und Protokoll 6.2.3 Cloud-Adapter und IC-DaaS-Adapter-IF 6.3 Zusammenfassung 7 Verteilter Discovery-Service 7.1 Problembeschreibung 7.1.1 Topologie des Discovery-Service 7.2 Einfache Cloud-Discovery mit Broadcasting-Weiterleitung 7.2.1 Schnittstelle und Protokoll des einfachen Discovery-Service 7.2.2 Diskussion des einfachen Discovery-Service 7.3 Cloud-Discovery mit Geräteverzeichnis und Multicast-Weiterleitung 7.3.1 Geeignete Geräteinformationen für das Verzeichnis 7.3.2 Struktur und Schnittstelle des Verzeichnisses 7.3.3 Verzeichnissynchronisation und erweitertes Protokoll 7.4 Zusammenfassung beider Ansätze des Discovery-Service 8 Verteilter Push-Stream-Provider 8.1 Verteilter Push-Stream-Provider im Modell des Broker-Overlay-Netzwerks 8.2 Verteilter Push-Stream-Provider mit einfachem Routing-Modell 8.2.1 Systemmodell 8.2.2 Integration der Subkomponenten in die verteilte ICB-Architektur 8.3 Redundanz und Redundanzvermeidung des Push-Stream-Providers 8.3.1 Beschreibung des Redundanzproblems und des Lösungsansatzes 8.3.2 Lösungsansatz 8.4 Verteilter Push-Stream-Provider mit vereinigungsbasiertem Routing-Modell 8.4.1 Erkennen von ähnlichen Filtern 8.4.2 Konstruktion eines Vereinigungsfilters 8.4.3 Rekonstruktion der Datenströme 8.4.4 Komponente: Merge-Controller 8.4.5 Komponente: Stream-Processing-Engine 8.4.6 Integration in die bisherige Architektur 8.4.7 Diskussion des Ansatzes zur Redundanzvermeidung 8.5 Zusammenfassung zum Konzept des Push-Stream-Providers 9 Evaluation 9.1 Prototypische Implementierung der Konzeptarchitektur 9.1.1 Intercloud-Broker 9.1.2 IoT-Cloud und IoT-Geräte 9.1.3 IoT-Dienste 9.1.4 Grenzen des Prototyps und Fokus der experimentellen Evaluation 9.2 Aufbau der Evaluationsumgebung 9.3 Experimentelle Untersuchung der prototypischen Implementierung des Konzepts 9.3.1 Ermittlung einer Performance-Baseline 9.3.2 Experiment 1: Performance bei variabler Nachrichtengröße und Nachrichtenanzahl 9.3.3 Experiment 2: Performance bei multiplen Subscriptions 9.3.4 Experiment 3: Ermittlung des maximalen Durchsatzes und Skalierbarkeit des ICB 9.3.5 Experiment 4: Effizienzvergleich zwischen einfachem und vereinigungsbasiertem Routing 9.4 Zusammenfassung und Diskussion der Evaluation 10 Zusammenfassung 10.1 Beiträge der Dissertation 10.2 Ausblick A Abbildungen B Tabellen Inhaltsverzeichnis C Algorithmen D Listings Literaturverzeichni

    A FORENSICALLY-ENABLED IAAS CLOUD COMPUTING ARCHITECTURE

    Get PDF
    Cloud computing has been advancing at an intense pace. It has become one of the most important research topics in computer science and information systems. Cloud computing offers enterprise-scale platforms in a short time frame with little effort. Thus, it delivers significant economic benefits to both commercial and public entities. Despite this, the security and subsequent incident management requirements are major obstacles to adopting the cloud. Current cloud architectures do not support digital forensic investigators, nor comply with today’s digital forensics procedures – largely due to the fundamental dynamic nature of the cloud. When an incident has occurred, an organization-based investigation will seek to provide potential digital evidence while minimising the cost of the investigation. Data acquisition is the first and most important process within digital forensics – to ensure data integrity and admissibility. However, access to data and the control of resources in the cloud is still very much provider-dependent and complicated by the very nature of the multi-tenanted operating environment. Thus, investigators have no option but to rely on the Cloud Service Providers (CSPs) to acquire evidence for them. Due to the cost and time involved in acquiring the forensic image, some cloud providers will not provide evidence beyond 1TB despite a court order served on them. Assuming they would be willing or are required to by law, the evidence collected is still questionable as there is no way to verify the validity of evidence and whether evidence has already been lost. Therefore, dependence on the CSPs is considered one of the most significant challenges when investigators need to acquire evidence in a timely yet forensically sound manner from cloud systems. This thesis proposes a novel architecture to support a forensic acquisition and analysis of IaaS cloud-base systems. The approach, known as Cloud Forensic Acquisition and Analysis System (Cloud FAAS), is based on a cluster analysis of non-volatile memory that achieves forensically reliable images at the same level of integrity as the normal “gold standard” computer forensic acquisition procedures with the additional capability to reconstruct the image at any point in time. Cloud FAAS fundamentally, shifts access of the data back to the data owner rather than relying on a third party. In this manner, organisations are free to undertaken investigations at will requiring no intervention or cooperation from the cloud provider. The novel architecture is validated through a proof-of-concept prototype. A series of experiments are undertaken to illustrate and model how Cloud FAAS is capable of providing a richer and more complete set of admissible evidence than what current CSPs are able to provide. Using Cloud FAAS, investigators have the ability to obtain a forensic image of the system after, just prior to or hours before the incident. Therefore, this approach can not only create images that are forensically sound but also provide access to deleted and more importantly overwritten files – which current computer forensic practices are unable to achieve. This results in an increased level of visibility for the forensic investigator and removes any limitations that data carving and fragmentation may introduce. In addition, an analysis of the economic overhead of operating Cloud FAAS is performed. This shows the level of disk change that occurs is well with acceptable limits and is relatively small in comparison to the total volume of memory available. The results show Cloud FAAS has both a technical and economic basis for solving investigations involving cloud computing.Saudi Governmen
    corecore