250 research outputs found

    Automatically responding to customers

    Get PDF

    Replication and Caching Systems for the support of VMs stored in File Systems with Snapshots

    Get PDF
    Recently, in a relatively short timeframe, there were fundamental changes in the way computing power is used. Virtualisation technology has changed both the model of a data centre’s infrastructure and the way physical computers are now managed. This shift is a consequence of today’s fast deployment rate of Virtual Machines (VM) in a high consolidation environment with minimal need for human management. New approaches to virtualisation techniques are being developed at a surprisingly fast rate, leading to a new exciting and vibrating ecosystem of platforms and services. We see the big industry players tackling problems such as Desktop Virtualisation with moderate success, but completely ignoring the computation power already present in their clients’ infrastructures and, instead, opting for a costly solution based on powerful new machines. There’s still room for improvement in Virtual Desktop Infrastructure (VDI) and development of new architectures that take advantage of the computation power available at the user’s desk, with a minimum effort on the management side; Infrastructure for Client-Based Desktops (iCBD) is one of these projects. This thesis focuses on the development of mechanisms for the replication and caching of VM images stored in a local filesystem, albeit one with the ability to perform snapshots. In this work, there are some challenges to address: the proposed architecture must be entirely distributed and completely integrated with the already existing client-based VDI platform; and it must be able to efficiently cope with very large, read-only files, (some of them snapshots) and handle their multiple versions. This work will also explore the challenges and advantages of deploying such a system in a high throughput network, with both high availability and scalability while efficiently supporting a large number of users (and their workstations)

    Performance Analytics of Cloud Networks

    Get PDF
    As the world becomes more inter-connected and dependent on the Internet, networks become ever more pervasive, and the stresses placed upon them more demanding. Similarly, the expectations of networks to maintain a high level of performance have also increased. Network performance is highly important to any business that operates online, depends on web traffic, runs any part of their infrastructure in a cloud environment, or even hosts their own network infrastructure. Depending upon the exact nature of a network, whether it be local or wide-area, 10 or 100 Gigabit, it will have distinct performance characteristics and it is important for a business or individual operating on the network to understand those performance characteristics and how they affect operations. To better understand our networks, it is necessary that we test them to measure their performance capabilities and track these metrics over time. In our work, we provide an in-depth analysis of how best to run cloud benchmarks to increase our network intelligence and how we can use the results of those benchmarks to predict future performance and identify performance anomalies. To achieve this, we explain how to effectively run cloud benchmarks and propose a scheduling algorithm for running large numbers of cloud benchmarks daily. We then use the performance data gathered from this method to conduct a thorough analysis of the performance characteristics of a cloud network, train neural networks to forecast future throughput based on historical results and detect performance anomalies as they occur

    Modern data analytics in the cloud era

    Get PDF
    Cloud Computing ist die dominante Technologie des letzten Jahrzehnts. Die Benutzerfreundlichkeit der verwalteten Umgebung in Kombination mit einer nahezu unbegrenzten Menge an Ressourcen und einem nutzungsabhängigen Preismodell ermöglicht eine schnelle und kosteneffiziente Projektrealisierung für ein breites Nutzerspektrum. Cloud Computing verändert auch die Art und Weise wie Software entwickelt, bereitgestellt und genutzt wird. Diese Arbeit konzentriert sich auf Datenbanksysteme, die in der Cloud-Umgebung eingesetzt werden. Wir identifizieren drei Hauptinteraktionspunkte der Datenbank-Engine mit der Umgebung, die veränderte Anforderungen im Vergleich zu traditionellen On-Premise-Data-Warehouse-Lösungen aufweisen. Der erste Interaktionspunkt ist die Interaktion mit elastischen Ressourcen. Systeme in der Cloud sollten Elastizität unterstützen, um den Lastanforderungen zu entsprechen und dabei kosteneffizient zu sein. Wir stellen einen elastischen Skalierungsmechanismus für verteilte Datenbank-Engines vor, kombiniert mit einem Partitionsmanager, der einen Lastausgleich bietet und gleichzeitig die Neuzuweisung von Partitionen im Falle einer elastischen Skalierung minimiert. Darüber hinaus führen wir eine Strategie zum initialen Befüllen von Puffern ein, die es ermöglicht, skalierte Ressourcen unmittelbar nach der Skalierung auszunutzen. Cloudbasierte Systeme sind von fast überall aus zugänglich und verfügbar. Daten werden häufig von zahlreichen Endpunkten aus eingespeist, was sich von ETL-Pipelines in einer herkömmlichen Data-Warehouse-Lösung unterscheidet. Viele Benutzer verzichten auf die Definition von strikten Schemaanforderungen, um Transaktionsabbrüche aufgrund von Konflikten zu vermeiden oder um den Ladeprozess von Daten zu beschleunigen. Wir führen das Konzept der PatchIndexe ein, die die Definition von unscharfen Constraints ermöglichen. PatchIndexe verwalten Ausnahmen zu diesen Constraints, machen sie für die Optimierung und Ausführung von Anfragen nutzbar und bieten effiziente Unterstützung bei Datenaktualisierungen. Das Konzept kann auf beliebige Constraints angewendet werden und wir geben Beispiele für unscharfe Eindeutigkeits- und Sortierconstraints. Darüber hinaus zeigen wir, wie PatchIndexe genutzt werden können, um fortgeschrittene Constraints wie eine unscharfe Multi-Key-Partitionierung zu definieren, die eine robuste Anfrageperformance bei Workloads mit unterschiedlichen Partitionsanforderungen bietet. Der dritte Interaktionspunkt ist die Nutzerinteraktion. Datengetriebene Anwendungen haben sich in den letzten Jahren verändert. Neben den traditionellen SQL-Anfragen für Business Intelligence sind heute auch datenwissenschaftliche Anwendungen von großer Bedeutung. In diesen Fällen fungiert das Datenbanksystem oft nur als Datenlieferant, während der Rechenaufwand in dedizierten Data-Science- oder Machine-Learning-Umgebungen stattfindet. Wir verfolgen das Ziel, fortgeschrittene Analysen in Richtung der Datenbank-Engine zu verlagern und stellen das Grizzly-Framework als DataFrame-zu-SQL-Transpiler vor. Auf dieser Grundlage identifizieren wir benutzerdefinierte Funktionen (UDFs) und maschinelles Lernen (ML) als wichtige Aufgaben, die von einer tieferen Integration in die Datenbank-Engine profitieren würden. Daher untersuchen und bewerten wir Ansätze für die datenbankinterne Ausführung von Python-UDFs und datenbankinterne ML-Inferenz.Cloud computing has been the groundbreaking technology of the last decade. The ease-of-use of the managed environment in combination with nearly infinite amount of resources and a pay-per-use price model enables fast and cost-efficient project realization for a broad range of users. Cloud computing also changes the way software is designed, deployed and used. This thesis focuses on database systems deployed in the cloud environment. We identify three major interaction points of the database engine with the environment that show changed requirements compared to traditional on-premise data warehouse solutions. First, software is deployed on elastic resources. Consequently, systems should support elasticity in order to match workload requirements and be cost-effective. We present an elastic scaling mechanism for distributed database engines, combined with a partition manager that provides load balancing while minimizing partition reassignments in the case of elastic scaling. Furthermore we introduce a buffer pre-heating strategy that allows to mitigate a cold start after scaling and leads to an immediate performance benefit using scaling. Second, cloud based systems are accessible and available from nearly everywhere. Consequently, data is frequently ingested from numerous endpoints, which differs from bulk loads or ETL pipelines in a traditional data warehouse solution. Many users do not define database constraints in order to avoid transaction aborts due to conflicts or to speed up data ingestion. To mitigate this issue we introduce the concept of PatchIndexes, which allow the definition of approximate constraints. PatchIndexes maintain exceptions to constraints, make them usable in query optimization and execution and offer efficient update support. The concept can be applied to arbitrary constraints and we provide examples of approximate uniqueness and approximate sorting constraints. Moreover, we show how PatchIndexes can be exploited to define advanced constraints like an approximate multi-key partitioning, which offers robust query performance over workloads with different partition key requirements. Third, data-centric workloads changed over the last decade. Besides traditional SQL workloads for business intelligence, data science workloads are of significant importance nowadays. For these cases the database system might only act as data delivery, while the computational effort takes place in data science or machine learning (ML) environments. As this workflow has several drawbacks, we follow the goal of pushing advanced analytics towards the database engine and introduce the Grizzly framework as a DataFrame-to-SQL transpiler. Based on this we identify user-defined functions (UDFs) and machine learning inference as important tasks that would benefit from a deeper engine integration and investigate approaches to push these operations towards the database engine

    File Fragment Classification Using Neural Networks with Lossless Representations

    Get PDF
    This study explores the use of neural networks as universal models for classifying file fragments. This approach differs from previous work in its lossless feature representation, with fragments’ bits as direct input, and its use of feedforward, recurrent, and convolutional networks as classifiers, whereas previous work has only tested feedforward networks. Due to the study’s exploratory nature, the models were not directly evaluated in a practical setting; rather, easily reproducible experiments were performed to attempt to answer the initial question of whether this approach is worthwhile to pursue further, especially due to its high computational cost. The experiments tested classification of fragments of homogeneous file types as an idealized case, rather than using a realistic set of types, because the types of interest are highly application-dependent. The recurrent networks achieved 98 percent accuracy in distinguishing 4 file types, suggesting that this approach may be capable of yielding models with sufficient performance for practical applications. The potential applications depend mainly on the model performance gains achievable by future work but include binary mapping, deep packet inspection, and file carving

    Building the Future Internet through FIRE

    Get PDF
    The Internet as we know it today is the result of a continuous activity for improving network communications, end user services, computational processes and also information technology infrastructures. The Internet has become a critical infrastructure for the human-being by offering complex networking services and end-user applications that all together have transformed all aspects, mainly economical, of our lives. Recently, with the advent of new paradigms and the progress in wireless technology, sensor networks and information systems and also the inexorable shift towards everything connected paradigm, first as known as the Internet of Things and lately envisioning into the Internet of Everything, a data-driven society has been created. In a data-driven society, productivity, knowledge, and experience are dependent on increasingly open, dynamic, interdependent and complex Internet services. The challenge for the Internet of the Future design is to build robust enabling technologies, implement and deploy adaptive systems, to create business opportunities considering increasing uncertainties and emergent systemic behaviors where humans and machines seamlessly cooperate
    • …
    corecore